-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing Exists() check from S3 getters #1842
Conversation
@linzhp @ngshiheng For performance issues: I think we can replace ListObjects call in S3 with HeadObject (as suggested; sorry did not read all of it) for all the 3 and it should significantly speed things up. If this works, we can keep the exists check around. The key should be: |
I think Delete is to help Delete specific modules. It is a path
|
Yes, this is a good option too. I can do it in a separate PR. |
(I replied at the other PR going to post it here again for reference) I think However, my main concern is that we would be making 2 additional requests/module (3 in total for Perhaps, the wdyt? edit: just to put it out there, the HEAD requests limit on S3 is currently at 5,500 req/s (reference). |
With this PR, the |
You can’t predict, if this is a concern the S3 client can have some back offs. I would not worry though. In case, we are throttled, we can always serialize these requests |
With #1844, I think this is still a good improvement to getters. @marwan-at-work Can you take a look too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I am not sure though! Can we hold on to merging this please? |
@manugupt1 What is your concern here? You are more familiar with S3 and AWS than I am so I'm happy to hear your thoughts |
I am not sure if we should download all the bytes for info, mod and version with a GET call. It’s just that the bytes transferred will introduce latency as well as cost. I maybe wrong but it seems like an over optimization to remove Exists now. |
From the diff of this PR, all it's doing is that it's not calling ".Check()" but instead using the 404 from the GET call as a way to check if something does not exist. On the flip side, if a module does exist then we just proceed to serving like as if Check() return true. Am I missing something? Thanks! |
You are correct! I need sometime to think and understand. Let me get back! I am still not sure though |
@marwan-at-work @linzhp Sorry! I think this PR looks good to me. I think it is good to go. I did not read the commit properly as I was going through my phone. |
No worries. Thanks for the quick review! |
What is the problem I am trying to address?
Athens first checks the existence of a module/version before:
Some of these checks are not necessary, because
Removing these checks will improve the atomicity and performance of the operations, which would also address the performance issue described in #1840
3 is the only case when we need to check for existence to avoid duplicate uploads, but we can potentially make
HeadObject
calls in the Uploader, which is called concurrently. It would make the check more granular: instead of uploading all 3 files again when one file is missing, we can upload only the missing file.How is the fix applied?
Exists()
check from getter.go and instead read the return error fromGetObjectWithContext
to determine whether it is not found.If this approach is OK, I can apply similar changes to other storage clients.
@ngshiheng can you test this?
@marwan-at-work @r-ashish What do you think?