Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: retry API calls with exponential backoff #287

Merged
merged 15 commits into from Oct 16, 2020
Merged

feat: retry API calls with exponential backoff #287

merged 15 commits into from Oct 16, 2020

Conversation

@andrewsg
Copy link
Contributor

@andrewsg andrewsg commented Oct 2, 2020

Fixes #108

Related: https://github.com/googleapis/google-cloud-python/issues/9298

@andrewsg andrewsg requested a review from frankyn Oct 2, 2020
@google-cla google-cla bot added the cla: yes label Oct 2, 2020
@frankyn frankyn requested a review from crwilcox Oct 2, 2020
Copy link
Member

@frankyn frankyn left a comment

Thank you for your patience, it took a bit more time than I expected to review the PR.

google/cloud/storage/retry.py Show resolved Hide resolved
google/cloud/storage/retry.py Outdated Show resolved Hide resolved
google/cloud/storage/hmac_key.py Outdated Show resolved Hide resolved
google/cloud/storage/bucket.py Outdated Show resolved Hide resolved
google/cloud/storage/retry.py Outdated Show resolved Hide resolved
tests/unit/test_hmac_key.py Outdated Show resolved Hide resolved
google/cloud/storage/_http.py Outdated Show resolved Hide resolved
tests/unit/test__http.py Show resolved Hide resolved
@andrewsg
Copy link
Contributor Author

@andrewsg andrewsg commented Oct 8, 2020

@tritone FYI

google/cloud/storage/retry.py Outdated Show resolved Hide resolved
response.status_code = 200
data = b"brent-spiner"
response._content = data
http.request.return_value = response
Copy link
Contributor

@crwilcox crwilcox Oct 9, 2020

this test is missing the comment the other tests have. There may be a way to refactor this also, but unsure if it is warranted.

Copy link
Contributor Author

@andrewsg andrewsg Oct 9, 2020

Which comment is missing?

@andrewsg
Copy link
Contributor Author

@andrewsg andrewsg commented Oct 10, 2020

@crwilcox @frankyn PTAL. Changes include:

  • Retry predicate simplified; my testing suggests the google-api-core nuanced inspection of error messages is not needed for GCS.
  • Retry predicate now includes all HTTP error codes listed in retry design, except 408
    -- 408 is mysteriously missing from google-api-core exceptions and I have no way to test what happens if we actually get one (testbench was no help here)
  • Retry predicate now includes ConnectionError which encompasses failure to find host and broken connections of all types. LMK if we need to narrow it down; we have fairly rich and detailed info on the specific ConnectionError subtype which we can use here.

Responded to all other comments as well.

Copy link
Member

@frankyn frankyn left a comment

Generally LGTM, clean design with minimal impact on the code base!

Have documentation comment requests

exceptions.BadGateway, # 502
exceptions.ServiceUnavailable, # 503
exceptions.GatewayTimeout, # 504
requests.ConnectionError,
Copy link
Member

@frankyn frankyn Oct 15, 2020

@tritone this is a change in what we discussed around retrying ConnectionErrors in general. Discussed this with @andrewsg w.r.t retrying unable to connect errors and for now implementation is moving forward with this solution until it's baked into the api_core libraries. Rationale, there's a default timeout enabled that would timeout the retry compared to Golang which does not.

Copy link
Contributor

@tritone tritone Oct 15, 2020

Got it, this seems fine to me re: the fact that there is a default timeout in python. So the idea is that down the road, we can add an error in api_core which catches "connection reset by peer" more specifically?

Copy link
Member

@frankyn frankyn Oct 15, 2020

that's correct, because right now requests.ConnectionError is tied to a specific implementation, e.g. requests and should be wrapped by api_core / cloud_core for it to be used generally outside of just the GCS library.

Copy link
Contributor Author

@andrewsg andrewsg Oct 15, 2020

I'm not totally convinced that retrying for a little while by default if there is an interruption due to network outage is not the right thing to do anyways, but yes, the idea is that instead of tapping into requests here we would have a custom api_core exception like we do with other errors.

google/cloud/storage/retry.py Show resolved Hide resolved
google/cloud/storage/retry.py Show resolved Hide resolved
google/cloud/storage/retry.py Show resolved Hide resolved
@andrewsg
Copy link
Contributor Author

@andrewsg andrewsg commented Oct 15, 2020

@frankyn Added documentation as requested, PTAL

Copy link
Contributor

@tritone tritone left a comment

Couple small comments, generally looks really good!

exceptions.BadGateway, # 502
exceptions.ServiceUnavailable, # 503
exceptions.GatewayTimeout, # 504
requests.ConnectionError,
Copy link
Contributor

@tritone tritone Oct 15, 2020

Got it, this seems fine to me re: the fact that there is a default timeout in python. So the idea is that down the road, we can add an error in api_core which catches "connection reset by peer" more specifically?

To modify the default retry behavior, call a ``with_XXX`` method
on ``DEFAULT_RETRY``. For example, to change the deadline to 30 seconds,
pass ``retry=DEFAULT_RETRY.with_deadline(30)``. See google-api-core reference
(https://googleapis.dev/python/google-api-core/latest/retry.html) for details.
Copy link
Contributor

@tritone tritone Oct 15, 2020

Would it be worth having a short sample snippet here rather than explaining and inlining? Might make it easier to understand what's necessary.

Copy link
Contributor Author

@andrewsg andrewsg Oct 15, 2020

I decided against this since a useful snippet here would have to show how to actually inject the modified retry via user code, and that feature is out of scope for this PR (will require changing the signature of every method in the library).

@andrewsg andrewsg merged commit fbe5d9c into master Oct 16, 2020
4 checks passed
@andrewsg andrewsg deleted the retry branch Oct 16, 2020
@NoahDragon
Copy link

@NoahDragon NoahDragon commented Nov 9, 2020

Just wonder if the blob.update() also has the retry logic with this feature. Is there any way to specify the retry number?

Update: followed the API core doc, using DEFAULT_RETRY as decorator seems works. But how can I confirm if the retry happened?

Update: there is a fix on this PR #340

cojenco added a commit to cojenco/python-storage that referenced this issue Oct 13, 2021
Retries errors for idempotent API calls by default. Some API calls are conditionally idempotent (only idempotent if etag, generation, if_generation_match, if_metageneration_match are specified); in those cases, retries are also conditional on the inclusion of that data in the call.
cojenco added a commit to cojenco/python-storage that referenced this issue Oct 13, 2021
Retries errors for idempotent API calls by default. Some API calls are conditionally idempotent (only idempotent if etag, generation, if_generation_match, if_metageneration_match are specified); in those cases, retries are also conditional on the inclusion of that data in the call.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

6 participants