Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding `num_retries` in `discovery.build()`? #1049

Closed
yohplala opened this issue Sep 26, 2020 · 2 comments
Closed

Understanding `num_retries` in `discovery.build()`? #1049

yohplala opened this issue Sep 26, 2020 · 2 comments

Comments

@yohplala
Copy link

@yohplala yohplala commented Sep 26, 2020

Hi @DFrenkel @busunkim96,

CONTEXT
Recently, I got an HttpError 502 when running googleapiclient.discovery.build().

  File "/usr/local/lib/python3.8/dist-packages/my_app-0.3.1-py3.8.egg/my_app/data/gd.py", line 58, in __init__
    self.drive = googleapiclient.discovery.build('drive', 'v3', http=auth_http)
  File "/usr/local/lib/python3.8/dist-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/googleapiclient/discovery.py", line 292, in build
    raise e
  File "/usr/local/lib/python3.8/dist-packages/googleapiclient/discovery.py", line 268, in build
    content = _retrieve_discovery_doc(
  File "/usr/local/lib/python3.8/dist-packages/googleapiclient/discovery.py", line 365, in _retrieve_discovery_doc
    resp, content = req.execute(num_retries=num_retries)
  File "/usr/local/lib/python3.8/dist-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/googleapiclient/http.py", line 907, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 502 when requesting https://www.googleapis.com/discovery/v1/apis/drive/v3/rest returned "Bad Gateway">

I could read about this error from this SO thread.
Quest.
Is anyone else getting a 502 Bad Gateway error when trying to access https://www.googleapis.com/discovery/v1/apis/drive/v2/rest
Ans.
It appears to have started working again after being down for ~ 1 hr. Nothing needed to be done.

Additionally, Google doc suggests to address 5xx HttpError with a retry strategy.

Heading in this direction, I finally noticed that a num_retries made its apparition in build a few weeks ago thanks to you @DFrenkel 👍 (PR)

So my trouble seems like easy to solve with this parameter. Nonetheless I have 2 questions.

QUESTION 1: Setting an appropriate num_retries value

  • What is the max time discovery can be down? Is this 1hr true? This seems to me like a large time, but then why not. I am not an expert on this topic.
  • What would be the appropriate num_retries from this max time?
    I read the formula used to assess the sleep time between 2 retries:
    sleep_time = rand() * 2 ** retry_num

    random() giving a number between 0 and 1, I considered it being equal to 0.5. I could then see that with 12 retries, I could have a process retrying for about 1hr 10 minutes. Do you consider this correct?

QUESTION 2: num_retries inheritance?
Does the subsequent num_retries in execute() inherit as default value the value set for num_retries in build()?

Thanks a lot for your help and support.
Have a good day,
Bests,

PS: num_retries does not appear documented yet in source doc, which is where I 1st looked to see how I could implement it.
I most notably found this other thread, suggesting to subclass HttpRequest, with default num_retries. Seemed like another level of work to conduct, and very happy you could implement it directly in build()! :) Thanks!

@DFrenkel
Copy link
Contributor

@DFrenkel DFrenkel commented Oct 2, 2020

@yohplala, thank you for your questions.

In regards to guidance related to appropriate number of retries: Guidance of setting a number of retries can vary, but we suggest that using large number of retries should not be considered as a mechanism to avoid longer endpoint outages. Retries with exponential backoff (with jitter) is well-known mechanism in the industry, and it is positioned as a correct response of a well-behaved client to occasional, short-lived throttling (503) or gateway timeouts (504). In fact, we should probably reconsider making error 502 retriable as it signifies an invalid response from the upstream server, therefore making 502 errors not temporal or transient in nature.
Setting a number of retries too high (above 2 or 3) can cause your own process to not use computing resources efficiently due to exponentially increasing waiting/sleeping times, and therefore may cause a cascading failure. It may be better to fail faster, after one or two retries, and have an alternative mechanism to handle an apparent outage, for example, by retrying the entire workflow or by returning an error to your customer that your customer’s client can handle with its own retries.

Answering your second question: yes, the number of retries is passed to downstream methods. If you take a look at this line, num_retries value is passed to the actual worker method responsible for fetching discovery document from the http endpoint, and then also passed to the http wrapped via this line.

Finally, thank you for reporting a miss in documentation update. We will refresh the docs to properly document usage of num_retries.

DFrenkel added a commit to DFrenkel/google-api-python-client that referenced this issue Oct 2, 2020
Updates for both discovery docs and epydoc API Documentation

Fixes: googleapis#1049
@yohplala
Copy link
Author

@yohplala yohplala commented Oct 5, 2020

Thanks @DFrenkel for your very complete answer!
All clear for me!
Thanks again!

@yohplala yohplala closed this Oct 5, 2020
gcf-merge-on-green bot pushed a commit that referenced this issue Oct 6, 2020
Updates for both discovery docs and epydoc API Documentation

Fixes: #1049
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants
You can’t perform that action at this time.