Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConnectionResetError missing in retry #558

Closed
thehesiod opened this issue Aug 16, 2018 · 12 comments
Closed

ConnectionResetError missing in retry #558

thehesiod opened this issue Aug 16, 2018 · 12 comments
Assignees
Labels
type: question Request for information or clarification. Not an issue.

Comments

@thehesiod
Copy link
Contributor

Here's a callstack we just got and it wasn't retried:

File "/pyenv/lib/python3.6/site-packages/googleapiclient/http.py", line 841, in execute
    method=str(self.method), body=self.body, headers=self.headers)
File "/pyenv/lib/python3.6/site-packages/googleapiclient/http.py", line 165, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
File "/pyenv/lib/python3.6/site-packages/oauth2client/transport.py", line 175, in new_request
    redirections, connection_type)
File "/pyenv/lib/python3.6/site-packages/oauth2client/transport.py", line 282, in request
    connection_type=connection_type)
File "/pyenv/lib/python3.6/site-packages/httplib2/__init__.py", line 1514, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/pyenv/lib/python3.6/site-packages/httplib2/__init__.py", line 1264, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/pyenv/lib/python3.6/site-packages/httplib2/__init__.py", line 1217, in _conn_request
    response = conn.getresponse()
File "/pyenv/lib/python3.6/site-packages/aws_xray_sdk/core/recorder.py", line 374, in record_subsegment
    return_value = wrapped(*args, **kwargs)
File "/usr/local/lib/python3.6/http/client.py", line 1331, in getresponse
    response.begin()
File "/usr/local/lib/python3.6/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
File "/usr/local/lib/python3.6/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/local/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
File "/usr/local/lib/python3.6/ssl.py", line 1009, in recv_into
    return self.read(nbytes, buffer)
File "/usr/local/lib/python3.6/ssl.py", line 871, in read
    return self._sslobj.read(len, buffer)
File "/usr/local/lib/python3.6/ssl.py", line 631, in read
    v = self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

looks like a python3 support issue, this case is covered in py2 by socket.error. In my implementation I've centralized this logic here: https://github.com/thehesiod/google-api-python-client/blob/thehesiod/batch-retries/googleapiclient/http.py#L143

@JustinBeckwith JustinBeckwith added triage me I really want to be triaged. 🚨 This issue needs some love. labels Aug 17, 2018
@thehesiod
Copy link
Contributor Author

probably should bring this into line with what the cloudsdk does: http://testcompany.info/google-cloud-sdk/lib/third_party/apitools/base/py/http_wrapper.py at the same time handle: #563

@yan-hic
Copy link

yan-hic commented Nov 11, 2018

Is there any traction on this ?
ConnectionResetError has become very common for those using the api in Google Cloud Functions.

We raised this with Google Support and their answer was to implement a exponential backup.
As per @thehesiod , adding the error as retriable would be an easy fix.

@ppawiggers
Copy link

ppawiggers commented Nov 19, 2018

+1, same issue here, often happens on Cloud Functions. Implementing exponential backoff on ConnectionResetError doesn't solve it entirely for me.

@eckardt
Copy link

eckardt commented Nov 27, 2018

I'm having the same issue when using the google-cloud-bigquery client running within a cloud function in us-east1.

@thehesiod It looks like you implemented a fix for this in your private fork. There are also some unrelated changes in that branch. Do you plan to extract the fix for this issue and open a separate pull request here? That should make it easier for someone to review it and getting it merged. Otherwise let me know, I might open a PR.

Edit: Nevermind, google-cloud-bigquery when running within cloud functions uses requests with urllib3 instead of the googleapiclient/http.py from your stacktrace. That's a different story.

@thehesiod
Copy link
Contributor Author

thehesiod commented Nov 27, 2018

@eckardt <rant> the google APIs are IMHO a mess ;) have a few issues. Here they switched to google-auth, however that doesn't support token storage so depending on your impl you may do a lot of auth calls. Also beware of batched API calls as they didn't add retries to that (nor cared for my fixes as technically this is just in "support" mode). Also there are some weird auth errors (saying you're unauth'd when you are, fixed with retry, they didn't want my fix for either. In the new cloud functions they switched to requests (googleapis/google-cloud-python#3674) as you've noticed, but there were a lot of fundamental bugs (googleapis/google-cloud-python#4274, perhaps like you're finding) so I'm still wary of using the new version. For now we're on 0.26.0 of google-cloud-pubsub and related sub-modules. With my branch I'm able to run for ~10 days before a SSL leak in core python executable causes us to restart. </rant>

update: For httplib you'll also need: httplib2/httplib2#111, which isn't released yet

@theacodes
Copy link
Contributor

Hey @thehesiod, we hear your frustration and we're working hard to improve all of these libraries, but please keep comment this issue tracker on topic and respectful.

Also beware of batched API calls as they didn't add retries to that (nor cared for my fixes as technically this is just in "support" mode)

Just want to add some context here- it's risky for us to do this as we don't want to introduce bugs for other users, and our APIs are largely moving away from API-level batching towards protocol-level batching (gRPC). We could definitely consider something like this for a 2.0, if that happens (as we're considering switching this library over to Requests as part of a major version change).

In the new cloud functions they switched to requests (googleapis/google-cloud-python#3674) as you've noticed, but there were a lot of fundamental bugs (googleapis/google-cloud-python#4274, perhaps like you're finding)

The bugs in Pub/Sub are entirely unrelated to Requests and were due to the fact that the Pub/Sub library was under heavy active development. We are close to our 1.0.0 milestone on that, so I encourage you to try it again and give us feedback.

@sduskis sduskis added type: question Request for information or clarification. Not an issue. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Dec 7, 2018
@thehesiod
Copy link
Contributor Author

thehesiod commented Mar 8, 2019

fyi another issue I've found is that discovery is not retried on socket.timeout. We're seeing this call periodically fail. The reliance on http2lib means that anytime socket.timeout/socket.gaierror/unhandled socket.error means the client (google-api-python-client) needs to implement the retry. Eventually I'll add fixes for these in the batch-retries branch and our httplib2 branch

update: fixed in thehesiod-forks@c7f43e4

@lucidsushi
Copy link

Also getting a ton of this error... going to assume the easiest way is to exponential back off on both ConnectionResetError and errors.HttpError (had some issue with this one too) ?

@mauliksoneji
Copy link

mauliksoneji commented Sep 30, 2019

Hi everyone, do we have any update on this? We are currently using airflow which uses google-api-python-client library to check status of dataproc job. Since this code is from airflow, we cannot implement exponential backoff.

This issue is a recurring issue and happens intermittently.

@lucidsushi
Copy link

Hi everyone, do we have any update on this? We are currently using airflow which uses google-api-python-client library to check status of dataproc job. Since this code is from airflow, we cannot implement exponential backoff.

This issue is a recurring issue and happens intermittently.

can you link the code since it's in the airflow repo?

@mauliksoneji
Copy link

mauliksoneji commented Oct 3, 2019

This is the line where while querying the dataproc job status, it fails: https://github.com/apache/airflow/blob/1.10.4/airflow/contrib/hooks/gcp_dataproc_hook.py#L123

Whole stacktrace:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 921, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/operators/dataproc_operator.py", line 1139, in execute
    super(DataProcPySparkOperator, self).execute(context)
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/operators/dataproc_operator.py", line 707, in execute
    self.hook.submit(self.hook.project_id, self.job, self.region, self.job_error_states)
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/hooks/gcp_dataproc_hook.py", line 312, in submit
    if not submitted.wait_for_done():
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/hooks/gcp_dataproc_hook.py", line 124, in wait_for_done
    jobId=self.job_id).execute(num_retries=self.num_retries)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/http.py", line 851, in execute
    method=str(self.method), body=self.body, headers=self.headers)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/http.py", line 165, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/google_auth_httplib2.py", line 187, in request
    self._request, method, uri, request_headers)
  File "/usr/local/lib/python3.7/site-packages/google/auth/credentials.py", line 122, in before_request
    self.refresh(request)
  File "/usr/local/lib/python3.7/site-packages/google/oauth2/service_account.py", line 322, in refresh
    request, self._token_uri, assertion)
  File "/usr/local/lib/python3.7/site-packages/google/oauth2/_client.py", line 145, in jwt_grant
    response_data = _token_endpoint_request(request, token_uri, body)
  File "/usr/local/lib/python3.7/site-packages/google/oauth2/_client.py", line 106, in _token_endpoint_request
    method='POST', url=token_uri, headers=headers, body=body)
  File "/usr/local/lib/python3.7/site-packages/google_auth_httplib2.py", line 116, in __call__
    url, method=method, body=body, headers=headers, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/httplib2/__init__.py", line 1314, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/usr/local/lib/python3.7/site-packages/httplib2/__init__.py", line 1064, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/local/lib/python3.7/site-packages/httplib2/__init__.py", line 1017, in _conn_request
    response = conn.getresponse()
  File "/usr/local/lib/python3.7/http/client.py", line 1336, in getresponse
    response.begin()
  File "/usr/local/lib/python3.7/http/client.py", line 306, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.7/http/client.py", line 267, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.7/ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/lib/python3.7/ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

damgad added a commit to damgad/google-api-python-client that referenced this issue Feb 7, 2020
busunkim96 added a commit that referenced this issue Mar 23, 2020
This commit fixes issue #558

Co-authored-by: Bu Sun Kim <8822365+busunkim96@users.noreply.github.com>
@fhinkel
Copy link

fhinkel commented Dec 7, 2020

Greetings, we're closing this. Looks like the issue got resolved. Please let us know if the issue needs to be reopened.

@fhinkel fhinkel closed this as completed Dec 7, 2020
@fhinkel fhinkel self-assigned this Dec 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

10 participants