Various Intermittent Request Errors #3845

lukas-gitl · 2017-02-02T00:57:28Z

When using requests.post (have not tried other methods) I get the following errors intermittently.

Errors

('Connection aborted.', BadStatusLine("''",))
('Connection aborted.', error(104, 'Connection reset by peer'))
("bad handshake: SysCallError(104, 'ECONNRESET')",)
EOF occurred in violation of protocol (_ssl.c:661)
HTTPSConnectionPool(host='...', port=443): Max retries exceeded with url: ... (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x...>: Failed to establish a new connection: [Errno 110] Operation timed out',))
Sometimes I just get a 502 or 504 back, but that might be a remote issues

The amount this happens differs, but at least a few times every hour (we are sending a few thousand logs per hour). I wasn't able to reproduce these errors locally or reliably.

I have experienced this will various targets hosts, namely the api provided by: loggly.com, logentries.com, logz.io. Since they all have the issue I don't think it is a server configuration problem.

This issue first appeared about a year ago when we set this up. However it seems to have gotten more problematic since we are sending more requests.

Setup
Running Python on Elastic Beanstalk inside a Single Container Docker Configuration behind an Elastic Load Balancer. The outgoing requests happen in various threads (not just in the main thread).

More information that might be useful:

EC2 Instance OS (cat /etc/*-release)

NAME="Amazon Linux AMI"
VERSION="2016.09"
ID="amzn"
ID_LIKE="rhel fedora"
VERSION_ID="2016.09"
PRETTY_NAME="Amazon Linux AMI 2016.09"
ANSI_COLOR="0;33"
CPE_NAME="cpe:/o:amazon:linux:2016.09:ga"
HOME_URL="http://aws.amazon.com/amazon-linux-ami/"
Amazon Linux AMI release 2016.09

Docker Version (docker --version)

Docker version 1.12.6, build 7392c3b/1.12.6

Docker Container OS (cat /etc/*-release)

3.4.6
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.4.6
PRETTY_NAME="Alpine Linux v3.4"
HOME_URL="http://alpinelinux.org"
BUG_REPORT_URL="http://bugs.alpinelinux.org"

Python 2.7.13
pip freeze --local results:

alembic==0.8.10
aniso8601==1.2.0
apns==2.0.1
awsebcli==3.9.0
backports.ssl-match-hostname==3.5.0.1
blessed==1.9.5
boto==2.45.0
botocore==1.5.7
cement==2.8.2
cffi==1.9.1
click==6.7
colorama==0.3.7
contextlib2==0.5.4
coverage==4.3.4
cryptography==1.7.2
docker-py==1.7.2
dockerpty==0.4.1
docopt==0.4.0
docutils==0.13.1
enum34==1.1.6
Faker==0.7.7
Flask==0.12
Flask-Compress==1.4.0
Flask-Limiter==0.9.3
Flask-Migrate==2.0.3
Flask-RESTful==0.3.5
-e flask-restful-swagger==0.15.5
Flask-Script==2.0.5
Flask-SQLAlchemy==2.1
Flask-Testing==0.6.1
funcsigs==1.0.2
GeoAlchemy2==0.4.0
idna==2.2
ipaddress==1.0.18
itsdangerous==0.24
Jinja2==2.9.5
jmespath==0.9.1
limits==1.2.1
Mako==1.0.6
mandrill==1.0.57
MarkupSafe==0.23
mock==2.0.0
nose==1.3.7
pathspec==0.5.0
pbr==1.10.0
pep8==1.7.0
pexpect==4.2.1
psycopg2==2.6.2
ptyprocess==0.5.1
pyasn1==0.1.9
pybars3==0.9.2
pycparser==2.17
pycrypto==2.6.1
pymemcache==1.4.0
PyMeta3==0.5.1
pyOpenSSL==16.2.0
python-dateutil==2.6.0
python-editor==1.0.3
python-gcm==0.1.5
pytz==2016.10
PyYAML==3.12
requests==2.13.0
rollbar==0.13.10
semantic-version==2.5.0
six==1.10.0
SQLAlchemy==1.1.5
tabulate==0.7.5
uWSGI==2.0.14
vcrpy==1.10.5
wcwidth==0.1.7
websocket-client==0.40.0
Werkzeug==0.11.15
wrapt==1.10.8

Not working Solutions
I have found similar issues here, but was not able to fix it with any of the described solutions. I have tried a lot of different approaches, non of which seem to work. Would much appreciate advice and what I should try next.

Added a small sleep, since I thought this might be related to this GIL issue
Tried running requests in separate processes instead of a thread, since GIL issue implied this might help
Tried to use sessions per thread as to not open too many TCP connections, but no difference in error rate
Tried installing various libraries (pyopenssl, pyasn1, and ndg-httpsclient) as recommended here and here.
Tried injecting from requests.packages.urllib3.contrib import pyopenssl; pyopenssl.inject_into_urllib3() like described here.
switched from ubuntu to alpine docker container
updated everythign to the latest version (including ec2 and docker version)
Some other stuff that I can't remember at this point

What else should I try? Any idea what might be causing this? What tools should I use to debug this? I'm kind of at a loss here.

The text was updated successfully, but these errors were encountered:

Lukasa · 2017-02-02T08:34:42Z

So the first four errors all amount to "the connection was closed by the remote peer at a time we didn't expect". The fifth one is a standard connection timeout.

While it's possible that some of these reflect underlying problems (for example, a connection closed during a TLS handshake usually indicates that your TLS configuration is not supported by the server), in general all of these can be lumped together as what I'd call "network noise". Essentially, when using keep-alive connections it is possible for all kinds of wacky stuff to go on at the connection management layer.

The simplest way to handle this, and something I highly recommend, is to turn on retries. This can be done at the transport adapter level like this:

import requests
from requests.adapters import HTTPAdapter

s = requests.Session()
s.mount('http', HTTPAdapter(max_retries=3))
s.mount('https', HTTPAdapter(max_retries=3))

If you want even more control you can pass a urllib3 Retry object as shown in urllib3's documentation instead of the integer value.

However, without more specificity we can't help you with the individual problems. If you can get down to single reproducing cases that'd be fab.

lukas-gitl · 2017-02-06T21:38:06Z

~~@Lukasa If you want even more control you can pass a urllib3 Retry object as shown in urllib3's documentation instead of the integer value. - This does not seem to work. Is this broken?~~

The code doesn't seem to handle this correctly. Reference: https://github.com/kennethreitz/requests/blob/master/requests/adapters.py#L110

I'll try to set it on the adapter after initialisation with:

retries = Retry( total=gitl.app.config['MAX_REQUEST_SESSION_RETRIES'], backoff_factor=0.1, status_forcelist=[500, 502, 503, 504], ) adapter = HTTPAdapter() adapter.max_retries = retries self.session.mount('http', adapter) self.session.mount('https', adapter)

~~and see if that works...~~

It seems to be passes through correctly. However I still get the errors. Very odd...

Edit: Turns out POST requests are not automatically retried...

Lukasa · 2017-02-07T08:28:15Z

Indeed, this is because POST requests are not idempotent so it is not safe to automatically retry all POST requests. If you know POST requests are safe in this case, you can use urllib3's detailed Retry configuration object to allow retries to POST requests.

lukas-gitl · 2017-02-07T18:16:29Z

Yes, that makes perfect sense. This now seems to be working fine. I'm just wondering how I would urllib3 retry to also handle ("bad handshake: SysCallError(104, 'ECONNRESET')",). It seems to not currently retry on that error. Any idea?

Lukasa · 2017-02-07T18:32:39Z

Hrm, do you have the full trace back? I suspect that's a bug.

lukas-gitl · 2017-02-07T19:14:56Z

Kk, I'll try to get more information the next time it happens!

lukas-gitl · 2017-02-08T01:14:56Z

Here is the full stack trace:

Traceback (most recent call last):
  File "...", line ..., in ...
    r = self.session.post(..., data=...)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 535, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 497, in send
    raise SSLError(e, request=request)
SSLError: ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)

Lukasa · 2017-02-08T09:17:16Z

This is one of those moments where I'd have liked to see more data out of the stack trace. I swear, Python 3's chained exceptions only show up in my life when they aren't helpful, and never when they would be. 😉

However, I think this is a real bug. Looking at urllib3's code, it simply does not consider SSL errors to be worth retrying on. I think that instinct is wrong: SSL errors are almost certainly worth a retry attempt. I recommend you raise this as a bug over on the urllib3 repository, where we'll deal with it.

For when you raise the bug, please attach the following repro code. Specifically, if the following server is run:

import time
import socket

s = socket.socket()
s.bind(('', 4433))
s.listen(10)

while True:
    news, _ = s.accept()
    time.sleep(0.5)
    news.close()

Then this code will exhibit the issue:

import urllib3
http = urllib3.PoolManager()
http.request('GET', 'https://localhost:4433/', retries=5)

That code should fire off a MaxRetryError, but instead gets an SSLError.

lukas-gitl · 2017-02-15T21:19:18Z

For anyone reading this: Forking python processes doesn't seem to play nicely with requests. Ideally you have a single worker thread that is shared among all processes. This seems to have resolved all our errors here.

JDFagan · 2018-02-16T23:06:30Z

I've noticed a pattern of receiving requests.exceptions.ConnectionError: ('Connection aborted.', OSError("(104, 'ECONNRESET')",)) when just beyond 5 minutes since request initiated and I've verified that the server I'm talking to (IIS 8.5) has server side connection timeout set to 5 minutes. I have not been able to convince admin of this server to increase the IIS server's connection timeout from 5 to 10 or 15 minutes despite one of their REST APIs can return quite a large payload back which sometimes goes beyond the timeout threshold and when it does, I get this error. Here is full stack trace. Any ideas how I can resolve this in my case? Let me know if this seems different enough from this issue's thread and I'll open another issue:

Traceback (most recent call last):
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/urllib3/connectionpool.py", line 383, in _make_request
    httplib_response = conn.getresponse()
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/http/client.py", line 1331, in getresponse
    response.begin()
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py", line 285, in recv_into
    raise SocketError(str(e))
OSError: (104, 'ECONNRESET')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/urllib3/util/retry.py", line 357, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/urllib3/packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/urllib3/connectionpool.py", line 383, in _make_request
    httplib_response = conn.getresponse()
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/http/client.py", line 1331, in getresponse
    response.begin()
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py", line 285, in recv_into
    raise SocketError(str(e))
urllib3.exceptions.ProtocolError: ('Connection aborted.', OSError("(104, 'ECONNRESET')",))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "apollo/waypoint/etl/mri/cli.py", line 150, in <module>
    cli()
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "apollo/waypoint/etl/mri/cli.py", line 54, in etl
    etl.etl(route_set=route_set, routes=route, routes_params=route_params)
  File "/home/jdfagan/Repositories/Waypoint/apollo/apollo/waypoint/etl/mri/api_etl.py", line 95, in etl
    self.extract(route_set=route_set, routes=routes, routes_params=routes_params)
  File "/home/jdfagan/Repositories/Waypoint/apollo/apollo/waypoint/etl/mri/api_etl.py", line 171, in extract
    responses[extractor.name] = extractor.extract(params=params)
  File "/home/jdfagan/Repositories/Waypoint/apollo/apollo/waypoint/etl/mri/api_extractor.py", line 113, in extract
    responses = self.route.get_all(params=params)
  File "/home/jdfagan/Repositories/Waypoint/apollo/apollo/accounting/__init__.py", line 15, in wrapper
    result = f(self, *arg, **kw)
  File "/home/jdfagan/Repositories/Waypoint/apollo/apollo/accounting/mri/mri_api.py", line 432, in get_all
    response = self.get(params=params, stream=stream)
  File "/home/jdfagan/Repositories/Waypoint/apollo/apollo/accounting/__init__.py", line 15, in wrapper
    result = f(self, *arg, **kw)
  File "/home/jdfagan/Repositories/Waypoint/apollo/apollo/accounting/mri/mri_api.py", line 480, in get
    response = self.session.get(self.url, params=params, headers=self.headers, stream=stream)
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/requests/sessions.py", line 521, in get
    return self.request('GET', url, **kwargs)
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/home/jdfagan/.miniconda/apollo/lib/python3.6/site-packages/requests/adapters.py", line 490, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', OSError("(104, 'ECONNRESET')",))

sigmavirus24 · 2018-02-17T12:55:06Z

@JDFagan

Reviving issues closed for over a year because the exception looks vaguely similar isn't that helpful to anyone subscribed to the issue or the issue tracker
You have a completely different issue and there's literally nothing you can do to avoid ECONNRESET based on what you said. Basically you have to either convince the server to not take 5 minutes or get it to return data that returns in 5 minutes
Finally, you're asking for help on a defect tracker. Requests for help with requests belong on StackOverflow and this is well documented.

Lukasa closed this as completed Feb 2, 2017

lukas-gitl mentioned this issue Feb 8, 2017

Bug: No Retry for SSL Error urllib3/urllib3#1112

Closed

jairhenrique mentioned this issue Feb 26, 2018

try to fix ECONNRESET error luizalabs/correios#9

Merged

dalf mentioned this issue Mar 10, 2018

ECONNRESET searx/searx#1230

Closed

vsoch mentioned this issue Aug 29, 2018

Getting post error when uploading large image > 2G. singularityhub/sregistry#139

Closed

CastagnaIT mentioned this issue Jan 22, 2021

Attempt to better handle HTTP ReadTimeout/ConnectionError errors CastagnaIT/plugin.video.netflix#1046

Merged

9 tasks

github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various Intermittent Request Errors #3845

Various Intermittent Request Errors #3845

lukas-gitl commented Feb 2, 2017

Lukasa commented Feb 2, 2017

lukas-gitl commented Feb 6, 2017 •

edited

Loading

Lukasa commented Feb 7, 2017

lukas-gitl commented Feb 7, 2017

Lukasa commented Feb 7, 2017

lukas-gitl commented Feb 7, 2017

lukas-gitl commented Feb 8, 2017

Lukasa commented Feb 8, 2017

lukas-gitl commented Feb 15, 2017

JDFagan commented Feb 16, 2018 •

edited

Loading

sigmavirus24 commented Feb 17, 2018

Various Intermittent Request Errors #3845

Various Intermittent Request Errors #3845

Comments

lukas-gitl commented Feb 2, 2017

Lukasa commented Feb 2, 2017

lukas-gitl commented Feb 6, 2017 • edited Loading

Lukasa commented Feb 7, 2017

lukas-gitl commented Feb 7, 2017

Lukasa commented Feb 7, 2017

lukas-gitl commented Feb 7, 2017

lukas-gitl commented Feb 8, 2017

Lukasa commented Feb 8, 2017

lukas-gitl commented Feb 15, 2017

JDFagan commented Feb 16, 2018 • edited Loading

sigmavirus24 commented Feb 17, 2018

lukas-gitl commented Feb 6, 2017 •

edited

Loading

JDFagan commented Feb 16, 2018 •

edited

Loading