Connection reset by peer when sending POST #4937

rdgoite · 2019-01-14T19:08:58Z

Client code that uses Requests module to send data via HTTP POST encounters a ConnectionResetError. The entire operation (composed of multiple POST requests to a small set of service endpoints) can sometimes succeed, but most of the time, it fails with this error.

Expected Result

Operation succeeds (or fails) without connection issues.

Actual Result

The operation fails with ConnectionResetError.

Additional Information

It's a little difficult to provide basic reproduction for the issue as we're running into this problem with a test payload that's specific to our system. The server (peer) is a Java application that's configured to terminate/reset connection after a given time of no use (idle). The client sends multiple one-off POST requests, but it seems like internally, the connections are being reused similar to issue #4506, and the operation eventually runs into a connection that's already been reset and raises the error. However, unlike #4506, we are not using Sessions.

Here are some tracebacks that could hopefully describe the problem better:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 384, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.7/http/client.py", line 1321, in getresponse
    response.begin()
  File "/usr/local/lib/python3.7/http/client.py", line 296, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.7/http/client.py", line 257, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 367, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 384, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.7/http/client.py", line 1321, in getresponse
    response.begin()
  File "/usr/local/lib/python3.7/http/client.py", line 296, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.7/http/client.py", line 257, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

...

Traceback (most recent call last):
  ...
  File "../client-code.py", line 322, in createFile
    r = requests.post(fileSubmissionsUrl, data=json.dumps(fileToCreateObject), headers=self.headers)
  File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 116, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

System Information

$ python -m requests.help

{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": "2.3.1"
  },
  "idna": {
    "version": "2.6"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.7.1"
  },
  "platform": {
    "release": "4.14.42-61.37.amzn2.x86_64",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "1000211f",
    "version": "18.0.0"
  },
  "requests": {
    "version": "2.20.1"
  },
  "system_ssl": {
    "version": "20000000"
  },
  "urllib3": {
    "version": "1.24.1"
  },
  "using_pyopenssl": true
}

The text was updated successfully, but these errors were encountered:

evahteev · 2019-01-24T19:37:26Z

+1 same issue here

OneDrawer · 2019-02-18T16:02:37Z

test

lucas03 · 2019-05-12T22:27:38Z

We are also experiencing a lot of errors (14%) on some servers. POST call is not retried by requests as it's not idempotent.

After installing requests[security], error changed to ('Connection aborted.', OSError("(104, 'ECONNRESET')",)).

lucas03 · 2019-05-24T23:44:50Z

I can reproduce single ECONNRESET error. If there is no connection in 10 minutes, next first connection will raise ECONNRESET.

lucas03 · 2019-07-03T15:08:59Z

I've got more info here.
here is message from our internal gitlab:

I have an idea I wanna put to test:

there are 10 connections in connection pool per server/worker.

We have slack notifier that has like 10 requests per minute. We have 8 workers, which I assume share same session (connection pool).

These connections are randomly assigned to workers. If worker receives task, it will pick connection from pool and use it. If it receives two tasks at the same time, it will pick two connections and use it , returning them to pool. So something like this can happen for single worker:

minute: 2 simultaneous connections used, returned to pool.

minute: max 1 simultaneous connection used, returned to pool.

minute: max 1 simultaneous connection used, returned to pool.

minute: max 1 simultaneous connection used, returned to pool.

minute: max 1 simultaneous connection used, returned to pool.
server closed second connection, as it was idle for 5 minutes..

minute: 2 simultaneous connection used. When second connection is picked, it's already closed and we receive error.

When I moved initialization of session to method that is doing requests, connection errors disappeared. though that will not use session and will consume more resources.

Shouldn't requests lib be aware of these closed connections and retry them automatically?

jeicoo · 2019-07-23T02:39:04Z

Been experiencing this issue as well. In my case, session is being used and we are doing the actual request inside a coroutine (loop.run_in_executor)

shankari · 2019-08-21T02:16:38Z

I ran into this as well, and I can provide a way to reproduce 😄

To reproduce

Run any of the notebooks in
https://github.com/e-mission/e-mission-eval-public-data
~~The notebooks can be launched using binder:~~
Update: The issue does not occur on notebooks launched via binder, only on my laptop.
Update 2: The issue is intermittent even on notebooks launched on my laptop. When it works, it works consistently. When it fails, it fails for hours at a time.
Update 3: The issue does not occur on binder, but occurs 90% of the time on my laptop

Current code

The current code is in https://github.com/e-mission/e-mission-eval-public-data/blob/master/emeval/input/spec_details.py#L24 and uses the quickstart version without any sessions

        response = requests.post(self.DATASTORE_URL+"/datastreams/find_entries/timestamp", headers=close_headers, json=post_msg, stream=False)

Workarounds tried

I have tried multiple workarounds, none of which have worked:

setting stream=False (as you can see above) DOES NOT WORK
using with requests.post(...) as response DOES NOT WORK

manually closing the response by adding response.close DOES NOT WORK

     print("Found %d entries" % len(ret_list))
     response.close()
     return ret_list

tried using a session with keep_alive = False DOES NOT WORK

     s = requests.Session()
     s.keep_alive = False
     response = s.post(self.DATASTORE_URL+"/datastreams/find_entries/timestamp", headers=close_headers, json=post_msg, stream=False)

tried using a session with retries, DOES NOT WORK

     s = requests.Session()
     adapter = requests.adapters.HTTPAdapter(max_retries=2)
     s.mount('http://', adapter)
     response = s.post(self.DATASTORE_URL+"/datastreams/find_entries/timestamp", headers=close_headers, json=post_msg, stream=False)

tried using a session with max connections = 1 in an attempt to disable pooling DOES NOT WORK

     s = requests.Session()
     adapter = requests.adapters.HTTPAdapter(pool_connections=1, pool_maxsize=1,
         max_retries=5, pool_block=True)
     s.mount('http://', adapter)
     response = s.post(self.DATASTORE_URL+"/datastreams/find_entries/timestamp", json=post_msg)

tried manually closing the session and the adapter DOES NOT WORK

     print("Found %d entries" % len(ret_list))
     adapter.close()
     s.close()
     return ret_list

I haven't tried using sessions, but I am not sure why I need to. If I use the quickstart version and ensure that the response is closed, shouldn't the pooling be effectively disabled (as hinted at #4937 (comment))

At this point, I am tempted to give up on requests and drop down to urllib directly

shankari · 2019-08-21T03:01:46Z

Also, I control both the client and the server, and I can confirm that the request that fails never makes it to the server.

On the client:

About to retrieve data for ucb-sdb-ios-3 from 1563764533.214622 -> 1563788061.999584
About to retrieve messages using {'user': 'ucb-sdb-ios-3', 'key_list': ['background/motion_activity'], 'start_time': 1563764533.214622, 'end_time': 1563788061.999584}
response = <Response [200]>
Found 11 entries
Retrieved 11 entries with timestamps [1563764540.2866454, 1563764653.7736874, 1563764657.5883818, 1563768002.560504, 1563771602.48612, 1563775203.1804562, 1563778802.582345, 1563782402.60982, 1563786002.906196, 1563788054.78684]...


About to retrieve data for ucb-sdb-ios-3 from 1563788059.5701566 -> 1563788061.999584
About to retrieve messages using {'user': 'ucb-sdb-ios-3', 'key_list': ['background/motion_activity'], 'start_time': 1563788059.5701566, 'end_time': 1563788061.999584}

---------------------------------------------------------------------------
ConnectionResetError                      Traceback (most recent call last)

On the server, last call to /datastreams/find_entries is as below. Note that this is from the 11 entry successful call. The call generating the ConnectionResetError is not even making it to the server.

2019-08-20 19:58:20,794:DEBUG:140171047577344:START POST /datastreams/find_entries/timestamp
2019-08-20 19:58:20,794:DEBUG:140171047577344:methodName = skip, returning <class 'emission.net.auth.skip.SkipMethod'>
2019-08-20 19:58:20,794:DEBUG:140171047577344:Using the skip method to verify id token ucb-sdb-ios-3 of length 13
2019-08-20 19:58:20,795:DEBUG:140171047577344:retUUID = 7ed80490-6853-433d9d20-838fe4d3d71b
2019-08-20 19:58:20,796:DEBUG:140171047577344:curr_query = {'user_id': UUID('7ed80490-6853-433d-9d20-838fe4d3d71b'), '$or': [{'metadata.key': 'background/motion_activity'}], 'metadata.write_ts': {'$lte': 1563788061.999584, '$gte': 1563764533.214622}}, sort_key = metadata.write_ts
2019-08-20 19:58:20,796:DEBUG:140171047577344:orig_ts_db_keys = ['background/motion_activity'], analysis_ts_db_keys = []
2019-08-20 19:58:20,798:DEBUG:140171047577344:finished querying values for ['background/motion_activity'], count = 0
2019-08-20 19:58:20,798:DEBUG:140171047577344:finished querying values for [], count = 0
2019-08-20 19:58:20,800:DEBUG:140171047577344:orig_ts_db_matches = 0, analysis_ts_db_matches = 0
2019-08-20 19:58:20,802:DEBUG:140171047577344:Found 11 messages in response to query {'user_id': UUID('7ed80490-6853-433d-9d20-838fe4d3d71b'), '$or': [{'metadata.key': 'background/motion_activity'}], 'metadata.write_ts': {'$lte': 1563788061.999584, '$gte': 1563764533.214622}}
2019-08-20 19:58:20,807:DEBUG:140171047577344:END POST /datastreams/find_entries/timestamp 7ed80490-6853-433d-9d20-838fe4d3d71b 0.01349186897277832

shankari · 2019-08-21T03:56:50Z

ok, so after poking around a bit more, I actually see an error on the server

  File ".../lib/python3.6/_pyio.py", line 1001, in read
    return self._read_unlocked(size)
  File ".../lib/python3.6/_pyio.py", line 1041, in _read_unlocked
    chunk = self.raw.read(wanted)
  File ".../lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer

This seems to imply that the client is dropping the connection to the server. But I am no longer sure that dropping directly to urllib is going to solve the problem.

shankari · 2019-08-21T04:49:47Z

A workaround that did work was to simply catch the error and retry. I am still not sure whether the client or the server is dropping the connection, and why they are doing so, but hopefully this workaround helps somebody else.

Note that both the request creation (requests.post) and the response read/parse (response.json()) are within the try block.

        try:
            response = requests.post(self.DATASTORE_URL+"/datastreams/find_entries/timestamp", json=post_msg)
            print("response = %s" % response)
            response.raise_for_status()
            ret_list = response.json()["phone_data"]
        except Exception as e:
            # Hacky copy-paste of original code, TODO refactor into separate function
            print("Got %s error %s, retrying" % (type(e).__name__, e))
            time.sleep(10)
            response = requests.post(self.DATASTORE_URL+"/datastreams/find_entries/timestamp", json=post_msg)
            print("response = %s" % response)
            response.raise_for_status()
            ret_list = response.json()["phone_data"]

arondeparon · 2019-08-27T08:28:08Z

Experiencing this same issue as well. It seems to be happening 50% of the time.

isaiah-lyra · 2019-09-09T18:57:32Z

+1, experiencing ~35% of the time and I am using urllib3 directly.

loic-bellinger · 2019-11-13T15:14:07Z

Experiencing this issue as well. Made an exponential backoff to bypass it but, that seems somewhat hacky.

azin634 · 2019-11-16T00:41:30Z

I'm having the same issue. What version is it happening on? Im using 2.21.0

Hackeron · 2019-11-21T05:54:11Z

@LouisBellinger: Can you eleborate your exponential backoff mechanism to bypass the issue?
Struggling to solve this issue while doing ETL. Loosing 5-10% of the files in movement.

vlade-rc · 2020-04-15T23:16:18Z

Hi to everyone, Any update to this issue? I'm having the same problem. Do you known any hotfix?

eliuha · 2020-06-04T07:10:47Z

Is there a workaround?

shankari · 2020-06-05T15:52:09Z

for me, a single retry has worked pretty reliably:
https://github.com/MobilityNet/mobilitynet-analysis-scripts/blob/master/emeval/input/spec_details.py#L18

For exponential backoff, you would just catch the exceptions multiple times, sleeping longer every time.

hedza06 · 2021-01-14T14:21:19Z

Any solutions?

jakermx · 2021-02-04T13:59:06Z

Hello... The Persistent Connectionts features are not handling the client side when it requested a Connection Keep Alive, because at lease for Linux based OS the TCP Keep Alive Feature is not enabled, or when Enabled it is set to 2 start working after 2 hours, so I found that the problem iis because the OS is not handling the flo control correctly you can fix it in some ways...

For Python in Linux

import requests
import socket
from urllib3.connection import HTTPConnection
HTTPConnection.default_socket_options = HTTPConnection.default_socket_options + [
(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) #Enables the feature
,(socket.SOL_TCP, socket.TCP_KEEPIDLE, 45) #Overrides the time when the stack willl start sending KeppAlives after no data received on a Persistent Connection
,(socket.SOL_TCP, socket.TCP_KEEPINTVL, 10) #Defines how often thoe KA will be sent between them
,(socket.SOL_TCP, socket.TCP_KEEPCNT, 6) #How many attemps will your code try if the server goes down before droping the connection.

now you can Instanciate your Sessions or Requests..

Without TCP Keep Alive

With TCP Keep Alive

If the client reach the unhandled session timeout by the server...now the server will close the connection , releasing the connection back to the pool insteadd of dropoing it....

There are some other ways to handle it.. depending on your OS and in your needs...
I hope it solve some of your issues

jakermx · 2021-02-04T14:06:00Z

I found this but I dont how to handle setsockopt and requests... maybe you know. and will fix ti for all OSs

https://gist.github.com/shi-yan/611cc0221eeff1644797

enote-kane · 2021-03-02T13:14:15Z

I was also having this issue (and not just on POST requests) while running stress tests of services using locust, which uses requests.

The very good investigation and fix provided by @jakermx really did it for me as well:

import socket
from urllib3.connection import HTTPConnection
# ...
HTTPConnection.default_socket_options = (
    HTTPConnection.default_socket_options + [
        (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1),
        (socket.SOL_TCP, socket.TCP_KEEPIDLE, 45),
        (socket.SOL_TCP, socket.TCP_KEEPINTVL, 10),
        (socket.SOL_TCP, socket.TCP_KEEPCNT, 6)
    ]
)

Insert that code just somewhere that gets loaded by your application/script and it will change the urllib3 default socket options globally.

This isn't a bug in requests nor urllib3, since both are offering solutions:

requests: keep-alive (relies on urllib3)
urllib3: HTTPConnections

Also, urllib3 makes it pretty clear that especially in this case, there is a simple detection issue for automatic handling:

detection of dropped persistent connection fails if server resets connection urllib3/urllib3#944 (comment)

jakermx · 2021-03-10T15:03:47Z

By defautl, OS TCP/IP stack defines that the app will stack sending TCP Keep-Alives packes after 2 hours of idle activity, this will only work on OOTB Web Servers Installations, but, if your Content provider set lower limits, the TCP L5/L4 Connection willbe dropped off before 2 hours....and when your app try to used the "persistent" connection it will fail due Connection Reset by RemotePeer...caus it is not a valid connection, setting the tcp options will avoind someerrorsm but not all....not matter is you set the connection keep alive header at L7-L4....if the OS dont send KA packets at TCP layer....the only header that will avoid this conditions is Connection:Close

this is a app, http friendly API, url m api and IETFissue

elv1z · 2021-03-11T08:07:24Z

@jakermx you are my hero. Thank you for solution.
I killed almost 2 days trying to fix it.

jakermx · 2021-03-11T10:20:36Z

there is no heros pal,,,,just a great community workforce

nateprewitt · 2021-08-28T04:03:32Z

There are several issues that have been tacked onto the original problem here because people are conflating multiple exceptions. I believe @jakermx has explained what is happening at the TCP level and the urllib3 issue #944 discussing that we're not able to distinguish between a dropped connection and RST has been linked. I'm going to resolve this as there isn't more information to provide at this point.

gigi-at-zymergen mentioned this issue Oct 16, 2020

Allow configuring socket_options on gateway proxy listeners for downstream TCP keepalive solo-io/gloo#3758

Closed

CastagnaIT mentioned this issue Jan 22, 2021

Attempt to better handle HTTP ReadTimeout/ConnectionError errors CastagnaIT/plugin.video.netflix#1046

Merged

9 tasks

shankari mentioned this issue Mar 17, 2021

Figure out the JSON file structure to export the data and make it "cloud optimized" MobilityNet/mobilitynet.github.io#18

Open

mfcodeworks mentioned this issue Mar 18, 2021

Connection reset by peer edeng23/binance-trade-bot#212

Closed

verkaufer mentioned this issue May 7, 2021

Cannot connect to 1Password Connect container 1Password/ansible-onepasswordconnect-collection#13

Closed

nateprewitt closed this as completed Aug 28, 2021

sliwinski-milosz mentioned this issue Sep 13, 2021

Sentry is not logging ERROR logs by default getsentry/sentry-python#1191

Closed

sliwinski-milosz mentioned this issue Sep 20, 2021

From time to time "Connection reset by peer" error while sending event to the server getsentry/sentry-python#1198

Open

github-actions bot locked as resolved and limited conversation to collaborators Nov 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection reset by peer when sending POST #4937

Connection reset by peer when sending POST #4937

rdgoite commented Jan 14, 2019

evahteev commented Jan 24, 2019

OneDrawer commented Feb 18, 2019

lucas03 commented May 12, 2019 •

edited

Loading

lucas03 commented May 24, 2019

lucas03 commented Jul 3, 2019

jeicoo commented Jul 23, 2019

shankari commented Aug 21, 2019 •

edited

Loading

shankari commented Aug 21, 2019

shankari commented Aug 21, 2019

shankari commented Aug 21, 2019 •

edited

Loading

arondeparon commented Aug 27, 2019

isaiah-lyra commented Sep 9, 2019 •

edited

Loading

loic-bellinger commented Nov 13, 2019

azin634 commented Nov 16, 2019

Hackeron commented Nov 21, 2019

vlade-rc commented Apr 15, 2020

eliuha commented Jun 4, 2020

shankari commented Jun 5, 2020

hedza06 commented Jan 14, 2021

jakermx commented Feb 4, 2021 •

edited

Loading

jakermx commented Feb 4, 2021

enote-kane commented Mar 2, 2021 •

edited

Loading

jakermx commented Mar 10, 2021

elv1z commented Mar 11, 2021

jakermx commented Mar 11, 2021

nateprewitt commented Aug 28, 2021

Connection reset by peer when sending POST #4937

Connection reset by peer when sending POST #4937

Comments

rdgoite commented Jan 14, 2019

Expected Result

Actual Result

Additional Information

System Information

evahteev commented Jan 24, 2019

OneDrawer commented Feb 18, 2019

lucas03 commented May 12, 2019 • edited Loading

lucas03 commented May 24, 2019

lucas03 commented Jul 3, 2019

jeicoo commented Jul 23, 2019

shankari commented Aug 21, 2019 • edited Loading

To reproduce

Current code

Workarounds tried

shankari commented Aug 21, 2019

shankari commented Aug 21, 2019

shankari commented Aug 21, 2019 • edited Loading

arondeparon commented Aug 27, 2019

isaiah-lyra commented Sep 9, 2019 • edited Loading

loic-bellinger commented Nov 13, 2019

azin634 commented Nov 16, 2019

Hackeron commented Nov 21, 2019

vlade-rc commented Apr 15, 2020

eliuha commented Jun 4, 2020

shankari commented Jun 5, 2020

hedza06 commented Jan 14, 2021

jakermx commented Feb 4, 2021 • edited Loading

jakermx commented Feb 4, 2021

enote-kane commented Mar 2, 2021 • edited Loading

jakermx commented Mar 10, 2021

elv1z commented Mar 11, 2021

jakermx commented Mar 11, 2021

nateprewitt commented Aug 28, 2021

lucas03 commented May 12, 2019 •

edited

Loading

shankari commented Aug 21, 2019 •

edited

Loading

shankari commented Aug 21, 2019 •

edited

Loading

isaiah-lyra commented Sep 9, 2019 •

edited

Loading

jakermx commented Feb 4, 2021 •

edited

Loading

enote-kane commented Mar 2, 2021 •

edited

Loading