Connection pool exhausted when connection failures occur, should refill with empty connections #644

jlatherfold · 2015-06-04T08:34:00Z

If preload_content is not specified on a request, connections are implicitly returned to the pool after the request is read. However, when a timeout error occurs or the connection is dropped the connection is closed but not returned in to the pool (since the response is None). This problem is compounded when retries are turned on since the next attempt will grab a new connection from the pool thus depleting it further. With non-blocking pools this is not a problem since we will create a new connection if the pool is empty but when the pool is blocking we have found that eventually the pool size becomes zero (after no.of connections timeout errors) which causes the calling application to hang on it's next request.
This can be fixed in connectionpool.urlopen in the exception handlers by explicitly returning the closed connection to the pool via _put_conn (if release_conn is false), since subsequent calls to _get_conn will check if the connection has been dropped and return a new one.

Lukasa · 2015-06-04T08:34:56Z

Hello again!

I think we need to be sure that we close the connection before we return them to the pool. Otherwise we run the risk of attempting to re-use a live connection that has timed out, which will end extremely poorly for us.

shazow · 2015-06-04T08:36:11Z

We should be putting a None into the pool if we want to discard the connection. That will be replaced with a fresh connection when we try to get it out of the pool. That does sound like a bug.

Lukasa · 2015-06-04T08:36:37Z

@shazow's idea is even better than mine.

shazow · 2015-06-04T08:38:31Z

@jlatherfold Is this something you'd be interested in working on?

Producing a failing test would be the first step. Make a ConnectionPool with a small pool (1-2), trigger some connection exceptions, see if the pool gets exhausted.

jlatherfold · 2015-06-04T08:44:26Z

Hi,

I've already done this, hence this bug report. Locally I've fixed it by putting the [closed] connection back on to the pool in the exception handlers (which do close the connections). Those Nones or closed connections are discarded and a new one is returned in subsequent calls to _get_conn. Sadly I don't know when I will get time to do any more on this right now.

shazow · 2015-06-04T08:53:59Z

@jlatherfold Would be handy if you can share your code for reproducing the issue. :)

Also if you can confirm that putting None rather than the closed connection still works as expected, that would be great.

jlatherfold · 2015-06-04T11:17:38Z

I can confirm that _put_conn(None) works as expected.

Here's some basic code, which also produces this error. Setting the read-timeout to a low value will cause read timeout exceptions. You'll need to hit a url with some large content - doesn't have to be massive - 500KB or so although you can probably replicate this with much less):

import os
import sys
import requests

def get(req, url):
    r = req.get(url, stream=True, timeout=(0.5, 0.1)) # set get timeout low to force timeout exceptions
    r.raise_for_status()

    return r.iter_content(chunk_size=4096)

if __name__ == '__main__':

    # if you need to auth the session do so....
    session = requests.Session()

    # setting the poolsize to 1 and 1 retries will result in this program
    # hanging on the first retry attempt (failed connection or read timeout)
    # set the pool_maxsize=4 and set some trace statements in connectionpool.py
    # to print the queue size in _get_conn and _put_conn and watch it slowly
    # decrease as we hit timeouts and retries.....
    adpater = requests.adapters.HTTPAdapter(pool_connections=1, pool_maxsize=1,
                                            pool_block=True, max_retries=1) 
    session.mount('http://', adpater)

    while True:

        try:
            print('request.get......')
            content = get(session, '<url to large content>')

            # consume the response
            for chunk in content:
                pass
        except Exception:
            print('ops.....')

        # after a while when you've hit few timeouts you're not going to get here....
        print('request.done......')

jlatherfold · 2015-06-04T11:18:14Z

Sorry - didn't know how to format for code - but you get the idea....

Lukasa · 2015-06-04T11:20:58Z

Updated code formatting.

shazow · 2015-06-05T11:09:59Z

Bonus points if somebody wants to translate that to plain-urllib3 and double-bonus if you make it into a test. :)

jlatherfold · 2015-06-05T11:21:20Z

Ok, I'll make the patch myself. Please let me know what procedure I need to follow or point me in the right direction where I can get that info as I have not made any code contributions before.

shazow · 2015-06-05T13:32:26Z

@jlatherfold Thanks! Fork the project, make a branch, start with a test that shows the failure (we have full test coverage, have a look at other tests for examples), then send a PR for feedback. We'll review the code and give you more pointers.

If you can think of a way to reproduce the scenario without doing a full end-to-end request, then urllib/tests/test_connectionpool.py would be a good place to put it. If you need a server, then take a look at the with_dummyserver subdirectory of suites. We lately prefer the test_socketlevel.py method of testing with a dummyserver when possible.

Might be wise to just write a simple example of the scenario reproducing in urllib3 first, to make sure it's not requests' fault.

jlatherfold · 2015-06-09T19:36:04Z

Hi,

I now have a fix for this with unit test and am ready to submit a PR. The work has been done on a local branch. Would you like me to merge it to master or push the branch to my fork and use that?

shazow · 2015-06-09T20:09:53Z

@jlatherfold Whatever you prefer, as long as it ends up as a Pull Request on our end. :)

When a ConnectionError occurs, python-requests <= 2.8.0 and/or urllib3 <= 1.11, connections are not returned to the pool. This change adds a CustomHTTPAdapter to workaround this issue. This HTTPAdapter doesn't trigger theses urllib3 issues. This HTTPAdapter workaround this by enforcing preloading of the response. When enabled, urllib3 releases the connection to the pool immediately after its usage, and doesn't trigger the issue. By enforcing preloading, this break some requests features (like stream) that we didn't use into our GnocchiClient Upstream bugs: * urllib3/urllib3#659 * urllib3/urllib3#651 * urllib3/urllib3#644 * https://github.com/kennethreitz/requests/issues/2687 We could remove this when requests 2.8.0 will be released Closes-bug: #1498823 Change-Id: I7d40ade927b834909e230613777cba1f7537c0ec

Project: openstack/ceilometer 75491297e34a359fbc1faff7bba484693a0b32af Workaround requests/urllib connection leaks When a ConnectionError occurs, python-requests <= 2.8.0 and/or urllib3 <= 1.11, connections are not returned to the pool. This change adds a CustomHTTPAdapter to workaround this issue. This HTTPAdapter doesn't trigger theses urllib3 issues. This HTTPAdapter workaround this by enforcing preloading of the response. When enabled, urllib3 releases the connection to the pool immediately after its usage, and doesn't trigger the issue. By enforcing preloading, this break some requests features (like stream) that we didn't use into our GnocchiClient Upstream bugs: * urllib3/urllib3#659 * urllib3/urllib3#651 * urllib3/urllib3#644 * https://github.com/kennethreitz/requests/issues/2687 We could remove this when requests 2.8.0 will be released Closes-bug: #1498823 Change-Id: I7d40ade927b834909e230613777cba1f7537c0ec

…b3#644 Pull-request for branch users/faustkun/VHSUP-8556 ref:fe62c165ed60688fc43330aae53b43b889f132ce

FrankSalad · 2019-06-03T23:25:40Z

I think we're experiencing an issue related to this when communicating over HTTPS. Using urllib version 1.25.3 I used @jlatherfold's proof of concept and https://beeceptor.com/ to create an https endpoint with an artificial 10 second timeout (passed into the example here as url). The POC hangs after logging request.get 26 times.

import os
import sys
import requests

def get(req, url):
    r = req.get(url, stream=True, timeout=(0.5, 0.1)) # set get timeout low to force timeout exceptions
    r.raise_for_status()

    return r.iter_content(chunk_size=4096)

if __name__ == '__main__':

    # if you need to auth the session do so....
    session = requests.Session()

    # setting the poolsize to 1 and 1 retries will result in this program
    # hanging on the first retry attempt (failed connection or read timeout)
    # set the pool_maxsize=4 and set some trace statements in connectionpool.py
    # to print the queue size in _get_conn and _put_conn and watch it slowly
    # decrease as we hit timeouts and retries.....
    adpater = requests.adapters.HTTPAdapter(pool_connections=1, pool_maxsize=1,
                                            pool_block=True, max_retries=1) 
    session.mount('https://', adpater)

    while True:

        try:
            print('request.get......')
            content = get(session, url)

            # consume the response
            for chunk in content:
                pass
        except Exception:
            print('ops.....')

        # after a while when you've hit few timeouts you're not going to get here....
        print('request.done......')

shazow changed the title ~~Connection pool not recycling connections on timeout or connection exceptions~~ Connection pool exhausted when connection failures occur, should refill with empty connections Jun 5, 2015

Lukasa mentioned this issue Jun 10, 2015

Cleanup properly after timeout or connection-based error. #646

Merged

shazow mentioned this issue Jun 10, 2015

Replenish pool when release_conn=False and error occurred #647

Merged

shazow closed this as completed in #647 Jun 10, 2015

rrva mentioned this issue Jul 23, 2015

Upgrade to urllib3 1.11 to fix connection pool exhaustion psf/requests#2687

Closed

Lukasa mentioned this issue Oct 11, 2015

Fix AppEngine support #618

Closed

lithammer mentioned this issue Nov 16, 2015

Eclim + YCM leads to HTTPConnectionPool "Read timed out" errors ycm-core/YouCompleteMe#1767

Closed

ml31415 mentioned this issue Feb 24, 2016

Connection pool queue can still be exhausted by gevent timeouts #805

Closed

arcadia-devtools pushed a commit to catboost/catboost that referenced this issue May 28, 2018

VHSUP-8556: try merge bugfix into contrib/requests from urllib3/urlli…

be6a1b2

…b3#644 Pull-request for branch users/faustkun/VHSUP-8556 ref:fe62c165ed60688fc43330aae53b43b889f132ce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection pool exhausted when connection failures occur, should refill with empty connections #644

Connection pool exhausted when connection failures occur, should refill with empty connections #644

jlatherfold commented Jun 4, 2015

Lukasa commented Jun 4, 2015

shazow commented Jun 4, 2015

Lukasa commented Jun 4, 2015

shazow commented Jun 4, 2015

jlatherfold commented Jun 4, 2015

shazow commented Jun 4, 2015

jlatherfold commented Jun 4, 2015

jlatherfold commented Jun 4, 2015

Lukasa commented Jun 4, 2015

shazow commented Jun 5, 2015

jlatherfold commented Jun 5, 2015

shazow commented Jun 5, 2015

jlatherfold commented Jun 9, 2015

shazow commented Jun 9, 2015

FrankSalad commented Jun 3, 2019 •

edited

Connection pool exhausted when connection failures occur, should refill with empty connections #644

Connection pool exhausted when connection failures occur, should refill with empty connections #644

Comments

jlatherfold commented Jun 4, 2015

Lukasa commented Jun 4, 2015

shazow commented Jun 4, 2015

Lukasa commented Jun 4, 2015

shazow commented Jun 4, 2015

jlatherfold commented Jun 4, 2015

shazow commented Jun 4, 2015

jlatherfold commented Jun 4, 2015

jlatherfold commented Jun 4, 2015

Lukasa commented Jun 4, 2015

shazow commented Jun 5, 2015

jlatherfold commented Jun 5, 2015

shazow commented Jun 5, 2015

jlatherfold commented Jun 9, 2015

shazow commented Jun 9, 2015

FrankSalad commented Jun 3, 2019 • edited

FrankSalad commented Jun 3, 2019 •

edited