Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection pool exhausted when connection failures occur, should refill with empty connections #644

Closed
jlatherfold opened this issue Jun 4, 2015 · 15 comments

Comments

@jlatherfold
Copy link

If preload_content is not specified on a request, connections are implicitly returned to the pool after the request is read. However, when a timeout error occurs or the connection is dropped the connection is closed but not returned in to the pool (since the response is None). This problem is compounded when retries are turned on since the next attempt will grab a new connection from the pool thus depleting it further. With non-blocking pools this is not a problem since we will create a new connection if the pool is empty but when the pool is blocking we have found that eventually the pool size becomes zero (after no.of connections timeout errors) which causes the calling application to hang on it's next request.
This can be fixed in connectionpool.urlopen in the exception handlers by explicitly returning the closed connection to the pool via _put_conn (if release_conn is false), since subsequent calls to _get_conn will check if the connection has been dropped and return a new one.

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Jun 4, 2015

Hello again!

I think we need to be sure that we close the connection before we return them to the pool. Otherwise we run the risk of attempting to re-use a live connection that has timed out, which will end extremely poorly for us.

@shazow
Copy link
Member

shazow commented Jun 4, 2015

We should be putting a None into the pool if we want to discard the connection. That will be replaced with a fresh connection when we try to get it out of the pool. That does sound like a bug.

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Jun 4, 2015

@shazow's idea is even better than mine.

@shazow
Copy link
Member

shazow commented Jun 4, 2015

@jlatherfold Is this something you'd be interested in working on?

Producing a failing test would be the first step. Make a ConnectionPool with a small pool (1-2), trigger some connection exceptions, see if the pool gets exhausted.

@jlatherfold
Copy link
Author

Hi,

I've already done this, hence this bug report. Locally I've fixed it by putting the [closed] connection back on to the pool in the exception handlers (which do close the connections). Those Nones or closed connections are discarded and a new one is returned in subsequent calls to _get_conn. Sadly I don't know when I will get time to do any more on this right now.

@shazow
Copy link
Member

shazow commented Jun 4, 2015

@jlatherfold Would be handy if you can share your code for reproducing the issue. :)

Also if you can confirm that putting None rather than the closed connection still works as expected, that would be great.

@jlatherfold
Copy link
Author

I can confirm that _put_conn(None) works as expected.

Here's some basic code, which also produces this error. Setting the read-timeout to a low value will cause read timeout exceptions. You'll need to hit a url with some large content - doesn't have to be massive - 500KB or so although you can probably replicate this with much less):

import os
import sys
import requests

def get(req, url):
    r = req.get(url, stream=True, timeout=(0.5, 0.1)) # set get timeout low to force timeout exceptions
    r.raise_for_status()

    return r.iter_content(chunk_size=4096)

if __name__ == '__main__':

    # if you need to auth the session do so....
    session = requests.Session()

    # setting the poolsize to 1 and 1 retries will result in this program
    # hanging on the first retry attempt (failed connection or read timeout)
    # set the pool_maxsize=4 and set some trace statements in connectionpool.py
    # to print the queue size in _get_conn and _put_conn and watch it slowly
    # decrease as we hit timeouts and retries.....
    adpater = requests.adapters.HTTPAdapter(pool_connections=1, pool_maxsize=1,
                                            pool_block=True, max_retries=1) 
    session.mount('http://', adpater)

    while True:

        try:
            print('request.get......')
            content = get(session, '<url to large content>')

            # consume the response
            for chunk in content:
                pass
        except Exception:
            print('ops.....')

        # after a while when you've hit few timeouts you're not going to get here....
        print('request.done......')

@jlatherfold
Copy link
Author

Sorry - didn't know how to format for code - but you get the idea....

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Jun 4, 2015

Updated code formatting.

@shazow shazow changed the title Connection pool not recycling connections on timeout or connection exceptions Connection pool exhausted when connection failures occur, should refill with empty connections Jun 5, 2015
@shazow
Copy link
Member

shazow commented Jun 5, 2015

Bonus points if somebody wants to translate that to plain-urllib3 and double-bonus if you make it into a test. :)

@jlatherfold
Copy link
Author

Ok, I'll make the patch myself. Please let me know what procedure I need to follow or point me in the right direction where I can get that info as I have not made any code contributions before.

@shazow
Copy link
Member

shazow commented Jun 5, 2015

@jlatherfold Thanks! Fork the project, make a branch, start with a test that shows the failure (we have full test coverage, have a look at other tests for examples), then send a PR for feedback. We'll review the code and give you more pointers.

If you can think of a way to reproduce the scenario without doing a full end-to-end request, then urllib/tests/test_connectionpool.py would be a good place to put it. If you need a server, then take a look at the with_dummyserver subdirectory of suites. We lately prefer the test_socketlevel.py method of testing with a dummyserver when possible.

Might be wise to just write a simple example of the scenario reproducing in urllib3 first, to make sure it's not requests' fault.

@jlatherfold
Copy link
Author

Hi,

I now have a fix for this with unit test and am ready to submit a PR. The work has been done on a local branch. Would you like me to merge it to master or push the branch to my fork and use that?

@shazow
Copy link
Member

shazow commented Jun 9, 2015

@jlatherfold Whatever you prefer, as long as it ends up as a Pull Request on our end. :)

openstack-gerrit pushed a commit to openstack/ceilometer that referenced this issue Sep 23, 2015
When a ConnectionError occurs, python-requests <= 2.8.0
and/or urllib3 <= 1.11, connections are not returned to
the pool.

This change adds a CustomHTTPAdapter to workaround this
issue.

This HTTPAdapter doesn't trigger theses urllib3 issues.
This HTTPAdapter workaround this by enforcing preloading
of the response. When enabled, urllib3 releases the connection
to the pool immediately after its usage, and doesn't trigger the
issue.

By enforcing preloading, this break some requests
features (like stream) that we didn't use into our GnocchiClient

Upstream bugs:

* urllib3/urllib3#659
* urllib3/urllib3#651
* urllib3/urllib3#644
* https://github.com/kennethreitz/requests/issues/2687

We could remove this when requests 2.8.0 will be released

Closes-bug: #1498823
Change-Id: I7d40ade927b834909e230613777cba1f7537c0ec
openstack-gerrit pushed a commit to openstack/openstack that referenced this issue Sep 23, 2015
Project: openstack/ceilometer  75491297e34a359fbc1faff7bba484693a0b32af

Workaround requests/urllib connection leaks

When a ConnectionError occurs, python-requests <= 2.8.0
and/or urllib3 <= 1.11, connections are not returned to
the pool.

This change adds a CustomHTTPAdapter to workaround this
issue.

This HTTPAdapter doesn't trigger theses urllib3 issues.
This HTTPAdapter workaround this by enforcing preloading
of the response. When enabled, urllib3 releases the connection
to the pool immediately after its usage, and doesn't trigger the
issue.

By enforcing preloading, this break some requests
features (like stream) that we didn't use into our GnocchiClient

Upstream bugs:

* urllib3/urllib3#659
* urllib3/urllib3#651
* urllib3/urllib3#644
* https://github.com/kennethreitz/requests/issues/2687

We could remove this when requests 2.8.0 will be released

Closes-bug: #1498823
Change-Id: I7d40ade927b834909e230613777cba1f7537c0ec
arcadia-devtools pushed a commit to catboost/catboost that referenced this issue May 28, 2018
…b3#644

Pull-request for branch users/faustkun/VHSUP-8556

ref:fe62c165ed60688fc43330aae53b43b889f132ce
@FrankSalad
Copy link

FrankSalad commented Jun 3, 2019

I think we're experiencing an issue related to this when communicating over HTTPS. Using urllib version 1.25.3 I used @jlatherfold's proof of concept and https://beeceptor.com/ to create an https endpoint with an artificial 10 second timeout (passed into the example here as url). The POC hangs after logging request.get 26 times.

import os
import sys
import requests

def get(req, url):
    r = req.get(url, stream=True, timeout=(0.5, 0.1)) # set get timeout low to force timeout exceptions
    r.raise_for_status()

    return r.iter_content(chunk_size=4096)

if __name__ == '__main__':

    # if you need to auth the session do so....
    session = requests.Session()

    # setting the poolsize to 1 and 1 retries will result in this program
    # hanging on the first retry attempt (failed connection or read timeout)
    # set the pool_maxsize=4 and set some trace statements in connectionpool.py
    # to print the queue size in _get_conn and _put_conn and watch it slowly
    # decrease as we hit timeouts and retries.....
    adpater = requests.adapters.HTTPAdapter(pool_connections=1, pool_maxsize=1,
                                            pool_block=True, max_retries=1) 
    session.mount('https://', adpater)

    while True:

        try:
            print('request.get......')
            content = get(session, url)

            # consume the response
            for chunk in content:
                pass
        except Exception:
            print('ops.....')

        # after a while when you've hit few timeouts you're not going to get here....
        print('request.done......')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants