Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool Depletion / Leaking Connections #76

Closed
thatguystone opened this issue Jun 15, 2012 · 10 comments
Closed

Pool Depletion / Leaking Connections #76

thatguystone opened this issue Jun 15, 2012 · 10 comments

Comments

@thatguystone
Copy link
Contributor

In our production environment, we've noticed the following errors in our logs, pool size set to 500:

FetchError: FetchError: msg = "No retries left, giving up", original exception = "EmptyPoolError("HTTPConnectionPool(host='SUPER_SECRET', port=80): Pool reached maximum size and no more connections are allowed.",)", url = "http://SUPER_SECRET_URL/", retries = 2
ERROR - base.py:fetch:166 - FetchError: HTTPConnectionPool(host='SUPER_SECRET', port=80): Pool reached maximum size and no more connections are allowed., retries left = 2

After a bit of investigation, it looks like the number of connections in the pool is slow decreasing over time as we encounter other errors, forced timeouts, etc. On urllib3/connectionpool.py:424, if httplib/socket raises an error, the connection will be dropped from the pool because it was acquired on 382 and never put back on line 431 since conn == None.

This issue can be replicated with the following gist (uses gevent.Timeout, so quasi-related to #69): https://gist.github.com/2932793

It looks like the if conn on line 431 was originally added to ensure that a SocketError raised from _get_conn() didn't add None to the pool since a connection would never be gotten, but in the current incarnation, it has the effect of causing the pool size to shrink over time if a connection is acquired and any error is raised from httplib/socket.

@shazow
Copy link
Member

shazow commented Jun 15, 2012

Ah, thank you for the excellent bug report and analysis. I believe you're correct.

Is this something you may be interested in fixing and sending a pull request? Otherwise I'll take a look at it in the next week or two.

@thatguystone
Copy link
Contributor Author

I was looking at trying to fix it, but I didn't just want to go in and change stuff while breaking everything. If you're okay reviewing any changes I make to make sure they're not going to mess everything up, I'd be more than happy to give it a shot.

@shazow
Copy link
Member

shazow commented Jun 15, 2012

Absolutely.

Since we suspect the bug is simply the way the queue gets populated, we should be able to start with a simple unit test illustrating the bug. Have a look at some of these for example: https://github.com/shazow/urllib3/blob/master/test/test_connectionpool.py

urllib3 should be very thoroughly tested so hopefully we won't need to worry about breakage too much. :)

@thatguystone
Copy link
Contributor Author

@shazow: Could you run your tests of 358c789? I'm seeing the following failures. Just want to make sure I'm not missing something.

(ve)andi@Apollo:~/Dev/IHR/urllib3$ pip freeze
argparse==1.2.1
coverage==3.5.2
distribute==0.6.24
nose==1.1.2
tornado==2.3
wsgiref==0.1.2
(ve)andi@Apollo:~/Dev/IHR/urllib3$ nosetests
.........................F.F..................................
======================================================================
FAIL: test_cross_host_redirect (test.with_dummyserver.test_poolmanager.TestPoolManager)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/andi/Dev/IHR/urllib3/test/with_dummyserver/test_poolmanager.py", line 36, in test_cross_host_redirect
    self.fail("Request succeeded instead of raising an exception like it should.")
AssertionError: Request succeeded instead of raising an exception like it should.
-------------------- >> begin captured logging << --------------------
urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
root: INFO: 200 GET http://localhost:18081/redirect?target=http%3A%2F%2F127.0.0.1%3A18081%2Fecho%3Fa%3Db (127.0.0.1) 1.38ms
urllib3.connectionpool: DEBUG: "GET http://localhost:18081/redirect?target=http%3A%2F%2F127.0.0.1%3A18081%2Fecho%3Fa%3Db ('HTTP/1.1',)" 200 13
--------------------- >> end captured logging << ---------------------

======================================================================
FAIL: test_redirect (test.with_dummyserver.test_poolmanager.TestPoolManager)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/andi/Dev/IHR/urllib3/test/with_dummyserver/test_poolmanager.py", line 20, in test_redirect
    self.assertEqual(r.status, 303)
AssertionError: 200 != 303
-------------------- >> begin captured logging << --------------------
urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
root: INFO: 200 GET http://localhost:18081/redirect?target=http%3A%2F%2Flocalhost%3A18081%2F (127.0.0.1) 0.76ms
urllib3.connectionpool: DEBUG: "GET http://localhost:18081/redirect?target=http%3A%2F%2Flocalhost%3A18081%2F ('HTTP/1.1',)" 200 13
--------------------- >> end captured logging << ---------------------

Name                     Stmts   Miss  Cover   Missing
------------------------------------------------------
urllib3                     16      0   100%   
urllib3._collections        69      0   100%   
urllib3.connectionpool     152      7    95%   38, 251, 371-375, 408
urllib3.contrib              0      0   100%   
urllib3.exceptions          24      4    83%   43-47
urllib3.filepost            37      0   100%   
urllib3.poolmanager         48      3    94%   106-108
urllib3.request             23      0   100%   
urllib3.response            80      1    99%   152
urllib3.util                60      0   100%   
------------------------------------------------------
TOTAL                      509     15    97%   
----------------------------------------------------------------------
Ran 62 tests in 0.732s

FAILED (failures=2)

@shazow
Copy link
Member

shazow commented Jun 18, 2012

Hmm, passes on Tornado 2.1.1, fails on Tornado 2.3. Wonder what changed.

thatguystone added a commit to thatguystone/urllib3 that referenced this issue Jun 18, 2012
 * Always returning connections to the pool, unless release_conn == False
 * Adding test cases for all the exceptions in urlopen to verify pool never gets empty
 * Moved SocketError exception handling to is_connection_dropped to simplify urlopen
@thatguystone
Copy link
Contributor Author

Alrighty, 1bbda4e is linked and contains the fixes. If you could review, I would greatly appreciate it.

@shazow
Copy link
Member

shazow commented Jun 19, 2012

Took a quick glance, looks good! I'll take a more thorough look this weekend and merge it in. Thanks for doing this, @thatguystone. You're welcome to add yourself to the CONTRIBUTORS.txt if you wish. :)

thatguystone added a commit to thatguystone/urllib3 that referenced this issue Jun 19, 2012
@adamJLev
Copy link

adamJLev commented Jul 6, 2012

+1 please merge fix in. Thanks!

@shazow
Copy link
Member

shazow commented Jul 9, 2012

Looks like this has been merged? Hard to tell because it wasn't a real Github pull request. :)

Please reopen if I'm mistaken.

@shazow shazow closed this as completed Jul 9, 2012
@thatguystone thatguystone mentioned this issue Jul 9, 2012
@thatguystone
Copy link
Contributor Author

Just created #86 for this bug. Not sure why I couldn't just make this a pull request :-\

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants