Too many open files #239

Closed
daleharvey opened this Issue Nov 5, 2011 · 79 comments

Comments

Projects
None yet

I am building a basic load generator and started running into file descriptor limits, I havent seen any documentation pertaining to how to release resources, so either I am doing it wrong and the docs need updated, or requests is leaking file descriptors somewhere (without support for keepalive I am slightly confused at why any files would be left open at all)

Owner

kennethreitz commented Nov 5, 2011

Where you using requests.async?

nope, all requests were reasonably plain requests.get / requests.post, I am still seeing a few in there

$ lsof | grep localhost | wc -l
110

all but 4/5 of them are of the format

Python    82117 daleharvey  123u    IPv4 0xffffff800da304e0       0t0      TCP localhost:61488->localhost:http (CLOSE_WAIT)
Owner

kennethreitz commented Nov 26, 2011

I'm a bit baffled by this, to be honest.

Hah ill take another shot at reproducing it reliably, if I cant ill close

Owner

kennethreitz commented Nov 26, 2011

I've seen this happening to me, but only when I'm using the async module w/ 200+ simultaneous connections.

tamiel commented Dec 13, 2011

Hi,
I got exactly the same problem using requests and monkey patching with gevent : some connections staying in CLOSE_WAIT .
Maybe a problem with gevent so .

Contributor

juanriaza commented Dec 22, 2011

It may be problem of ulimit -n. Try with a higher value.

tamiel commented Dec 22, 2011

"Too many open files" is the result of the bug caused by sockets staying in CLOSE_WAIT .
So ulimit won't fix just make a workaround .

Owner

kennethreitz commented Dec 22, 2011

@tamiel how do we fix this?

tamiel commented Dec 22, 2011

I will do more tests asap and try to fix .

Owner

kennethreitz commented Dec 22, 2011

I've looked into it, and seems to be a problem with all libraries using httplib.HTTPSConnection.

tamiel commented Dec 22, 2011

Posted an example here :

https://gist.github.com/1512329

Contributor

acdha commented Jan 17, 2012

I just encountered a very similar error using an async pool with only HTTP connections - I'm still investigating but passing a pool size to async.map makes the error reproduce quickly.

bevenky commented Jan 29, 2012

Any fixes to this? This makes Requests unusable with gevent..

Owner

kennethreitz commented Jan 29, 2012

It's all about the CLOSE_WAITs. Just have to close them. I'm not sure why they're still open though.

bevenky commented Jan 29, 2012

Is it a urllib3 issue? Having to close these by ourselves isnt a great idea i feel.

Owner

kennethreitz commented Jan 29, 2012

It's more of a general issue. We can keep the conversation here.

bevenky commented Jan 29, 2012

Ok just to give you a perspective, we are trying to move from httplib2 to requests, and we dont see this issue with httplib2. So its not a general issue for sure.

Owner

kennethreitz commented Jan 29, 2012

By general i mean that it's a very serious issue that effects everyone involved.

bevenky commented Jan 29, 2012

so how do we solve this? we really want to use requests + slumber moving forward

Owner

kennethreitz commented Jan 29, 2012

I'd love to know the answer to that.

@kennethreitz kennethreitz reopened this Jan 29, 2012

Contributor

acdha commented Jan 29, 2012

The leak appears to be due to the internal redirect handling, which causes new requests to be generated before the pending responses have been consumed. In testing acdha@730c0e2 has an under-satisfying but effective fix simply by forcing each response to be consumed before continuing.

This required changes in two places which makes me want to refactor the interface slightly but I'm out of time to continue currently.

kennethreitz added a commit that referenced this issue Jan 29, 2012

Contributor

acdha commented Jan 30, 2012

#399 has a fix which works well in my async load generator (https://github.com/acdha/webtoolbox/blob/master/bin/http_bench.py) with thousands of requests and a low fd ulimit

I have run into the same issue when using async -- kludged a workaround by chunking requests and deleting responses / calling gc.collect

I believe I was running into this today connecting to a licensed server that only allows 5 connections.

Using async I could only GET 4 things before it paused for 60 seconds.

Using the normal GET with consumption I could fetch about 150 things serially in under 40 seconds.

Haven't made my kludge yet since I saw this issue.

Contributor

dalanmiller commented Apr 30, 2012

Just got this error while using ipython and got this message. This is just making each request one at a time, but I think I got something similar when using async.

ERROR: Internal Python error in the inspect module.
Below is the traceback from this internal error.
Traceback (most recent call last):
    File "/Library/Python/2.7/site-packages/IPython/core/ultratb.py", line 756, in structured_traceback
    File "/Library/Python/2.7/site-packages/IPython/core/ultratb.py", line 242, in _fixed_getinnerframes
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/inspect.py", line 1035, in getinnerframes
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/inspect.py", line 995, in getframeinfo
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/inspect.py", line 456, in getsourcefile
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/inspect.py", line 485, in getmodule
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/inspect.py", line 469, in getabsfile
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.py", line 347, in abspath
OSError: [Errno 24] Too many open files

Unfortunately, your original traceback can not be constructed.

Oddly, I think when using just the normal Python interpreter I get a "Max Retries Error" but I think that is another issue with me doing requests on all the same domain, but not sure.

Contributor

acdha commented Apr 30, 2012

I ran into this on the first project I had where allow_redirects was True; it appears to be caused by the redirection chain leaking response objects which aren't released even with prefetch=True. This fixed it in my initial testing:

        [i.raw.release_conn() for i in resp.history]
        resp.raw.release_conn()
Owner

kennethreitz commented Apr 30, 2012

Hmmm..

Contributor

dalanmiller commented Apr 30, 2012

@acdha setting:

requests.defaults.defaults['allow_redirects'] = False

before I make any requests still results in the same error, but I think this isn't an option for my implementation as all the requests I'm making will require a redirect =/

Contributor

acdha commented Apr 30, 2012

@dalanmiller How are you processing your responses? I was previously using async.map with a response hook and it appears to be more stable using a simple loop over async.imap:

for resp in requests.async.imap(reqs, size=8):
    try:
        print resp.status_code, resp.url
    finally:
        [i.raw.release_conn() for i in resp.history]
        resp.raw.release_conn()
Contributor

dalanmiller commented Apr 30, 2012

@acdha

I was just using a for loop through a url list and doing a request.get on each with my settings and such.

for u in urls:
    response_list.append(requests.get(u))

I tried using your paste and it works for about 50 requests in my 900 length list, until I start to get "max retries errors exceeded with url" for the rest. This is a pretty standard error though for hitting the same domain repeatedly though, no?

dmishe commented Jun 7, 2012

Hey, i was crawling a huge list of urls, 35k, and got this same error on some of requests.

I am getting urls in chunks of 10, like this:

responses = requests.async.map([requests.async.get(u, params=self.params()) for u in chunk]) # chunk is a list of 10

Somewhere in 20k range i started getting error 24, then it was ok thru 30k and then again.

Any more info you would be interested in to narrow it down?

Contributor

piotr-dobrogost commented Jun 7, 2012

requests.async is gone. You might want to consider moving to grequests.

dmishe commented Jun 7, 2012

All right, thanks. Would be good to mention this in the docs.

Contributor

dalanmiller commented Jun 7, 2012

Kind of a noob when it comes to Pull Requests and writing documentation but I took a stab at it and sent it. Please comment or criticize :)

kennethreitz#665

dmishe commented Jun 9, 2012

Ok, this happends even without using async, with just requests.get, after 6K requests.

Owner

kennethreitz commented Jun 9, 2012

I suspected that.

For me the 'Too many open files' error occurred after downloading exactly 1k files. My solution was to disable keep-alive property, ever getting requests in chunks (@acdha thank you for the hint). lsof -p PID | wc -l shows a non-increasing number of connections during the execution.

rsess = requests.session()
rsess.config['keep-alive'] = False

rs = [grequests.get(l, session=rsess) for l in links]

for s in chunks(rs,100):
    responses = grequests.map(s, size=concurrency)
    for r in responses:
        try:
            print(r.status_code, r.url)
        finally:
            r.raw.release_conn()

[1] chunking: http://stackoverflow.com/a/312464

Owner

kennethreitz commented Jul 27, 2012

Closing while deferring to urllib3 fix.

Contributor

piotr-dobrogost commented Sep 22, 2012

@kennethreitz What's the urllib3's issue number?

Looks like this is the issue http://bugs.python.org/issue16298

Owner

sigmavirus24 commented Nov 28, 2012

@silvexis could very well be related to the urllib3 bug, now I'm just wishing someone had answered @piotr-dobrogost :P

@dmishe dmishe referenced this issue in kennethreitz/grequests Mar 18, 2013

Closed

error(24, 'Too many open files') #9

barapa commented Aug 7, 2013

Is anyone else still encountering this issue?

Owner

Lukasa commented Aug 7, 2013

I haven't heard any reports of it. Are you?

It's problem of the box config, not of the framework. Look at kernel configuration of your OS. In BSD it is called kern.maxfiles. There is thread about ulimit in Linux systems: http://stackoverflow.com/questions/34588/how-do-i-change-the-number-of-open-files-limit-in-linux
Hope it helps, and I don't know how to change this parameter on Windows.

danfairs commented Aug 7, 2013

With the caveat that we're still running an older version of requests, we have the following, horrible, code in place to handle this:

    if self._current_response is not None:
            # Requests doesn't have a clean API to actually close the
            # socket properly. Dig through multiple levels of private APIs
            # to close the socket ourselves. Icky.
            self._current_response.raw.release_conn()
            if self._current_response.raw._fp.fp is not None:
                sock = self._current_response.raw._fp.fp._sock
                try:
                    logger.debug('Forcibly closing socket')
                    sock.shutdown(socket.SHUT_RDWR)
                    sock.close()
                except socket.error:
                    pass

(I think self._current_response is the requests' response object)

Owner

Lukasa commented Aug 7, 2013

Hmm, where is the chain of closing broken? We have a Response.close() method that calls release_conn(), so what needs to happen in release_conn() for this to work?

Owner

sigmavirus24 commented Aug 10, 2013

@Lukasa this was definitely fixed in urllib3 as I was part of the discussion. With an inclination towards being conservative in my estimate, I would say it's there since requests 1.2.x if not 1.1.x.

Owner

Lukasa commented Aug 10, 2013

Yeah, I did think this was fixed. Unless we see something on 1.2.3, I'm going to continue to assume this is fixed.

Contributor

tardyp commented Nov 28, 2013

I'm seeing a CLOSE_WAIT leak with 2.0.2, do you have unit tests to ensure there is no regression on the topic ?

Owner

Lukasa commented Nov 28, 2013

No, we don't. AFAIK urllib3 doesn't either. Can you reproduce your leak easily?

Contributor

tardyp commented Nov 28, 2013

We use request in our internal app since monday, and hit the 1024 maxfiles today..

2 hours after reboot, we have 40 CLOSE_WAIT as told by lsof.

So I think we'll be able to reproduce in a dev environment, yes. I'll keep you in touch

Owner

sigmavirus24 commented Nov 29, 2013

@tardyp also, how did you install requests? I think all of the OS package maintainers strip out urllib3. If they don't keep that up-to-date and you're using an old version, that could be the cause instead. If you're using pip, then feel free to open a new issue to track this with instead of adding discussion onto this one.

Contributor

tardyp commented Nov 29, 2013

I installed with pip, but I use python 2.6, I've seen fix on python2.7 for
this bug. Do you monkeypatch for older version?

Pierre

On Fri, Nov 29, 2013 at 5:33 PM, Ian Cordasco notifications@github.comwrote:

@tardyp https://github.com/tardyp also, how did you install requests? I
think all of the OS package maintainers strip out urllib3. If they don't
keep that up-to-date and you're using an old version, that could be the
cause instead. If you're using pip, then feel free to open a new issue to
track this with instead of adding discussion onto this one.


Reply to this email directly or view it on GitHubhttps://github.com/kennethreitz/requests/issues/239#issuecomment-29526302
.

Owner

sigmavirus24 commented Nov 29, 2013

@tardyp please open a new issue with as much detail as possible including whether the requests you're making have redirects and whether you're using gevent. Also, any details about the operating system and an example of how to reproduce it would be fantastic.

Contributor

shazow commented Dec 4, 2013

FYI shazow/urllib3#291 has been reverted due to bugs.

sigmavirus24 added a commit to sigmavirus24/urllib3 that referenced this issue Dec 4, 2013

This should be enough to handle requests/requests#239
- The comment in the method explains why we need to check self.pool first

Should we re-open this?
I am having the same issue!

Owner

Lukasa commented Nov 25, 2014

@polvoazul There's no way this is the same issue, which was originally reported in 2011, so I don't think reopening is correct. However, if you're running the current release of requests (2.4.3) and can reproduce the problem, opening a new issue would be correct.

mygoda commented Jan 23, 2016

@Lukasa i need you help 。 i use eventlet + requests,that always create so many sock that can't identify protocol 。 my requests is 2.4.3, is eventlet + requests cause this problem?

Owner

Lukasa commented Jan 23, 2016

I'm sorry @mygoda, but it's impossible to know. If you aren't constraining the number of requests that can be outstanding at any one time then it's certainly possible, but that's an architectural problem outside the remit of requests.

mygoda commented Jan 23, 2016

@Lukasa thank you。 i think my issue is similar with this。 my project is pyvmomi. that connection is long-connection. i always confused why can hold so many can't identify protocol sock

1a1a11a commented Mar 17, 2016

Having the same problem now, running 120 threads, cause 100000+ opened files, any solution right now?

Owner

kennethreitz commented Mar 17, 2016

@mygoda you use awesome periods。

Owner

Lukasa commented Mar 17, 2016

@1a1a11a What files do you have open? That'd be a useful first step to understanding this problem.

Owner

sigmavirus24 commented Mar 17, 2016

@1a1a11a what version of requests are you using? What version of python? What operating system? Can we get any information?

1a1a11a commented Mar 17, 2016

I am using request 2.9.1, python 3.4, ubuntu 14.04, basically I am writing a crawler using 30 threads with proxies to crawl some website. Currently I have adjusted the file limit per process to 655350, otherwise it will report error.

I am still receiving the error "Failed to establish a new connection: [Errno 24] Too many open files" from requests.packages.urllib3.connection.VerifiedHTTPSConnection." I'm using Python 3.4, requests 2.11.1 and requests-futures 0.9.7. I appreciate requests-futures is a separate library, but it seems like the error is coming from requests. I'm attempting to make 180k asynchronous requests over SSL. I've divided those requests into segments of 1000, so I only move onto the next 1000 once all the future objects have been resolved. I'm running Ubuntu 16.04.2 and my default open files limit is 1024. It would be good to understand the underlying reason for this error. Does the requests library create an open file for each individual request? And if so, why? Is this a SSL certificate file? And does the requests library automatically close those open files when the future object is resolved?

Owner

Lukasa commented Mar 17, 2017

Requests opens many files. Some of those files are opened for certificates, but they are opened by OpenSSL and not by Requests, so those aren't maintained. Additionally, Requests will also open, if needed, the .netrc file, the hosts file, and many others.

You will be best served by using a tool like strace to work out which files are opened. There is a strict list of system calls that lead to file descriptors being allocated, so you should reasonably swiftly be able to enumerate them. That will also let you know whether there is a problem or not. But, yes, I'd expect that if you're actively making 1000 connections over HTTPS then at peak load we could easily use over 1000 FDs.

tkaria commented Jul 11, 2017

I struggled with this issue as well and found that using opensnoop on OS X worked great to let me see what was happening if anyone runs in to the same issues.

nyurik commented Aug 29, 2017

I'm also frequently seeing this error when repeatedly calling requests.post(url, data=data) to an HTTP (not HTTPS) server. Running on Ubuntu 16.04.3, Python 3.5.2, requests 2.9.1

Owner

Lukasa commented Aug 29, 2017

What is data?

nyurik commented Aug 29, 2017

A few hundred kb text

Owner

Lukasa commented Aug 29, 2017

Not a file object?

nyurik commented Aug 29, 2017

No, I form a large query in memory.

Owner

Lukasa commented Aug 29, 2017

Are you running this code in multiple threads?

nyurik commented Aug 29, 2017

No, single thread, posting to localhost

Owner

Lukasa commented Aug 29, 2017

It seems almost impossible for us to be leaking that many FDs then: we should be repeatedly using the same TCP connection or aggressively closing it. Want to check what your server is up to?

iraykhel commented Sep 26, 2017

I am having this problem. Python 2.7, requests 2.18.4, urllib3 1.22.
Running multi-threaded code (not multi-processed). Connecting to at most 6 URLs at one time, manually creating and closing a new session for each one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment