New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClosedPoolError on crawling #951
Comments
Related stackoverflow question here: https://stackoverflow.com/questions/39012081/python-urllib3-closedpoolerror-on-crawling |
Do you have a complete traceback, please? I'd like to see where this failure is coming from. |
This is what i have from my log: |
Hrm, we're really going to need a better traceback. If you're not catching this, can you add a block that catches it and fires |
Full traceback :)
|
So, that's convenient. It seems like we've followed a redirect and are now attempting to grab a connection out of the connection pool. Somehow, in between this, the pool is getting closed. It looks like the PoolManager allows this behaviour: essentially, if a pool starts making requests to the host after What's not clear to me is why there is redirect code in multiple places. If we're capable of doing redirects at the PoolManager level, why do the ConnectionPools also support it? @haikuginger, @shazow, got theories on that? |
Because ConnectionPools were originally and still are a first-class object that can be used on their own for a single host. This was a cause for confusion for me as well, and is one of a number of reasons that redirects are refactored to be part of the |
Historically, ConnectionPools came first, and the first attempt at redirection code lived there. Of course it did not work for cross-host requests. Then PoolManagers came, and the redirection code was ~copied, then later improved by the Retry objects stuff. I'd be +1 from removing redirect support from ConnectionPools, as it's very edge-case-y anyways (since it breaks for cross-host). I feel if you're going to use such a low-level primitive, you should be ready to do your own redirecting. Bonus points if our redirecting code can be used in stand-alone for those who want to. This would be a great recipe to have in docs, if it's not there already. |
I'd be +1 on pulling it out of the ConnectionPool too. |
I have no objection to that. |
I'm working on this since it's also necessary for #952. |
Slightly refactor our PoolManager#urlopen method to turn integers into Retry objects sooner and move our redirect handling tests for connection pools to our pool manager so we don't lose test coverage. Some tests were duplicated and thus not moved over. The rest were not and so I moved them over and modified their logic to exercise the PoolManager instead of HTTPConnectionPool Closes urllib3#951
When will this fix will get in the main repository? |
When the fix is finished. =) |
Any projection for that? :) |
Nope. The nature of these things is that without any form of contracted work the work will get done when someone is motivated sufficiently to complete that contribution. |
@eladbitton posting a bounty on this bug is a good way to motivate people to fix the bug for you. |
Hitting the same issue while using a bunch of threads to mass download html content. Is there anything I could do to help you fix this or do you reckon I should handle redirects on my own as @shazow says:
|
Hitting the same issue here as well, but in my case I'm usings @Lukasa, @haikuginger and @shazow I'm really aiming to get this solved. I'm studying how I can submit a PR with tested code and so on to get this fixed and help the community. If you guys have any tips, just leave them on the table 😄 |
One possible way to circumvent this, is to create a code like:
Then you can use it like this:
For those who is facing the same problem. |
so digging into this
while the doc says:
|
I ended up by patching _get_conn like this:
it works but I don't know the drawbacks, any help here? |
How large do bounties usually need to get before somebody bites? |
I am building a crawler with python3 and urllib3. I am using a PoolManager instance that is used by 15 different threads. While crawling thousands of website i get a lot of ClosedPoolError from different website.
On the documentation - ClosedPoolError:
It appears that the PoolManager instance is trying to use a closed connection.
My code:
How can i make the PoolManager renew the connection and try again?
The text was updated successfully, but these errors were encountered: