Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch hanging HTTPConnectionPool.closeCachedConnections call #999

Merged
merged 1 commit into from
Dec 31, 2014

Conversation

curita
Copy link
Member

@curita curita commented Dec 31, 2014

This PR fixes #985

The issue while closing the spider on that domain raised while closing the HTTP11DownloadHandler (it can be seen here). The server doesn't close the connection properly so the client waits for a confirmation that doesn't arrive. I've reported this on the Twisted issue tracker (#7738) since this a problem concerning their HTTPConnectionPool while cleaning persistent connections, but I've patched it externally by firing a deferred on a DelayedCalled.

This problem hasn't came up before the changes on the crawling API because the reactor was stopped along the download handlers on the engine_stop signal (CrawlerProcess@8fece4b and DownloadHandlers@8fece4b). Instead of that, now the reactor is stopped after the crawl deferreds has been fired (CrawlerProcess), which happens after each engine has stopped, so the HTTP11DownloadHandler.close isn't abruptly terminated.

I chose a _disconnect_timeout of one second on a tradeoff between the previous instant termination and giving a little time to the connections to close in an orderly manner. It could be a new Scrapy setting, and that's why I set this variable on the class init, but I think that kind of parametrization is not needed right now.

I struggled on mocking a server that mimics this behavior (Twisted doesn't provide a way to do it as they manage the sockets internally, and I'm still not sure how to do it otherwise), so that's why I'm submitting the PR as it is and later I'll try to add a proper test.

@dangra
Copy link
Member

dangra commented Dec 31, 2014

nice bugfix! makes sense and I can confirm #985 is fixed by this patch.

dangra added a commit that referenced this pull request Dec 31, 2014
Patch hanging HTTPConnectionPool.closeCachedConnections call
@dangra dangra merged commit f3110aa into scrapy:master Dec 31, 2014
@curita curita deleted the fix-985 branch December 31, 2014 16:16
@nramirezuy
Copy link
Contributor

It is fixed on http 1.1; but what happens with http 1.0? Does it reproduce the issue?

@curita
Copy link
Member Author

curita commented Dec 31, 2014

http 1.0 doesn't have a close method because it doesn't have any persistent resources to free (it doesn't keep track of persistent connections like with http 1.1's pool), so it can't hang and closes immediately.

#
# closeCachedConnections doesn't handle external errbacks, so we'll
# issue a callback after `_disconnect_timeout` seconds.
delayed_call = reactor.callLater(self._disconnect_timeout, d.callback, [])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that while this will "unblock" the deferred, it will left the connection opened...
so you should also do something like self.transport.abortConnection()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scrapy 0.25 hangs closing spider
4 participants