Patch hanging HTTPConnectionPool.closeCachedConnections call #999
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes #985
The issue while closing the spider on that domain raised while closing the
HTTP11DownloadHandler(it can be seen here). The server doesn't close the connection properly so the client waits for a confirmation that doesn't arrive. I've reported this on the Twisted issue tracker (#7738) since this a problem concerning theirHTTPConnectionPoolwhile cleaning persistent connections, but I've patched it externally by firing a deferred on a DelayedCalled.This problem hasn't came up before the changes on the crawling API because the reactor was stopped along the download handlers on the engine_stop signal (CrawlerProcess@8fece4b and DownloadHandlers@8fece4b). Instead of that, now the reactor is stopped after the crawl deferreds has been fired (CrawlerProcess), which happens after each engine has stopped, so the
HTTP11DownloadHandler.closeisn't abruptly terminated.I chose a
_disconnect_timeoutof one second on a tradeoff between the previous instant termination and giving a little time to the connections to close in an orderly manner. It could be a new Scrapy setting, and that's why I set this variable on the class init, but I think that kind of parametrization is not needed right now.I struggled on mocking a server that mimics this behavior (Twisted doesn't provide a way to do it as they manage the sockets internally, and I'm still not sure how to do it otherwise), so that's why I'm submitting the PR as it is and later I'll try to add a proper test.