Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0 delay inside utils/defer.py->defer_succeed makes reactor not to be async enough. #1074

Closed
aliowka opened this issue Mar 16, 2015 · 4 comments
Closed

Comments

@aliowka
Copy link

@aliowka aliowka commented Mar 16, 2015

The problem appears when HTTPCACHE_ENABLED is True.
Suppose I need to collect items while using httpcache with FilesystemCacheStorage.
In the items pipeline I want to send them out with some twisted based client.
While having HTTPCACHE_ENABLED I observe that the items are sent very slow and the memory is growing very fast.
For instance:
When cache is disabled: I have about 50 items/second being sent and the total process memory is 70M. When it's enabled I have 3 items/second and 4G of memory used (my items are big). Fortunately my task is small (ends after 3 min.)
When the crawling ends and spider closes, I observe the bulk of items being transfered by the client with 2K items/s speed.
It's clear for me that when the cache is enabled the reactor is not released for enough time to accomplish it's async tasks (items sending) and that what causes this abnormal behavior, making using of cache very questionable.
I found that there is a delay 0 in the scrapy/scrapy/utils/defer.py : defer_succeed function:
reactor.callLater(0, d.callback, result)
that seems just not enough for the cached responses that does not obey to DOWNLOAD_DELAY and potentially may through big bulks of items very fast (it does)
I tried to increase the delay to 0.1 and this fixed the issue.

aliowka referenced this issue Mar 16, 2015
… cycles to accomplish its needs.
@nramirezuy
Copy link
Contributor

@nramirezuy nramirezuy commented Mar 16, 2015

@aliowka Which is your speed and memory usage after the change?

@aliowka
Copy link
Author

@aliowka aliowka commented Mar 16, 2015

it's reaching 80 items/s

@rmax
Copy link
Contributor

@rmax rmax commented May 15, 2015

@aliowka How big are your items? Did you tried using the LevelDB backend?

@curita
Copy link
Member

@curita curita commented May 27, 2015

Closed by #1253

@curita curita closed this May 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.