Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] 0 delay inside utils/defer.py->defer_succeed makes reactor not to be async enough #1075

Closed
wants to merge 1 commit into from

Conversation

@aliowka
Copy link

@aliowka aliowka commented Mar 16, 2015

This will make scrapy to be more async by letting the reactor spend extra cycles to accomplish its needs.

… cycles to accomplish its needs.
@aliowka

This comment has been minimized.

Copy link

@aliowka aliowka commented on 39aefca Mar 16, 2015

This fixes issue #1074

@aliowka aliowka changed the title The fix of issue #1074 0 delay inside utils/defer.py->defer_succeed makes reactor not to be async enough. Fix of issue #1074 Mar 16, 2015
@aliowka aliowka changed the title 0 delay inside utils/defer.py->defer_succeed makes reactor not to be async enough. Fix of issue #1074 0 delay inside utils/defer.py->defer_succeed makes reactor not to be async enough Mar 16, 2015
@pablohoffman
Copy link
Member

@pablohoffman pablohoffman commented Mar 17, 2015

do you have a particular real-world case where this change produces a clear improvement?

@aliowka
Copy link
Author

@aliowka aliowka commented Mar 17, 2015

Yes, sure. Please see the issue #1074.
I need to store a whole htmls of the site with httpcache middlewate to run/rerun different tasks from cache without actually querying the web site. As I described in the issue, when using httpcache middleware, scrapy effectively becoms synchronous (0 delay is not enough). So I can't use a reactor for storing the items to queue. I can't use my txamqp based client https://github.com/aliowka/txamqpr.git for Rabbitmq, it just blows up in memory.
Instead of 70M (without cache) it takes 4G (with)
0 delay implies one reactor cycle that is just not enough for client to actually send the items. So the items remain in the memory as a deferred callbacks and their number only grows.
I suppose the same problem will occur to everyone who wants to use a reactor for extra tasks.

@nramirezuy
Copy link
Contributor

@nramirezuy nramirezuy commented Mar 17, 2015

@aliowka Can we reproduce it with some simple code?

I tried here but I wasn't able to.

@dangra
Copy link
Member

@dangra dangra commented May 15, 2015

+1 to merge.

The original intention of defer_succeed() and defer_fail() was to provide enough reactor cycles for other tasks to run, there is not a point on using callLater() if not. I think the bug and its fix are valid even if 0.1 is an arbitrary value as 0 was.

@dangra
Copy link
Member

@dangra dangra commented May 15, 2015

The background on this issue is that Twisted process timed calls before file descriptors. Zero delay means it will process all delayed calls (pending and newly generated while processing existent calls) immediately after it reaches the reactor loop for 1 second max, only then it attend file descriptors but the next iteration will be busy again for 1 second handling delayed calls.

Scrapy generates lot of delayed calls, it is not rare to add more delayed calls from within delayed calls, so the IO loop is not handling real IO most of the time.

@dangra dangra changed the title 0 delay inside utils/defer.py->defer_succeed makes reactor not to be async enough [MRG+1] 0 delay inside utils/defer.py->defer_succeed makes reactor not to be async enough May 21, 2015
@curita
Copy link
Member

@curita curita commented May 27, 2015

Closed by #1253

@curita curita closed this May 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants