Refactor signals #8

Open
dangra opened this Issue Sep 9, 2011 · 16 comments

Comments

Projects
None yet
10 participants
@dangra
Member

dangra commented Sep 9, 2011

Refactor signal handling similar how Django did for 1.0.

According to them, the new approach brings up to 90% speed improvements

@pablohoffman

This comment has been minimized.

Show comment
Hide comment
@pablohoffman

pablohoffman Sep 7, 2012

Member

Signals got their own API on 0.15 (no longer need to import pydispatcher) but we can still rewrite the backend (which is what this issue is for) - it will actually be easier to do it now.

Member

pablohoffman commented Sep 7, 2012

Signals got their own API on 0.15 (no longer need to import pydispatcher) but we can still rewrite the backend (which is what this issue is for) - it will actually be easier to do it now.

@artemdevel

This comment has been minimized.

Show comment
Hide comment
@artemdevel

artemdevel Feb 20, 2013

Contributor

Perhaps we still need some refactoring here (or maybe I found a bug). I added some test code here https://gist.github.com/artem-dev/4996685 So at first I tried to use new API as shown in test5.py but this approach didn't work for me. I managed to make it work as shown in test5_2.py but as for me this is even less convenient than old approach which is shown in test5_1.py Did I use this new API wrong way or is it some sort of bug?

Contributor

artemdevel commented Feb 20, 2013

Perhaps we still need some refactoring here (or maybe I found a bug). I added some test code here https://gist.github.com/artem-dev/4996685 So at first I tried to use new API as shown in test5.py but this approach didn't work for me. I managed to make it work as shown in test5_2.py but as for me this is even less convenient than old approach which is shown in test5_1.py Did I use this new API wrong way or is it some sort of bug?

@artemdevel

This comment has been minimized.

Show comment
Hide comment
@artemdevel

artemdevel Feb 20, 2013

Contributor

Just after my previous post I decided to do one more small test, looks like this change https://gist.github.com/artem-dev/4996755 fixes the issue I described earlier, so sm = SignalManager() is working, but I'm not sure if using dispatcher.Any instead of dispatcher.Anonymous is ok for the rest of the engine (but my simple example worked fine)

Contributor

artemdevel commented Feb 20, 2013

Just after my previous post I decided to do one more small test, looks like this change https://gist.github.com/artem-dev/4996755 fixes the issue I described earlier, so sm = SignalManager() is working, but I'm not sure if using dispatcher.Any instead of dispatcher.Anonymous is ok for the rest of the engine (but my simple example worked fine)

@artemdevel

This comment has been minimized.

Show comment
Hide comment
@artemdevel

artemdevel Feb 27, 2013

Contributor

hi @pablohoffman this code https://gist.github.com/artem-dev/5046355 is working, is it a correct way to use the new SignalManager?

Contributor

artemdevel commented Feb 27, 2013

hi @pablohoffman this code https://gist.github.com/artem-dev/5046355 is working, is it a correct way to use the new SignalManager?

@pablohoffman

This comment has been minimized.

Show comment
Hide comment
@pablohoffman

pablohoffman Feb 27, 2013

Member

I'm pasting a comment I wrongly made in another issue:

@artem-dev you are instantiating a SignalManager in your own spider, that's not how it's supposed to work. You should be connecting to the SignalManager of the Crawler controlling your spider (accessible through crawler.signals).

User code should never instantiate a SignalManager, but use the one instantiated by framework code (and accessible through the Crawler object).

I hope this is more clear now.

Member

pablohoffman commented Feb 27, 2013

I'm pasting a comment I wrongly made in another issue:

@artem-dev you are instantiating a SignalManager in your own spider, that's not how it's supposed to work. You should be connecting to the SignalManager of the Crawler controlling your spider (accessible through crawler.signals).

User code should never instantiate a SignalManager, but use the one instantiated by framework code (and accessible through the Crawler object).

I hope this is more clear now.

@pablohoffman

This comment has been minimized.

Show comment
Hide comment
@pablohoffman

pablohoffman Feb 27, 2013

Member

That's right @artem-dev. However, doing it in parse would cause it to be called many times, so start_requests would be more appropriate.

Member

pablohoffman commented Feb 27, 2013

That's right @artem-dev. However, doing it in parse would cause it to be called many times, so start_requests would be more appropriate.

@artemdevel

This comment has been minimized.

Show comment
Hide comment
@artemdevel

artemdevel Feb 27, 2013

Contributor

yeah, it was just for a test

Contributor

artemdevel commented Feb 27, 2013

yeah, it was just for a test

@kmike

This comment has been minimized.

Show comment
Hide comment
@kmike

kmike Jul 14, 2014

Member

For the reference, django's implementation: https://github.com/django/django/tree/master/django/dispatch
One advantage of switching to django implementation is that is supports Python 3.x. But I haven't checked if it is possible to switch.

Member

kmike commented Jul 14, 2014

For the reference, django's implementation: https://github.com/django/django/tree/master/django/dispatch
One advantage of switching to django implementation is that is supports Python 3.x. But I haven't checked if it is possible to switch.

jtwaleson pushed a commit to jtwaleson/scrapy that referenced this issue Nov 23, 2014

@jdemaeyer

This comment has been minimized.

Show comment
Hide comment
@jdemaeyer

jdemaeyer Mar 27, 2015

Contributor

Here is a partial proof of concept for switching the backend to django.dispatch while conserving support for the old signaling API: http://jakobdemaeyer.com/x/scrapy_signaling_proof.html

In this implementation, the SignalManager is deprecated and replaced by a NewSignalManager whose sole purpose is backwards-compatibility. The major open question is how to assign the Crawler instance as default sender for a signal, as the signals are now essentially completely decoupled from anything else.

A different implementation could preserve using the SignalManager by solely rewriting the backend, using the NewSignalManager implementation without warnings. Signals would then still be emitted (and handlers registered) via the Crawler.signals attribute instead of on the Signals themselves.

Contributor

jdemaeyer commented Mar 27, 2015

Here is a partial proof of concept for switching the backend to django.dispatch while conserving support for the old signaling API: http://jakobdemaeyer.com/x/scrapy_signaling_proof.html

In this implementation, the SignalManager is deprecated and replaced by a NewSignalManager whose sole purpose is backwards-compatibility. The major open question is how to assign the Crawler instance as default sender for a signal, as the signals are now essentially completely decoupled from anything else.

A different implementation could preserve using the SignalManager by solely rewriting the backend, using the NewSignalManager implementation without warnings. Signals would then still be emitted (and handlers registered) via the Crawler.signals attribute instead of on the Signals themselves.

@jdemaeyer

This comment has been minimized.

Show comment
Hide comment
@jdemaeyer

jdemaeyer Apr 12, 2015

Contributor

Here's a list of signaling frameworks that might be of interest: http://www.fortpedro.com/blog/2015/1/observer-pattern-event-oriented-programming-in-python

Contributor

jdemaeyer commented Apr 12, 2015

Here's a list of signaling frameworks that might be of interest: http://www.fortpedro.com/blog/2015/1/observer-pattern-event-oriented-programming-in-python

@kmike

This comment has been minimized.

Show comment
Hide comment
Member

kmike commented Apr 12, 2015

@eLRuLL

This comment has been minimized.

Show comment
Hide comment
@eLRuLL

eLRuLL May 4, 2015

Contributor

I would like to work on this issue 😄

Contributor

eLRuLL commented May 4, 2015

I would like to work on this issue 😄

@curita

This comment has been minimized.

Show comment
Hide comment
@curita

curita May 5, 2015

Member

If anybody is currently working on this issue let us know, otherwise @eLRuLL: all yours!

Member

curita commented May 5, 2015

If anybody is currently working on this issue let us know, otherwise @eLRuLL: all yours!

@kmike

This comment has been minimized.

Show comment
Hide comment
@kmike

kmike Jun 15, 2015

Member

I've profiled a simple spider which downloads a page and follows all links from it using LinkExractor. There is a non-standard CrawlerProcess which listens to all signals from its Crawlers and re-sends them through its own SignalManager to provide a single place to listen for all signals - nothing complicated. I haven't checked it with a standard CrawlerProcess.

For this spider signal handling (SignalManager.send_catch_log function) is one of the bottlenecks - it takes ~5 times more time than HTML parsing.

Member

kmike commented Jun 15, 2015

I've profiled a simple spider which downloads a page and follows all links from it using LinkExractor. There is a non-standard CrawlerProcess which listens to all signals from its Crawlers and re-sends them through its own SignalManager to provide a single place to listen for all signals - nothing complicated. I haven't checked it with a standard CrawlerProcess.

For this spider signal handling (SignalManager.send_catch_log function) is one of the bottlenecks - it takes ~5 times more time than HTML parsing.

@redapple

This comment has been minimized.

Show comment
Hide comment
@redapple

redapple Sep 13, 2016

Contributor

See #2030

Contributor

redapple commented Sep 13, 2016

See #2030

@tifcty

This comment has been minimized.

Show comment
Hide comment
@tifcty

tifcty Sep 19, 2016

thanks to all of them

tifcty commented Sep 19, 2016

thanks to all of them

@redapple redapple added this to the v1.3 milestone Oct 28, 2016

@redapple redapple modified the milestones: v1.4, v1.3 Nov 23, 2016

@kmike kmike removed this from the v1.5 milestone Dec 22, 2017

@cathalgarvey cathalgarvey added stale and removed stale labels Feb 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment