Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] New signal for reqeuests reached downloader #3393

Conversation

whalebot-helmsman
Copy link
Contributor

New signal added with documentation to be emitted for requests reached downloader. It is done to provide better integration between scheduler and downloader without adding much complexity, based on previous discussions: #2435 #2474

Signal name is discussable. Documentation can be improved.

@codecov
Copy link

codecov bot commented Aug 17, 2018

Codecov Report

Merging #3393 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #3393      +/-   ##
==========================================
+ Coverage   84.46%   84.47%   +<.01%     
==========================================
  Files         167      167              
  Lines        9362     9366       +4     
  Branches     1390     1390              
==========================================
+ Hits         7908     7912       +4     
  Misses       1199     1199              
  Partials      255      255
Impacted Files Coverage Δ
scrapy/signals.py 100% <100%> (ø) ⬆️
scrapy/core/downloader/__init__.py 88.97% <100%> (+0.08%) ⬆️
scrapy/spidermiddlewares/offsite.py 100% <0%> (ø) ⬆️

@whalebot-helmsman
Copy link
Contributor Author

Made a performance comparison with new signal and without one. Full log available https://gist.github.com/whalebot-helmsman/427908453e28e4d91ba24062a3a1aa05

==> /home/hotdox/exc/master <==
2018-08-28 09:11:37 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)

The results of the benchmark are (all speeds in items/sec) : 


Test = 'Book Spider' Iterations = '10'


Mean : 86.57660125097868 Median : 86.4017169447081 Std Dev : 0.8022010534209844


==> /home/hotdox/exc/branch <==
2018-08-28 09:09:11 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)

The results of the benchmark are (all speeds in items/sec) : 


Test = 'Book Spider' Iterations = '10'


Mean : 86.94572398605807 Median : 86.71087801245771 Std Dev : 0.9639990239084966

No substantial impact on performance

.. signal:: request_reached_downloader
.. function:: request_reached_downloader(request, spider)

Sent when a :class:`~scrapy.http.Request`, reached downloader.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: comma is not needed here

@kmike kmike changed the title New signal for reqeuests reached downloader [MRG+1] New signal for reqeuests reached downloader Aug 28, 2018
@kmike kmike added this to the v1.6 milestone Aug 28, 2018
@kmike
Copy link
Member

kmike commented Aug 28, 2018

👍 I think that's a good feature which may be used not only for custom schedulers; overhead doesn't seem significant. Signal name and the implementation look fine.

@dangra dangra merged commit 4da0b59 into scrapy:master Sep 5, 2018
@kmike kmike mentioned this pull request Sep 24, 2018
@whalebot-helmsman whalebot-helmsman deleted the singal-request-added-to-downloader-slot branch July 18, 2019 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants