Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Round Robin Queue #21

Merged
merged 7 commits into from
Mar 8, 2018
Merged

Adding Round Robin Queue #21

merged 7 commits into from
Mar 8, 2018

Conversation

tianhuil
Copy link
Contributor

@tianhuil tianhuil commented Feb 25, 2018

This queue (with tests) is to solve the issues raised scrapy/scrapy#2474 and scrapy/scrapy#1802

I would like a domain scheduler implemented here which scrapes in a domain-smart way: by round-robin cycling through the domains. This has two benefits:

  1. Spreading out load on the target server instead of hitting the server with many requests at once
  2. Reducing delays caused by server-overloaded errors or CONCURRENT_REQUESTS_PER_IP type restrictions.

This implements the proposed solution in scrapy/scrapy#1802. I would like to merge the round-robin queue first, and then merge in the changes from in the domain scheduler into scrapy.

@codecov
Copy link

codecov bot commented Feb 25, 2018

Codecov Report

Merging #21 into master will decrease coverage by 0.99%.
The diff coverage is 92.3%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #21   +/-   ##
=======================================
- Coverage   98.52%   97.53%   -1%     
=======================================
  Files           3        4    +1     
  Lines         204      243   +39     
  Branches       26       34    +8     
=======================================
+ Hits          201      237   +36     
- Misses          1        2    +1     
- Partials        2        4    +2
Impacted Files Coverage Δ
queuelib/__init__.py 100% <100%> (ø) ⬆️
queuelib/rrqueue.py 92.1% <92.1%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 06439b7...b608deb. Read the comment docs.

.gitignore Outdated
@@ -0,0 +1,101 @@
# Byte-compiled / optimized / DLL files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, remove this file. This kind of files are part of developer environment, if we have something project specific we add it here.

@dangra
Copy link
Member

dangra commented Feb 28, 2018

I like the idea and implementation. Let's merge as soon as my .gitignore comment is sorted out.

@tianhuil
Copy link
Contributor Author

tianhuil commented Mar 1, 2018

Thanks @dangra: I removed the .gitignore. Let me know if you'd like me to do anything else!

@dangra
Copy link
Member

dangra commented Mar 1, 2018

@cathalgarvey LGTM and release but we need to fix travis-ci build which seems broken due to missing pypy binary.

@dangra dangra merged commit e013af8 into scrapy:master Mar 8, 2018
@dangra
Copy link
Member

dangra commented Mar 8, 2018

broken travis-ci builds addressed by #22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants