Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DFO vs BFO in Scrapy FAQ #1739

Closed
kmike opened this issue Jan 27, 2016 · 1 comment · Fixed by #3621
Closed

DFO vs BFO in Scrapy FAQ #1739

kmike opened this issue Jan 27, 2016 · 1 comment · Fixed by #3621

Comments

@kmike
Copy link
Member

kmike commented Jan 27, 2016

@vincent-ferotin sent a message to scrapy-users a while ago (https://groups.google.com/forum/#!searchin/scrapy-users/Vincent$20F%C3%A9rotin/scrapy-users/n56O2sCAbp0/IbN8XGusAgAJ) and created a repo (https://github.com/vincent-ferotin/scraping-github) to demonstrate an issue: http://doc.scrapy.org/en/latest/faq.html#does-scrapy-crawl-in-breadth-first-or-depth-first-order says scrapy crawls DFO by default, but in practice this is not observed.

It looks like the tricky part is Downloader. By default Scrapy processes 16 requests in parallel. It means that Downloader asks Scheduler for 16 requests; after Downloader get them they are no longer handled by Scheduler. Downloader processes these requests in no particular order if there is enough concurrency per domain (8 by defualt), or uses a FIFO queue if concurrency is not enough. FIFO means BFO crawl, so for short request queues order is BFO regardless of Scheduler settings.

I think we should at least clarify that in docs.

See also: #1727, #1440, #1371.

@nramirezuy
Copy link
Contributor

Yea the buffer is not clear from the docs you need to read the code to notice it.

@redapple redapple added the docs label Jan 29, 2016
@redapple redapple changed the title DFO vs BFO ins Scrapy FAQ DFO vs BFO in Scrapy FAQ Sep 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants