Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DFO vs BFO in Scrapy FAQ #1739

Closed
kmike opened this issue Jan 27, 2016 · 1 comment
Closed

DFO vs BFO in Scrapy FAQ #1739

kmike opened this issue Jan 27, 2016 · 1 comment

Comments

@kmike
Copy link
Member

@kmike kmike commented Jan 27, 2016

@vincent-ferotin sent a message to scrapy-users a while ago (https://groups.google.com/forum/#!searchin/scrapy-users/Vincent$20F%C3%A9rotin/scrapy-users/n56O2sCAbp0/IbN8XGusAgAJ) and created a repo (https://github.com/vincent-ferotin/scraping-github) to demonstrate an issue: http://doc.scrapy.org/en/latest/faq.html#does-scrapy-crawl-in-breadth-first-or-depth-first-order says scrapy crawls DFO by default, but in practice this is not observed.

It looks like the tricky part is Downloader. By default Scrapy processes 16 requests in parallel. It means that Downloader asks Scheduler for 16 requests; after Downloader get them they are no longer handled by Scheduler. Downloader processes these requests in no particular order if there is enough concurrency per domain (8 by defualt), or uses a FIFO queue if concurrency is not enough. FIFO means BFO crawl, so for short request queues order is BFO regardless of Scheduler settings.

I think we should at least clarify that in docs.

See also: #1727, #1440, #1371.

@nramirezuy
Copy link
Contributor

@nramirezuy nramirezuy commented Jan 27, 2016

Yea the buffer is not clear from the docs you need to read the code to notice it.

@redapple redapple added the docs label Jan 29, 2016
@redapple redapple changed the title DFO vs BFO ins Scrapy FAQ DFO vs BFO in Scrapy FAQ Sep 19, 2016
@kmike kmike closed this in #3621 Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants