DFO vs BFO in Scrapy FAQ #1739

kmike · 2016-01-27T18:09:39Z

@vincent-ferotin sent a message to scrapy-users a while ago (https://groups.google.com/forum/#!searchin/scrapy-users/Vincent$20F%C3%A9rotin/scrapy-users/n56O2sCAbp0/IbN8XGusAgAJ) and created a repo (https://github.com/vincent-ferotin/scraping-github) to demonstrate an issue: http://doc.scrapy.org/en/latest/faq.html#does-scrapy-crawl-in-breadth-first-or-depth-first-order says scrapy crawls DFO by default, but in practice this is not observed.

It looks like the tricky part is Downloader. By default Scrapy processes 16 requests in parallel. It means that Downloader asks Scheduler for 16 requests; after Downloader get them they are no longer handled by Scheduler. Downloader processes these requests in no particular order if there is enough concurrency per domain (8 by defualt), or uses a FIFO queue if concurrency is not enough. FIFO means BFO crawl, so for short request queues order is BFO regardless of Scheduler settings.

I think we should at least clarify that in docs.

See also: #1727, #1440, #1371.

nramirezuy · 2016-01-27T19:35:59Z

Yea the buffer is not clear from the docs you need to read the code to notice it.

redapple added the docs label Jan 29, 2016

redapple changed the title ~~DFO vs BFO ins Scrapy FAQ~~ DFO vs BFO in Scrapy FAQ Sep 19, 2016

redapple added the help wanted label Sep 19, 2016

Gallaecio mentioned this issue Feb 12, 2019

Document that the crawl order is BFO for small numbers of start requests #3621

Merged

kmike closed this as completed in #3621 Jul 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DFO vs BFO in Scrapy FAQ #1739

DFO vs BFO in Scrapy FAQ #1739

kmike commented Jan 27, 2016 •

edited

Loading

nramirezuy commented Jan 27, 2016

DFO vs BFO in Scrapy FAQ #1739

DFO vs BFO in Scrapy FAQ #1739

Comments

kmike commented Jan 27, 2016 • edited Loading

nramirezuy commented Jan 27, 2016

kmike commented Jan 27, 2016 •

edited

Loading