You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It looks like the tricky part is Downloader. By default Scrapy processes 16 requests in parallel. It means that Downloader asks Scheduler for 16 requests; after Downloader get them they are no longer handled by Scheduler. Downloader processes these requests in no particular order if there is enough concurrency per domain (8 by defualt), or uses a FIFO queue if concurrency is not enough. FIFO means BFO crawl, so for short request queues order is BFO regardless of Scheduler settings.
@vincent-ferotin sent a message to scrapy-users a while ago (https://groups.google.com/forum/#!searchin/scrapy-users/Vincent$20F%C3%A9rotin/scrapy-users/n56O2sCAbp0/IbN8XGusAgAJ) and created a repo (https://github.com/vincent-ferotin/scraping-github) to demonstrate an issue: http://doc.scrapy.org/en/latest/faq.html#does-scrapy-crawl-in-breadth-first-or-depth-first-order says scrapy crawls DFO by default, but in practice this is not observed.
It looks like the tricky part is Downloader. By default Scrapy processes 16 requests in parallel. It means that Downloader asks Scheduler for 16 requests; after Downloader get them they are no longer handled by Scheduler. Downloader processes these requests in no particular order if there is enough concurrency per domain (8 by defualt), or uses a FIFO queue if concurrency is not enough. FIFO means BFO crawl, so for short request queues order is BFO regardless of Scheduler settings.
I think we should at least clarify that in docs.
See also: #1727, #1440, #1371.
The text was updated successfully, but these errors were encountered: