From bebcd5081ce50be17434d8815e9037130d709f6e Mon Sep 17 00:00:00 2001 From: Jose Ricardo Date: Tue, 18 Oct 2016 11:22:55 -0200 Subject: [PATCH 1/3] Add downloader middleware ordering details to the docs Add more details, making it easier to understand what are the effects of setting a downloader middleware order. --- docs/topics/downloader-middleware.rst | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/topics/downloader-middleware.rst b/docs/topics/downloader-middleware.rst index 31545d548d3..15069e56ec5 100644 --- a/docs/topics/downloader-middleware.rst +++ b/docs/topics/downloader-middleware.rst @@ -27,7 +27,11 @@ The :setting:`DOWNLOADER_MIDDLEWARES` setting is merged with the :setting:`DOWNLOADER_MIDDLEWARES_BASE` setting defined in Scrapy (and not meant to be overridden) and then sorted by order to get the final sorted list of enabled middlewares: the first middleware is the one closer to the engine and -the last is the one closer to the downloader. +the last is the one closer to the downloader. In other words, +the :meth:`~scrapy.downloadermiddlewares.DownloaderMiddleware.process_request` +method of each middleware will be invoked in increasing +middleware order (100, 200, 300, ...) and the :meth:`~scrapy.downloadermiddlewares.DownloaderMiddleware.process_response` method +of each middleware will be invoked in decreasing order. To decide which order to assign to your middleware see the :setting:`DOWNLOADER_MIDDLEWARES_BASE` setting and pick a value according to From ea7bd39529347af3b8e30c0588428ac442151673 Mon Sep 17 00:00:00 2001 From: Jose Ricardo Date: Tue, 18 Oct 2016 11:48:58 -0200 Subject: [PATCH 2/3] Make architecture overview references a little more clear on the docs Expliciting what actually happens by adding links to the respective methods that are invoked in each processing phase. --- docs/topics/architecture.rst | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/docs/topics/architecture.rst b/docs/topics/architecture.rst index 91c80acc0c8..ea0cb0ea77f 100644 --- a/docs/topics/architecture.rst +++ b/docs/topics/architecture.rst @@ -41,25 +41,26 @@ this: 4. The :ref:`Engine ` sends the Requests to the :ref:`Downloader `, passing through the - :ref:`Downloader Middleware ` - (requests direction). + :ref:`Downloader Middlewares ` (see + :meth:`~scrapy.downloadermiddlewares.DownloaderMiddleware.process_request`). 5. Once the page finishes downloading the :ref:`Downloader ` generates a Response (with that page) and sends it to the Engine, passing through the - :ref:`Downloader Middleware ` - (response direction). + :ref:`Downloader Middlewares ` (see + :meth:`~scrapy.downloadermiddlewares.DownloaderMiddleware.process_response`). 6. The :ref:`Engine ` receives the Response from the :ref:`Downloader ` and sends it to the :ref:`Spider ` for processing, passing - through the :ref:`Spider Middleware ` - (input direction). + through the :ref:`Spider Middleware ` (see + :meth:`~scrapy.spidermiddlewares.SpiderMiddleware.process_spider_input`). 7. The :ref:`Spider ` processes the Response and returns scraped items and new Requests (to follow) to the :ref:`Engine `, passing through the - :ref:`Spider Middleware ` (output direction). + :ref:`Spider Middleware ` (see + :meth:`~scrapy.spidermiddlewares.SpiderMiddleware.process_spider_output`). 8. The :ref:`Engine ` sends processed items to :ref:`Item Pipelines `, then send processed Requests to From e12e364a40e951ffa76ef15b3f24fa64abb0f1bb Mon Sep 17 00:00:00 2001 From: Jose Ricardo Date: Tue, 18 Oct 2016 12:29:30 -0200 Subject: [PATCH 3/3] Add details to the spider middlewares docs Document the effects of the middleware order in a more detailed way. --- docs/topics/spider-middleware.rst | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/topics/spider-middleware.rst b/docs/topics/spider-middleware.rst index a38c1ab6555..604f1864f73 100644 --- a/docs/topics/spider-middleware.rst +++ b/docs/topics/spider-middleware.rst @@ -28,7 +28,12 @@ The :setting:`SPIDER_MIDDLEWARES` setting is merged with the :setting:`SPIDER_MIDDLEWARES_BASE` setting defined in Scrapy (and not meant to be overridden) and then sorted by order to get the final sorted list of enabled middlewares: the first middleware is the one closer to the engine and the last -is the one closer to the spider. +is the one closer to the spider. In other words, +the :meth:`~scrapy.spidermiddlewares.SpiderMiddleware.process_spider_input` +method of each middleware will be invoked in increasing +middleware order (100, 200, 300, ...), and the +:meth:`~scrapy.spidermiddlewares.SpiderMiddleware.process_spider_output` method +of each middleware will be invoked in decreasing order. To decide which order to assign to your middleware see the :setting:`SPIDER_MIDDLEWARES_BASE` setting and pick a value according to where