Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Backwards compatible per key priorities #1586

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 13 additions & 11 deletions docs/topics/downloader-middleware.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,22 @@ Here's an example::
'myproject.middlewares.CustomDownloaderMiddleware': 543,
}

The specified :setting:`DOWNLOADER_MIDDLEWARES` setting is merged with the
default one (i.e. it does not overwrite it) and then sorted by order to get the
final sorted list of enabled middlewares: the first middleware is the one
closer to the engine and the last is the one closer to the downloader.

To decide which order to assign to your middleware see the default
:setting:`DOWNLOADER_MIDDLEWARES` setting and pick a value according to
The :setting:`DOWNLOADER_MIDDLEWARES` setting is merged with the
:setting:`DOWNLOADER_MIDDLEWARES_BASE` setting defined in Scrapy (and not meant
to be overridden) and then sorted by order to get the final sorted list of
enabled middlewares: the first middleware is the one closer to the engine and
the last is the one closer to the downloader.

To decide which order to assign to your middleware see the
:setting:`DOWNLOADER_MIDDLEWARES_BASE` setting and pick a value according to
where you want to insert the middleware. The order does matter because each
middleware performs a different action and your middleware could depend on some
previous (or subsequent) middleware being applied.

If you want to disable a built-in middleware you must define it in your
project's :setting:`DOWNLOADER_MIDDLEWARES` setting and assign ``None`` as its
value. For example, if you want to disable the user-agent middleware::
If you want to disable a built-in middleware (the ones defined in
:setting:`DOWNLOADER_MIDDLEWARES_BASE` and enabled by default) you must define it
in your project's :setting:`DOWNLOADER_MIDDLEWARES` setting and assign `None`
as its value. For example, if you want to disable the user-agent middleware::

DOWNLOADER_MIDDLEWARES = {
'myproject.middlewares.CustomDownloaderMiddleware': 543,
Expand Down Expand Up @@ -162,7 +164,7 @@ middleware, see the :ref:`downloader middleware usage guide
<topics-downloader-middleware>`.

For a list of the components enabled by default (and their orders) see the
:setting:`DOWNLOADER_MIDDLEWARES` setting.
:setting:`DOWNLOADER_MIDDLEWARES_BASE` setting.

.. _cookies-mw:

Expand Down
13 changes: 7 additions & 6 deletions docs/topics/extensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,14 @@ by a string: the full Python path to the extension's class name. For example::

As you can see, the :setting:`EXTENSIONS` setting is a dict where the keys are
the extension paths, and their values are the orders, which define the
extension *loading* order. The specified :setting:`EXTENSIONS` setting is merged
with the default one (i.e. it does not overwrite it) and then sorted by order
to get the final sorted list of enabled extensions.
extension *loading* order. The :setting:`EXTENSIONS` setting is merged with the
:setting:`EXTENSIONS_BASE` setting defined in Scrapy (and not meant to be
overridden) and then sorted by order to get the final sorted list of enabled
extensions.

As extensions typically do not depend on each other, their loading order is
irrelevant in most cases. This is why the default :setting:`EXTENSIONS` setting
defines all extensions with the same order (``500``). However, this feature can
irrelevant in most cases. This is why the :setting:`EXTENSIONS_BASE` setting
defines all extensions with the same order (``0``). However, this feature can
be exploited if you need to add an extension which depends on other extensions
already loaded.

Expand All @@ -63,7 +64,7 @@ Disabling an extension
======================

In order to disable an extension that comes enabled by default (ie. those
included in the default :setting:`EXTENSIONS` setting) you must set its order to
included in the :setting:`EXTENSIONS_BASE` setting) you must set its order to
``None``. For example::

EXTENSIONS = {
Expand Down
47 changes: 34 additions & 13 deletions docs/topics/feed-exports.rst
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,16 @@ Whether to export empty feeds (ie. feeds with no items).
FEED_STORAGES
-------------

Default:: ``{}``

A dict containing additional feed storage backends supported by your project.
The keys are URI schemes and the values are paths to storage classes.

.. setting:: FEED_STORAGES_BASE

FEED_STORAGES_BASE
------------------

Default::

{
Expand All @@ -275,19 +285,30 @@ Default::
'ftp': 'scrapy.extensions.feedexport.FTPFeedStorage',
}

A dict containing all feed storage backends supported by your project. The keys
are URI schemes and the values are paths to storage classes.
A dict containing the built-in feed storage backends supported by Scrapy. You
can disable any of these backends by assigning ``None`` to their URI scheme in
:setting:`FEED_STORAGES`. E.g., to disable the built-in FTP storage backend
(without replacement), place this in your ``settings.py``::

When you set :setting:`FEED_STORAGES` manually, e.g. in your project's settings
module, it will be merged with the default, not overwrite it. If you want to
disable any of the default feed storage backends, you must assign ``None`` as
their value.
FEED_STORAGES = {
'ftp': None,
}

.. setting:: FEED_EXPORTERS

FEED_EXPORTERS
--------------

Default:: ``{}``

A dict containing additional exporters supported by your project. The keys are
serialization formats and the values are paths to :ref:`Item exporter
<topics-exporters>` classes.

.. setting:: FEED_EXPORTERS_BASE

FEED_EXPORTERS_BASE
-------------------
Default::

{
Expand All @@ -300,14 +321,14 @@ Default::
'pickle': 'scrapy.exporters.PickleItemExporter',
}

A dict containing all feed exporters supported by your project. The keys are
URI schemes and the values are paths to :ref:`Item exporter <topics-exporters>`
classes.
A dict containing the built-in feed exporters supported by Scrapy. You can
disable any of these exporters by assigning ``None`` to their serialization
format in :setting:`FEED_EXPORTERS`. E.g., to disable the built-in CSV exporter
(without replacement), place this in your ``settings.py``::

When you set :setting:`FEED_EXPORTERS` manually, e.g. in your project's settings
module, it will be merged with the default, not overwrite it. If you want to
disable any of the default feed exporters, you must assign ``None`` as their
value.
FEED_EXPORTERS = {
'csv': None,
}

.. _URI: http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
.. _Amazon S3: http://aws.amazon.com/s3/
Expand Down
128 changes: 86 additions & 42 deletions docs/topics/settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -269,11 +269,6 @@ Default::
The default headers used for Scrapy HTTP Requests. They're populated in the
:class:`~scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware`.

When you set :setting:`DEFAULT_REQUEST_HEADERS` manually, e.g. in your
project's settings module, it will be merged with the default, not overwrite it.
If you want to disable any of the default request headers (and not replace them)
you must assign ``None`` as their value.

.. setting:: DEPTH_LIMIT

DEPTH_LIMIT
Expand Down Expand Up @@ -355,6 +350,16 @@ The downloader to use for crawling.
DOWNLOADER_MIDDLEWARES
----------------------

Default:: ``{}``

A dict containing the downloader middlewares enabled in your project, and their
orders. For more info see :ref:`topics-downloader-middleware-setting`.

.. setting:: DOWNLOADER_MIDDLEWARES_BASE

DOWNLOADER_MIDDLEWARES_BASE
---------------------------

Default::

{
Expand All @@ -375,16 +380,11 @@ Default::
'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': 900,
}

A dict containing the downloader middlewares enabled in your project, and their
orders. Low orders are closer to the engine, high orders are closer to the
downloader.

When you set :setting:`DOWNLOADER_MIDDLEWARES` manually, e.g. in your project's
settings module, it will be merged with the default, not overwrite it. If you
want to disable any of the default downloader middlewares you must assign
``None`` as their value.

For more info see :ref:`topics-downloader-middleware-setting`.
A dict containing the downloader middlewares enabled by default in Scrapy. Low
orders are closer to the engine, high orders are closer to the downloader. You
should never modify this setting in your project, modify
:setting:`DOWNLOADER_MIDDLEWARES` instead. For more info see
:ref:`topics-downloader-middleware-setting`.

.. setting:: DOWNLOADER_STATS

Expand Down Expand Up @@ -425,6 +425,16 @@ spider attribute.
DOWNLOAD_HANDLERS
-----------------

Default: ``{}``

A dict containing the request downloader handlers enabled in your project.
See :setting:`DOWNLOAD_HANDLERS_BASE` for example format.

.. setting:: DOWNLOAD_HANDLERS_BASE

DOWNLOAD_HANDLERS_BASE
----------------------

Default::

{
Expand All @@ -436,15 +446,16 @@ Default::
}


A dict containing the request downloader handlers enabled in your project.
A dict containing the request download handlers enabled by default in Scrapy.
You should never modify this setting in your project, modify
:setting:`DOWNLOAD_HANDLERS` instead.

When you set :setting:`DOWNLOAD_HANDLERS` manually, e.g. in your project's
settings module, it will be merged with the default, not overwrite it. If you
want to disable any of the default download handlers you must assign ``None``
as their value. For example, if you want to disable the file download handler::
You can disable any of these download handlers by assigning ``None`` to their
URI scheme in :setting:`DOWNLOAD_HANDLERS`. E.g., to disable the built-in FTP
handler (without replacement), place this in your ``settings.py``::

DOWNLOAD_HANDLERS = {
'file': None,
'ftp': None,
}

.. setting:: DOWNLOAD_TIMEOUT
Expand Down Expand Up @@ -544,6 +555,15 @@ to ``vi`` (on Unix systems) or the IDLE editor (on Windows).
EXTENSIONS
----------

Default:: ``{}``

A dict containing the extensions enabled in your project, and their orders.

.. setting:: EXTENSIONS_BASE

EXTENSIONS_BASE
---------------

Default::

{
Expand All @@ -558,15 +578,10 @@ Default::
'scrapy.extensions.throttle.AutoThrottle': 0,
}

A dict containing the extensions enabled in your project, and their orders. By
default, this setting contains all stable built-in extensions. Keep in mind that
A dict containing the extensions available by default in Scrapy, and their
orders. This setting contains all stable built-in extensions. Keep in mind that
some of them need to be enabled through a setting.

When you set :setting:`EXTENSIONS` manually, e.g. in your project's settings
module, it will be merged with the default, not overwrite it. If you want to
disable any of the default enabled extensions you must assign ``None`` as their
value.

For more information See the :ref:`extensions user guide <topics-extensions>`
and the :ref:`list of available extensions <topics-extensions-ref>`.

Expand All @@ -589,6 +604,16 @@ Example::
'mybot.pipelines.validate.StoreMyItem': 800,
}

.. setting:: ITEM_PIPELINES_BASE

ITEM_PIPELINES_BASE
-------------------

Default: ``{}``

A dict containing the pipelines enabled by default in Scrapy. You should never
modify this setting in your project, modify :setting:`ITEM_PIPELINES` instead.

.. setting:: LOG_ENABLED

LOG_ENABLED
Expand Down Expand Up @@ -878,6 +903,16 @@ The scheduler to use for crawling.
SPIDER_CONTRACTS
----------------

Default:: ``{}``

A dict containing the spider contracts enabled in your project, used for
testing spiders. For more info see :ref:`topics-contracts`.

.. setting:: SPIDER_CONTRACTS_BASE

SPIDER_CONTRACTS_BASE
---------------------

Default::

{
Expand All @@ -886,13 +921,17 @@ Default::
'scrapy.contracts.default.ScrapesContract': 3,
}

A dict containing the scrapy contracts enabled in your project, used for
testing spiders. For more info see :ref:`topics-contracts`.
A dict containing the scrapy contracts enabled by default in Scrapy. You should
never modify this setting in your project, modify :setting:`SPIDER_CONTRACTS`
instead. For more info see :ref:`topics-contracts`.

You can disable any of these contracts by assigning ``None`` to their class
path in :setting:`SPIDER_CONTRACTS`. E.g., to disable the built-in
``ScrapesContract``, place this in your ``settings.py``::

When you set :setting:`SPIDER_CONTRACTS` manually, e.g. in your project's
settings module, it will be merged with the default, not overwrite it. If you
want to disable any of the default contracts you must assign ``None`` as their
value.
SPIDER_CONTRACTS = {
'scrapy.contracts.default.ScrapesContract': None,
}

.. setting:: SPIDER_LOADER_CLASS

Expand All @@ -909,6 +948,16 @@ The class that will be used for loading spiders, which must implement the
SPIDER_MIDDLEWARES
------------------

Default:: ``{}``

A dict containing the spider middlewares enabled in your project, and their
orders. For more info see :ref:`topics-spider-middleware-setting`.

.. setting:: SPIDER_MIDDLEWARES_BASE

SPIDER_MIDDLEWARES_BASE
-----------------------

Default::

{
Expand All @@ -919,14 +968,9 @@ Default::
'scrapy.spidermiddlewares.depth.DepthMiddleware': 900,
}

A dict containing the spider middlewares enabled in your project, and their
orders. Low orders are closer to the engine, high orders are closer to the
spider. For more info see :ref:`topics-spider-middleware-setting`.

When you set :setting:`SPIDER_MIDDLEWARES` manually, e.g. in your project's
settings module, it will be merged with the default, not overwrite it. If you
want to disable any of the default spider middlewares you must assign ``None``
as their value.
A dict containing the spider middlewares enabled by default in Scrapy, and
their orders. Low orders are closer to the engine, high orders are closer to
the spider. For more info see :ref:`topics-spider-middleware-setting`.

.. setting:: SPIDER_MODULES

Expand Down
Loading