diff --git a/docs/topics/feed-exports.rst b/docs/topics/feed-exports.rst index 37b7096f665..0f0f258dc0e 100644 --- a/docs/topics/feed-exports.rst +++ b/docs/topics/feed-exports.rst @@ -321,13 +321,14 @@ The following is a list of the accepted keys and the setting that is used as a fallback value if that key is not provided for a specific feed definition. * ``format``: the serialization format to be used for the feed. - See :ref:`topics-feed-format` for possible values. + See :ref:`topics-feed-format` for possible values. Mandatory, no fallback setting +* ``batch_item_count``: falls back to :setting:`FEED_EXPORT_BATCH_ITEM_COUNT` * ``encoding``: falls back to :setting:`FEED_EXPORT_ENCODING` * ``fields``: falls back to :setting:`FEED_EXPORT_FIELDS` * ``indent``: falls back to :setting:`FEED_EXPORT_INDENT` * ``store_empty``: falls back to :setting:`FEED_STORE_EMPTY` -* ``batch_item_count``: falls back to :setting:`FEED_EXPORT_BATCH_ITEM_COUNT` +* ``uri_params``: falls back to :setting:`FEED_URI_PARAMS` .. setting:: FEED_EXPORT_ENCODING @@ -500,7 +501,7 @@ generated: * ``%(batch_time)s`` - gets replaced by a timestamp when the feed is being created (e.g. ``2020-03-28T14-45-08.237134``) -* ``%(batch_id)d`` - gets replaced by the sequence number of the batch. +* ``%(batch_id)d`` - gets replaced by the 1-based sequence number of the batch. Use :ref:`printf-style string formatting ` to alter the number format. For example, to make the batch ID a 5-digit @@ -517,16 +518,74 @@ And your :command:`crawl` command line is:: The command line above can generate a directory tree like:: -->projectname --->dirname ---->1-filename2020-03-28T14-45-08.237134.json ---->2-filename2020-03-28T14-45-09.148903.json ---->3-filename2020-03-28T14-45-10.046092.json + ->projectname + -->dirname + --->1-filename2020-03-28T14-45-08.237134.json + --->2-filename2020-03-28T14-45-09.148903.json + --->3-filename2020-03-28T14-45-10.046092.json Where the first and second files contain exactly 100 items. The last one contains 100 items or fewer. +.. setting:: FEED_URI_PARAMS + +FEED_URI_PARAMS +--------------- + +Default: ``None`` + +A string with the import path of a function to set the parameters to apply with +:ref:`printf-style string formatting ` to the +feed URI. + +The function signature should be as follows: + +.. function:: uri_params(params, spider) + + Return a :class:`dict` of key-value pairs to apply to the feed URI using + :ref:`printf-style string formatting `. + + :param params: default key-value pairs + + Specifically: + + - ``batch_id``: ID of the file batch. See + :setting:`FEED_EXPORT_BATCH_ITEM_COUNT`. + + If :setting:`FEED_EXPORT_BATCH_ITEM_COUNT` is ``0``, ``batch_id`` + is always ``1``. + + - ``batch_time``: UTC date and time, in ISO format with ``:`` + replaced with ``-``. + + See :setting:`FEED_EXPORT_BATCH_ITEM_COUNT`. + + - ``time``: ``batch_time``, with microseconds set to ``0``. + :type params: dict + + :param spider: source spider of the feed items + :type spider: scrapy.spiders.Spider + +For example, to include the :attr:`name ` of the +source spider in the feed URI: + +#. Define the following function somewhere in your project:: + + # myproject/utils.py + def uri_params(params, spider): + return {**params, 'spider_name': spider.name} + +#. Point :setting:`FEED_URI_PARAMS` to that function in your settings:: + + # myproject/settings.py + FEED_URI_PARAMS = 'myproject.utils.uri_params' + +#. Use ``%(spider_name)s`` in your feed URI:: + + scrapy crawl -o "%(spider_name)s.jl" + + .. _URIs: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier .. _Amazon S3: https://aws.amazon.com/s3/ .. _botocore: https://github.com/boto/botocore