Merge branch '2.11' into relnotes-2.11.1

scrapy · Jan 15, 2024 · ddb7367 · ddb7367
2 parents 9a834e6 + 09a7efe
commit ddb7367
Show file tree

Hide file tree

Showing 7 changed files with 53 additions and 20 deletions.
diff --git a/docs/_tests/quotes.html b/docs/_tests/quotes.html
@@ -273,7 +273,7 @@ <h2>Top Ten tags</h2>
                 Quotes by: <a href="https://www.goodreads.com/quotes">GoodReads.com</a>
             </p>
             <p class="copyright">
-                Made with <span class='sh-red'>❤</span> by <a href="https://scrapinghub.com">Scrapinghub</a>
+                Made with <span class='sh-red'>❤</span> by <a href="https://www.zyte.com">Zyte</a>
             </p>
         </div>
     </footer>

diff --git a/docs/_tests/quotes1.html b/docs/_tests/quotes1.html
@@ -273,7 +273,7 @@ <h2>Top Ten tags</h2>
                 Quotes by: <a href="https://www.goodreads.com/quotes">GoodReads.com</a>
             </p>
             <p class="copyright">
-                Made with <span class='sh-red'>❤</span> by <a href="https://scrapinghub.com">Scrapinghub</a>
+                Made with <span class='sh-red'>❤</span> by <a href="https://www.zyte.com">Zyte</a>
             </p>
         </div>
     </footer>

diff --git a/docs/contributing.rst b/docs/contributing.rst
@@ -178,7 +178,7 @@ Scrapy:
 * We use `black <https://black.readthedocs.io/en/stable/>`_ for code formatting.
   There is a hook in the pre-commit config
   that will automatically format your code before every commit. You can also
-  run black manually with ``tox -e black``.
+  run black manually with ``tox -e pre-commit``.
 
 * Don't put your name in the code you contribute; git provides enough
   metadata to identify author of the code.

diff --git a/docs/topics/feed-exports.rst b/docs/topics/feed-exports.rst
@@ -13,6 +13,11 @@ Scrapy provides this functionality out of the box with the Feed Exports, which
 allows you to generate feeds with the scraped items, using multiple
 serialization formats and storage backends.
 
+This page provides detailed documentation for all feed export features. If you
+are looking for a step-by-step guide, check out `Zyte’s export guides`_.
+
+.. _Zyte’s export guides: https://docs.zyte.com/web-scraping/guides/export/index.html#exporting-scraped-data
+
 .. _topics-feed-format:
 
 Serialization formats

diff --git a/docs/topics/practices.rst b/docs/topics/practices.rst
@@ -288,9 +288,8 @@ Here are some tips to keep in mind when dealing with these kinds of sites:
 * use a pool of rotating IPs. For example, the free `Tor project`_ or paid
   services like `ProxyMesh`_. An open source alternative is `scrapoxy`_, a
   super proxy that you can attach your own proxies to.
-* use a highly distributed downloader that circumvents bans internally, so you
-  can just focus on parsing clean pages. One example of such downloaders is
-  `Zyte Smart Proxy Manager`_
+* use a ban avoidance service, such as `Zyte API`_, which provides a `Scrapy
+  plugin <https://github.com/scrapy-plugins/scrapy-zyte-api>`__
 
 If you are still unable to prevent your bot getting banned, consider contacting
 `commercial support`_.
@@ -301,4 +300,4 @@ If you are still unable to prevent your bot getting banned, consider contacting
 .. _Common Crawl: https://commoncrawl.org/
 .. _testspiders: https://github.com/scrapinghub/testspiders
 .. _scrapoxy: https://scrapoxy.io/
-.. _Zyte Smart Proxy Manager: https://www.zyte.com/smart-proxy-manager/
+.. _Zyte API: https://docs.zyte.com/zyte-api/get-started.html
diff --git a/docs/topics/request-response.rst b/docs/topics/request-response.rst
@@ -193,18 +193,47 @@ Request objects
         :meth:`replace`.
 
     .. attribute:: Request.meta
-
-        A dict that contains arbitrary metadata for this request. This dict is
-        empty for new Requests, and is usually  populated by different Scrapy
-        components (extensions, middlewares, etc). So the data contained in this
-        dict depends on the extensions you have enabled.
-
-        See :ref:`topics-request-meta` for a list of special meta keys
-        recognized by Scrapy.
-
-        This dict is :doc:`shallow copied <library/copy>` when the request is
-        cloned using the ``copy()`` or ``replace()`` methods, and can also be
-        accessed, in your spider, from the ``response.meta`` attribute.
+       :value: {}
+
+        A dictionary of arbitrary metadata for the request.
+
+        You may extend request metadata as you see fit.
+
+        Request metadata can also be accessed through the
+        :attr:`~scrapy.http.Response.meta` attribute of a response.
+
+        To pass data from one spider callback to another, consider using
+        :attr:`cb_kwargs` instead. However, request metadata may be the right
+        choice in certain scenarios, such as to maintain some debugging data
+        across all follow-up requests (e.g. the source URL).
+
+        A common use of request metadata is to define request-specific
+        parameters for Scrapy components (extensions, middlewares, etc.). For
+        example, if you set ``dont_retry`` to ``True``,
+        :class:`~scrapy.downloadermiddlewares.retry.RetryMiddleware` will never
+        retry that request, even if it fails. See :ref:`topics-request-meta`.
+
+        You may also use request metadata in your custom Scrapy components, for
+        example, to keep request state information relevant to your component.
+        For example,
+        :class:`~scrapy.downloadermiddlewares.retry.RetryMiddleware` uses the
+        ``retry_times`` metadata key to keep track of how many times a request
+        has been retried so far.
+
+        Copying all the metadata of a previous request into a new, follow-up
+        request in a spider callback is a bad practice, because request
+        metadata may include metadata set by Scrapy components that is not
+        meant to be copied into other requests. For example, copying the
+        ``retry_times`` metadata key into follow-up requests can lower the
+        amount of retries allowed for those follow-up requests.
+
+        You should only copy all request metadata from one request to another
+        if the new request is meant to replace the old request, as is often the
+        case when returning a request from a :ref:`downloader middleware
+        <topics-downloader-middleware>` method.
+
+        Also mind that the :meth:`copy` and :meth:`replace` request methods
+        :doc:`shallow-copy <library/copy>` request metadata.
 
     .. attribute:: Request.cb_kwargs
 

diff --git a/tests/test_feedexport.py b/tests/test_feedexport.py
@@ -2300,7 +2300,7 @@ def run_and_export(self, spider_cls, settings):
                     content[feed["format"]].append(file.read_bytes())
         finally:
             self.tearDown()
-        defer.returnValue(content)
+        return content
 
     @defer.inlineCallbacks
     def assertExportedJsonLines(self, items, rows, settings=None):