Use f-strings (#4307)

scrapy · Aug 23, 2020 · e5e7952 · e5e7952
1 parent f125017
commit e5e7952
Show file tree

Hide file tree

Showing 134 changed files with 562 additions and 569 deletions.
diff --git a/docs/conf.py b/docs/conf.py
@@ -49,7 +49,7 @@
 
 # General information about the project.
 project = 'Scrapy'
-copyright = '2008–{}, Scrapy developers'.format(datetime.now().year)
+copyright = f'2008–{datetime.now().year}, Scrapy developers'
 
 # The version info for the project you're documenting, acts as replacement for
 # |version| and |release|, also used in various other places throughout the

diff --git a/docs/intro/tutorial.rst b/docs/intro/tutorial.rst
@@ -101,10 +101,10 @@ This is the code for our first Spider. Save it in a file named
 
         def parse(self, response):
             page = response.url.split("/")[-2]
-            filename = 'quotes-%s.html' % page
+            filename = f'quotes-{page}.html'
             with open(filename, 'wb') as f:
                 f.write(response.body)
-            self.log('Saved file %s' % filename)
+            self.log(f'Saved file {filename}')
 
 
 As you can see, our Spider subclasses :class:`scrapy.Spider <scrapy.spiders.Spider>`
@@ -190,7 +190,7 @@ for your spider::
 
         def parse(self, response):
             page = response.url.split("/")[-2]
-            filename = 'quotes-%s.html' % page
+            filename = f'quotes-{page}.html'
             with open(filename, 'wb') as f:
                 f.write(response.body)
 

diff --git a/docs/topics/developer-tools.rst b/docs/topics/developer-tools.rst
@@ -5,9 +5,9 @@ Using your browser's Developer Tools for scraping
 =================================================
 
 Here is a general guide on how to use your browser's Developer Tools
-to ease the scraping process. Today almost all browsers come with 
+to ease the scraping process. Today almost all browsers come with
 built in `Developer Tools`_ and although we will use Firefox in this
-guide, the concepts are applicable to any other browser. 
+guide, the concepts are applicable to any other browser.
 
 In this guide we'll introduce the basic tools to use from a browser's
 Developer Tools by scraping `quotes.toscrape.com`_.
@@ -41,16 +41,16 @@ Therefore, you should keep in mind the following things:
 Inspecting a website
 ====================
 
-By far the most handy feature of the Developer Tools is the `Inspector` 
-feature, which allows you to inspect the underlying HTML code of 
-any webpage. To demonstrate the Inspector, let's look at the 
+By far the most handy feature of the Developer Tools is the `Inspector`
+feature, which allows you to inspect the underlying HTML code of
+any webpage. To demonstrate the Inspector, let's look at the
 `quotes.toscrape.com`_-site.
 
 On the site we have a total of ten quotes from various authors with specific
-tags, as well as the Top Ten Tags. Let's say we want to extract all the quotes 
-on this page, without any meta-information about authors, tags, etc. 
+tags, as well as the Top Ten Tags. Let's say we want to extract all the quotes
+on this page, without any meta-information about authors, tags, etc.
 
-Instead of viewing the whole source code for the page, we can simply right click 
+Instead of viewing the whole source code for the page, we can simply right click
 on a quote and select ``Inspect Element (Q)``, which opens up the `Inspector`.
 In it you should see something like this:
 
@@ -97,16 +97,16 @@ Then, back to your web browser, right-click on the ``span`` tag, select
 >>> response.xpath('/html/body/div/div[2]/div[1]/div[1]/span[1]/text()').getall()
 ['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”']
 
-Adding ``text()`` at the end we are able to extract the first quote with this 
+Adding ``text()`` at the end we are able to extract the first quote with this
 basic selector. But this XPath is not really that clever. All it does is
-go down a desired path in the source code starting from ``html``. So let's 
-see if we can refine our XPath a bit: 
+go down a desired path in the source code starting from ``html``. So let's
+see if we can refine our XPath a bit:
 
-If we check the `Inspector` again we'll see that directly beneath our 
-expanded ``div`` tag we have nine identical ``div`` tags, each with the 
-same attributes as our first. If we expand any of them, we'll see the same 
+If we check the `Inspector` again we'll see that directly beneath our
+expanded ``div`` tag we have nine identical ``div`` tags, each with the
+same attributes as our first. If we expand any of them, we'll see the same
 structure as with our first quote: Two ``span`` tags and one ``div`` tag. We can
-expand each ``span`` tag with the ``class="text"`` inside our ``div`` tags and 
+expand each ``span`` tag with the ``class="text"`` inside our ``div`` tags and
 see each quote:
 
 .. code-block:: html
@@ -121,7 +121,7 @@ see each quote:
 
 
 With this knowledge we can refine our XPath: Instead of a path to follow,
-we'll simply select all ``span`` tags with the ``class="text"`` by using 
+we'll simply select all ``span`` tags with the ``class="text"`` by using
 the `has-class-extension`_:
 
 >>> response.xpath('//span[has-class("text")]/text()').getall()
@@ -130,45 +130,45 @@ the `has-class-extension`_:
 '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”',
 ...]
 
-And with one simple, cleverer XPath we are able to extract all quotes from 
-the page. We could have constructed a loop over our first XPath to increase 
-the number of the last ``div``, but this would have been unnecessarily 
+And with one simple, cleverer XPath we are able to extract all quotes from
+the page. We could have constructed a loop over our first XPath to increase
+the number of the last ``div``, but this would have been unnecessarily
 complex and by simply constructing an XPath with ``has-class("text")``
-we were able to extract all quotes in one line. 
+we were able to extract all quotes in one line.
 
-The `Inspector` has a lot of other helpful features, such as searching in the 
+The `Inspector` has a lot of other helpful features, such as searching in the
 source code or directly scrolling to an element you selected. Let's demonstrate
-a use case: 
+a use case:
 
-Say you want to find the ``Next`` button on the page. Type ``Next`` into the 
-search bar on the top right of the `Inspector`. You should get two results. 
-The first is a ``li`` tag with the ``class="next"``, the second the text 
+Say you want to find the ``Next`` button on the page. Type ``Next`` into the
+search bar on the top right of the `Inspector`. You should get two results.
+The first is a ``li`` tag with the ``class="next"``, the second the text
 of an ``a`` tag. Right click on the ``a`` tag and select ``Scroll into View``.
 If you hover over the tag, you'll see the button highlighted. From here
-we could easily create a :ref:`Link Extractor <topics-link-extractors>` to 
-follow the pagination. On a simple site such as this, there may not be 
+we could easily create a :ref:`Link Extractor <topics-link-extractors>` to
+follow the pagination. On a simple site such as this, there may not be
 the need to find an element visually but the ``Scroll into View`` function
-can be quite useful on complex sites. 
+can be quite useful on complex sites.
 
 Note that the search bar can also be used to search for and test CSS
-selectors. For example, you could search for ``span.text`` to find 
-all quote texts. Instead of a full text search, this searches for 
-exactly the ``span`` tag with the ``class="text"`` in the page. 
+selectors. For example, you could search for ``span.text`` to find
+all quote texts. Instead of a full text search, this searches for
+exactly the ``span`` tag with the ``class="text"`` in the page.
 
 .. _topics-network-tool:
 
 The Network-tool
 ================
 While scraping you may come across dynamic webpages where some parts
-of the page are loaded dynamically through multiple requests. While 
-this can be quite tricky, the `Network`-tool in the Developer Tools 
+of the page are loaded dynamically through multiple requests. While
+this can be quite tricky, the `Network`-tool in the Developer Tools
 greatly facilitates this task. To demonstrate the Network-tool, let's
-take a look at the page `quotes.toscrape.com/scroll`_. 
+take a look at the page `quotes.toscrape.com/scroll`_.
 
-The page is quite similar to the basic `quotes.toscrape.com`_-page, 
-but instead of the above-mentioned ``Next`` button, the page 
-automatically loads new quotes when you scroll to the bottom. We 
-could go ahead and try out different XPaths directly, but instead 
+The page is quite similar to the basic `quotes.toscrape.com`_-page,
+but instead of the above-mentioned ``Next`` button, the page
+automatically loads new quotes when you scroll to the bottom. We
+could go ahead and try out different XPaths directly, but instead
 we'll check another quite useful command from the Scrapy shell:
 
 .. skip: next
@@ -179,31 +179,31 @@ we'll check another quite useful command from the Scrapy shell:
   (...)
   >>> view(response)
 
-A browser window should open with the webpage but with one 
-crucial difference: Instead of the quotes we just see a greenish 
-bar with the word ``Loading...``. 
+A browser window should open with the webpage but with one
+crucial difference: Instead of the quotes we just see a greenish
+bar with the word ``Loading...``.
 
 .. image:: _images/network_01.png
    :width: 777
    :height: 296
    :alt: Response from quotes.toscrape.com/scroll
 
 The ``view(response)`` command let's us view the response our
-shell or later our spider receives from the server. Here we see 
-that some basic template is loaded which includes the title, 
+shell or later our spider receives from the server. Here we see
+that some basic template is loaded which includes the title,
 the login-button and the footer, but the quotes are missing. This
 tells us that the quotes are being loaded from a different request
-than ``quotes.toscrape/scroll``. 
+than ``quotes.toscrape/scroll``.
 
-If you click on the ``Network`` tab, you will probably only see 
-two entries. The first thing we do is enable persistent logs by 
-clicking on ``Persist Logs``. If this option is disabled, the 
+If you click on the ``Network`` tab, you will probably only see
+two entries. The first thing we do is enable persistent logs by
+clicking on ``Persist Logs``. If this option is disabled, the
 log is automatically cleared each time you navigate to a different
-page. Enabling this option is a good default, since it gives us 
-control on when to clear the logs. 
+page. Enabling this option is a good default, since it gives us
+control on when to clear the logs.
 
 If we reload the page now, you'll see the log get populated with six
-new requests. 
+new requests.
 
 .. image:: _images/network_02.png
    :width: 777
@@ -212,31 +212,31 @@ new requests.
 
 Here we see every request that has been made when reloading the page
 and can inspect each request and its response. So let's find out
-where our quotes are coming from: 
+where our quotes are coming from:
 
-First click on the request with the name ``scroll``. On the right 
+First click on the request with the name ``scroll``. On the right
 you can now inspect the request. In ``Headers`` you'll find details
 about the request headers, such as the URL, the method, the IP-address,
 and so on. We'll ignore the other tabs and click directly on ``Response``.
 
-What you should see in the ``Preview`` pane is the rendered HTML-code, 
-that is exactly what we saw when we called ``view(response)`` in the 
-shell. Accordingly the ``type`` of the request in the log is ``html``. 
-The other requests have types like ``css`` or ``js``, but what 
-interests us is the one request called ``quotes?page=1`` with the 
-type ``json``. 
+What you should see in the ``Preview`` pane is the rendered HTML-code,
+that is exactly what we saw when we called ``view(response)`` in the
+shell. Accordingly the ``type`` of the request in the log is ``html``.
+The other requests have types like ``css`` or ``js``, but what
+interests us is the one request called ``quotes?page=1`` with the
+type ``json``.
 
-If we click on this request, we see that the request URL is 
+If we click on this request, we see that the request URL is
 ``http://quotes.toscrape.com/api/quotes?page=1`` and the response
 is a JSON-object that contains our quotes. We can also right-click
-on the request and open ``Open in new tab`` to get a better overview. 
+on the request and open ``Open in new tab`` to get a better overview.
 
 .. image:: _images/network_03.png
    :width: 777
    :height: 375
    :alt: JSON-object returned from the quotes.toscrape API
 
-With this response we can now easily parse the JSON-object and 
+With this response we can now easily parse the JSON-object and
 also request each page to get every quote on the site::
 
     import scrapy
@@ -255,17 +255,17 @@ also request each page to get every quote on the site::
                 yield {"quote": quote["text"]}
             if data["has_next"]:
                 self.page += 1
-                url = "http://quotes.toscrape.com/api/quotes?page={}".format(self.page)            
+                url = f"http://quotes.toscrape.com/api/quotes?page={self.page}"
                 yield scrapy.Request(url=url, callback=self.parse)
 
-This spider starts at the first page of the quotes-API. With each 
-response, we parse the ``response.text`` and assign it to ``data``. 
-This lets us operate on the JSON-object like on a Python dictionary. 
+This spider starts at the first page of the quotes-API. With each
+response, we parse the ``response.text`` and assign it to ``data``.
+This lets us operate on the JSON-object like on a Python dictionary.
 We iterate through the ``quotes`` and print out the ``quote["text"]``.
-If the handy ``has_next`` element is ``true`` (try loading 
+If the handy ``has_next`` element is ``true`` (try loading
 `quotes.toscrape.com/api/quotes?page=10`_ in your browser or a
-page-number greater than 10), we increment the ``page`` attribute 
-and ``yield`` a new request, inserting the incremented page-number 
+page-number greater than 10), we increment the ``page`` attribute
+and ``yield`` a new request, inserting the incremented page-number
 into our ``url``.
 
 .. _requests-from-curl:
@@ -298,7 +298,7 @@ Note that to translate a cURL command into a Scrapy request,
 you may use `curl2scrapy <https://michael-shub.github.io/curl2scrapy/>`_.
 
 As you can see, with a few inspections in the `Network`-tool we
-were able to easily replicate the dynamic requests of the scrolling 
+were able to easily replicate the dynamic requests of the scrolling
 functionality of the page. Crawling dynamic pages can be quite
 daunting and pages can be very complex, but it (mostly) boils down
 to identifying the correct request and replicating it in your spider.

diff --git a/docs/topics/exporters.rst b/docs/topics/exporters.rst
@@ -57,7 +57,7 @@ value of one of their fields::
             adapter = ItemAdapter(item)
             year = adapter['year']
             if year not in self.year_to_exporter:
-                f = open('{}.xml'.format(year), 'wb')
+                f = open(f'{year}.xml', 'wb')
                 exporter = XmlItemExporter(f)
                 exporter.start_exporting()
                 self.year_to_exporter[year] = exporter
@@ -98,7 +98,7 @@ Example::
     import scrapy
 
     def serialize_price(value):
-        return '$ %s' % str(value)
+        return f'$ {str(value)}'
 
     class Product(scrapy.Item):
         name = scrapy.Field()
@@ -122,7 +122,7 @@ Example::
 
           def serialize_field(self, field, name, value):
               if field == 'price':
-                  return '$ %s' % str(value)
+                  return f'$ {str(value)}'
               return super(Product, self).serialize_field(field, name, value)
 
 .. _topics-exporters-reference:

diff --git a/docs/topics/item-pipeline.rst b/docs/topics/item-pipeline.rst
@@ -96,7 +96,7 @@ contain a price::
                     adapter['price'] = adapter['price'] * self.vat_factor
                 return item
             else:
-                raise DropItem("Missing price in %s" % item)
+                raise DropItem(f"Missing price in {item}")
 
 
 Write items to a JSON file
@@ -211,7 +211,7 @@ item.
             # Save screenshot to file, filename will be hash of url.
             url = adapter["url"]
             url_hash = hashlib.md5(url.encode("utf8")).hexdigest()
-            filename = "{}.png".format(url_hash)
+            filename = f"{url_hash}.png"
             with open(filename, "wb") as f:
                 f.write(response.body)
 
@@ -240,7 +240,7 @@ returns multiples items with the same id::
         def process_item(self, item, spider):
             adapter = ItemAdapter(item)
             if adapter['id'] in self.ids_seen:
-                raise DropItem("Duplicate item found: %r" % item)
+                raise DropItem(f"Duplicate item found: {item!r}")
             else:
                 self.ids_seen.add(adapter['id'])
                 return item

diff --git a/docs/topics/leaks.rst b/docs/topics/leaks.rst
@@ -102,7 +102,7 @@ A real example
 Let's see a concrete example of a hypothetical case of memory leaks.
 Suppose we have some spider with a line similar to this one::
 
-    return Request("http://www.somenastyspider.com/product.php?pid=%d" % product_id,
+    return Request(f"http://www.somenastyspider.com/product.php?pid={product_id}",
                    callback=self.parse, cb_kwargs={'referer': response})
 
 That line is passing a response reference inside a request which effectively

diff --git a/docs/topics/selectors.rst b/docs/topics/selectors.rst
@@ -328,8 +328,9 @@ too. Here's an example:
  '<a href="image5.html">Name: My image 5 <br><img src="image5_thumb.jpg"></a>']
 
 >>> for index, link in enumerate(links):
-...     args = (index, link.xpath('@href').get(), link.xpath('img/@src').get())
-...     print('Link number %d points to url %r and image %r' % args)
+...     href_xpath = link.xpath('@href').get()
+...     img_xpath = link.xpath('img/@src').get()
+...     print(f'Link number {index} points to url {href_xpath!r} and image {img_xpath!r}')
 Link number 0 points to url 'image1.html' and image 'image1_thumb.jpg'
 Link number 1 points to url 'image2.html' and image 'image2_thumb.jpg'
 Link number 2 points to url 'image3.html' and image 'image3_thumb.jpg'
@@ -822,7 +823,7 @@ with groups of itemscopes and corresponding itemprops::
     ...     props = scope.xpath('''
     ...                 set:difference(./descendant::*/@itemprop,
     ...                                .//*[@itemscope]/*/@itemprop)''')
-    ...     print("    properties: %s" % (props.getall()))
+    ...     print(f"    properties: {props.getall()}")
     ...     print("")
 
     current scope: ['http://schema.org/Product']

diff --git a/docs/topics/settings.rst b/docs/topics/settings.rst
@@ -110,7 +110,7 @@ In a spider, the settings are available through ``self.settings``::
         start_urls = ['http://example.com']
 
         def parse(self, response):
-            print("Existing settings: %s" % self.settings.attributes.keys())
+            print(f"Existing settings: {self.settings.attributes.keys()}")
 
 .. note::
     The ``settings`` attribute is set in the base Spider class after the spider

diff --git a/docs/topics/spiders.rst b/docs/topics/spiders.rst
@@ -279,7 +279,7 @@ Spiders can access arguments in their `__init__` methods::
 
         def __init__(self, category=None, *args, **kwargs):
             super(MySpider, self).__init__(*args, **kwargs)
-            self.start_urls = ['http://www.example.com/categories/%s' % category]
+            self.start_urls = [f'http://www.example.com/categories/{category}']
             # ...
 
 The default `__init__` method will take any spider arguments
@@ -292,7 +292,7 @@ The above example can also be written as follows::
         name = 'myspider'
 
         def start_requests(self):
-            yield scrapy.Request('http://www.example.com/categories/%s' % self.category)
+            yield scrapy.Request(f'http://www.example.com/categories/{self.category}')
 
 Keep in mind that spider arguments are only strings.
 The spider will not do any parsing on its own.