From ca08e04198b94bd9583704f86316b57af3408adc Mon Sep 17 00:00:00 2001 From: Aditya Date: Fri, 20 Mar 2020 02:31:35 +0530 Subject: [PATCH 1/4] [docs] update redirect links python2 -> python3 --- docs/topics/downloader-middleware.rst | 5 ++--- docs/topics/email.rst | 2 +- docs/topics/exporters.rst | 8 ++++---- docs/topics/extensions.rst | 2 +- docs/topics/items.rst | 6 +++--- docs/topics/logging.rst | 16 ++++++++-------- docs/topics/request-response.rst | 10 +++++----- docs/topics/selectors.rst | 2 +- docs/topics/settings.rst | 6 +++--- docs/topics/spider-middleware.rst | 6 +++--- 10 files changed, 31 insertions(+), 32 deletions(-) diff --git a/docs/topics/downloader-middleware.rst b/docs/topics/downloader-middleware.rst index 73648994de6..61a3806fbc0 100644 --- a/docs/topics/downloader-middleware.rst +++ b/docs/topics/downloader-middleware.rst @@ -739,7 +739,7 @@ HttpProxyMiddleware This middleware sets the HTTP proxy to use for requests, by setting the ``proxy`` meta value for :class:`~scrapy.http.Request` objects. - Like the Python standard library modules `urllib`_ and `urllib2`_, it obeys + Like the Python standard library module `urllib.request`_, it obeys the following environment variables: * ``http_proxy`` @@ -751,8 +751,7 @@ HttpProxyMiddleware Keep in mind this value will take precedence over ``http_proxy``/``https_proxy`` environment variables, and it will also ignore ``no_proxy`` environment variable. -.. _urllib: https://docs.python.org/2/library/urllib.html -.. _urllib2: https://docs.python.org/2/library/urllib2.html +.. _urllib.request: https://docs.python.org/3/library/urllib.request.html RedirectMiddleware ------------------ diff --git a/docs/topics/email.rst b/docs/topics/email.rst index 72bf5222731..aed3deb2edb 100644 --- a/docs/topics/email.rst +++ b/docs/topics/email.rst @@ -15,7 +15,7 @@ IO of the crawler. It also provides a simple API for sending attachments and it's very easy to configure, with a few :ref:`settings `. -.. _smtplib: https://docs.python.org/2/library/smtplib.html +.. _smtplib: https://docs.python.org/3/library/smtplib.html Quick example ============= diff --git a/docs/topics/exporters.rst b/docs/topics/exporters.rst index e52682690c4..4ba8714bd37 100644 --- a/docs/topics/exporters.rst +++ b/docs/topics/exporters.rst @@ -320,7 +320,7 @@ CsvItemExporter Color TV,1200 DVD player,200 -.. _csv.writer: https://docs.python.org/2/library/csv.html#csv.writer +.. _csv.writer: https://docs.python.org/3/library/csv.html#csv.writer PickleItemExporter ------------------ @@ -342,7 +342,7 @@ PickleItemExporter Pickle isn't a human readable format, so no output examples are provided. -.. _pickle module documentation: https://docs.python.org/2/library/pickle.html +.. _pickle module documentation: https://docs.python.org/3/library/pickle.html PprintItemExporter ------------------ @@ -393,7 +393,7 @@ JsonItemExporter stream-friendly format, consider using :class:`JsonLinesItemExporter` instead, or splitting the output in multiple chunks. -.. _JSONEncoder: https://docs.python.org/2/library/json.html#json.JSONEncoder +.. _JSONEncoder: https://docs.python.org/3/library/json.html#json.JSONEncoder JsonLinesItemExporter --------------------- @@ -417,7 +417,7 @@ JsonLinesItemExporter Unlike the one produced by :class:`JsonItemExporter`, the format produced by this exporter is well suited for serializing large amounts of data. -.. _JSONEncoder: https://docs.python.org/2/library/json.html#json.JSONEncoder +.. _JSONEncoder: https://docs.python.org/3/library/json.html#json.JSONEncoder MarshalItemExporter ------------------- diff --git a/docs/topics/extensions.rst b/docs/topics/extensions.rst index 94fd2e36ec8..f57e37e6f37 100644 --- a/docs/topics/extensions.rst +++ b/docs/topics/extensions.rst @@ -372,5 +372,5 @@ For more info see `Debugging in Python`_. This extension only works on POSIX-compliant platforms (i.e. not Windows). -.. _Python debugger: https://docs.python.org/2/library/pdb.html +.. _Python debugger: https://docs.python.org/3/library/pdb.html .. _Debugging in Python: https://pythonconquerstheuniverse.wordpress.com/2009/09/10/debugging-in-python/ diff --git a/docs/topics/items.rst b/docs/topics/items.rst index 44643cb67f9..36731571e76 100644 --- a/docs/topics/items.rst +++ b/docs/topics/items.rst @@ -24,7 +24,7 @@ serialization can be customized using Item fields metadata, :mod:`trackref` tracks Item instances to help find memory leaks (see :ref:`topics-leaks-trackrefs`), etc. -.. _dictionary-like: https://docs.python.org/2/library/stdtypes.html#dict +.. _dictionary-like: https://docs.python.org/3/library/stdtypes.html#dict .. _topics-items-declaring: @@ -249,7 +249,7 @@ Item objects :class:`Field` objects used in the :ref:`Item declaration `. -.. _dict API: https://docs.python.org/2/library/stdtypes.html#dict +.. _dict API: https://docs.python.org/3/library/stdtypes.html#dict Field objects ============= @@ -262,7 +262,7 @@ Field objects to support the :ref:`item declaration syntax ` based on class attributes. -.. _dict: https://docs.python.org/2/library/stdtypes.html#dict +.. _dict: https://docs.python.org/3/library/stdtypes.html#dict Other classes related to Item diff --git a/docs/topics/logging.rst b/docs/topics/logging.rst index d4d22d8890f..a85e1a769a0 100644 --- a/docs/topics/logging.rst +++ b/docs/topics/logging.rst @@ -83,10 +83,10 @@ path:: .. seealso:: - Module logging, `HowTo `_ + Module logging, `HowTo `_ Basic Logging Tutorial - Module logging, `Loggers `_ + Module logging, `Loggers `_ Further documentation on loggers .. _topics-logging-from-spiders: @@ -166,13 +166,13 @@ possible levels listed in :ref:`topics-logging-levels`. :setting:`LOG_FORMAT` and :setting:`LOG_DATEFORMAT` specify formatting strings used as layouts for all messages. Those strings can contain any placeholders listed in `logging's logrecord attributes docs -`_ and +`_ and `datetime's strftime and strptime directives -`_ +`_ respectively. If :setting:`LOG_SHORT_NAMES` is set, then the logs will not display the Scrapy -component that prints the log. It is unset by default, hence logs contain the +component that prints the log. It is unset by default, hence logs contain the Scrapy component responsible for that log output. Command-line options @@ -190,7 +190,7 @@ to override some of the Scrapy settings regarding logging. .. seealso:: - Module `logging.handlers `_ + Module `logging.handlers `_ Further documentation on available handlers .. _custom-log-formats: @@ -201,7 +201,7 @@ Custom Log Formats A custom log format can be set for different actions by extending :class:`~scrapy.logformatter.LogFormatter` class and making :setting:`LOG_FORMATTER` point to your new class. - + .. autoclass:: scrapy.logformatter.LogFormatter :members: @@ -276,6 +276,6 @@ scrapy.utils.log module Refer to :ref:`run-from-script` for more details about using Scrapy this way. -.. _logging.basicConfig(): https://docs.python.org/2/library/logging.html#logging.basicConfig +.. _logging.basicConfig(): https://docs.python.org/3/library/logging.html#logging.basicConfig diff --git a/docs/topics/request-response.rst b/docs/topics/request-response.rst index b2a60ff39ee..6c5a084099a 100644 --- a/docs/topics/request-response.rst +++ b/docs/topics/request-response.rst @@ -189,7 +189,7 @@ Request objects ``copy()`` or ``replace()`` methods, and can also be accessed, in your spider, from the ``response.cb_kwargs`` attribute. - .. _shallow copied: https://docs.python.org/2/library/copy.html + .. _shallow copied: https://docs.python.org/3/library/copy.html .. method:: Request.copy() @@ -706,7 +706,7 @@ Response objects A :class:`twisted.internet.ssl.Certificate` object representing the server's SSL certificate. - + Only populated for ``https`` responses, ``None`` otherwise. .. method:: Response.copy() @@ -724,17 +724,17 @@ Response objects Constructs an absolute url by combining the Response's :attr:`url` with a possible relative url. - This is a wrapper over `urlparse.urljoin`_, it's merely an alias for + This is a wrapper over `urllib.parse.urljoin`_, it's merely an alias for making this call:: - urlparse.urljoin(response.url, url) + urllib.parse.urljoin(response.url, url) .. automethod:: Response.follow .. automethod:: Response.follow_all -.. _urlparse.urljoin: https://docs.python.org/2/library/urlparse.html#urlparse.urljoin +.. _urllib.parse.urljoin: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urljoin .. _topics-request-response-ref-response-subclasses: diff --git a/docs/topics/selectors.rst b/docs/topics/selectors.rst index 1f7802c98f9..0f90b28c07b 100644 --- a/docs/topics/selectors.rst +++ b/docs/topics/selectors.rst @@ -36,7 +36,7 @@ defines selectors to associate those styles with specific HTML elements. .. _BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/ .. _lxml: https://lxml.de/ -.. _ElementTree: https://docs.python.org/2/library/xml.etree.elementtree.html +.. _ElementTree: https://docs.python.org/3/library/xml.etree.elementtree.html .. _XPath: https://www.w3.org/TR/xpath/all/ .. _CSS: https://www.w3.org/TR/selectors .. _parsel: https://parsel.readthedocs.io/en/latest/ diff --git a/docs/topics/settings.rst b/docs/topics/settings.rst index dc6843d759e..d78a6253eaa 100644 --- a/docs/topics/settings.rst +++ b/docs/topics/settings.rst @@ -28,7 +28,7 @@ The value of ``SCRAPY_SETTINGS_MODULE`` should be in Python path syntax, e.g. ``myproject.settings``. Note that the settings module should be on the Python `import search path`_. -.. _import search path: https://docs.python.org/2/tutorial/modules.html#the-module-search-path +.. _import search path: https://docs.python.org/3/tutorial/modules.html#the-module-search-path .. _populating-settings: @@ -902,7 +902,7 @@ Default: ``'%(asctime)s [%(name)s] %(levelname)s: %(message)s'`` String for formatting log messages. Refer to the `Python logging documentation`_ for the whole list of available placeholders. -.. _Python logging documentation: https://docs.python.org/2/library/logging.html#logrecord-attributes +.. _Python logging documentation: https://docs.python.org/3/library/logging.html#logrecord-attributes .. setting:: LOG_DATEFORMAT @@ -915,7 +915,7 @@ String for formatting date/time, expansion of the ``%(asctime)s`` placeholder in :setting:`LOG_FORMAT`. Refer to the `Python datetime documentation`_ for the whole list of available directives. -.. _Python datetime documentation: https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior +.. _Python datetime documentation: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior .. setting:: LOG_FORMATTER diff --git a/docs/topics/spider-middleware.rst b/docs/topics/spider-middleware.rst index 0e8210130ac..3d7450c868a 100644 --- a/docs/topics/spider-middleware.rst +++ b/docs/topics/spider-middleware.rst @@ -173,18 +173,18 @@ object gives you access, for example, to the :ref:`settings `. :type spider: :class:`~scrapy.spiders.Spider` object .. method:: from_crawler(cls, crawler) - + If present, this classmethod is called to create a middleware instance from a :class:`~scrapy.crawler.Crawler`. It must return a new instance of the middleware. Crawler object provides access to all Scrapy core components like settings and signals; it is a way for middleware to access them and hook its functionality into Scrapy. - + :param crawler: crawler that uses this middleware :type crawler: :class:`~scrapy.crawler.Crawler` object -.. _Exception: https://docs.python.org/2/library/exceptions.html#exceptions.Exception +.. _Exception: https://docs.python.org/3/library/exceptions.html#Exception .. _topics-spider-middleware-ref: From f37b1bdc5616f67460c645e26c49f9d5b34e3631 Mon Sep 17 00:00:00 2001 From: Aditya Date: Fri, 20 Mar 2020 05:22:51 +0530 Subject: [PATCH 2/4] [docs] update redirect links to python3 --- docs/intro/tutorial.rst | 10 +++++----- docs/topics/contracts.rst | 4 +--- docs/topics/downloader-middleware.rst | 11 +++-------- docs/topics/dynamic-content.rst | 10 ++++------ docs/topics/email.rst | 4 +--- docs/topics/exporters.rst | 20 ++++++-------------- docs/topics/extensions.rst | 3 +-- docs/topics/items.rst | 21 ++++++--------------- docs/topics/logging.rst | 15 +++++---------- docs/topics/request-response.rst | 8 ++------ docs/topics/selectors.rst | 3 +-- docs/topics/spider-middleware.rst | 6 +----- docs/topics/spiders.rst | 4 +--- docs/topics/telnetconsole.rst | 11 ++++------- scrapy/item.py | 4 +--- 15 files changed, 42 insertions(+), 92 deletions(-) diff --git a/docs/intro/tutorial.rst b/docs/intro/tutorial.rst index 1768badbb83..ab6fd48291e 100644 --- a/docs/intro/tutorial.rst +++ b/docs/intro/tutorial.rst @@ -25,16 +25,16 @@ Scrapy. If you're already familiar with other languages, and want to learn Python quickly, the `Python Tutorial`_ is a good resource. If you're new to programming and want to start with Python, the following books -may be useful to you: +may be useful to you: * `Automate the Boring Stuff With Python`_ -* `How To Think Like a Computer Scientist`_ +* `How To Think Like a Computer Scientist`_ -* `Learn Python 3 The Hard Way`_ +* `Learn Python 3 The Hard Way`_ You can also take a look at `this list of Python resources for non-programmers`_, -as well as the `suggested resources in the learnpython-subreddit`_. +as well as the `suggested resources in the learnpython-subreddit`_. .. _Python: https://www.python.org/ .. _this list of Python resources for non-programmers: https://wiki.python.org/moin/BeginnersGuide/NonProgrammers @@ -62,7 +62,7 @@ This will create a ``tutorial`` directory with the following contents:: __init__.py items.py # project items definition file - + middlewares.py # project middlewares file pipelines.py # project pipelines file diff --git a/docs/topics/contracts.rst b/docs/topics/contracts.rst index 43db8f1014a..319f577bcf8 100644 --- a/docs/topics/contracts.rst +++ b/docs/topics/contracts.rst @@ -136,7 +136,7 @@ Detecting check runs ==================== When ``scrapy check`` is running, the ``SCRAPY_CHECK`` environment variable is -set to the ``true`` string. You can use `os.environ`_ to perform any change to +set to the ``true`` string. You can use :data:`os.environ` to perform any change to your spiders or your settings when ``scrapy check`` is used:: import os @@ -148,5 +148,3 @@ your spiders or your settings when ``scrapy check`` is used:: def __init__(self): if os.environ.get('SCRAPY_CHECK'): pass # Do some scraper adjustments when a check is running - -.. _os.environ: https://docs.python.org/3/library/os.html#os.environ diff --git a/docs/topics/downloader-middleware.rst b/docs/topics/downloader-middleware.rst index 61a3806fbc0..d7ec53bfa52 100644 --- a/docs/topics/downloader-middleware.rst +++ b/docs/topics/downloader-middleware.rst @@ -739,7 +739,7 @@ HttpProxyMiddleware This middleware sets the HTTP proxy to use for requests, by setting the ``proxy`` meta value for :class:`~scrapy.http.Request` objects. - Like the Python standard library module `urllib.request`_, it obeys + Like the Python standard library module :mod:`urllib.request`, it obeys the following environment variables: * ``http_proxy`` @@ -751,8 +751,6 @@ HttpProxyMiddleware Keep in mind this value will take precedence over ``http_proxy``/``https_proxy`` environment variables, and it will also ignore ``no_proxy`` environment variable. -.. _urllib.request: https://docs.python.org/3/library/urllib.request.html - RedirectMiddleware ------------------ @@ -982,7 +980,7 @@ RobotsTxtMiddleware Scrapy ships with support for the following robots.txt_ parsers: * :ref:`Protego ` (default) - * :ref:`RobotFileParser ` + * :class:`~urllib.robotparser.RobotFileParser` * :ref:`Reppy ` * :ref:`Robotexclusionrulesparser ` @@ -1030,13 +1028,10 @@ Based on `Protego `_: Scrapy uses this parser by default. -.. _python-robotfileparser: - RobotFileParser ~~~~~~~~~~~~~~~ -Based on `RobotFileParser -`_: +Based on :class:`~urllib.robotparser.RobotFileParser`: * is Python's built-in robots.txt_ parser diff --git a/docs/topics/dynamic-content.rst b/docs/topics/dynamic-content.rst index b981336764c..22bcac2686d 100644 --- a/docs/topics/dynamic-content.rst +++ b/docs/topics/dynamic-content.rst @@ -115,7 +115,7 @@ data from it depends on the type of response: - If the response is HTML or XML, use :ref:`selectors ` as usual. -- If the response is JSON, use `json.loads`_ to load the desired data from +- If the response is JSON, use :func:`json.loads` to load the desired data from :attr:`response.text `:: data = json.loads(response.text) @@ -130,7 +130,7 @@ data from it depends on the type of response: - If the response is JavaScript, or HTML with a ``