Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] update redirect links to python3 #4445

Merged
merged 4 commits into from Apr 15, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
15 changes: 7 additions & 8 deletions docs/intro/tutorial.rst
Expand Up @@ -25,16 +25,16 @@ Scrapy.
If you're already familiar with other languages, and want to learn Python quickly, the `Python Tutorial`_ is a good resource.

If you're new to programming and want to start with Python, the following books
may be useful to you:
may be useful to you:

* `Automate the Boring Stuff With Python`_

* `How To Think Like a Computer Scientist`_
* `How To Think Like a Computer Scientist`_

* `Learn Python 3 The Hard Way`_
* `Learn Python 3 The Hard Way`_

You can also take a look at `this list of Python resources for non-programmers`_,
as well as the `suggested resources in the learnpython-subreddit`_.
as well as the `suggested resources in the learnpython-subreddit`_.

.. _Python: https://www.python.org/
.. _this list of Python resources for non-programmers: https://wiki.python.org/moin/BeginnersGuide/NonProgrammers
Expand Down Expand Up @@ -62,7 +62,7 @@ This will create a ``tutorial`` directory with the following contents::
__init__.py

items.py # project items definition file

middlewares.py # project middlewares file

pipelines.py # project pipelines file
Expand Down Expand Up @@ -287,8 +287,8 @@ to be scraped, you can at least get **some** data.

Besides the :meth:`~scrapy.selector.SelectorList.getall` and
:meth:`~scrapy.selector.SelectorList.get` methods, you can also use
the :meth:`~scrapy.selector.SelectorList.re` method to extract using `regular
expressions`_:
the :meth:`~scrapy.selector.SelectorList.re` method to extract using
:doc:`regular expressions <library/re>`:

>>> response.css('title::text').re(r'Quotes.*')
['Quotes to Scrape']
Expand All @@ -305,7 +305,6 @@ with a selector (see :ref:`topics-developer-tools`).
`Selector Gadget`_ is also a nice tool to quickly find CSS selector for
visually selected elements, which works in many browsers.

.. _regular expressions: https://docs.python.org/3/library/re.html
.. _Selector Gadget: https://selectorgadget.com/


Expand Down
4 changes: 1 addition & 3 deletions docs/topics/contracts.rst
Expand Up @@ -136,7 +136,7 @@ Detecting check runs
====================

When ``scrapy check`` is running, the ``SCRAPY_CHECK`` environment variable is
set to the ``true`` string. You can use `os.environ`_ to perform any change to
set to the ``true`` string. You can use :data:`os.environ` to perform any change to
your spiders or your settings when ``scrapy check`` is used::

import os
Expand All @@ -148,5 +148,3 @@ your spiders or your settings when ``scrapy check`` is used::
def __init__(self):
if os.environ.get('SCRAPY_CHECK'):
pass # Do some scraper adjustments when a check is running

.. _os.environ: https://docs.python.org/3/library/os.html#os.environ
5 changes: 2 additions & 3 deletions docs/topics/coroutines.rst
Expand Up @@ -76,8 +76,8 @@ becomes::

Coroutines may be used to call asynchronous code. This includes other
coroutines, functions that return Deferreds and functions that return
`awaitable objects`_ such as :class:`~asyncio.Future`. This means you can use
many useful Python libraries providing such code::
:term:`awaitable objects <awaitable>` such as :class:`~asyncio.Future`.
This means you can use many useful Python libraries providing such code::

class MySpider(Spider):
# ...
Expand Down Expand Up @@ -107,4 +107,3 @@ Common use cases for asynchronous code include:
:ref:`the screenshot pipeline example<ScreenshotPipeline>`).

.. _aio-libs: https://github.com/aio-libs
.. _awaitable objects: https://docs.python.org/3/glossary.html#term-awaitable
8 changes: 2 additions & 6 deletions docs/topics/downloader-middleware.rst
Expand Up @@ -739,7 +739,7 @@ HttpProxyMiddleware
This middleware sets the HTTP proxy to use for requests, by setting the
``proxy`` meta value for :class:`~scrapy.http.Request` objects.

Like the Python standard library modules `urllib`_ and `urllib2`_, it obeys
Like the Python standard library module :mod:`urllib.request`, it obeys
the following environment variables:

* ``http_proxy``
Expand All @@ -751,9 +751,6 @@ HttpProxyMiddleware
Keep in mind this value will take precedence over ``http_proxy``/``https_proxy``
environment variables, and it will also ignore ``no_proxy`` environment variable.

.. _urllib: https://docs.python.org/2/library/urllib.html
.. _urllib2: https://docs.python.org/2/library/urllib2.html

RedirectMiddleware
------------------

Expand Down Expand Up @@ -1036,8 +1033,7 @@ Scrapy uses this parser by default.
RobotFileParser
~~~~~~~~~~~~~~~

Based on `RobotFileParser
<https://docs.python.org/3.7/library/urllib.robotparser.html>`_:
Based on :class:`~urllib.robotparser.RobotFileParser`:

* is Python's built-in robots.txt_ parser

Expand Down
14 changes: 7 additions & 7 deletions docs/topics/dynamic-content.rst
Expand Up @@ -115,7 +115,7 @@ data from it depends on the type of response:
- If the response is HTML or XML, use :ref:`selectors
<topics-selectors>` as usual.

- If the response is JSON, use `json.loads`_ to load the desired data from
- If the response is JSON, use :func:`json.loads` to load the desired data from
:attr:`response.text <scrapy.http.TextResponse.text>`::

data = json.loads(response.text)
Expand All @@ -130,8 +130,9 @@ data from it depends on the type of response:
- If the response is JavaScript, or HTML with a ``<script/>`` element
containing the desired data, see :ref:`topics-parsing-javascript`.

- If the response is CSS, use a `regular expression`_ to extract the desired
data from :attr:`response.text <scrapy.http.TextResponse.text>`.
- If the response is CSS, use a :doc:`regular expression <library/re>` to
extract the desired data from
:attr:`response.text <scrapy.http.TextResponse.text>`.

.. _topics-parsing-images:

Expand Down Expand Up @@ -168,8 +169,9 @@ JavaScript code:
Once you have a string with the JavaScript code, you can extract the desired
data from it:

- You might be able to use a `regular expression`_ to extract the desired
data in JSON format, which you can then parse with `json.loads`_.
- You might be able to use a :doc:`regular expression <library/re>` to
extract the desired data in JSON format, which you can then parse with
:func:`json.loads`.

For example, if the JavaScript code contains a separate line like
``var data = {"field": "value"};`` you can extract that data as follows:
Expand Down Expand Up @@ -241,9 +243,7 @@ along with `scrapy-selenium`_ for seamless integration.
.. _headless browser: https://en.wikipedia.org/wiki/Headless_browser
.. _JavaScript: https://en.wikipedia.org/wiki/JavaScript
.. _js2xml: https://github.com/scrapinghub/js2xml
.. _json.loads: https://docs.python.org/3/library/json.html#json.loads
.. _pytesseract: https://github.com/madmaze/pytesseract
.. _regular expression: https://docs.python.org/3/library/re.html
.. _scrapy-selenium: https://github.com/clemfromspace/scrapy-selenium
.. _scrapy-splash: https://github.com/scrapy-plugins/scrapy-splash
.. _Selenium: https://www.selenium.dev/
Expand Down
4 changes: 1 addition & 3 deletions docs/topics/email.rst
Expand Up @@ -7,16 +7,14 @@ Sending e-mail
.. module:: scrapy.mail
:synopsis: Email sending facility

Although Python makes sending e-mails relatively easy via the `smtplib`_
Although Python makes sending e-mails relatively easy via the :mod:`smtplib`
library, Scrapy provides its own facility for sending e-mails which is very
easy to use and it's implemented using :doc:`Twisted non-blocking IO
<twisted:core/howto/defer-intro>`, to avoid interfering with the non-blocking
IO of the crawler. It also provides a simple API for sending attachments and
it's very easy to configure, with a few :ref:`settings
<topics-email-settings>`.

.. _smtplib: https://docs.python.org/2/library/smtplib.html

Quick example
=============

Expand Down
20 changes: 6 additions & 14 deletions docs/topics/exporters.rst
Expand Up @@ -311,7 +311,7 @@ CsvItemExporter

The additional keyword arguments of this ``__init__`` method are passed to the
:class:`BaseItemExporter` ``__init__`` method, and the leftover arguments to the
`csv.writer`_ ``__init__`` method, so you can use any ``csv.writer`` ``__init__`` method
:func:`csv.writer` function, so you can use any :func:`csv.writer` function
argument to customize this exporter.

A typical output of this exporter would be::
Expand All @@ -320,8 +320,6 @@ CsvItemExporter
Color TV,1200
DVD player,200

.. _csv.writer: https://docs.python.org/2/library/csv.html#csv.writer

PickleItemExporter
------------------

Expand All @@ -335,15 +333,13 @@ PickleItemExporter
:param protocol: The pickle protocol to use.
:type protocol: int

For more information, refer to the `pickle module documentation`_.
For more information, see :mod:`pickle`.

The additional keyword arguments of this ``__init__`` method are passed to the
:class:`BaseItemExporter` ``__init__`` method.

Pickle isn't a human readable format, so no output examples are provided.

.. _pickle module documentation: https://docs.python.org/2/library/pickle.html

PprintItemExporter
------------------

Expand Down Expand Up @@ -372,8 +368,8 @@ JsonItemExporter
Exports Items in JSON format to the specified file-like object, writing all
objects as a list of objects. The additional ``__init__`` method arguments are
passed to the :class:`BaseItemExporter` ``__init__`` method, and the leftover
arguments to the `JSONEncoder`_ ``__init__`` method, so you can use any
`JSONEncoder`_ ``__init__`` method argument to customize this exporter.
arguments to the :class:`~json.JSONEncoder` ``__init__`` method, so you can use any
:class:`~json.JSONEncoder` ``__init__`` method argument to customize this exporter.

:param file: the file-like object to use for exporting the data. Its ``write`` method should
accept ``bytes`` (a disk file opened in binary mode, a ``io.BytesIO`` object, etc)
Expand All @@ -393,8 +389,6 @@ JsonItemExporter
stream-friendly format, consider using :class:`JsonLinesItemExporter`
instead, or splitting the output in multiple chunks.

.. _JSONEncoder: https://docs.python.org/2/library/json.html#json.JSONEncoder

JsonLinesItemExporter
---------------------

Expand All @@ -403,8 +397,8 @@ JsonLinesItemExporter
Exports Items in JSON format to the specified file-like object, writing one
JSON-encoded item per line. The additional ``__init__`` method arguments are passed
to the :class:`BaseItemExporter` ``__init__`` method, and the leftover arguments to
the `JSONEncoder`_ ``__init__`` method, so you can use any `JSONEncoder`_
``__init__`` method argument to customize this exporter.
the :class:`~json.JSONEncoder` ``__init__`` method, so you can use any
:class:`~json.JSONEncoder` ``__init__`` method argument to customize this exporter.

:param file: the file-like object to use for exporting the data. Its ``write`` method should
accept ``bytes`` (a disk file opened in binary mode, a ``io.BytesIO`` object, etc)
Expand All @@ -417,8 +411,6 @@ JsonLinesItemExporter
Unlike the one produced by :class:`JsonItemExporter`, the format produced by
this exporter is well suited for serializing large amounts of data.

.. _JSONEncoder: https://docs.python.org/2/library/json.html#json.JSONEncoder

MarshalItemExporter
-------------------

Expand Down
3 changes: 1 addition & 2 deletions docs/topics/extensions.rst
Expand Up @@ -364,13 +364,12 @@ Debugger extension

.. class:: Debugger

Invokes a `Python debugger`_ inside a running Scrapy process when a `SIGUSR2`_
Invokes a :doc:`Python debugger <library/pdb>` inside a running Scrapy process when a `SIGUSR2`_
signal is received. After the debugger is exited, the Scrapy process continues
running normally.

For more info see `Debugging in Python`_.

This extension only works on POSIX-compliant platforms (i.e. not Windows).

.. _Python debugger: https://docs.python.org/2/library/pdb.html
.. _Debugging in Python: https://pythonconquerstheuniverse.wordpress.com/2009/09/10/debugging-in-python/
33 changes: 11 additions & 22 deletions docs/topics/items.rst
Expand Up @@ -15,17 +15,15 @@ especially in a larger project with many spiders.

To define common output data format Scrapy provides the :class:`Item` class.
:class:`Item` objects are simple containers used to collect the scraped data.
They provide a `dictionary-like`_ API with a convenient syntax for declaring
their available fields.
They provide an API similar to :class:`dict` API with a convenient syntax
for declaring their available fields.

Various Scrapy components use extra information provided by Items:
exporters look at declared fields to figure out columns to export,
serialization can be customized using Item fields metadata, :mod:`trackref`
tracks Item instances to help find memory leaks
(see :ref:`topics-leaks-trackrefs`), etc.

.. _dictionary-like: https://docs.python.org/2/library/stdtypes.html#dict

.. _topics-items-declaring:

Declaring Items
Expand Down Expand Up @@ -79,7 +77,7 @@ Working with Items

Here are some examples of common tasks performed with items, using the
``Product`` item :ref:`declared above <topics-items-declaring>`. You will
notice the API is very similar to the `dict API`_.
notice the API is very similar to the :class:`dict` API.

Creating items
--------------
Expand Down Expand Up @@ -145,7 +143,7 @@ KeyError: 'Product does not support field: lala'
Accessing all populated values
------------------------------

To access all populated values, just use the typical `dict API`_:
To access all populated values, just use the typical :class:`dict` API:

>>> product.keys()
['price', 'name']
Expand All @@ -162,11 +160,9 @@ Copying items
To copy an item, you must first decide whether you want a shallow copy or a
deep copy.

If your item contains mutable_ values like lists or dictionaries, a shallow
copy will keep references to the same mutable values across all different
copies.

.. _mutable: https://docs.python.org/3/glossary.html#term-mutable
If your item contains :term:`mutable` values like lists or dictionaries,
a shallow copy will keep references to the same mutable values across all
different copies.

For example, if you have an item with a list of tags, and you create a shallow
copy of that item, both the original item and the copy have the same list of
Expand All @@ -175,9 +171,7 @@ other item as well.

If that is not the desired behavior, use a deep copy instead.

See the `documentation of the copy module`_ for more information.

.. _documentation of the copy module: https://docs.python.org/3/library/copy.html
See :mod:`copy` for more information.

To create a shallow copy of an item, you can either call
:meth:`~scrapy.item.Item.copy` on an existing item
Expand Down Expand Up @@ -235,8 +229,8 @@ Item objects

Return a new Item optionally initialized from the given argument.

Items replicate the standard `dict API`_, including its ``__init__`` method, and
also provide the following additional API members:
Items replicate the standard :class:`dict` API, including its ``__init__``
method, and also provide the following additional API members:

.. automethod:: copy

Expand All @@ -249,22 +243,17 @@ Item objects
:class:`Field` objects used in the :ref:`Item declaration
<topics-items-declaring>`.

.. _dict API: https://docs.python.org/2/library/stdtypes.html#dict

Field objects
=============

.. class:: Field([arg])

The :class:`Field` class is just an alias to the built-in `dict`_ class and
The :class:`Field` class is just an alias to the built-in :class:`dict` class and
doesn't provide any extra functionality or attributes. In other words,
:class:`Field` objects are plain-old Python dicts. A separate class is used
to support the :ref:`item declaration syntax <topics-items-declaring>`
based on class attributes.

.. _dict: https://docs.python.org/2/library/stdtypes.html#dict


Other classes related to Item
=============================

Expand Down