Merge pull request #1172 from bagratte/docs

minor corrections in documentation.
scrapy · Apr 19, 2015 · 1794a89 · 1794a89
2 parents bb4c8c3 + 1312bcd
commit 1794a89
Show file tree

Hide file tree

Showing 9 changed files with 56 additions and 53 deletions.
diff --git a/.gitignore b/.gitignore
@@ -10,3 +10,6 @@ venv
 build
 dist
 .idea
+
+# Windows
+Thumbs.db
diff --git a/docs/intro/tutorial.rst b/docs/intro/tutorial.rst
@@ -108,7 +108,7 @@ define the three main mandatory attributes:
   listed here. The subsequent URLs will be generated successively from data
   contained in the start URLs.
 
-* :meth:`~scrapy.spider.Spider.parse` a method of the spider, which will
+* :meth:`~scrapy.spider.Spider.parse`: a method of the spider, which will
   be called with the downloaded :class:`~scrapy.http.Response` object of each
   start URL. The response is passed to the method as the first and only
   argument.
@@ -248,7 +248,7 @@ To start a shell, you must go to the project's top level directory and run::
 
 .. note::
 
-   Remember to always enclose urls with quotes in running Scrapy shell from
+   Remember to always enclose urls in quotes when running Scrapy shell from
    command-line, otherwise urls containing arguments (ie. ``&`` character)
    will not work.
 

diff --git a/docs/topics/commands.rst b/docs/topics/commands.rst
@@ -80,8 +80,8 @@ some usage help and the available commands::
       fetch         Fetch a URL using the Scrapy downloader
     [...]
 
-The first line will print the currently active project, if you're inside a
-Scrapy project. In this, it was run from outside a project. If run from inside
+The first line will print the currently active project if you're inside a
+Scrapy project. In this example it was run from outside a project. If run from inside
 a project it would have printed something like this::
 
     Scrapy X.Y - project: myproject
@@ -135,7 +135,7 @@ Available tool commands
 =======================
 
 This section contains a list of the available built-in commands with a
-description and some usage examples. Remember you can always get more info
+description and some usage examples. Remember, you can always get more info
 about each command by running::
 
     scrapy <command> -h
@@ -196,7 +196,7 @@ genspider
 
 Create a new spider in the current project.
 
-This is just a convenient shortcut command for creating spiders based on
+This is just a convenience shortcut command for creating spiders based on
 pre-defined templates, but certainly not the only way to create spiders. You
 can just create the spider source code files yourself, instead of using this
 command.
@@ -298,7 +298,7 @@ edit
 Edit the given spider using the editor defined in the :setting:`EDITOR`
 setting.
 
-This command is provided only as a convenient shortcut for the most common
+This command is provided only as a convenience shortcut for the most common
 case, the developer is of course free to choose any tool or IDE to write and
 debug his spiders.
 
@@ -318,7 +318,7 @@ Downloads the given URL using the Scrapy downloader and writes the contents to
 standard output.
 
 The interesting thing about this command is that it fetches the page how the
-spider would download it. For example, if the spider has an ``USER_AGENT``
+spider would download it. For example, if the spider has a ``USER_AGENT``
 attribute which overrides the User Agent, it will use that one.
 
 So this command can be used to "see" how your spider would fetch a certain page.

diff --git a/docs/topics/feed-exports.rst b/docs/topics/feed-exports.rst
@@ -8,7 +8,7 @@ Feed exports
 
 One of the most frequently required features when implementing scrapers is
 being able to store the scraped data properly and, quite often, that means
-generating a "export file" with the scraped data (commonly called "export
+generating an "export file" with the scraped data (commonly called "export
 feed") to be consumed by other systems.
 
 Scrapy provides this functionality out of the box with the Feed Exports, which
@@ -21,7 +21,7 @@ Serialization formats
 =====================
 
 For serializing the scraped data, the feed exports use the :ref:`Item exporters
-<topics-exporters>` and these formats are supported out of the box:
+<topics-exporters>`. These formats are supported out of the box:
 
  * :ref:`topics-feed-format-json`
  * :ref:`topics-feed-format-jsonlines`

diff --git a/docs/topics/item-pipeline.rst b/docs/topics/item-pipeline.rst
@@ -5,14 +5,14 @@ Item Pipeline
 =============
 
 After an item has been scraped by a spider, it is sent to the Item Pipeline
-which process it through several components that are executed sequentially.
+which processes it through several components that are executed sequentially.
 
 Each item pipeline component (sometimes referred as just "Item Pipeline") is a
 Python class that implements a simple method. They receive an item and perform
 an action over it, also deciding if the item should continue through the
 pipeline or be dropped and no longer processed.
 
-Typical use for item pipelines are:
+Typical uses of item pipelines are:
 
 * cleansing HTML data
 * validating scraped data (checking that the items contain certain fields)
@@ -167,7 +167,7 @@ Duplicates filter
 -----------------
 
 A filter that looks for duplicate items, and drops those items that were
-already processed. Let say that our items have an unique id, but our spider
+already processed. Let's say that our items have a unique id, but our spider
 returns multiples items with the same id::
 
 
@@ -198,6 +198,6 @@ To activate an Item Pipeline component you must add its class to the
    }
 
 The integer values you assign to classes in this setting determine the
-order they run in- items go through pipelines from order number low to
-high. It's customary to define these numbers in the 0-1000 range.
+order in which they run: items go through from lower valued to higher
+valued classes. It's customary to define these numbers in the 0-1000 range.
 
diff --git a/docs/topics/link-extractors.rst b/docs/topics/link-extractors.rst
@@ -82,7 +82,7 @@ LxmlLinkExtractor
         module.
     :type deny_extensions: list
 
-    :param restrict_xpaths: is a XPath (or list of XPath's) which defines
+    :param restrict_xpaths: is an XPath (or list of XPath's) which defines
         regions inside the response where links should be extracted from.
         If given, only the text selected by those XPath will be scanned for
         links. See examples below.

diff --git a/docs/topics/loaders.rst b/docs/topics/loaders.rst
@@ -9,7 +9,7 @@ Item Loaders
 
 Item Loaders provide a convenient mechanism for populating scraped :ref:`Items
 <topics-items>`. Even though Items can be populated using their own
-dictionary-like API, the Item Loaders provide a much more convenient API for
+dictionary-like API, Item Loaders provide a much more convenient API for
 populating them from a scraping process, by automating some common tasks like
 parsing the raw extracted data before assigning it.
 
@@ -25,7 +25,7 @@ Using Item Loaders to populate items
 ====================================
 
 To use an Item Loader, you must first instantiate it. You can either
-instantiate it with an dict-like object (e.g. Item or dict) or without one, in
+instantiate it with a dict-like object (e.g. Item or dict) or without one, in
 which case an Item is automatically instantiated in the Item Loader constructor
 using the Item class specified in the :attr:`ItemLoader.default_item_class`
 attribute.
@@ -67,7 +67,7 @@ and finally the ``last_update`` field is populated directly with a literal value
 (``today``) using a different method: :meth:`~ItemLoader.add_value`.
 
 Finally, when all data is collected, the :meth:`ItemLoader.load_item` method is
-called which actually populates and returns the item populated with the data
+called which actually returns the item populated with the data
 previously extracted and collected with the :meth:`~ItemLoader.add_xpath`,
 :meth:`~ItemLoader.add_css`, and :meth:`~ItemLoader.add_value` calls.
 
@@ -565,8 +565,8 @@ Here is a list of all built-in processors:
 .. class:: Identity
 
     The simplest processor, which doesn't do anything. It returns the original
-    values unchanged. It doesn't receive any constructor arguments nor accepts
-    Loader contexts.
+    values unchanged. It doesn't receive any constructor arguments, nor does it
+	accept Loader contexts.
 
     Example::
 
@@ -579,7 +579,7 @@ Here is a list of all built-in processors:
 
     Returns the first non-null/non-empty value from the values received,
     so it's typically used as an output processor to single-valued fields.
-    It doesn't receive any constructor arguments, nor accept Loader contexts.
+    It doesn't receive any constructor arguments, nor does it accept Loader contexts.
 
     Example::
 

diff --git a/docs/topics/selectors.rst b/docs/topics/selectors.rst
@@ -13,9 +13,9 @@ achieve this:
    HTML code and also deals with bad markup reasonably well, but it has one
    drawback: it's slow.
 
- * `lxml`_ is a XML parsing library (which also parses HTML) with a pythonic
-   API based on `ElementTree`_ (which is not part of the Python standard
-   library).
+ * `lxml`_ is an XML parsing library (which also parses HTML) with a pythonic
+   API based on `ElementTree`_. (lxml is not part of the Python standard
+   library.)
 
 Scrapy comes with its own mechanism for extracting data. They're called
 selectors because they "select" certain parts of the HTML document specified
@@ -72,7 +72,7 @@ Constructing from response::
     >>> Selector(response=response).xpath('//span/text()').extract()
     [u'good']
 
-For convenience, response objects exposes a selector on `.selector` attribute,
+For convenience, response objects expose a selector on `.selector` attribute,
 it's totally OK to use this shortcut when possible::
 
     >>> response.selector.xpath('//span/text()').extract()
@@ -114,17 +114,17 @@ page, let's construct an XPath for selecting the text inside the title tag::
     >>> response.selector.xpath('//title/text()')
     [<Selector (text) xpath=//title/text()>]
 
-Querying responses using XPath and CSS is so common that responses includes two
-convenient shortcuts: ``response.xpath()`` and ``response.css()``::
+Querying responses using XPath and CSS is so common that responses include two
+convenience shortcuts: ``response.xpath()`` and ``response.css()``::
 
     >>> response.xpath('//title/text()')
     [<Selector (text) xpath=//title/text()>]
     >>> response.css('title::text')
     [<Selector (text) xpath=//title/text()>]
 
-As you can see, ``.xpath()`` and ``.css()`` methods returns an
+As you can see, ``.xpath()`` and ``.css()`` methods return a
 :class:`~scrapy.selector.SelectorList` instance, which is a list of new
-selectors. This API can be used quickly for selecting nested data::
+selectors. This API can be used for quickly selecting nested data::
 
     >>> response.css('img').xpath('@src').extract()
     [u'image1_thumb.jpg',
@@ -196,7 +196,7 @@ Now we're going to get the base URL and some image links::
 Nesting selectors
 -----------------
 
-The selection methods (``.xpath()`` or ``.css()``) returns a list of selectors
+The selection methods (``.xpath()`` or ``.css()``) return a list of selectors
 of the same type, so you can call the selection methods for those selectors
 too. Here's an example::
 
@@ -221,12 +221,12 @@ too. Here's an example::
 Using selectors with regular expressions
 ----------------------------------------
 
-:class:`~scrapy.selector.Selector` also have a ``.re()`` method for extracting
+:class:`~scrapy.selector.Selector` also has a ``.re()`` method for extracting
 data using regular expressions. However, unlike using ``.xpath()`` or
-``.css()`` methods, ``.re()`` method returns a list of unicode strings. So you
+``.css()`` methods, ``.re()`` returns a list of unicode strings. So you
 can't construct nested ``.re()`` calls.
 
-Here's an example used to extract images names from the :ref:`HTML code
+Here's an example used to extract image names from the :ref:`HTML code
 <topics-selectors-htmlcode>` above::
 
     >>> response.xpath('//a[contains(@href, "image")]/text()').re(r'Name:\s*(.*)')
@@ -295,7 +295,7 @@ set     \http://exslt.org/sets                   `set manipulation`_
 Regular expressions
 ~~~~~~~~~~~~~~~~~~~
 
-The ``test()`` function for example can prove quite useful when XPath's
+The ``test()`` function, for example, can prove quite useful when XPath's
 ``starts-with()`` or ``contains()`` are not sufficient.
 
 Example selecting links in list item with a "class" attribute ending with a digit::
@@ -440,7 +440,7 @@ you may want to take a look first at this `XPath tutorial`_.
 Using text nodes in a condition
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-When you need to use the text content as argument to a `XPath string function`_,
+When you need to use the text content as argument to an `XPath string function`_,
 avoid using ``.//text()`` and use just ``.`` instead.
 
 This is because the expression ``.//text()`` yields a collection of text elements -- a *node-set*.
@@ -478,7 +478,7 @@ But using the ``.`` to mean the node, works::
 
 .. _`XPath string function`: http://www.w3.org/TR/xpath/#section-String-Functions
 
-Beware the difference between //node[1] and (//node)[1]
+Beware of the difference between //node[1] and (//node)[1]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 ``//node[1]`` selects all the nodes occurring first under their respective parents.
@@ -559,7 +559,7 @@ Built-in Selectors reference
   An instance of :class:`Selector` is a wrapper over response to select
   certain parts of its content.
 
-  ``response`` is a :class:`~scrapy.http.HtmlResponse` or
+  ``response`` is an :class:`~scrapy.http.HtmlResponse` or an
   :class:`~scrapy.http.XmlResponse` object that will be used for selecting and
   extracting data.
 
@@ -593,7 +593,7 @@ Built-in Selectors reference
 
       .. note::
 
-          For convenience this method can be called as ``response.xpath()``
+          For convenience, this method can be called as ``response.xpath()``
 
   .. method:: css(query)
 
@@ -644,7 +644,7 @@ SelectorList objects
 
 .. class:: SelectorList
 
-   The :class:`SelectorList` class is subclass of the builtin ``list``
+   The :class:`SelectorList` class is a subclass of the builtin ``list``
    class, which provides a few additional methods.
 
    .. method:: xpath(query)
@@ -680,17 +680,17 @@ Selector examples on HTML response
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Here's a couple of :class:`Selector` examples to illustrate several concepts.
-In all cases, we assume there is already an :class:`Selector` instantiated with
+In all cases, we assume there is already a :class:`Selector` instantiated with
 a :class:`~scrapy.http.HtmlResponse` object like this::
 
       sel = Selector(html_response)
 
-1. Select all ``<h1>`` elements from a HTML response body, returning a list of
+1. Select all ``<h1>`` elements from an HTML response body, returning a list of
    :class:`Selector` objects (ie. a :class:`SelectorList` object)::
 
       sel.xpath("//h1")
 
-2. Extract the text of all ``<h1>`` elements from a HTML response body,
+2. Extract the text of all ``<h1>`` elements from an HTML response body,
    returning a list of unicode strings::
 
       sel.xpath("//h1").extract()         # this includes the h1 tag
@@ -705,12 +705,12 @@ Selector examples on XML response
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Here's a couple of examples to illustrate several concepts. In both cases we
-assume there is already an :class:`Selector` instantiated with a
+assume there is already a :class:`Selector` instantiated with an
 :class:`~scrapy.http.XmlResponse` object like this::
 
       sel = Selector(xml_response)
 
-1. Select all ``<product>`` elements from a XML response body, returning a list
+1. Select all ``<product>`` elements from an XML response body, returning a list
    of :class:`Selector` objects (ie. a :class:`SelectorList` object)::
 
       sel.xpath("//product")
@@ -752,12 +752,12 @@ nodes can be accessed directly by their names::
      <Selector xpath='//link' data=u'<link xmlns="http://www.w3.org/2005/Atom'>,
      ...
 
-If you wonder why the namespace removal procedure is not always called, instead
-of having to call it manually. This is because of two reasons which, in order
+If you wonder why the namespace removal procedure isn't called always by default
+instead of having to call it manually, this is because of two reasons, which, in order
 of relevance, are:
 
 1. Removing namespaces requires to iterate and modify all nodes in the
-   document, which is a reasonably expensive operation to performs for all
+   document, which is a reasonably expensive operation to perform for all
    documents crawled by Scrapy
 
 2. There could be some cases where using namespaces is actually required, in

diff --git a/docs/topics/spiders.rst b/docs/topics/spiders.rst
@@ -190,7 +190,7 @@ scrapy.Spider
        dicts or :class:`~scrapy.item.Item` objects.
 
        :param response: the response to parse
-       :type response: :class:~scrapy.http.Response`
+       :type response: :class:`~scrapy.http.Response`
 
    .. method:: log(message, [level, component])
 
@@ -297,10 +297,10 @@ See `Scrapyd documentation`_.
 Generic Spiders
 ===============
 
-Scrapy comes with some useful generic spiders that you can use, to subclass
+Scrapy comes with some useful generic spiders that you can use to subclass
 your spiders from. Their aim is to provide convenient functionality for a few
 common scraping cases, like following all links on a site based on certain
-rules, crawling from `Sitemaps`_, or parsing a XML/CSV feed.
+rules, crawling from `Sitemaps`_, or parsing an XML/CSV feed.
 
 For the examples used in the following spiders, we'll assume you have a project
 with a ``TestItem`` declared in a ``myproject.items`` module::
@@ -342,7 +342,7 @@ CrawlSpider
    .. method:: parse_start_url(response)
 
       This method is called for the start_urls responses. It allows to parse
-      the initial responses and must return either a
+      the initial responses and must return either an
       :class:`~scrapy.item.Item` object, a :class:`~scrapy.http.Request`
       object, or an iterable containing any of them.
 
@@ -417,7 +417,7 @@ Let's now take a look at an example CrawlSpider with rules::
 This spider would start crawling example.com's home page, collecting category
 links, and item links, parsing the latter with the ``parse_item`` method. For
 each item response, some data will be extracted from the HTML using XPath, and
-a :class:`~scrapy.item.Item` will be filled with it.
+an :class:`~scrapy.item.Item` will be filled with it.
 
 XMLFeedSpider
 -------------