Merge remote-tracking branch 'upstream/master' into py3_single_argume…

…nt_processors
scrapy · Sep 3, 2019 · b3981d3 · b3981d3
2 parents 4d23a75 + d4b8bf1
commit b3981d3
Show file tree

Hide file tree

Showing 50 changed files with 1,748 additions and 587 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,41 @@
+---
+name: Bug report
+about: Report a problem to help us improve
+---
+
+<!--
+
+Thanks for taking an interest in Scrapy!
+
+If you have a question that starts with "How to...", please see the Scrapy Community page: https://scrapy.org/community/.
+The Github issue tracker's purpose is to deal with bug reports and feature requests for the project itself.
+
+Keep in mind that by filing an issue, you are expected to comply with Scrapy's Code of Conduct, including treating everyone with respect: https://github.com/scrapy/scrapy/blob/master/CODE_OF_CONDUCT.md
+
+The following is a suggested template to structure your issue, you can find more guidelines at https://doc.scrapy.org/en/latest/contributing.html#reporting-bugs
+
+-->
+
+### Description
+
+[Description of the issue]
+
+### Steps to Reproduce
+
+1. [First Step]
+2. [Second Step]
+3. [and so on...]
+
+**Expected behavior:** [What you expect to happen]
+
+**Actual behavior:** [What actually happens]
+
+**Reproduces how often:** [What percentage of the time does it reproduce?]
+
+### Versions
+
+Please paste here the output of executing `scrapy version --verbose` in the command line.
+
+### Additional context
+
+Any additional information, configuration, data or output from commands that might be necessary to reproduce or understand the issue. Please try not to include screenshots of code or the command line, paste the contents as text instead. You can use [GitHub Flavored Markdown](https://help.github.com/en/articles/creating-and-highlighting-code-blocks) to make the text look better.
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,33 @@
+---
+name: Feature request
+about: Suggest an idea for an enhancement or new feature
+---
+
+<!--
+
+Thanks for taking an interest in Scrapy!
+
+If you have a question that starts with "How to...", please see the Scrapy Community page: https://scrapy.org/community/.
+The Github issue tracker's purpose is to deal with bug reports and feature requests for the project itself.
+
+Keep in mind that by filing an issue, you are expected to comply with Scrapy's Code of Conduct, including treating everyone with respect: https://github.com/scrapy/scrapy/blob/master/CODE_OF_CONDUCT.md
+
+The following is a suggested template to structure your pull request, you can find more guidelines at https://doc.scrapy.org/en/latest/contributing.html#writing-patches and https://doc.scrapy.org/en/latest/contributing.html#submitting-patches
+
+-->
+
+## Summary
+
+One paragraph explanation of the feature.
+
+## Motivation
+
+Why are we doing this? What use cases does it support? What is the expected outcome?
+
+## Describe alternatives you've considered
+
+A clear and concise description of the alternative solutions you've considered. Be sure to explain why Scrapy's existing customizability isn't suitable for this feature.
+
+## Additional context
+
+Any additional information about the feature request here.
diff --git a/.travis.yml b/.travis.yml
@@ -1,31 +1,34 @@
 language: python
+dist: xenial
 branches:
   only:
     - master
     - /^\d\.\d+$/
     - /^\d\.\d+\.\d+(rc\d+|\.dev\d+)?$/
 matrix:
   include:
-    - python: 2.7
-      env: TOXENV=py27
-    - python: 2.7
-      env: TOXENV=jessie
-    - python: 2.7
-      env: TOXENV=pypy
-    - python: 2.7
-      env: TOXENV=pypy3
-    - python: 3.4
-      env: TOXENV=py34
-    - python: 3.5
-      env: TOXENV=py35
-    - python: 3.6
-      env: TOXENV=py36
-    - python: 3.7
-      env: TOXENV=py37
-      dist: xenial
-      sudo: true
-    - python: 3.6
-      env: TOXENV=docs
+    - env: TOXENV=py27
+      python: 2.7
+    - env: TOXENV=py27-pinned
+      python: 2.7
+    - env: TOXENV=py27-extra-deps
+      python: 2.7
+    - env: TOXENV=pypy
+      python: 2.7
+    - env: TOXENV=pypy3
+      python: 3.5
+    - env: TOXENV=py35
+      python: 3.5
+    - env: TOXENV=py35-pinned
+      python: 3.5
+    - env: TOXENV=py36
+      python: 3.6
+    - env: TOXENV=py37
+      python: 3.7
+    - env: TOXENV=py37-extra-deps
+      python: 3.7
+    - env: TOXENV=docs
+      python: 3.6
 install:
   - |
       if [ "$TOXENV" = "pypy" ]; then

diff --git a/README.rst b/README.rst
@@ -40,7 +40,7 @@ https://scrapy.org
 Requirements
 ============
 
-* Python 2.7 or Python 3.4+
+* Python 2.7 or Python 3.5+
 * Works on Linux, Windows, Mac OSX, BSD
 
 Install

diff --git a/docs/conf.py b/docs/conf.py
@@ -252,6 +252,16 @@
 
     # Private exception used by the command-line interface implementation.
     r'^scrapy\.exceptions\.UsageError',
+
+    # Methods of BaseItemExporter subclasses are only documented in
+    # BaseItemExporter.
+    r'^scrapy\.exporters\.(?!BaseItemExporter\b)\w*?\.',
+
+    # Extension behavior is only modified through settings. Methods of
+    # extension classes, as well as helper functions, are implementation
+    # details that are not documented.
+    r'^scrapy\.extensions\.[a-z]\w*?\.[A-Z]\w*?\.',  # methods
+    r'^scrapy\.extensions\.[a-z]\w*?\.[a-z]',  # helper functions
 ]
 
 

diff --git a/docs/faq.rst b/docs/faq.rst
@@ -69,7 +69,7 @@ Here's an example spider using BeautifulSoup API, with ``lxml`` as the HTML pars
 What Python versions does Scrapy support?
 -----------------------------------------
 
-Scrapy is supported under Python 2.7 and Python 3.4+
+Scrapy is supported under Python 2.7 and Python 3.5+
 under CPython (default Python implementation) and PyPy (starting with PyPy 5.9).
 Python 2.6 support was dropped starting at Scrapy 0.20.
 Python 3 support was added in Scrapy 1.1.

diff --git a/docs/intro/install.rst b/docs/intro/install.rst
@@ -7,7 +7,7 @@ Installation guide
 Installing Scrapy
 =================
 
-Scrapy runs on Python 2.7 and Python 3.4 or above
+Scrapy runs on Python 2.7 and Python 3.5 or above
 under CPython (default Python implementation) and PyPy (starting with PyPy 5.9).
 
 If you're using `Anaconda`_ or `Miniconda`_, you can install the package from

diff --git a/docs/news.rst b/docs/news.rst
@@ -6,6 +6,11 @@ Release notes
 .. note:: Scrapy 1.x will be the last series supporting Python 2. Scrapy 2.0,
           planned for Q4 2019 or Q1 2020, will support **Python 3 only**.
 
+Scrapy 1.7.3 (2019-08-01)
+-------------------------
+
+Enforce lxml 4.3.5 or lower for Python 3.4 (:issue:`3912`, :issue:`3918`).
+
 Scrapy 1.7.2 (2019-07-23)
 -------------------------
 
@@ -75,8 +80,8 @@ New features
     provides a cleaner way to pass keyword arguments to callback methods
     (:issue:`1138`, :issue:`3563`)
 
-*   A new :class:`~scrapy.http.JSONRequest` class offers a more convenient way
-    to build JSON requests (:issue:`3504`, :issue:`3505`)
+*   A new :class:`JSONRequest <scrapy.http.JsonRequest>` class offers a more
+    convenient way to build JSON requests (:issue:`3504`, :issue:`3505`)
 
 *   A ``process_request`` callback passed to the :class:`~scrapy.spiders.Rule`
     constructor now receives the :class:`~scrapy.http.Response` object that
@@ -1264,8 +1269,8 @@ This 1.1 release brings a lot of interesting features and bug fixes:
     this behavior, update :setting:`ROBOTSTXT_OBEY` in ``settings.py`` file
     after creating a new project.
   - Exporters now work on unicode, instead of bytes by default (:issue:`1080`).
-    If you use ``PythonItemExporter``, you may want to update your code to
-    disable binary mode which is now deprecated.
+    If you use :class:`~scrapy.exporters.PythonItemExporter`, you may want to
+    update your code to disable binary mode which is now deprecated.
   - Accept XML node names containing dots as valid (:issue:`1533`).
   - When uploading files or images to S3 (with ``FilesPipeline`` or
     ``ImagesPipeline``), the default ACL policy is now "private" instead
@@ -1403,8 +1408,8 @@ Bugfixes
 - Fixed bug on ``XMLItemExporter`` with non-string fields in
   items (:issue:`1738`).
 - Fixed startproject command in OS X (:issue:`1635`).
-- Fixed PythonItemExporter and CSVExporter for non-string item
-  types (:issue:`1737`).
+- Fixed :class:`~scrapy.exporters.PythonItemExporter` and CSVExporter for
+  non-string item types (:issue:`1737`).
 - Various logging related fixes (:issue:`1294`, :issue:`1419`, :issue:`1263`,
   :issue:`1624`, :issue:`1654`, :issue:`1722`, :issue:`1726` and :issue:`1303`).
 - Fixed bug in ``utils.template.render_templatefile()`` (:issue:`1212`).

diff --git a/docs/topics/developer-tools.rst b/docs/topics/developer-tools.rst
@@ -252,17 +252,41 @@ If the handy ``has_next`` element is ``true`` (try loading
 `quotes.toscrape.com/api/quotes?page=10`_ in your browser or a
 page-number greater than 10), we increment the ``page`` attribute 
 and ``yield`` a new request, inserting the incremented page-number 
-into our ``url``. 
+into our ``url``.
 
-You can see that with a few inspections in the `Network`-tool we 
+.. _requests-from-curl:
+
+In more complex websites, it could be difficult to easily reproduce the
+requests, as we could need to add ``headers`` or ``cookies`` to make it work.
+In those cases you can export the requests in `cURL <https://curl.haxx.se/>`_
+format, by right-clicking on each of them in the network tool and using the
+:meth:`~scrapy.http.Request.from_curl()` method to generate an equivalent
+request::
+
+    from scrapy import Request
+
+    request = Request.from_curl(
+        "curl 'http://quotes.toscrape.com/api/quotes?page=1' -H 'User-Agent: Mozil"
+        "la/5.0 (X11; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0' -H 'Acce"
+        "pt: */*' -H 'Accept-Language: ca,en-US;q=0.7,en;q=0.3' --compressed -H 'X"
+        "-Requested-With: XMLHttpRequest' -H 'Proxy-Authorization: Basic QFRLLTAzM"
+        "zEwZTAxLTk5MWUtNDFiNC1iZWRmLTJjNGI4M2ZiNDBmNDpAVEstMDMzMTBlMDEtOTkxZS00MW"
+        "I0LWJlZGYtMmM0YjgzZmI0MGY0' -H 'Connection: keep-alive' -H 'Referer: http"
+        "://quotes.toscrape.com/scroll' -H 'Cache-Control: max-age=0'")
+
+Alternatively, if you want to know the arguments needed to recreate that
+request you can use the :func:`scrapy.utils.curl.curl_to_request_kwargs`
+function to get a dictionary with the equivalent arguments.
+
+As you can see, with a few inspections in the `Network`-tool we
 were able to easily replicate the dynamic requests of the scrolling 
 functionality of the page. Crawling dynamic pages can be quite
 daunting and pages can be very complex, but it (mostly) boils down
 to identifying the correct request and replicating it in your spider.
 
 .. _Developer Tools: https://en.wikipedia.org/wiki/Web_development_tools
 .. _quotes.toscrape.com: http://quotes.toscrape.com
-.. _quotes.toscrape.com/scroll: quotes.toscrape.com/scroll/
+.. _quotes.toscrape.com/scroll: http://quotes.toscrape.com/scroll
 .. _quotes.toscrape.com/api/quotes?page=10: http://quotes.toscrape.com/api/quotes?page=10
 .. _has-class-extension: https://parsel.readthedocs.io/en/latest/usage.html#other-xpath-extensions