Skip to content

Commit

Permalink
Merge 55602b2 into 61e95ba
Browse files Browse the repository at this point in the history
  • Loading branch information
yolile committed Jun 3, 2020
2 parents 61e95ba + 55602b2 commit 40d4293
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 3 deletions.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ You can:

- :doc:`Download data to your computer, by installing Kingfisher Collect<local>`
- :doc:`Download data to a remote server, by using Scrapyd<scrapyd>`
- :doc:`Integrate with Kingfisher Process<kingfisher_process_integration>`

You can also try using Kingfisher Collect with `Scrapy Cloud <https://scrapinghub.com/scrapy-cloud>`_.

Expand Down
19 changes: 19 additions & 0 deletions docs/kingfisher_process_integration.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Integrate with Kingfisher Process
=================================

Besides storing the scraped data on disk, you can also send them to an instance of `Kingfisher Process <https://kingfisher-process.readthedocs.io/>`_ for processing.

To do that, you need to deploy an instance of Kingfisher Process, including its `web app <https://kingfisher-process.readthedocs.io/en/latest/web.html#web-app>`__. Then, set the following either as environment variables or as Scrapy settings in ``kingfisher_scrapy.settings.py``:

``KINGFISHER_API_URI``
The URL from which Kingfisher Process' `web app <https://kingfisher-process.readthedocs.io/en/latest/web.html#web-app>`_ is served. Do not include a trailing slash.
``KINGFISHER_API_KEY``
One of the API keys in Kingfisher Process' `API_KEYS <https://kingfisher-process.readthedocs.io/en/latest/config.html#web-api>`__ setting.

For example, set the environment variables, then run ``scrapy crawl`` commands:

.. code-block:: bash
export KINGFISHER_API_URI='http://127.0.0.1:5000'
export KINGFISHER_API_KEY=1234
scrapy crawl my_spider
5 changes: 2 additions & 3 deletions kingfisher_scrapy/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,10 +82,9 @@
'kingfisher_scrapy.pipelines.Validate': 300,
}

# To send items to Kingfishet Process, set this to, for example, "http://kingfisher.example.com" (no trailing slash).
# To send items to Kingfishet Process, for more info see
# https://kingfisher-collect.readthedocs.io/en/latest/kingfisher_process_integration.html
KINGFISHER_API_URI = os.getenv('KINGFISHER_API_URI')
# Set this to the same value as Kingfisher Process' `API_KEYS` setting.
# See https://kingfisher-process.readthedocs.io/en/latest/config.html#web-api
KINGFISHER_API_KEY = os.getenv('KINGFISHER_API_KEY')
# If Kingfisher Process can read Kingfisher Collect's `FILES_STORE`, then Kingfisher Collect can send file paths
# instead of files to Kingfisher Process' API. To enable that, set this to the absolute path to the `FILES_STORE`.
Expand Down

0 comments on commit 40d4293

Please sign in to comment.