Skip to content

Commit

Permalink
Create separate page for kingfisher integration documentation
Browse files Browse the repository at this point in the history
Signed-off-by: Yohanna Lisnichuk <yohanitalisnichuk@gmail.com>
  • Loading branch information
yolile committed Jun 3, 2020
1 parent d0db793 commit d139421
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 20 deletions.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ You can:

- :doc:`Download data to your computer, by installing Kingfisher Collect<local>`
- :doc:`Download data to a remote server, by using Scrapyd<scrapyd>`
- :doc:`Integrate with Kingfisher Process<kingfisher_process_integration>`

You can also try using Kingfisher Collect with `Scrapy Cloud <https://scrapinghub.com/scrapy-cloud>`_.

Expand Down
19 changes: 19 additions & 0 deletions docs/kingfisher_process_integration.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Integrate with Kingfisher Process
=================================

Besides storing the scraped data on disk, you can also send them to an instance of `Kingfisher Process <https://kingfisher-process.readthedocs.io/>`_ for processing.

To do that, you need to deploy an instance of Kingfisher Process, including its `web app <https://kingfisher-process.readthedocs.io/en/latest/web.html#web-app>`__. Then, set the following either as environment variables or as Scrapy settings in ``kingfisher_scrapy.settings.py``:

``KINGFISHER_API_URI``
The URL from which Kingfisher Process' `web app <https://kingfisher-process.readthedocs.io/en/latest/web.html#web-app>`_ is served. Do not include a trailing slash.
``KINGFISHER_API_KEY``
One of the API keys in Kingfisher Process' `API_KEYS <https://kingfisher-process.readthedocs.io/en/latest/config.html#web-api>`__ setting.

For example, set the environment variables, then run ``scrapy crawl`` commands:

.. code-block:: bash
export KINGFISHER_API_URI='http://127.0.0.1:5000'
export KINGFISHER_API_KEY=1234
scrapy crawl my_spider
20 changes: 0 additions & 20 deletions docs/local.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,23 +90,3 @@ Use data
--------

You should now have a crawl directory within the ``data`` directory containing OCDS files. For help using data, read about `using open contracting data <https://www.open-contracting.org/data/data-use/>`__.

Integrate with `Kingfisher Process <https://kingfisher-process.readthedocs.io/>`_
--------

Besides storing the scraped data on disk, you can also send them to an instance of `Kingfisher Process <https://kingfisher-process.readthedocs.io/>`_ for processing.

To do that, you need to deploy an instance of Kingfisher Process, including its `web app <https://kingfisher-process.readthedocs.io/en/latest/web.html#web-app>`__. Then, set the following either as environment variables or as Scrapy settings in ``kingfisher_scrapy.settings.py``:

``KINGFISHER_API_URI``
The URL from which Kingfisher Process' `web app <https://kingfisher-process.readthedocs.io/en/latest/web.html#web-app>`_ is served. Do not include a trailing slash.
``KINGFISHER_API_KEY``
One of the API keys in Kingfisher Process' `API_KEYS <https://kingfisher-process.readthedocs.io/en/latest/config.html#web-api>`__ setting.

For example, set the environment variables, then run ``scrapy crawl`` commands:

.. code-block:: bash
export KINGFISHER_API_URI='http://127.0.0.1:5000'
export KINGFISHER_API_KEY=1234
scrapy crawl my_spider

0 comments on commit d139421

Please sign in to comment.