diff --git a/docs/index.rst b/docs/index.rst index 3f2cff3f6..61291b2ae 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -7,6 +7,7 @@ You can: - :doc:`Download data to your computer, by installing Kingfisher Collect` - :doc:`Download data to a remote server, by using Scrapyd` +- :doc:`Integrate with Kingfisher Process` You can also try using Kingfisher Collect with `Scrapy Cloud `_. diff --git a/docs/kingfisher_process_integration.rst b/docs/kingfisher_process_integration.rst new file mode 100644 index 000000000..d35299712 --- /dev/null +++ b/docs/kingfisher_process_integration.rst @@ -0,0 +1,19 @@ +Integrate with Kingfisher Process +================================= + +Besides storing the scraped data on disk, you can also send them to an instance of `Kingfisher Process `_ for processing. + +To do that, you need to deploy an instance of Kingfisher Process, including its `web app `__. Then, set the following either as environment variables or as Scrapy settings in ``kingfisher_scrapy.settings.py``: + +``KINGFISHER_API_URI`` + The URL from which Kingfisher Process' `web app `_ is served. Do not include a trailing slash. +``KINGFISHER_API_KEY`` + One of the API keys in Kingfisher Process' `API_KEYS `__ setting. + +For example, set the environment variables, then run ``scrapy crawl`` commands: + +.. code-block:: bash + + export KINGFISHER_API_URI='http://127.0.0.1:5000' + export KINGFISHER_API_KEY=1234 + scrapy crawl my_spider diff --git a/kingfisher_scrapy/settings.py b/kingfisher_scrapy/settings.py index 7db1f5bc5..7bda637d8 100644 --- a/kingfisher_scrapy/settings.py +++ b/kingfisher_scrapy/settings.py @@ -82,10 +82,9 @@ 'kingfisher_scrapy.pipelines.Validate': 300, } -# To send items to Kingfishet Process, set this to, for example, "http://kingfisher.example.com" (no trailing slash). +# To send items to Kingfishet Process, for more info see +# https://kingfisher-collect.readthedocs.io/en/latest/kingfisher_process_integration.html KINGFISHER_API_URI = os.getenv('KINGFISHER_API_URI') -# Set this to the same value as Kingfisher Process' `API_KEYS` setting. -# See https://kingfisher-process.readthedocs.io/en/latest/config.html#web-api KINGFISHER_API_KEY = os.getenv('KINGFISHER_API_KEY') # If Kingfisher Process can read Kingfisher Collect's `FILES_STORE`, then Kingfisher Collect can send file paths # instead of files to Kingfisher Process' API. To enable that, set this to the absolute path to the `FILES_STORE`.