From 548d5915ebf79fe89855f5b89f94e172474bbb4b Mon Sep 17 00:00:00 2001 From: Yohanna Lisnichuk Date: Mon, 1 Jun 2020 13:43:46 -0400 Subject: [PATCH 1/4] Add docs about kingfisher process integration Signed-off-by: Yohanna Lisnichuk --- docs/local.rst | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/docs/local.rst b/docs/local.rst index 7d285881a..d8346d142 100644 --- a/docs/local.rst +++ b/docs/local.rst @@ -90,3 +90,20 @@ Use data -------- You should now have a crawl directory within the ``data`` directory containing OCDS files. For help using data, read about `using open contracting data `__. + +Integrate with `Kingfisher Process `_ +-------- + +Besides storing the scraped data on disk, you can also send them to an instance of `Kingfisher Process `_ for processing. + +To do that, you need to have deployed an instance of `Kingfisher Process web app `_ and set the following environment variables: + +* ``KINGFISHER_API_URI``: The url where Kingfisher Process `web app `_ is served +* ``KINGFISHER_API_KEY``: The Api Key configured for Kingfisher Process `web app `_ + +For example: + +.. code-block:: bash + + export KINGFISHER_API_URI = 'http://127.0.0.1:5000' + export KINGFISHER_API_KEY = 1234 \ No newline at end of file From d0db793691f6abecc103cca2aeef285f43a44d02 Mon Sep 17 00:00:00 2001 From: Yohanna Lisnichuk Date: Wed, 3 Jun 2020 13:48:15 -0400 Subject: [PATCH 2/4] Apply suggestions from code review Co-authored-by: James McKinney <26463+jpmckinney@users.noreply.github.com> --- docs/local.rst | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/docs/local.rst b/docs/local.rst index d8346d142..0a80da824 100644 --- a/docs/local.rst +++ b/docs/local.rst @@ -96,14 +96,17 @@ Integrate with `Kingfisher Process ` Besides storing the scraped data on disk, you can also send them to an instance of `Kingfisher Process `_ for processing. -To do that, you need to have deployed an instance of `Kingfisher Process web app `_ and set the following environment variables: +To do that, you need to deploy an instance of Kingfisher Process, including its `web app `__. Then, set the following either as environment variables or as Scrapy settings in ``kingfisher_scrapy.settings.py``: -* ``KINGFISHER_API_URI``: The url where Kingfisher Process `web app `_ is served -* ``KINGFISHER_API_KEY``: The Api Key configured for Kingfisher Process `web app `_ +``KINGFISHER_API_URI`` + The URL from which Kingfisher Process' `web app `_ is served. Do not include a trailing slash. +``KINGFISHER_API_KEY`` + One of the API keys in Kingfisher Process' `API_KEYS `__ setting. -For example: +For example, set the environment variables, then run ``scrapy crawl`` commands: .. code-block:: bash - export KINGFISHER_API_URI = 'http://127.0.0.1:5000' - export KINGFISHER_API_KEY = 1234 \ No newline at end of file + export KINGFISHER_API_URI='http://127.0.0.1:5000' + export KINGFISHER_API_KEY=1234 + scrapy crawl my_spider From d1394218b7afb0af1973202e4b792161f2ffd84e Mon Sep 17 00:00:00 2001 From: Yohanna Lisnichuk Date: Wed, 3 Jun 2020 14:01:06 -0400 Subject: [PATCH 3/4] Create separate page for kingfisher integration documentation Signed-off-by: Yohanna Lisnichuk --- docs/index.rst | 1 + docs/kingfisher_process_integration.rst | 19 +++++++++++++++++++ docs/local.rst | 20 -------------------- 3 files changed, 20 insertions(+), 20 deletions(-) create mode 100644 docs/kingfisher_process_integration.rst diff --git a/docs/index.rst b/docs/index.rst index 3f2cff3f6..61291b2ae 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -7,6 +7,7 @@ You can: - :doc:`Download data to your computer, by installing Kingfisher Collect` - :doc:`Download data to a remote server, by using Scrapyd` +- :doc:`Integrate with Kingfisher Process` You can also try using Kingfisher Collect with `Scrapy Cloud `_. diff --git a/docs/kingfisher_process_integration.rst b/docs/kingfisher_process_integration.rst new file mode 100644 index 000000000..d35299712 --- /dev/null +++ b/docs/kingfisher_process_integration.rst @@ -0,0 +1,19 @@ +Integrate with Kingfisher Process +================================= + +Besides storing the scraped data on disk, you can also send them to an instance of `Kingfisher Process `_ for processing. + +To do that, you need to deploy an instance of Kingfisher Process, including its `web app `__. Then, set the following either as environment variables or as Scrapy settings in ``kingfisher_scrapy.settings.py``: + +``KINGFISHER_API_URI`` + The URL from which Kingfisher Process' `web app `_ is served. Do not include a trailing slash. +``KINGFISHER_API_KEY`` + One of the API keys in Kingfisher Process' `API_KEYS `__ setting. + +For example, set the environment variables, then run ``scrapy crawl`` commands: + +.. code-block:: bash + + export KINGFISHER_API_URI='http://127.0.0.1:5000' + export KINGFISHER_API_KEY=1234 + scrapy crawl my_spider diff --git a/docs/local.rst b/docs/local.rst index 0a80da824..7d285881a 100644 --- a/docs/local.rst +++ b/docs/local.rst @@ -90,23 +90,3 @@ Use data -------- You should now have a crawl directory within the ``data`` directory containing OCDS files. For help using data, read about `using open contracting data `__. - -Integrate with `Kingfisher Process `_ --------- - -Besides storing the scraped data on disk, you can also send them to an instance of `Kingfisher Process `_ for processing. - -To do that, you need to deploy an instance of Kingfisher Process, including its `web app `__. Then, set the following either as environment variables or as Scrapy settings in ``kingfisher_scrapy.settings.py``: - -``KINGFISHER_API_URI`` - The URL from which Kingfisher Process' `web app `_ is served. Do not include a trailing slash. -``KINGFISHER_API_KEY`` - One of the API keys in Kingfisher Process' `API_KEYS `__ setting. - -For example, set the environment variables, then run ``scrapy crawl`` commands: - -.. code-block:: bash - - export KINGFISHER_API_URI='http://127.0.0.1:5000' - export KINGFISHER_API_KEY=1234 - scrapy crawl my_spider From 3a322fca61e92533c74330d124ee7e06e15fb20f Mon Sep 17 00:00:00 2001 From: Yohanna Lisnichuk Date: Wed, 3 Jun 2020 14:04:23 -0400 Subject: [PATCH 4/4] Update kingfisher process reference in settings Signed-off-by: Yohanna Lisnichuk --- kingfisher_scrapy/settings.py | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kingfisher_scrapy/settings.py b/kingfisher_scrapy/settings.py index 53fc8d389..4e679b2a9 100644 --- a/kingfisher_scrapy/settings.py +++ b/kingfisher_scrapy/settings.py @@ -79,10 +79,9 @@ 'kingfisher_scrapy.pipelines.Validate': 300, } -# To send items to Kingfishet Process, set this to, for example, "http://kingfisher.example.com" (no trailing slash). +# To send items to Kingfishet Process, for more info see +# https://kingfisher-collect.readthedocs.io/en/latest/kingfisher_process_integration.html KINGFISHER_API_URI = os.getenv('KINGFISHER_API_URI') -# Set this to the same value as Kingfisher Process' `API_KEYS` setting. -# See https://kingfisher-process.readthedocs.io/en/latest/config.html#web-api KINGFISHER_API_KEY = os.getenv('KINGFISHER_API_KEY') # If Kingfisher Process can read Kingfisher Collect's `FILES_STORE`, then Kingfisher Collect can send file paths # instead of files to Kingfisher Process' API. To enable that, set this to the absolute path to the `FILES_STORE`.