Skip to content

Commit

Permalink
improve scrapy deploy documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
pablohoffman committed Apr 9, 2015
1 parent 9ea309c commit 1a12922
Show file tree
Hide file tree
Showing 3 changed files with 63 additions and 20 deletions.
2 changes: 1 addition & 1 deletion docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ For more info see:
What is the recommended way to deploy a Scrapy crawler in production?
---------------------------------------------------------------------

See :ref:`topics-scrapyd`.
See :ref:`topics-deploy`.

Can I use JSON for large exports?
---------------------------------
Expand Down
6 changes: 3 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ Solving specific problems
topics/leaks
topics/images
topics/ubuntu
topics/deployment
topics/deploy
topics/autothrottle
topics/benchmarking
topics/jobs
Expand Down Expand Up @@ -186,8 +186,8 @@ Solving specific problems
:doc:`topics/ubuntu`
Install latest Scrapy packages easily on Ubuntu

:doc:`topics/deployment`
Deploying your Scrapy project in production.
:doc:`topics/deploy`
Deploying your Scrapy spiders and run them in a remote server.

:doc:`topics/autothrottle`
Adjust crawl rate dynamically based on load.
Expand Down
75 changes: 59 additions & 16 deletions docs/topics/deploy.rst
Original file line number Diff line number Diff line change
@@ -1,29 +1,72 @@
.. _topics-deployment:
.. _topics-deploy:

==========
Deployment
==========
=================
Deploying Spiders
=================

The recommended way to deploy Scrapy projects to a server is through `Scrapyd`_.
This section describes the different options you have for deploying your Scrapy
spiders to run them on a regular basis. Running Scrapy spiders in your local
machine is very convenient for the (early) development stage, but not so much
when you need to execute long-running spiders or move spiders to run in
production continously. This is where the solutions for deploying Scrapy
spiders come in.

.. _Scrapyd: https://github.com/scrapy/scrapyd
The most popular choices, for deploying Scrapy spiders, are:

* :ref:`Scrapy Cloud <deploy-scrapy-cloud>` (open source, easier to setup)
* :ref:`Scrapyd <deploy-scrapyd>` (open source, harder to setup)

.. _deploy-scrapy-cloud:

Deploying to Scrapy Cloud
=========================

`Scrapy Cloud`_ is a hosted, cloud-based service by `Scrapinghub`_, the company
behind Scrapy.

Advantages:

- easy to setup (no need to setup or manage servers)
- well-designed UI to manage spiders and review scraped items, logs and stats
- cheap pricing (cheaper than renting a server, for small workloads)

Disadvantages:

- it's not open source

To deploy spiders to Scrapy Cloud you can use the `shub`_ command line tool.
Please refer to the `Scrapy Cloud documentation`_ for more information.

The configuration is read from the ``scrapy.cfg`` file just like
``scrapyd-deploy``.

.. _deploy-scrapyd:

Deploying to a Scrapyd Server
=============================

You can deploy to a Scrapyd server using the `Scrapyd client <https://github.com/scrapy/scrapyd-client>`_. You can add targets to your ``scrapy.cfg`` file which can be deployed to using the ``scrapyd-deploy`` command.
`Scrapyd`_ is an open source application to run Scrapy spiders. It is
maintained by some of the Scrapy developers.

The basic syntax is as follows:
Advantages:

scrapyd-deploy <target> -p <project>
- it's open source, so it can be installed and run anywhere

For more information please refer to the `Deploying your project`_ section.
Disadvantages:

.. _Deploying your project: https://scrapyd.readthedocs.org/en/latest/deploy.html

Deploying to Scrapinghub
========================
- simple UI (no analytics, graphs or rich log/items browsing)
- requires setting up servers, installing and configuring scrapyd on them. An
APT repo with Ubuntu packages is provided by the Scrapyd team

You can deploy to Scrapinghub using Scrapinghub's command line client, `shub`_. The configuration is read from the ``scrapy.cfg`` file just like ``scrapyd-deploy``.
To deploy spiders to Scrapyd, you can use the scrapyd-deploy tool provided by
the `scrapyd-client`_ package. Please refer to the `scrapyd-deploy
documentation`_ for more information.

.. _shub: https://github.com/scrapinghub/shub
.. _Scrapyd: https://github.com/scrapy/scrapyd
.. _Deploying your project: https://scrapyd.readthedocs.org/en/latest/deploy.html
.. _Scrapy Cloud: http://scrapinghub.com/scrapy-cloud/
.. _scrapyd-client: https://github.com/scrapy/scrapyd-client
.. _shub: http://doc.scrapinghub.com/shub.html
.. _scrapyd-deploy documentation: http://scrapyd.readthedocs.org/en/latest/deploy.html
.. _Scrapy Cloud documentation: http://doc.scrapinghub.com/scrapy-cloud.html
.. _Scrapinghub: http://scrapinghub.com/

0 comments on commit 1a12922

Please sign in to comment.