This documentation contains everything you need to know about Scrapy.
Having trouble? We'd like to help!
- Try the :doc:`FAQ <faq>` -- it's got answers to some common questions.
- Looking for specific information? Try the :ref:`genindex` or :ref:`modindex`.
- Ask or search questions in StackOverflow using the scrapy tag.
- Ask or search questions in the Scrapy subreddit.
- Search for questions on the archives of the scrapy-users mailing list.
- Ask a question in the #scrapy IRC channel,
- Report bugs with Scrapy in our issue tracker.
.. toctree:: :caption: First steps :hidden: intro/overview intro/install intro/tutorial intro/examples
- Understand what Scrapy is and how it can help you.
- Get Scrapy installed on your computer.
- Write your first Scrapy project.
- Learn more by playing with a pre-made Scrapy project.
.. toctree:: :caption: Basic concepts :hidden: topics/commands topics/spiders topics/selectors topics/items topics/loaders topics/shell topics/item-pipeline topics/feed-exports topics/request-response topics/link-extractors topics/settings topics/exceptions
- Learn about the command-line tool used to manage your Scrapy project.
- Write the rules to crawl your websites.
- Extract the data from web pages using XPath.
- Test your extraction code in an interactive environment.
- Define the data you want to scrape.
- Populate your items with the extracted data.
- Post-process and store your scraped data.
- Output your scraped data using different formats and storages.
- Understand the classes used to represent HTTP requests and responses.
- Convenient classes to extract links to follow from pages.
- Learn how to configure Scrapy and see all :ref:`available settings <topics-settings-ref>`.
- See all available exceptions and their meaning.
.. toctree:: :caption: Built-in services :hidden: topics/logging topics/stats topics/email topics/telnetconsole topics/webservice
- Learn how to use Python's builtin logging on Scrapy.
- Collect statistics about your scraping crawler.
- Send email notifications when certain events occur.
- Inspect a running crawler using a built-in Python console.
- Monitor and control a crawler using a web service.
Solving specific problems
.. toctree:: :caption: Solving specific problems :hidden: faq topics/debug topics/contracts topics/practices topics/broad-crawls topics/firefox topics/firebug topics/leaks topics/media-pipeline topics/deploy topics/autothrottle topics/benchmarking topics/jobs
- Get answers to most frequently asked questions.
- Learn how to debug common problems of your scrapy spider.
- Learn how to use contracts for testing your spiders.
- Get familiar with some Scrapy common practices.
- Tune Scrapy for crawling a lot domains in parallel.
- Learn how to scrape with Firefox and some useful add-ons.
- Learn how to scrape efficiently using Firebug.
- Learn how to find and get rid of memory leaks in your crawler.
- Download files and/or images associated with your scraped items.
- Deploying your Scrapy spiders and run them in a remote server.
- Adjust crawl rate dynamically based on load.
- Check how Scrapy performs on your hardware.
- Learn how to pause and resume crawls for large spiders.
.. toctree:: :caption: Extending Scrapy :hidden: topics/architecture topics/downloader-middleware topics/spider-middleware topics/extensions topics/api topics/signals topics/exporters
- Understand the Scrapy architecture.
- Customize how pages get requested and downloaded.
- Customize the input and output of your spiders.
- Extend Scrapy with your custom functionality
- Use it on extensions and middlewares to extend Scrapy functionality
- See all available signals and how to work with them.
- Quickly export your scraped items to a file (XML, CSV, etc).
All the rest
.. toctree:: :caption: All the rest :hidden: news contributing versioning