This documentation contains everything you need to know about Scrapy.
Having trouble? We'd like to help!
- Try the :doc:`FAQ <faq>` -- it's got answers to some common questions.
- Looking for specific information? Try the :ref:`genindex` or :ref:`modindex`.
- Search for information in the archives of the scrapy-users mailing list, or post a question.
- Ask a question in the #scrapy IRC channel.
- Report bugs with Scrapy in our issue tracker.
- Understand what Scrapy is and how it can help you.
- Get Scrapy installed on your computer.
- Write your first Scrapy project.
- Learn more by playing with a pre-made Scrapy project.
- Learn about the command-line tool used to manage your Scrapy project.
- Define the data you want to scrape.
- Write the rules to crawl your websites.
- Extract the data from web pages.
- Test your extraction code in an interactive environment.
- Populate your items with the extracted data.
- Post-process and store your scraped data.
- Output your scraped data using different formats and storages.
- Convenient classes to extract links to follow from pages.
- Understand the simple logging facility provided by Scrapy.
- Collect statistics about your scraping crawler.
- Send email notifications when certain events occur.
- Inspect a running crawler using a built-in Python console.
- Monitor and control a crawler using a web service.
Solving specific problems
- Get answers to most frequently asked questions.
- Learn how to scrape with Firefox and some useful add-ons.
- Learn how to scrape efficiently using Firebug.
- Learn how to find and get rid of memory leaks in your crawler.
- Download static images associated with your scraped items.
- Install latest Scrapy packages easily on Ubuntu
- Deploying your Scrapy project in production.
- Learn how to pause and resume crawls for large spiders.
- Understand the Scrapy architecture.
- Customize how pages get requested and downloaded.
- Customize the input and output of your spiders.
- Add any custom functionality using :doc:`signals <topics/signals>` and the Scrapy API
- Learn about the command-line tool and see all :ref:`available commands <topics-commands-ref>`.
- Understand the classes used to represent HTTP requests and responses.
- Learn how to configure Scrapy and see all :ref:`available settings <topics-settings-ref>`.
- See all available signals and how to work with them.
- See all available exceptions and their meaning.
- Quickly export your scraped items to a file (XML, CSV, etc).