Awesome Scrapy

A curated list of awesome packages, articles, and other cool resources from the Scrapy community. Scrapy is a fast high-level web crawling & scraping framework for Python.

Apps

Visual Web Scraping

Portia Visual scraping for Scrapy

Distributed Spider

scrapy-cluster Distributed on demand scraping cluster using Redis and Kafka.
scrapy-redis Redis-based components for Scrapy.

Scrapy Service

scrapyscript Run a Scrapy spider programmatically from a script or a Celery task - no project required.
scrapyd A service daemon to run Scrapy spiders
scrapyd-client Command line client for Scrapyd server
python-scrapyd-api A Python wrapper for working with Scrapyd's API.
SpiderKeeper A scalable admin ui for spider service
scrapyrt HTTP server which provides API for scheduling Scrapy spiders and making requests with spiders.

Front-End Scrapy Managers

Gerapy Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js
SpiderKeeper admin ui for scrapy/open source scrapinghub.
ScrapydWeb Scrapyd cluster management, Scrapy log analysis & visualization, Basic auth, Auto packaging, Timer Tasks, Email notice, and Mobile UI.

Monitor

scrapy-sentry Logs Scrapy exceptions into Sentry
scrapy-statsd-middleware Statsd integration middleware for scrapy
scrapy-jsonrpc An extension to control a running Scrapy web crawler via JSON-RPC
scrapy-fieldstats A Scrapy extension to log items coverage when the spider shuts down
spidermon Extension which provides useful tools for data validation, stats monitoring, and notification messages.

Avoid Ban

HttpProxyMiddleware A middleware for scrapy. Used to change HTTP proxy from time to time.
scrapy-proxies Processes Scrapy requests using a random proxy from list to avoid IP ban and improve crawling speed.
scrapy-rotating-proxies Use multiple proxies with Scrapy
scrapy-random-useragent Scrapy Middleware to set a random User-Agent for every Request.
scrapy-fake-useragent Random User-Agent middleware based on fake-useragent
scrapy-crawlera Crawlera routes requests through a pool of IPs, throttling access by introducing delays and discarding IPs from the pool when they get banned from certain domains, or have other problems.

Data Processing

scrapy-elasticsearch A scrapy pipeline which send items to Elastic Search server
scrapy-mongodb MongoDB pipeline for Scrapy.
scrapy-mysql-pipeline MySQL pipeline to persist items in MySQL databases.
scrapy-s3pipeline Scrapy pipeline to store chunked items into AWS S3 bucket
scrapy-sqs-exporter Scrapy extension for outputting scraped items to an Amazon SQS instance
scrapy-kafka-export Scrapy extension which writes crawled items to Kafka
scrapy-rss-exporter An RSS exporter for Scrapy

Process Javascript

scrapy-playwright Enable scraping dynamic pages using PlayWright.
scrapy-puppeteer Make Scrapy and Puppeteer work together to handle Javascript-rendered pages.
scrapy-splash Make Scrapy can understand Javascript

Other Useful Extensions

scrapy-djangoitem Scrapy extension to write scraped items using Django models
scrapy-deltafetch Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls
scrapy-crawl-once This package provides a Scrapy middleware which allows to avoid re-crawling pages which were already downloaded in previous crawls.
scrapy-magicfields Scrapy middleware to add extra fields to items, like timestamp, response fields, spider attributes etc.
scrapy-pagestorage A scrapy extension to store requests and responses information in storage service.
itemloaders Library to populate items using XPath and CSS with a convenient API.
itemadapter Adapter which provides a common interface to handle objects of different types in an uniform manner.
scrapy-poet Page Object pattern implementation which enables writing reusable and portable extraction and crawling code.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Scrapy

Table of Contents

Apps

Visual Web Scraping

Distributed Spider

Scrapy Service

Front-End Scrapy Managers

Monitor

Avoid Ban

Data Processing

Process Javascript

Other Useful Extensions

Resources

Articles

Exercises

Video

Book

About

Releases

Packages

Contributors 3

AccordBox/awesome-scrapy

Folders and files

Latest commit

History

Repository files navigation

Awesome Scrapy

Table of Contents

Apps

Visual Web Scraping

Distributed Spider

Scrapy Service

Front-End Scrapy Managers

Monitor

Avoid Ban

Data Processing

Process Javascript

Other Useful Extensions

Resources

Articles

Exercises

Video

Book

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages