GitHub is home to over 28 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.
A scrapy extension to store requests and responses information in storage service
Scrapy middleware to add extra fields to items, like timestamp, response fields, spider attributes etc.
Crawlera middleware for Scrapy
Scrapy spider middleware to use Scrapinghub's Hub Crawl Frontier as a backend for URLs
Scrapy schema validation pipeline and Item builder using JSON Schema
Scrapy extension to write scraped items using Django models
Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls
Scrapy extension to control spiders using JSON-RPC
A Scrapy pipeline to categorize items using MonkeyLearn
A scrapy extension to sync `.scrapy` folder to an S3 bucket
Scrapy spider middleware to split an item into multiple items using a multi-valued key
Scrapy spider middleware to clean up query parameters in request URLs
Scrapy pipeline for writing items to BigML datasets
Scrapy support for working with streamcorpus Stream Items.