Web extraction with Scrapy and Scrapyd

What is scrapy?

According to the maintainers at Zyte, Scrapy is

An open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.

What is Scrapyd?

According to the documentation,

Scrapyd is an application for deploying and running Scrapy spiders. It enables you to deploy (upload) your projects and control their spiders using a JSON API.

What is this project about?

I was looking for a decent Docker image for Scrapy. I wanted something

with the most recent version of Python
easy to update and maintain
based on Debian

This is it.

Build the Scrapy image

docker-compose -f crawling.yaml build scrapy

does the trick.

Configuration

If you are using RabbitMQ together with Scrapy, you need to provide values for RABBITMQ_DEFAULT_USER and RABBITMQ_DEFAULT_PASS. I find that an environment file is quite convenient here.

Spin the containers up

docker-compose -f crawling.yaml up -d starts the containers for Scrapy and RabbitMQ in detached mode.

Acknowledgements

I thank Itamar Turner-Trauring (@itamarst) for his articles at https://pythonspeed.com and Maximilian Schwarzmüller at ACADEMIND for his excellent course on Docker and Kubernetes.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
scrapy		scrapy
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
crawling.yaml		crawling.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web extraction with Scrapy and Scrapyd

What is scrapy?

What is Scrapyd?

What is this project about?

Build the Scrapy image

Configuration

Spin the containers up

Acknowledgements

About

Releases

Packages

Languages

License

klauswarzecha/scrapyd

Folders and files

Latest commit

History

Repository files navigation

Web extraction with Scrapy and Scrapyd

What is scrapy?

What is Scrapyd?

What is this project about?

Build the Scrapy image

Configuration

Spin the containers up

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages