Distributed Crawling with RMQ


This is a showcase for a distributed website crawler using one producer and multiple workers. The basic idea is to use RMQ's worker queue to schedule tasks for scraping/parsing many pages by running multiple workers simultaneously.


The input for this project is the UK Area Codes website:


The worker crawlers scrape the passed url's passed by the producer and parse the city/town name along with their area codes. The workers run in parallel.


To run the code you need to setup RabbitMQ and also install pika, requests and BeautifulSoup Python libraries.