A Python-based web crawler for extracting real estate listings from Craigslist across multiple cities.
- Crawl Craigslist housing listings from multiple cities
- Extract advertisement data including titles, prices, and details
- Support for multiple storage backends (MongoDB, file storage)
- Image downloading capabilities
- Functional and object-oriented crawler implementations
- Clone the repository:
git clone https://github.com/rezamobaraki/craigslist-crawler-python.git
cd craigslist-crawler-python- Install dependencies:
pip install -r requirements.txtThe crawler supports three main operations:
Extract advertisement links from city pages:
python main.py find_linksExtract detailed data from advertisement pages:
python main.py extract_pagesDownload images from advertisements:
python main.py download_imagesModify config.py to adjust:
- Base URL patterns
- Storage type (MongoDB or file storage)
- Other crawler settings
- MongoDB: Requires a running MongoDB instance
- File Storage: Saves data to local JSON files
Reza Mobaraki
- GitHub: @rezamobaraki
- LinkedIn: reza-mobaraki
This project is licensed under the MIT License - see the LICENCE.txt file for details.