Skip to content

craigslist-crawler made with the technologies available in Python and I used requests and y in it. This project is made in two functional stages and base class.

License

Notifications You must be signed in to change notification settings

rezamobaraki/craigslist-crawler-python

Repository files navigation

Craigslist Crawler Python

A Python-based web crawler for extracting real estate listings from Craigslist across multiple cities.

Features

  • Crawl Craigslist housing listings from multiple cities
  • Extract advertisement data including titles, prices, and details
  • Support for multiple storage backends (MongoDB, file storage)
  • Image downloading capabilities
  • Functional and object-oriented crawler implementations

Installation

  1. Clone the repository:
git clone https://github.com/rezamobaraki/craigslist-crawler-python.git
cd craigslist-crawler-python
  1. Install dependencies:
pip install -r requirements.txt

Usage

The crawler supports three main operations:

1. Find Links

Extract advertisement links from city pages:

python main.py find_links

2. Extract Page Data

Extract detailed data from advertisement pages:

python main.py extract_pages

3. Download Images

Download images from advertisements:

python main.py download_images

Configuration

Modify config.py to adjust:

  • Base URL patterns
  • Storage type (MongoDB or file storage)
  • Other crawler settings

Storage Options

  • MongoDB: Requires a running MongoDB instance
  • File Storage: Saves data to local JSON files

Author

Reza Mobaraki

License

This project is licensed under the MIT License - see the LICENCE.txt file for details.

About

craigslist-crawler made with the technologies available in Python and I used requests and y in it. This project is made in two functional stages and base class.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages