GitHub

Siddharth Hathi 2022

The Listing Scraper

Code Documentation still in progress

This repository is one part of a housing aggregator web service for students. Its purpose is to scrape rental listings across multiple public listing websites and store them in a mongodb database collection. It also contains additional functionality to review listings, cull expired/old ones, and to migrate the database schema when additional fields are added/removed.

Scrape

The Scrape submodule is where the centralized scraper lives. Running it will begin the process of scraping listings across all currently supported listing providers based on queries described in the MongoDB database.

Usage:

To install dependencies:

pip install -r requirements.txt

To run the scraper:

python3 -m app.Scrape

Cull

The Cull submodule is where the Culler lives. The Culler looks at every listing currently stored in the database and evaluates each one to determine whether it's still available to rent. It removes expired listings.

Usage:

To install dependencies:

pip install -r requirements.txt

To run the culler:

python3 -m app.Cull

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.vscode		.vscode
app		app
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
ipu.py		ipu.py
requirements.txt		requirements.txt
scrape.cron		scrape.cron

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vscode

.vscode

app

app

.DS_Store

.DS_Store

.gitignore

.gitignore

README.md

README.md

ipu.py

ipu.py

requirements.txt

requirements.txt

scrape.cron

scrape.cron

Repository files navigation

Siddharth Hathi 2022

The Listing Scraper

Scrape

Usage:

Cull

Usage:

About

Releases

Packages

Languages

sidHathi/listingScraper

Folders and files

Latest commit

History

Repository files navigation

Siddharth Hathi 2022

The Listing Scraper

Scrape

Usage:

Cull

Usage:

About

Resources

Stars

Watchers

Forks

Languages