Very simple scrapy scraper to get stackoverflow jobs
HTML Jupyter Notebook Python
Latest commit 6879ea1 Jan 26, 2017 @gyurisc updating notebooks
Permalink
Failed to load latest commit information.
data
notebooks
stackjobs changing download delay from 3 seconds to 750ms Nov 9, 2016
.gitignore
LICENSE
README.md
clean_data.py
create_technology_data.py
enhance_data_with_pandas.py
export_from_mongo.py
merge_exported_jobs.py
scrapy.cfg fixing project folder Jul 25, 2016

README.md

stackjobs

Very simple scrapy scraper to get stackoverflow jobs using mongodb as store and pandas to enhance data.

Articles written related to this project:

Workflow steps

The steps should be ran in the following order:

  • run the scraper
  • export from mongodb (export_from_mongo.py)
  • merge exported jobs (merge_exported_jobs.py)
  • enhance data (enhance_data_with_pandas.py)