Very simple scrapy scraper to get stackoverflow jobs using mongodb as store and pandas to enhance data.
Articles written related to this project:
- Scraping Stackoverflow Careers for Fun and Profit - Part 1. Started writing a parser. Showing how to parse udsing xpath expressions.
- Scraping Stackoverflow Careers for Fun and Profit - Part 2. Added mongodb support to the parser and saving the items to the database.
- Playing with Pandas - Part 3. Manipulating and enhancing the data using python pandas and jupyter notebooks.
The steps should be ran in the following order:
- run the scraper
- export from mongodb (export_from_mongo.py)
- merge exported jobs (merge_exported_jobs.py)
- enhance data (enhance_data_with_pandas.py)