Historical Job Postings API+Example
Repository for the statistics API and an example app for the dataset located at http://jobtechdev.se/assets/historical-job-postings.
- Anaconda 3.x
- Apache Spark
(Python + Spark)
Calculates sparse vectors for occurences of all words/phrases used more than 20 times (~500k) across all job postings. These are saved to a gzipped pickle. Note: This will take several hours if run on a local machine.
(Python + flask)
Loads computed sparse vectors into memory. These are used as in-memory bitmaps for statistics calculations. Has a simple interpreter for a query syntax using bit operations (see notebook for examples). For a specific query, outputs sums for various variables such as months, years, employers and occupations.
Visualizes some of the output from the API.