# Job.search()
* [The App](#app)
* [Historical Predictions](#historical)
* [Data Exploration](#exploration)

If you are a recent graduate, or someone who is switching fields, knowing where to apply for jobs is daunting without domain knowledge or connections. How do you know where your best options are? Are you looking in the right places? Job.search() is a job prediction engine that allows a user to find out which US states is most suitable given job title keywords.

In [1]:
from utils import *

## The App <a class="anchor" id="app"></a>

A user can input any number of job title/description keywords and the model will return the probability that the given set of keywords is seen in each state. The red plot shows the unweighted predictions for given keywords, which is representative of the **volume** of matching job posts, while the blue plot shows a prediction normalized for state population, representative of the relative **demand** of matching job posts.

The model is a multinomial naive-bayes classifier that was iteratively trained on over 84 million job posts over a span of one and a half years.

In [2]:
text = [input('job keywords: ')]

job keywords: data scientist


In [3]:
make_prediction(text)

## Historical Predictions <a class="anchor" id="historical"></a>

The user can see the model's prediction over the historical data how the demand for particular job keywords may be stable or volatile.

In [4]:
make_weekly_prediction(text)

## Data Exploration <a class="anchor" id="exploration"></a>

Job data was taken from The Data Incubator and Thinknum consisting of 175 GB of job posts from companies listed on the NYSE and NASDAQ. The job post data was then augmented with US census data to normalize for state populations.

By simply looking at the total number of job posts integrated across the US, we find a regular pattern in post volume, while the overall behavior is relatively flat.

In [2]:
IFrame('plots/total_posts_over_time.html', width=1000, height=500)

Aggregating the mean posts per day, we find that the periodic behavior is weekly due to low post volumes over the weekend. Moving forward, I choose to aggregate over weeks to smooth out this behavior.

In [3]:
IFrame('plots/weekly_distribution.html', width=1000, height=500)

A simple lookat the total posts per state does not yield anything particularly unique, California and Texas are the most populous states, and likewise have the most job posts.

In [4]:
IFrame('plots/bar_posts_by_state.html', width=1000, height=500)

Normalizing for population shows a very different story, Washington DC (using DC metro area) and Washington state have the greatest job posts per capita.

In [5]:
IFrame(src='plots/bar_posts_per_capita.html', width=1000, height=500)

In [7]:
IFrame(src='plots/usa_total_posts.html', width=1000, height=500)

In [8]:
IFrame(src='plots/usa_per_capita.html', width=1000, height=500)

By using an online SVD algorithm in the gensim package, job keywords were grouped into "sectors" of industry.

In [9]:
IFrame(src='plots/sectors.html', width=1000, height=500)