# Data Jobs Analysis




## Contents

1. Filtering of the relevant offers.
2. General stats for the result.
3. Compare selected job titles (BI, ETL, Data Engineer)
4. Identify Requirements
5. Identify Most Used Technologies
6. Identify Salary Trends
7. Identify Location Trends
8. Identify Major Companies


## Filter Relevant Offers

Let's say we are interested in job offers in the field of data analysis, data integration, data wrangling, etc. The first step in the process is to correctly filter the offers that are interesting for our research. That was done using a targeted query in the offer titles, and subsequent searches of selected keywords in the contents of the offers to identify additional strings to look for in the titles. A list of the keywords used in the final filter is given below:

```
'bi( |$)'
'(data|business) intelligence'
'etl( |$)'
'data analy(st|tics|sis)'
'анализ.*данни'
'data (engineer|scientist|warehouse)'
'reporting (analyst|specialist)'
'tableau'
'clikview'
```

## General Statistics

Having identified and verified our set of targeted offers, we can provide some general stats and historical trends.

In [6]:
import psycopg2
import pandas as pd

%matplotlib notebook
%matplotlib inline

In [7]:
conn = psycopg2.connect("dbname=jobsbg")
datajobs_df = pd.read_sql_query('SELECT job_id, subm_date FROM v_full_data_offers_history', conn, index_col='subm_date')

print(f'The total amount of offers that match the selected criteria is {len(datajobs_df.index)}.')
print(f'The first matching record is from {min(datajobs_df.index)}, and last matching record is from {max(datajobs_df.index)}')

The total amount of offers that match the selected criteria is 1601.
The first matching record is from 2017-09-27, and last matching record is from 2018-12-21


Let's see how that number accumulated over time. 

In [35]:
%%HTML
<iframe width="100%" height="525px" src="./data_offers_over_time.html"></iframe>

What are the salaries for our targets.

## Top Job Titles 

Let's compare the different job titles that employers need.

## Top Used Technologies

Use the requirements to produce a list of top technologies, then look in the offers' contents for them and provide a summary.

In [36]:
from IPython.core.display import HTML
with open('../resources/styles/datum.css', 'r') as f:
    style = f.read()
HTML(style)