## Data Offers Salary Transparency

In [2]:
import psycopg2
import pandas as pd
import matplotlib
import plotly
import plotly.plotly as py
import plotly.graph_objs as go

from nbstyler import DATA_STYLE as DS

plotly.offline.init_notebook_mode(connected=True)

%matplotlib notebook
%matplotlib inline

### Objectives

The main objective of this visualization is to illustrate the ratio of data related job offer ads that have a salary announced as a part of the total offer count.


### Methodology

A percent stacked area graph was chosen to illustrate the salary transparency ratio. In this type of chart the value of each group is normalized at each time stamp and presented as a percentage part of the whole, allowing the reader to compare the groups that compose the whole. In order to hide the noise from daily fluctuations the data has been aggregated in time period bins spanning one month.

### Data Preparation

First we need to fetch the data from the database. A predefined query will return the counts we need, but we need to drop the first and last week from the result since they contain partial data for the relevant periods. Finally, visually inspect a couple of lines to confirm that the data looks as expected.

In [3]:
conn = psycopg2.connect("dbname=jobsbg")
query = 'SELECT * FROM v_data_offers_salary_disclosure_counts_weekly'

data_df = pd.read_sql_query(query, conn, index_col='week_ts')
data_df = data_df[1:-1]  # first and last week are incomplete so we drop them.
data_df.index = pd.to_datetime(data_df.index)  # convert the DataFrame index to DatetimeIndex object
data_df.head(2)

Unnamed: 0_level_0,published_salary,unpublished_salary
week_ts,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-10-02,5,28
2017-10-09,0,15


In [4]:
publ_salary_ratio = data_df.apply(lambda row: row.published_salary / (row.published_salary + row.unpublished_salary) * 100, axis=1)
unpubl_salary_ratio = data_df.apply(lambda row: row.unpublished_salary / (row.published_salary + row.unpublished_salary) * 100, axis=1)

data_df = data_df.assign(publ_salary_ratio=publ_salary_ratio.values)
data_df = data_df.assign(unpubl_salary_ratio=unpubl_salary_ratio.values)
data_df.head(1)

Unnamed: 0_level_0,published_salary,unpublished_salary,publ_salary_ratio,unpubl_salary_ratio
week_ts,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2017-10-02,5,28,15.151515,84.848485


### Implementing the Chart in Plotly

Constructing the area traces for both possible values. They represent the data for the Plotly plot. Drawing directly as a line shape.

In [45]:
trace_publ = go.Scatter(
    x=data_df.index,
    y=data_df.publ_salary_ratio,
    text=data_df.published_salary,
    hoverinfo='text', 
    mode='lines',
    line=dict(
        width=1.5, 
        color=DS['colors']['acc1'],
        shape='linear',
    ),
    fillcolor=DS['colorramp']['acc1'][2],
    stackgroup='one',
    name='Published Salary',
)

trace_unpubl = go.Scatter(
    x=data_df.index,
    y=data_df.unpubl_salary_ratio,
    text=data_df.unpublished_salary,
    hoverinfo='text', 
    mode='lines',
    line=dict(
        width=0, 
        color=DS['colorramp']['acc2'][4],
    ),
    stackgroup='one',
    name='Unpublished Salary'
)

data = [trace_publ, trace_unpubl]

Preparing the layout for the Plotly plot. 

In [46]:
layout = go.Layout(
    paper_bgcolor=DS['colors']['bg1'],
    plot_bgcolor=DS['colors']['bg1'],
    title='Salary Transparency in Data Jobs',
    titlefont=DS['chart_fonts']['title'],
    font=DS['chart_fonts']['text'],
    legend=dict(orientation='h'),
    modebar=dict(orientation='h'),
    autosize=True,
    showlegend=True,
    hidesources=True,
    dragmode='zoom',
    hovermode='closest',
    hoverlabel=dict(
        bgcolor=DS['colors']['bg1']
    ),
    xaxis = dict(
        type='date', 
        fixedrange=True,
        hoverformat='Week %W, %Y',
        ticks='outside',
        tickmode='auto',
        zerolinecolor=DS['colors']['fg2'],
        ), 
    yaxis = dict(
        title='Total submissions distribution',
        fixedrange=True,
        ticks='outside',
        tickwidth=1,
        gridcolor=DS['colors']['bg3'],
        type='linear',
        range=[0,100],
        ticksuffix='%',
    ),
)

In [49]:
fig = go.Figure(data = data, layout = layout)
plotly.offline.iplot(fig, filename = 'data_jobs_salary_transparency')
# Uncomment the line below to export an HTML version of the chart.
# plotly.offline.plot(fig, filename='data_jobs_salary_transparency.html', show_link=False)

'file:///data/WORKSPACE/jpynb_Employment_Trends_Bulgaria/workbooks/data_jobs_salary_transparency.html'

In [48]:
from IPython.core.display import HTML
with open('../resources/styles/datum.css', 'r') as f:
    style = f.read()
HTML(style)