## Salary Transparency Line Chart

In [1]:
import psycopg2
import pandas as pd
import matplotlib
import plotly
import chart_studio.plotly as py
import plotly.graph_objs as go

from nbstyler import DATA_STYLE as DS

plotly.offline.init_notebook_mode(connected=True)

%matplotlib notebook
%matplotlib inline

### Objectives

The main objective of this visualization is to illustrate the ratio of offers that have a salary announced.


### Methodology

A percent stacked area graph was chosen to illustrate the salary transparency ratio. In this type of chart the value of each group is normalized at each time stamp and presented as a percentage part of the whole, allowing the reader to compare the groups that compose the whole. In order to hide the noise from daily fluctuations the data has been aggregated in time period bins spanning one month.

### Data Preparation

First we need to fetch the data from the database. A predefined query will return the counts we need, but we need to drop the first and last week from the result since they contain partial data for the relevant periods. Finally, visually inspect a couple of lines to confirm that the data looks as expected.

In [2]:
conn = psycopg2.connect("dbname=jobsbg")
query = 'SELECT * FROM all_offers.ao_publ_salary_count_weekly'

# All jobs
all_df = pd.read_sql_query(query, conn, index_col='week_ts')
all_df = all_df[1:-1]  # first and last week are incomplete so we drop them.
all_df.index = pd.to_datetime(all_df.index)  # convert the DataFrame index to DatetimeIndex object
all_df.head(2)

Unnamed: 0_level_0,published_salary,unpublished_salary
week_ts,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-10-02,2265,8017
2017-10-09,2179,7322


In [3]:
all_publ_salary_ratio = all_df.apply(
    lambda row: row.published_salary / (row.published_salary + row.unpublished_salary) * 100, axis=1)
all_unpubl_salary_ratio = all_df.apply(
    lambda row: row.unpublished_salary / (row.published_salary + row.unpublished_salary) * 100, axis=1)

all_df = all_df.assign(all_publ_salary_ratio=all_publ_salary_ratio.values)
all_df = all_df.assign(all_unpubl_salary_ratio=all_unpubl_salary_ratio.values)

all_df.head(2)

Unnamed: 0_level_0,published_salary,unpublished_salary,all_publ_salary_ratio,all_unpubl_salary_ratio
week_ts,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2017-10-02,2265,8017,22.028788,77.971212
2017-10-09,2179,7322,22.934428,77.065572


### Implementing the Chart in Plotly

Constructing the area traces for both possible values. They represent the data for the Plotly plot. Drawing directly as a line shape.

In [4]:
trace_publ_all = go.Scatter(
    x=all_df.index,
    y=all_df.all_publ_salary_ratio,
    text=all_df.published_salary,
    hoverinfo='x+y+text', 
    mode='lines',
    line=dict(
        width=1.5,
        color=DS['colors']['acc1'],
    ),
    fillcolor=DS['colorramp']['acc1'][2],
    stackgroup='two',
    name='Published Salary'
)

data = [trace_publ_all]

Preparing the layout for the Plotly plot. 

In [14]:
layout = go.Layout(
    paper_bgcolor=DS['colors']['bg1'],
    plot_bgcolor=DS['colors']['bg1'],
    title='Salary Transparency',
    titlefont=DS['chart_fonts']['title'],
    font=DS['chart_fonts']['text'],
    legend=dict(orientation='h'),
    modebar=dict(orientation='h'),
    autosize=True,
    showlegend=True,
    hidesources=True,
    dragmode='zoom',
    hovermode='closest',
    hoverlabel=dict(
        bgcolor=DS['colors']['bg1']
    ),
    xaxis = dict(
        type='date', 
        fixedrange=True,
        hoverformat='Week %W, %Y',
        ticks='outside',
        tickmode='auto',
        gridcolor=DS['colors']['bg3'],

        ), 
    yaxis = dict(
        title='Total submissions distribution',
        fixedrange=True,
        ticks='outside',
        tickwidth=1,
        gridcolor=DS['colors']['bg3'],
        type='linear',
        range=[0,100],
        ticksuffix='%',
    ),
)

In [15]:
fig = go.Figure(data = data, layout = layout)

plotly.offline.iplot(fig, filename = 'all_offers_weekly_salary_transparency_line.html')

In [16]:
# Uncomment the line below to export an HTML version of the chart.
fig.write_html(file='all_offers_weekly_salary_transparency_line.html',  full_html=True)

In [17]:
from IPython.core.display import HTML
with open('../resources/styles/datum.css', 'r') as f:
    style = f.read()
HTML(style)