## Historical Job Profiles Demand - Bar Chart

In [1]:
import datetime
import psycopg2
import pandas as pd
import ipywidgets as widgets
import plotly
import chart_studio.plotly as py
import plotly.graph_objs as go

from nbstyler import DATA_STYLE as DS

plotly.offline.init_notebook_mode(connected=True)

%matplotlib notebook
%matplotlib inline

### Objectives

Create a standard bar chart to gain insight into how the quantity of submitted data jobs is changing over time. Provide two separate time dimensions to aggregate the data on -- weekly and monthly. Provide a control to change between those two views.



### Data Preparation

Two separate queries were used to fetch prepared data from database views. A `Pandas.DataFrame` is constructed for each dataset and the first and last time periods for each dataset is dropped to remove incomplete time periods at the start and finish. This step also converts the `Pandas.DataFrame.Index` to `DateTimeIndex` to utilize the temporal methods that it exposes.

In [2]:
conn = psycopg2.connect("dbname=jobsbg")


# data analysis fs
da_montly = pd.read_sql_query("SELECT * FROM data_offers.do_count_monthly_by_title_kw('(data & (analyst | analysis | analytics | entry)) | ((анализатор | анализ | анализи | анализиране) & данни)')", 
                              conn, 
                              index_col='month_ts')
da_montly = da_montly[1:-1]
da_montly.index = pd.to_datetime(da_montly.index)


# business intelligence
bi_monthly = pd.read_sql_query("SELECT * FROM data_offers.do_count_monthly_by_title_kw('bi | ((business | data) & intelligence)')", 
                              conn, 
                              index_col='month_ts')
bi_monthly = bi_monthly[1:-1]
bi_monthly.index = pd.to_datetime(bi_monthly.index)


# data engineering
de_monthly = pd.read_sql_query("SELECT * FROM data_offers.do_count_monthly_by_title_kw('(data <-> (engineer | warehouse)) | dwh | etl')", 
                              conn, 
                              index_col='month_ts')
de_monthly = de_monthly[1:-1]
de_monthly.index = pd.to_datetime(de_monthly.index)

# data science
ds_monthly = pd.read_sql_query("SELECT * FROM data_offers.do_count_monthly_by_title_kw('data & (scientist | science)')", 
                              conn, 
                              index_col='month_ts')
ds_monthly = ds_monthly[1:-1]
ds_monthly.index = pd.to_datetime(ds_monthly.index)

In [3]:
print(f'Weekly dataframe head:{de_monthly.head(5)}')

Weekly dataframe head:            subm_count
month_ts              
2017-10-01          21
2017-11-01          15
2017-12-01          10
2018-01-01          18
2018-02-01          19


### Implementing the Chart in Plotly

Preparing two traces to represent our levels of aggregation for this chart. Which one will be visible will be controlled via buttons. The initial state is to show the `weekly_bar` trace by default.

In [4]:
da_bar = go.Bar(
    name='Data Analysis',
    x=[month for month in da_montly.index],
    y=[value for value in da_montly.subm_count],
    hoverinfo='y+name',
    showlegend=True,
    marker=dict(
        line=dict(
            width=1,
            color=DS['colorramp']['acc1'][-1]
        ),
        color=DS['colorramp']['acc1'][8],
        opacity=0.8,
    ),
    visible=True,
)

bi_bar = go.Bar(
    name='Business Intelligence',
    x=[month for month in bi_monthly.index],
    y=[value for value in bi_monthly.subm_count],
    hoverinfo='y+name',
    showlegend=True,
    marker=dict(
        line=dict(
            width=1,
            color=DS['colorramp']['acc1'][-1]
        ),
        color=DS['colorramp']['acc1'][6],
        opacity=0.8,
    ),
    visible=True,
)

de_bar = go.Bar(
    name='Data Engineering',
    x=[month for month in de_monthly.index],
    y=[value for value in de_monthly.subm_count],
    hoverinfo='y+name',
    showlegend=True,
    marker=dict(
        line=dict(
            width=1,
            color=DS['colorramp']['acc1'][-1]
        ),
        color=DS['colorramp']['acc1'][4],
        opacity=0.8,
    ),
    visible=True,
)

ds_bar = go.Bar(
    name='Data Science',
    x=[month for month in ds_monthly.index],
    y=[value for value in ds_monthly.subm_count],
    hoverinfo='y+name',
    showlegend=True,
    marker=dict(
        line=dict(
            width=1,
            color=DS['colorramp']['acc1'][-1]
        ),
        color=DS['colorramp']['acc1'][2],
        opacity=0.8,
    ),
    visible=True,
)

data = [da_bar, bi_bar, de_bar, ds_bar]

 Plotly `updatemenus` defines buttons and interactions to make the chart interactive. Setting different hoverformats and titles for the different views.

Preparing the figure layout:

In [5]:
layout = go.Layout(
    paper_bgcolor=DS['colors']['bg1'],
    plot_bgcolor=DS['colors']['bg1'],
    title='Monthly Data Jobs by Job Profile',
    titlefont=DS['chart_fonts']['title'],
    font=DS['chart_fonts']['text'],
    autosize=True,
    showlegend=True,
    hidesources=True,
    legend=dict(orientation='h'),
    modebar=dict(orientation='h'),
    xaxis=dict(
        type='date',
        fixedrange=True,
        hoverformat='Week %W, %Y',
        ticks='outside',
        tickmode='auto',
        zerolinecolor=DS['colors']['fg2'],
    ),
    yaxis=dict(
        title='Number of submissions',
        fixedrange=True,
        ticks='outside',
        tickwidth=1,
        gridcolor=DS['colors']['bg3'],
    ),
    barmode='stack'
)

def myfunc():
    lalala
    return a

In [6]:
fig = go.Figure(data=data, layout=layout)
plotly.offline.iplot(fig, filename='data_offer_profiles_stacked_bar.html')

In [7]:
# Uncomment the line below to export an HTML version of the chart.
plotly.offline.plot(fig, filename = 'data_offer_profiles_stacked_bar.html', show_link=False)

'data_offer_profiles_stacked_bar.html'

In [8]:
from IPython.core.display import HTML
with open('../resources/styles/datum.css', 'r') as f:
    style = f.read()
HTML(style)