## Data Offers Heatmap

In [2]:
import itertools
import datetime as dt
import psycopg2
import pandas as pd
import plotly
import plotly.plotly as py
import plotly.graph_objs as go

plotly.offline.init_notebook_mode(connected=True) # run at the start of every ipython notebook to use plotly.offline

%matplotlib notebook
%matplotlib inline

### Initial dataset preparation

Get the data and prepare the `Pandas.DataFrame` object.

In [3]:
conn = psycopg2.connect('dbname=jobsbg')
daily_df = pd.read_sql_query('SELECT * FROM v_data_offers_count_daily', conn, index_col='subm_date')
conn.close()

daily_df.index = pd.to_datetime(daily_df.index)
daily_df.head(5)

Unnamed: 0_level_0,subm_count
subm_date,Unnamed: 1_level_1
2017-09-27,4
2017-09-28,4
2017-09-29,0
2017-09-30,0
2017-10-01,0


### Shape the dataframe

Since I want to shape the heatmap in a way where the most active day of the week (Monday) is on the bottom, and the other days of the week following on top, I have to provide `Plotly` with a list of list with each of the lists being the same day (e.g. Monday) for the whole time period. In order to do that I first filled in the missing dates to create a complete rectangular dataframe table which could then be reshaped.

_See also: https://stackoverflow.com/a/45850005_

In [4]:
min_ts = min(daily_df.index)
max_ts = max(daily_df.index)
idx = pd.date_range(
    min_ts - dt.timedelta(days=min_ts.weekday()),
    max_ts + dt.timedelta(days=6-max_ts.weekday()))

In [5]:
daily_df = daily_df.reindex(idx)
daily_df.head()

Unnamed: 0,subm_count
2017-09-25,
2017-09-26,
2017-09-27,4.0
2017-09-28,4.0
2017-09-29,0.0


Now the dataframe can be reshaped into columns representing the days of the week, and then transposed to completely match our heatmap structure. That will make composing the chart itself very easy.

In [6]:
matrix_df = pd.DataFrame(daily_df.values.reshape(len(daily_df)//7, 7), columns=daily_df.index[:7].strftime('%A'))
matrix_df = matrix_df.T
matrix_df.head(7)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,56,57,58,59,60,61,62,63,64,65
Monday,,3.0,4.0,4.0,1.0,7.0,4.0,4.0,3.0,5.0,...,9.0,10.0,8.0,12.0,11.0,17.0,4.0,9.0,7.0,0.0
Tuesday,,10.0,2.0,4.0,8.0,1.0,5.0,4.0,5.0,5.0,...,6.0,7.0,7.0,5.0,1.0,6.0,9.0,5.0,5.0,0.0
Wednesday,4.0,2.0,4.0,5.0,5.0,1.0,3.0,4.0,2.0,5.0,...,7.0,7.0,9.0,3.0,4.0,3.0,5.0,4.0,4.0,
Thursday,4.0,5.0,2.0,3.0,4.0,2.0,3.0,2.0,2.0,1.0,...,8.0,5.0,2.0,2.0,6.0,9.0,2.0,2.0,11.0,
Friday,0.0,4.0,1.0,6.0,4.0,3.0,3.0,4.0,8.0,5.0,...,9.0,3.0,7.0,9.0,6.0,3.0,3.0,7.0,7.0,
Saturday,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
Sunday,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,


### Build the heatmap

The heatmap is composed of three dimensions: x, y, and z. The z dimension will hold our values matrix. Let's compose that first:

In [7]:
vals = [
    [v for v in matrix_df.loc['Monday'].values],
    [v for v in matrix_df.loc['Tuesday'].values],
    [v for v in matrix_df.loc['Wednesday'].values],
    [v for v in matrix_df.loc['Thursday'].values],
    [v for v in matrix_df.loc['Friday'].values],
    [v for v in matrix_df.loc['Saturday'].values],
    [v for v in matrix_df.loc['Sunday'].values]
]

A manually defined color scale to match with the presentation style:

In [8]:
colorscale =[
    [0.0, '#faf0dc'], 
    [0.1111111111111111, '#f2e6ce'], 
    [0.2222222222222222, '#ecd5bb'], 
    [0.3333333333333333, '#e6c4a9'], 
    [0.4444444444444444, '#e1b396'], 
    [0.5555555555555556, '#dba284'], 
    [0.6666666666666666, '#d59171'],
    [0.7777777777777778, '#d0805f'], 
    [0.8888888888888888, '#ca6f4c'], 
    [1.0, '#c45e3a']
]

Finally, the heatmap trace definition:

In [9]:
z_dim_labels = [d.strftime('%d.%m.%Y') for d in daily_df.index]
z_dim_labels[:5]

['25.09.2017', '26.09.2017', '27.09.2017', '28.09.2017', '29.09.2017']

In [10]:
hm = go.Heatmap(
    x = [d for d in daily_df.index],
    y = [d for d in matrix_df.index],
    z = vals,
    hoverinfo = 'all',
    text = z_dim_labels,
    zhoverformat = '%d %m %y',
    colorscale=colorscale,
    xgap = 0,
    ygap = 0,
)  

In [11]:
data = [hm]

Styling the layout and the heatmap is ready.

In [12]:
layout = go.Layout(
    paper_bgcolor = '#FFFAF0',            
    plot_bgcolor = '#FFFAF0',
    title = 'Data Job Offers Heatmap',
    titlefont = dict(
        size = 22,
        color = '#594B42',
        family = 'Fira Sans Extra Condensed',
    ),
    font = dict(
        size = 12,
        color = '#594B42',
        family = 'Fira Sans Extra Condensed',
    ),
    xaxis = dict(
        ticks = '',
    ),
    yaxis = dict(
        title = 'Day of week',
        titlefont = dict(
            size = 14,
            color = '#594B42',
            family = 'Fira Sans Extra Condensed',
        ),
    ),
)

In [15]:
fig = go.Figure(data=data, layout=layout)

plotly.offline.iplot(fig, filename = 'data_jobs_subm_heatmap.html')

In [16]:
# Uncomment the line below to export an HTML version of the chart.
# plotly.offline.plot(fig, filename = 'data_offers_subm_heatmap.html')

'file:///data/WORKSPACE/jpynb_Employment_Trends_Bulgaria/workbooks/data_offers_subm_heatmap.html'

In [1]:
from IPython.core.display import HTML
with open('../resources/styles/datum.css', 'r') as f:
    style = f.read()
HTML(style)