# Heatmaps
<img src="img/heatmap-logo.png" style="float: right; padding-left: 1em;"></img>

A heatmap is a graphical representation of data where the individual values contained in a matrix are represented as colors. Be sure to normalise your data, and choose a relevant colour palette.

Heatmaps are good for showing variance across multiple variables, revealing any patterns, displaying whether any variables are similar to each other, and for detecting if any correlations exist in-between them.

See [10 Heatmaps 10 Python Libraries](https://blog.algorexhealth.com/2017/09/10-heatmaps-10-python-libraries/) for a complete tour through many frameworks.

In [None]:
import chromedriver_binary
import numpy as np
import pandas as pd
import altair as alt

# Enable Altair for notebooks (not needed for JupyterLab)
_ = alt.renderers.enable('notebook')

## ‘Deployed Versions’ Heatmap with Altair

This creates a heatmap over the versions of packages different teams have deployed, normalized to 100%. That allows insights regarding the version spectrum a team has to maintain, and how much of their inventory is on the ‘newer’ side.

First, we read the list of installed packages into `data`.

In [None]:
from dfply import *

raw_data = pd.read_csv("../data/cmdb-packages.csv", sep=',')
print('♯ of Records: {}\n'.format(len(raw_data)))

data = (raw_data
    >> mutate(Version=X['Installed version'].str.split('-', 1, expand=True)[0])
    >> drop(X.CMDB_Id, X['Last seen'], X['Last modified'], X['Installed version'])
)
print(data.head(6).transpose())

The `tvc` dataframe holds counts per team and version, including a normalized `Percent` column.

In [None]:
tvc = data.groupby(['Team', 'Version']).count()
tvc = tvc.index.to_frame().assign(Counts=tvc.iloc[:, 0])
tvc = tvc.reset_index(drop=True)

tvsum = tvc.groupby('Team').sum()
tvc['Percent'] = 0
tvc = tvc.assign(Percent=[int(.5 + 100 * row['Counts'] / tvsum.loc[row['Team'], 'Counts'])
                          for idx, row in tvc.iterrows()])

print(tvc.head(6).transpose())

**TODO** Explain:

* version order
* color palette
* text and rect marks and overlay by `+`
* `rangeStep` for width control
* `properties()` for global chart settings

In [None]:
versions_ordered = list(sorted(set(tvc['Version']), 
                               key=lambda x, _r=re.compile('[-.]'): tuple(map(int, _r.split(x))), 
                               reverse=True))

base = alt.Chart(tvc)
rect = base.mark_rect(size=30).encode(
    alt.X('Version:O', sort=versions_ordered),
    y='Team:O',
    color=alt.Color('Percent:Q', scale=alt.Scale(
        domain=list(range(101)),
        range=['#' + ('%02x' % ((100 - x) * 200 // 100) * 3) for x in range(101)],
    ))
)
text = base.mark_text(baseline='middle', align='center', color='#fff', size=10, fontWeight=600).encode(
    alt.X('Version:O', scale=alt.Scale(rangeStep=25)),
    y='Team:O',
    text='Percent:Q'
)

chart = rect + text
chart = chart.properties(background='#f0f0f0', 
                         title='Version Distribution by Team')
#render_chart(chart, 'installs_by_team_heatmap', scale_factor=0 or 1350 / 520)
chart