Recently, I needed to make a heatmap with a dendrogram for work.  The only libraries that I could find with that particular template were [seaborn](https://stanford.edu/~mwaskom/software/seaborn/) or [plotly](https://plot.ly/). However, I really like plotting with [bokeh](https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=bokeh%20pyton), and after stumbling upon [this](http://stackoverflow.com/questions/23578753/how-to-make-a-cluster-style-dendrogram-in-bokeh) StackOverflow question, it seemed like no code was available.  The more I program the more I find myself prefering to code graphs myself, so I decided to forgo using a high-level template and instead make my own. Here is a heatmap clustering the MLB teams by batting statistics.  
<!-- TEASER_END -->
First import the standards...

In [1]:
import pandas as pd
import numpy as np

Load team colors and batting data downloaded from [baseball reference](http://www.baseball-reference.com/).  

In [2]:
colors_df = pd.read_csv('files/colors.csv', index_col=0)

df = pd.read_csv('files/2016_teams_standard_batting_8_31_16.csv', index_col=0)

colors_df = colors_df.loc[df.index]

Scale each batting stat to [0,1] so I can use color intensity to compare teams.  Do a little housekeeping on the colors data as well.  

In [3]:
df_std = (df - df.min(axis=0)) / (df.max(axis=0) - df.min(axis=0))
df_scaled = df_std * (1.0 - 0.0) + 0.0

names = colors_df['Teams']
colors_df.drop(['Teams', 'Abbrv.1'], axis=1, inplace=True)

Write a function that will generate a random color from a team's color palette.  

In [4]:
def color_gen(tm):
    """ function to generate a random color from a given team thats not white"""
    from random import randint
    color = '#FFFFFF'
    while color == '#FFFFFF':
        color = colors_df.loc[tm][randint(0, len(colors_df.loc[tm].dropna())-1)]
    return color

[Partial](https://docs.python.org/2/library/functools.html) is a pretty useful function for "loading" a bunch of functions each with different parameters.  Here I use partial to create a dictionary where the keys are team names and the value is a `color_gem` function with that particular team already passed as an argument.  

In [5]:
from functools import partial
color_map = dict([(tm, partial(color_gen, tm)) for tm in colors_df.index.tolist()])

I create the dendrogram using the [SciPy's dendrogram](http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.cluster.hierarchy.dendrogram.html) function.  The function takes a linkage matrix `Z` and returns a dictionary of objects; two of which are the horiztonal line and vertical line coordinates of the dendorogram.  A third object, `ivl`, is a list of a labels mapping the original order of the teams, to their order in the dendorgram (from top to bottom).

In [6]:
from sklearn.metrics.pairwise import pairwise_distances
from scipy.cluster.hierarchy import linkage, dendrogram

X = pairwise_distances(df.values, metric='euclidean')
Z = linkage(X, 'ward')
results = dendrogram(Z, no_plot=True)
icoord, dcoord = results['icoord'], results['dcoord']
labels = list(map(int, results['ivl']))
df = df.iloc[labels]
df_scaled = df_scaled.iloc[labels]

  """


I reshape the data a bit to make it work with bokeh's tooltips, a hover tool that allows you to see the data associated with a certain object on the plot.  

In [7]:
tms = []
batting = []
xs = []
ys = []
colors = []
alpha = []
value = []

for i, tm in enumerate(df.index):
    tms = tms + [tm]*len(df.columns)
    batting = batting + df.columns.tolist()
    xs = xs + list(np.arange(0.5, len(df.columns)+0.5))
    ys = ys + [i+0.5]*len(df.columns)
    colors = colors + [color_map[tm]() for i in range(len(df.columns))]
    value = value + df.loc[tm].tolist()
    alpha = alpha + df_scaled.loc[tm].tolist()

The rest is just plotting.

In [8]:
from bokeh.plotting import figure, output_file
from bokeh.models.sources import ColumnDataSource
from bokeh.models import HoverTool

data = pd.DataFrame(dict(
        tms=tms,
        batting=batting,
        xs=xs,
        ys=ys,
        colors=colors,
        value=value,
        alpha=alpha
              ))

source = ColumnDataSource(data)

hover = HoverTool()

hover.tooltips = [
    ("Team", "@tms"),
    ("Batting Stat", ("@batting: @value")),
    ("Value", "@value")
]

height, width = df.shape

icoord = pd.DataFrame(icoord)
icoord = icoord * (data['ys'].max() / icoord.max().max())
icoord = icoord.values

dcoord = pd.DataFrame(dcoord)
dcoord = dcoord * (data['xs'].max() / dcoord.max().max())
dcoord = dcoord.values

hm = figure(x_range=[-40, 40],
            height=600,
            width=600,
            tools=[]
)

for i, d in zip(icoord, dcoord):
    d = list(map(lambda x: -x, d))
    hm.line(x=d, y=i, line_color='black')

hm.add_tools(hover)

hm.rect(x='xs', y='ys',
        height=1,
        width=1,
        fill_color='colors',
        line_color='black',
        source=source,
        line_alpha=0.2,
        fill_alpha='alpha'        
        )

hm.text([data['xs'].max()+0.51] * len(data['tms'].unique()), 
        data['ys'].unique().tolist(), 
        text=[nm for nm in names.iloc[labels]],
        text_baseline='middle',
        text_font_size='6pt'     
       )

hm.axis.major_tick_line_color = None
hm.axis.minor_tick_line_color = None
hm.axis.major_label_text_color = None
hm.axis.major_label_text_font_size = '0pt'
hm.axis.axis_line_color = None
hm.grid.grid_line_color = None
hm.outline_line_color = None

In [9]:
from bokeh.plotting import output_notebook, show
output_notebook()

In [10]:
show(hm)

In the heatmap, MLB leaders have higher color intensities.  It looks like there are two distinct clusters with the bottom half of the heatmap having much more faded or white color spots.  Not surprisingly, a majority of the division leaders (sans the Dodgers) are in the top half of the map.

All the code and data are available on my github page: https://github.com/russodanielp/bokeh_plots.