# Interactive visualization
## Data Loading
### Statistical GIS Boundary Files dataset
In the section, we will use ```geopandas``` to load the [Statistical GIS Boundary Files for London](https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london). Even if this dataset was made avialable in 2011, we should note that the boundaries haven't really changed compared to 2015 (date of Tesco Dataset) and it should not impact our visualisation.

For each aggregation level ```LSOA```, ```MSOA```, ```WARD```, ```BOROUGH``` we read the correponding file and retreive the following subset of columns: 

* ```area_id ```: the id of the area considered
* ```name    ```: the name of the area considered
* ```geometry```: the geometric shape of the area in the 2D london map (polygon or multipolygon if the area is definied in multiple pieces)

In [5]:
import pandas as pd
import geopandas as gpd
import numpy as np
#data path
data_path = 'data/statistical-gis-boundaries-london/ESRI/'
#read geopandas df and selected subset of columns
gdf_lsoa = gpd.read_file(data_path + 'LSOA_2011_London_gen_MHW.shp' )[['LSOA11CD','LSOA11NM','geometry']]\
                .rename(columns={'LSOA11CD':'area_id','LSOA11NM':'name'})
gdf_msoa = gpd.read_file(data_path + 'MSOA_2011_London_gen_MHW.shp' )[['MSOA11CD','MSOA11NM','geometry']]\
                .rename(columns={'MSOA11CD':'area_id','MSOA11NM':'name'})
gdf_ward = gpd.read_file(data_path + 'London_Ward_CityMerged.shp')[['GSS_CODE','NAME','geometry']]\
               .rename(columns={'GSS_CODE':'area_id','NAME':'name'})
gdf_borough = gpd.read_file(data_path + 'London_Borough_Excluding_MHW.shp' )[['GSS_CODE','NAME','geometry']]\
               .rename(columns={'GSS_CODE':'area_id','NAME':'name'})
#store them into a dictionary
gdf = {'lsoa':gdf_lsoa,'msoa':gdf_msoa,'osward':gdf_ward,'borough':gdf_borough}
gdf_lsoa.head()

Unnamed: 0,area_id,name,geometry
0,E01000001,City of London 001A,"POLYGON ((532105.092 182011.230, 532162.491 18..."
1,E01000002,City of London 001B,"POLYGON ((532746.813 181786.891, 532671.688 18..."
2,E01000003,City of London 001C,"POLYGON ((532135.145 182198.119, 532158.250 18..."
3,E01000005,City of London 001E,"POLYGON ((533807.946 180767.770, 533649.063 18..."
4,E01000006,Barking and Dagenham 016A,"POLYGON ((545122.049 184314.931, 545271.917 18..."


### Tesco data set

We can proceed further by reading the tesco dataset (output of ```cluster.jpnb```). This dataset contains a subset of the orginal tesco dataset containing some of typical product features:

> ```fat,saturate,sugar,protein,carb,fibre,energy_tot,h_nutriments_calories```

Moreover, it stores these features for each combinaison of 

 * ```aggretation level``` : lsoa, msoa, ward, borough 
 * ```period```            : January, February,..., December as well as the yearly aggregation (Year)

The following cell aims at constructing a efficient indexing data structure that allows to easily retrieve the typical product for a given aggregation level and period in order to speed up the refreshing time of the visualization. We used a dictionary for this purpose.

In [6]:
tesco = pd.read_csv('data/tesco.csv')
tesco.head()

Unnamed: 0,area_id,fat,saturate,sugar,protein,carb,fibre,energy_tot,h_nutrients_calories,month,agg_level
0,E09000001,8.472985,3.361599,9.278065,5.253333,15.779639,1.61985,165.851751,1.618208,yea,borough
1,E09000002,9.209959,3.596834,10.793244,5.193872,19.784988,1.590335,187.17439,1.545272,yea,borough
2,E09000003,8.594464,3.407353,9.530548,5.129627,17.02595,1.638639,170.655504,1.581507,yea,borough
3,E09000004,9.11918,3.466346,10.941085,5.304496,19.997105,1.657118,187.754791,1.551703,yea,borough
4,E09000005,8.962466,3.559913,10.14861,5.132915,18.726476,1.585978,180.510586,1.555736,yea,borough


In [7]:
def create_datasource(tesco):
    periods          = list(tesco.month.unique())
    agg_levels       = list(tesco.agg_level.unique())
    feature_names    = [c for c in tesco.columns if c not in ['area_id','month','agg_level']]

    tesco_dict = dict()
    for level in agg_levels:
        inner = dict()
        for mo in periods:
            df_mo_level = tesco.query("agg_level == @level and month == @mo")
            inner[mo]   = gdf[level].merge(df_mo_level, on='area_id', how = 'left').fillna('No data')
        tesco_dict[level] = inner
    return tesco_dict,periods,agg_levels,feature_names

tesco_dict,periods,agg_levels,feature_names = create_datasource(tesco)

## Convert data into JSON format for Bokeh

The bokeh library we are going to use to make the interactive visualiastion requires the data to be encoded in a JSON format. Given a ```agg_level``` and a ```month``` the following function will performs this encoding.

In [8]:
import json
#Convert data to json
def json_data(agg_level,month,feature=None,data=tesco_dict):
    mo = data[agg_level][month]
    merged_json = json.loads(mo.to_json())
    json_data = json.dumps(merged_json)
    return json_data

# Visualisation creation

Since we want to create an interactive visualisation need to store the current state of the application. We chose to simply record in a dict name ```state```:

1. ```agg_level```: the aggregation level 
2. ```feature  ```: the feature of the typical product (or custering) that 
3. ```month    ```: Jan, Feb,..., Dec as well as the yearly aggregation (Year)

## Color Bar helper functions
In this first section we will define several helper functions that will be used by the visualisation in the last cells of this notebook. We will focus in creating a ```color bar``` for the visualisation. It should obsivously depend on the feature (will define de range of the color bar).

In [13]:
def get_feature(state:dict,tesco_dict=tesco_dict):
    """
    Returns the pd.Series associated to state where the missing values are 
    filtered and the type is converted from string to double
    """
    #get the data from the aggregation
    data = tesco_dict[state['agg_level']][state['month']][state['feature']]
    data = data[data != 'No data']  # filter missing values
    data = data.astype(np.double)   # convert into double
    out  = data if len(data)>0 else None
    return out

def create_color_mapper(state:dict,n_default_colors=8,nan_color='#d9d9d9'):
    """
    This function create the color mapper used in the colorbar for the given state
    """
    feature  = get_feature(state) # Select feature
    n_colors = n_default_colors
    palette  = brewer['YlGnBu'][n_colors]           # Reverse : dark blue is for highest values.
    palette  = palette[::-1]                        # Create a linear color mapper with right range
    low  = 0 if feature is None else feature.quantile(0.05)
    high = 1 if feature is None else feature.quantile(0.95)
    color_mapper = LinearColorMapper(
            palette = palette, nan_color = nan_color,   # nan color
            low = low, high = high)                     # right range
    return color_mapper

def create_color_bar(state:dict,plot,geosource):
    """
    Creates the color bar given the state and geosource and link it to the plot
    """
    color_mapper = create_color_mapper(state)       # use previous function to create the mapper
    color_bar = ColorBar(color_mapper=color_mapper, # set the mapper
        label_standoff=8,width = 500, height = 20,  # specify size 
        border_line_color=None,                     # style : no border lines
        location = (0,0), orientation ='horizontal')# horizontal bar 
    # link color bar and plot 
    plot.patches('xs','ys', source = geosource,     # link geosource
        fill_color = {'field' :state['feature'],    # color related to selected feature
                      'transform':color_mapper},    # use the defined colormapper
        line_color = 'black',                       # style : black borders
        line_width = 0.25, fill_alpha = 1)          # more styling
    plot.add_layout(color_bar, 'below')             # add color bar below plot
    return color_bar

## Handling events : user clicks and changes
The following method will handle the interactivity in our plot. When the user select a new ```month```, ```agg_level``` or ```feature```, we need to update:

1. The title of the plot
2. The color mapper of the plot since the range of value has changed. We chose to only update it when we change the feature. In that way it is still possible to compare the results accross months
3. the hover 

In [14]:
def update_plot(state_field:str, new_val:str,state:dict,plot,geosource,color_bar):
    """
    Handles interactivity in the visualization
    @param state_field : (str) element of the state that needs to be updated
    @param new_val     : (str) new value of state_field element 
    """
    state[state_field] = new_val  # state update
    # update title
    plot.title.text = "{feature} consumption for {agg_level} during the prediod {month}".format(**state)
    if state_field == 'feature' or state_field == 'agg_level':                    # update color mapper
        color_mapper = create_color_mapper(state)   # create new mapper
        color_bar.color_mapper = color_mapper       # set the new mapper in color bar
        plot.patches('xs','ys',                     # update patches
            source = geosource,
            fill_color = {'field' :state['feature'], 'transform' : color_mapper},
            line_color = 'black', line_width = 0.25, fill_alpha = 1)
    new_data = json_data(**state)                   # get the new data and convert to JSON
    geosource.geojson = new_data                    # set new data to trigger recoloring event
    # new hover for the new feature 
    hover = HoverTool(tooltips = [('Area name','@name'),(state['feature'], '@'+state['feature'])])
    # add hover tools to plot
    plot.tools = [hover]

### Creation of the actual figure and components
Here is are the actual components defined. We created a ```bkapp``` (bokeh application) function that given the ```doc``` (bokeh document), creates the components :

1. ```plot          ```: map of colors for the selected areas
2. ```btn_period    ```: button to select the period
3. ```btn_agg_level ```: button to select the aggregaton level
4. ```select_feature```: button to select the feature of typical product to display (or clustering)

it then adds the respective event handers and links the components to the ```doc```

In [23]:
import yaml
from bokeh.layouts import column
from bokeh.models import ColumnDataSource, Slider,HoverTool
from bokeh.plotting import figure
from bokeh.themes import Theme
from bokeh.io import show, output_notebook
from bokeh.models.widgets import Button, RadioButtonGroup, Select, Slider
from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature
from bokeh.io import output_notebook, show, output_file,save
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar,CategoricalColorMapper
from bokeh.palettes import brewer,Category10
from bokeh.resources import CDN
from bokeh.embed import file_html

# output notebook to allow inline jupyter interaction
output_notebook()

def bkapp(doc):
    # default state when the component is loaded for the first time
    state = {'month':'yea','agg_level':'borough','feature':'fat'}
    geosource = GeoJSONDataSource(geojson = json_data(**state))
    # Hover that will show the current feature value for the hovered area + its name
    hover = HoverTool(tooltips = [ ('Area name','@name'),(state['feature'], '@'+state['feature'])])
    # Create figure for the plot
    plot = figure(title = "{feature} consumption for {agg_level} during the prediod {month}".format(**state), 
            plot_height = 800 , plot_width = 950, toolbar_location = None,
            tools = [hover])
    # Remover axis for nicer layout
    plot.xgrid.grid_line_color = None
    plot.ygrid.grid_line_color = None
    plot.axis.visible = False
    # Add the color bar
    color_bar = create_color_bar(state,plot,geosource)    
    #create graphical components for the user to interact
    btn_period     = RadioButtonGroup(labels=periods, active=0)       # selection aggregation level
    btn_agg_level  = RadioButtonGroup(labels=agg_levels, active=0)    # selection aggregation level
    select_feature = Select(title="Typical product feature:", 
                            value=feature_names[0], options=feature_names)
    #add event handler
    local_update_plot = lambda st,v: update_plot(st, v,state,plot,geosource,color_bar)
    btn_period.on_click(              lambda new         : local_update_plot('month'    ,periods[new]))
    btn_agg_level.on_click(           lambda new         : local_update_plot('agg_level',agg_levels[new]))
    select_feature.on_change('value', lambda attr,old,new: local_update_plot('feature'  ,new))
    #add components to root document
    doc.add_root(column(btn_period,btn_agg_level, plot,select_feature))
    #output_file('images/vizu.html')
    #save(doc,filename='images/vizu.html')

In [24]:
# show the viz in jupyter but otherwise : notebook_url="http://localhost:8888"
show(bkapp) 

# Cluster visualisation

For this visualisation, we follow a similar approach. The main difference is the fact that we do not need to use a color bar. Furthermore, we only computed the clusters for the yearly data so, the ```period``` button can also be removed.

In [9]:
cluster = pd.read_csv('data/tesco_cluster.csv')
cluster = cluster.query("month == 'yea'")
cluster = cluster[['area_id','month','agg_level']+[c for c in cluster.columns if c.startswith("cluster")]]
cluster.head()

Unnamed: 0,area_id,month,agg_level,cluster_2,cluster_3,cluster_4,cluster_5,cluster_6,cluster_7,cluster_8
0,E09000001,yea,borough,1,0,3,1,5,5,5
1,E09000002,yea,borough,0,1,1,2,4,1,1
2,E09000003,yea,borough,1,0,3,3,2,3,3
3,E09000004,yea,borough,0,1,1,2,4,1,1
4,E09000005,yea,borough,0,2,2,0,3,2,2


In [11]:
cluster_dict,_,c_agg_levels,c_feature_names = create_datasource(cluster)

In [16]:
from bokeh.palettes import Dark2,Spectral,Set2,Colorblind
def create_cat_color_mapper(state:dict,nan_color='#d9d9d9'):
    """This function create the categorical color mapper"""
    feature  = get_feature(state,tesco_dict=cluster_dict) # Select feature
    n_colors = max(feature.nunique(),3) if feature is not None else 1
    palette  = Set2[max(n_colors,3)][:n_colors][::-1]
    color_mapper=LinearColorMapper(palette=palette)
    return color_mapper

In [17]:
def update_cat_plot(state_field:str, new_val:str,state:dict,plot,geosource):
    """
    Handles interactivity in the visualization
    @param state_field : (str) element of the state that needs to be updated
    @param new_val     : (str) new value of state_field element 
    """
    state[state_field] = new_val  # state update
    # update title
    plot.title.text = "K-Means clustering (k = {0}) for {1}".format(state['feature'][-1],state['agg_level'])
    if state_field == 'feature':                    # update color mapper
        color_mapper = create_cat_color_mapper(state)   # create new mapper
        plot.patches('xs','ys',                     # update patches
            source = geosource,
            fill_color = {'field' :state['feature'], 'transform' : color_mapper},
            line_color = 'black', line_width = 0.25, 
            fill_alpha = 1,
                     #legend_label=state['feature']
                    )
    new_data = json_data(**state,data=cluster_dict) # get the new data and convert to JSON
    geosource.geojson = new_data                    # set new data to trigger recoloring event
    # new hover for the new feature 
    hover = HoverTool(tooltips = [('Area name','@name'),(state['feature'], '@'+state['feature'])])
    # add hover tools to plot
    plot.tools = [hover]

In [18]:
# output notebook to allow inline jupyter interaction
output_notebook()

def cluster_bkapp(doc):
    # default state when the component is loaded for the first time
    state = {'month':'yea','agg_level':'borough','feature':'cluster_2'}
    geosource = GeoJSONDataSource(geojson = json_data(**state,data=cluster_dict))
    # Hover that will show the current feature value for the hovered area + its name
    hover = HoverTool(tooltips = [ ('Area name','@name'),(state['feature'], '@'+state['feature'])])
    # Create figure for the plot
    plot = figure(title = "K-Means clustering (k = {0}) for {1}".format(state['feature'][-1],state['agg_level']), 
            plot_height = 800 , plot_width = 950, toolbar_location = None,
            tools = [hover])
    # Remover axis for nicer layout
    plot.xgrid.grid_line_color = None
    plot.ygrid.grid_line_color = None
    plot.axis.visible = False
    # link data and plot 
    color_mapper   = create_cat_color_mapper(state) # create new mapper
    plot.patches('xs','ys', source = geosource,     # link geosource
        fill_color = {'field' :state['feature'],    # color related to selected feature
                      'transform':color_mapper},    # use the defined colormapper
        line_color = 'black',                       # style : black borders
        line_width = 0.25, fill_alpha = 1,)           # more styling
    #create graphical components for the user to interact
    btn_agg_level  = RadioButtonGroup(labels=c_agg_levels, active=0)    # selection aggregation level
    select_feature = Select(title="Number of cluster (value of k)", 
                            value=c_feature_names[0], options=c_feature_names)
    #add event handler
    local_update_plot = lambda st,v: update_cat_plot(st, v,state,plot,geosource)
    btn_agg_level.on_click(           lambda new         : local_update_plot('agg_level',c_agg_levels[new]))
    select_feature.on_change('value', lambda attr,old,new: local_update_plot('feature'  ,new))
    #add components to root document
    doc.add_root(column(btn_agg_level, plot,select_feature))

In [19]:
show(cluster_bkapp) 