# Using Altair to create interactive graphs
## Objective
Using Altair to create interactive graphs.

The following functionalities are present in the final graph: 
* Using tooltips and other basic functionalities
* Higlight datapoints based on proximity
* Datapoints can be toggled
* Show upper and lower (expected) bounds of data when a line is clicked

### Internal links
* Creation of graph starts [here](#initial_graph)
* [Final graph and code](#final_graph)

## Data used
The final cleaned data has 3 levels of indexes and 4 columns ('value','normalised': data normalised according to expected upper and lower bounds, 'UPPER': upper bound and 'LOWER':lower bound). The 1st level of index has no purpose (it could serve as another level of toggling since it's a higher level of categorisation for the next indexing layer). The last index is the date.

For reference, the final dataset looks like this

```
                                        Value       Normalised value       Upper limit       Lower limit

Category       Subcategory       Date
```

The initial data that I used to create this was a medical record file containing time series data file, in messy rows and columns. The initial part of this notebook is dedicated to cleaning the data. 

For reference, the data was in this format
```
               Upper Limit       Lower Limit     Date 1       Date 2       Date 3
Category A       <null row>

Subcategory 1       <ulA1>       <llA1>       <val>       <val>

..

Category B       <null row>

Subcateory 1       <ulB1>       <llB1>       <val>       <val>

Subcategory 2       <ulB2>       <llB2>       <val>       <val>

..

.....
```

# References
* add selector in one part and filter in another: https://altair-viz.github.io/user_guide/transform/filter.html#selection-predicates
* rulers: https://altair-viz.github.io/gallery/bar_chart_with_mean_line.html
* highlighting: https://altair-viz.github.io/gallery/multiline_highlight.html
* toggling: https://github.com/altair-viz/altair/issues/954
* rulers, sliders and buttons: https://altair-viz.github.io/gallery/multiple_interactions.html

In [1]:
import pandas as pd
import numpy as np

In [2]:
UPPER = 'UPPER'
LOWER = 'LOWER'

In [3]:
f = pd.read_excel('report.xlsx')
# Drop the rows and columns used for padding in excel
f.dropna(axis=0,how='all',inplace=True)
f.dropna(axis=1,how='all',inplace=True)
f.reset_index(drop=True, inplace = True)

In [4]:
# Other columns are treated as DateTime, so store their values as a string and rename the other columns as required
f.columns = [str(x) if '00' not in str(x) else str(x)[:-9] for x in f.loc[0,:].tolist()]
# trackerID is the subcategory
f.rename(columns={'nan':'trackerID'},inplace=True)

In [5]:
f.drop(0,axis=0,inplace=True)
f.reset_index(drop=True, inplace=True)

In [6]:
# remove the null rows in between
f = f.loc[f.iloc[:,0].notnull(),:]

In [7]:
trackerIDs = f['trackerID']
f['section'] = np.where(f.iloc[:,1:].isnull().all(axis=1), trackerIDs, None)

In [8]:
# percolate the category name to all rows (do the modification in a separate list and assign it back)
section_name = f['section'].tolist()
for i in range(len(section_name)):
    if section_name[i] is None:
        section_name[i] = section_name[i-1]
f['section'] = section_name

In [9]:
# remove all the rows which contain only section name
f = f[f.iloc[:,1:-1].notnull().any(axis=1)]
f.reset_index(drop=True,inplace=True)

In [10]:
# replace a column name if it contains some additional data
l = [x.strip() for x in f.columns.tolist()]
for i in range(len(l)):
    if '(' in l[i]:
        l[i] = '2020-08-21'
f.columns = l

In [11]:
# Remove all the rows which do not contain an upper and lower bound
f = f[f[UPPER].notnull() & f[LOWER].notnull()]

In [12]:
f = f.set_index(['section','trackerID']) # same as f.pivot_table(index=['section','Obj'],aggfunc='first')

In [13]:
# get the column names except for the upper and lower bounds
col_list = f.columns.tolist()
col_list.remove(UPPER)
col_list.remove(LOWER)

index_names = f.index.names.copy()
# take the transpose and unstack to get date as index (reset the index to set the column name) 
temp = f[col_list].T.unstack().to_frame().reset_index()
# rename the column, set the index and join with f to get the upper and lower bounds too
f = temp.rename(columns={'level_2':'date',0:'value'}).set_index(index_names+['date']).join(f.loc[:,[UPPER,LOWER]])

In [14]:
# creating a column with normalised value
f.replace('[a-zA-Z]*',np.nan,regex=True,inplace=True)
upper = f[UPPER]
lower = f[LOWER]
f['normalised'] = ((f['value'] - (upper+lower)/2)/((upper-lower)/2)+1)/2

In [15]:
# round all the datapoints
df = f.round(3)

In [16]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,value,UPPER,LOWER,normalised
section,trackerID,date,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
BIOCHEMISTRY,(Serum) Calcium,2020-09-01,8.6,10.2,8.4,0.111
BIOCHEMISTRY,(Serum) Calcium,2020-08-24,7.6,10.2,8.4,-0.444
BIOCHEMISTRY,(Serum) Calcium,2020-08-21,,10.2,8.4,
BIOCHEMISTRY,(Serum) Calcium,2020-08-20,,10.2,8.4,
BIOCHEMISTRY,(Serum) Calcium,2020-08-17,8.6,10.2,8.4,0.111
...,...,...,...,...,...,...
URINE ANALYSIS,PTH,2020-07-24,,68.3,15.0,
URINE ANALYSIS,PTH,2020-07-23,,68.3,15.0,
URINE ANALYSIS,PTH,2020-07-22,,68.3,15.0,
URINE ANALYSIS,PTH,2020-07-21,,68.3,15.0,


# Graphing starts here
<a id='initial_graph'></a>
Note that multiindexing is not supported by Altair, so the index of the table needs to be reset.

The dataset has already been converted to [long form](https://altair-viz.github.io/user_guide/data.html#long-form-vs-wide-form-data), which should make things easier

In [17]:
import altair as alt
from altair import datum

## Creating a simple line graph with highlighting enabled

In [18]:
Sample_column = 'Haemoglobin'

In [19]:
"""
encodings can be channels or field can be set to trackerID.
Both are the same thing (encoding via channels provides abstraction since the underlying data
to be tracker can be anything)
"""
# mouseover selection event
highlight = alt.selection(type='single', on='mouseover',
                          encodings=['color'], nearest=True)

In [20]:
# setting the base graph properties
# Q and T are indicators for altair to treat the data differently
base = alt.Chart().encode(
    x=alt.X('date:T', title='Date'),
    # impute fills in the missing data
    y=alt.Y('value:Q', title='Value', impute=alt.ImputeParams(method='mean')),
    color=alt.Color('trackerID')
).transform_filter(
    # datum is used to refer to the datapoints themselves
    (datum.trackerID == Sample_column) # using only a single column to prevent crowding of boxes to be used later on
)

In [21]:
# define the points to be used
# the property specified should actually be the property of the base
points = base.mark_circle().encode(
    opacity=alt.value(0.5)
).add_selection(
    highlight #adding the selector that we have defined
).properties(
    width=600
)

In [22]:
# define the size of the line according to the selector defined
# note that the selector is not added because selectors cannot be shared
# the selectors can be used in alt.conditions in multiple alt.marks, but add_selection needs to be placed only once
# the selectors added via selection can be used by other objects as well
lines = base.mark_line().encode(
    size=alt.condition(~highlight, alt.value(1), alt.value(3))
)

In [23]:
bound = base.mark_rect(color='firebrick',opacity=0.3, filled = True).encode(
    x = 'min(date):T',
    x2 = 'max(date):T',
    y=str(LOWER)+':Q',
    y2=str(UPPER)+':Q',
 )

In [24]:
# multiIndex not supported by altair
source = df.reset_index()
# layer everything into one plot
alt.layer(points,lines,bound,data=source)

## Adding clickable legends and bounding boxes on click
To allow clickable legends, it has been created as a separate component. The method does not work efficiently because the data source is specified in alt.Chart and the data source is inherited. This does not allow linking the filtered data from the legend and the line plots in the graph (the opacity is set to 0 instead of actually filtering the data, which is why the graph does not rescale on selecting/deselecting legends).

Note1: transform_filter probably does not work well when the selectors used in the filters and the selectors are defined under the same class object

Note2: When applying transform_filters, there are different types of [predicates](https://altair-viz.github.io/user_guide/transform/filter.html). To apply multiple of these, the filters can be chained together through successive calls to transform_filters

In [25]:
## DEFINING selectors
# mouseover condition
highlight = alt.selection(type='single', on='mouseover',
                          fields=['trackerID'], nearest=True)
# selecting from legends
selection = alt.selection_multi(encodings=['color'], empty='none') #use encodings or fields. Doesn't matter

# interval selection
brush = alt.selection_interval()

# click condition
click = alt.selection(type='single',encodings=['color'],empty='none')

In [26]:
# defining the source properties
# properties are defined at the base itself
# the legend is removed from here, since the legend will be a separate component now
base = alt.Chart(source).encode(
    x=alt.X('date:T', title='Date'),
    y=alt.Y('value:Q', title='Value'),
    color=alt.Color('trackerID',legend=None),
    tooltip=[ 'value:Q', 'normalised:Q', 'date:T', 'trackerID' ,'section']
).properties(
    width=650,
    height=500,
)

In [27]:
# defining the points and setting their opacity based on selection(via legend) and size based on mouse
points = base.mark_point().encode(
    opacity=alt.condition(selection, alt.value(0.0), alt.value(1.0)),
    size=alt.condition(~highlight, alt.value(20), alt.value(40)),
).add_selection(
    highlight,
    click
)
"""
 adding the add_selection to base does not work, for some reason.
 Probably because it will be then inherited by both points and line
"""

'\n adding the add_selection to base does not work, for some reason.\n Probably because it will be then inherited by both points and line\n'

In [28]:
# doing the same thing with lines
lines = base.mark_line().encode(
    size=alt.condition(~highlight, alt.value(1), alt.value(3)),
    opacity=alt.condition(selection , alt.value(0.0), alt.value(1.0))
)

In [29]:
"""
transform_filter filters out the points defined by the click action.
Note that it was added to the selection in points, but is used here, without any issues

Using the commented code is not compatible with the transform_filter
"""
bound = alt.Chart(source).mark_rect(opacity=0.3, filled=True).encode(
#bound = base.mark_rect(color='firebrick', opacity=0.3, filled=True).encode(
    x = 'min(date):T',
    x2 = 'max(date):T',
    y=str(LOWER)+':Q',
    y2=str(UPPER)+':Q',
    color = 'trackerID'
 ).transform_filter(click)

In [30]:
clickable_legend = alt.Chart(source).mark_circle(size=200,radius=200).encode(
    y='trackerID',
    color=alt.condition(selection, alt.value('lightgray'), 'trackerID', legend=None) #replacing selection with higlight is also useful
).add_selection(selection)

In [31]:
source = df.reset_index()
# + is layering, | is hconcat, & is vconcat
points + lines + bound | clickable_legend

## Final graph


### TODO: add filters for category as well
<a id='final_graph'></a>

In [32]:
INTERPOLATE = False

In [33]:
### Creating selectors

# mouseover condition
highlight = alt.selection(type='single', on='mouseover',
                          fields=['trackerID'], nearest=True)
# selecting from legends
selection = alt.selection_multi(fields=['trackerID'], empty='all') #use encodings or fields. Doesn't matter

# interval selection
brush = alt.selection_interval()

# click condition
click = alt.selection(type='single',encodings=['color'],empty='none')

In [34]:
# Note: the data source has not been specified yet
# Clickable legend which colors based on selection
clickable_legend = alt.Chart().mark_circle(size=200,radius=200).encode(
    y='trackerID',
    color=alt.condition(selection, 'trackerID', alt.value('lightgray'), legend=None)
).add_selection(selection)

In [35]:
# filter based on selection via clickable_legend. Wasn't possible in the previous section
# Base graph properties which filters based on selection
base = alt.Chart().encode(
    x=alt.X('date:T', title='Date'),
    y=alt.Y('value:Q', title='Value', impute=alt.ImputeParams(method='mean' if INTERPOLATE else 'value')),
    color=alt.Color('trackerID',legend=None),
    tooltip=[ 'value:Q', 'normalised:Q', 'date:T', 'trackerID' ,'section']
).transform_filter(
    selection
).properties(
    width=650,
    height=500,
)

In [36]:
# Same objects as before till the last code block
# Adding selectors for highlights and clicks
points = base.mark_point().encode(
    size=alt.condition(~highlight, alt.value(20), alt.value(40)),
).add_selection(
    highlight,
    click
)

In [37]:
#highlight the lines on mouseover and display if selected
lines = base.mark_line().encode(
    size=alt.condition(~highlight, alt.value(1), alt.value(3)),
)

In [38]:
# Bounding rectange which filters based on click
bound = alt.Chart().mark_rect(opacity=0.3, filled=True).encode(
    x = 'min(date):T',
    x2 = 'max(date):T',
    y=str(LOWER)+':Q',
    y2=str(UPPER)+':Q',
    color = 'trackerID'
).transform_filter(click)

In [39]:
# defining the data source here
source = df.reset_index()
graph = alt.concat(lines + points + bound, clickable_legend , data=source)

In [40]:
graph

### Guide for using the graph
* Hover near a line to highlight it and show a tooltip for the nearest datapoint
* Click on any line to show the range of expected values for that trackerID
* Click on a legend to select that legend exclusively in the graph (Shift-click for multiple selections; to reset, deselect the selected legends)

In [41]:
graph.save('graph.html')