# Using Altair to create interactive graphs
## Objective
Using Altair to create interactive graphs.

The following functionalities are present in the final graph: 
* Using tooltips and other basic functionalities
* Higlight datapoints based on proximity
* Datapoints can be toggled
* Show upper and lower (expected) bounds of data when a line is clicked
* Show vertical rules on certain dates

### Internal links
* Creation of graph starts [here](#initial_graph)
* [Final graph and code](#final_graph)

# References
* add selector in one part and filter in another: https://altair-viz.github.io/user_guide/transform/filter.html#selection-predicates
* rulers: https://altair-viz.github.io/gallery/bar_chart_with_mean_line.html
* highlighting: https://altair-viz.github.io/gallery/multiline_highlight.html
* toggling: https://github.com/altair-viz/altair/issues/954
* rulers, sliders and buttons: https://altair-viz.github.io/gallery/multiple_interactions.html

# Data input 
Created using clean.ipynb

Input data should have a multi-index on date, section and trackerID. Each tracked value should have a lower and upper bound and a value corresponding to it

## Data format
The final cleaned data has 3 levels of indexes and 4 columns ('value','normalised': data normalised according to expected upper and lower bounds, 'UPPER': upper bound and 'LOWER':lower bound). The 1st level of index has no purpose (it could serve as another level of toggling since it's a higher level of categorisation for the next indexing layer). The last index is the date.

For reference, the final dataset looks like this

```
                                        Value       Interpolated value       Upper limit       Lower limit

Category       Subcategory       Date
```

In [1]:
import pandas as pd
import numpy as np

In [2]:
UPPER = 'UPPER'
LOWER = 'LOWER'

## Read raw data from file
Reports data, and file containing additional info

In [120]:
df = pd.read_csv('input.csv')
df['date'] = pd.to_datetime(df['date'])
df = df.set_index(['section','trackerID','date'])

# Graphing starts here
<a id='initial_graph'></a>
Note that multiindexing is not supported by Altair, so the index of the table needs to be reset.

The dataset has already been converted to [long form](https://altair-viz.github.io/user_guide/data.html#long-form-vs-wide-form-data), which should make things easier

In [4]:
import altair as alt
from altair import datum

## Creating a simple line graph with highlighting enabled

In [5]:
Sample_column = 'Haemoglobin'

In [6]:
"""
encodings can be channels or field can be set to trackerID.
Both are the same thing (encoding via channels provides abstraction since the underlying data
to be tracker can be anything)
"""
# mouseover selection event
highlight = alt.selection(type='single', on='mouseover',
                          encodings=['color'], nearest=True)

In [7]:
# setting the base graph properties
# Q and T are indicators for altair to treat the data differently
base = alt.Chart().encode(
    x=alt.X('date:T', title='Date'),
    # impute fills in the missing data
    y=alt.Y('interpolated_value:Q', title='Value'),
    color=alt.Color('trackerID')
).transform_filter(
    # datum is used to refer to the datapoints themselves
    (datum.trackerID == Sample_column) # using only a single column to prevent crowding of boxes to be used later on
)

In [8]:
# define the points to be used
# the property specified should actually be the property of the base
points = base.mark_circle().encode(
    opacity=alt.value(0.5)
).transform_filter(
    # show only those points where actual measurement was made
    alt.FieldValidPredicate(
        field='value',
        valid=True
    )
).add_selection(
    highlight #adding the selector that we have defined
).properties(
    width=600
)

In [9]:
def getVegaFormatDate(date):
    return { "year": date.year, "month": date.month, "date": date.day }

In [10]:
# define the size of the line according to the selector defined
# note that the selector is not added because selectors cannot be shared
# the selectors can be used in alt.conditions in multiple alt.marks, but add_selection needs to be placed only once
# the selectors added via selection can be used by other objects as well
lines = base.mark_line().encode(
    size=alt.condition(~highlight, alt.value(1), alt.value(3))
)

In [11]:
bound = base.mark_rect(color='firebrick',opacity=0.3, filled = True).encode(
    x = 'min(date):T',
    x2 = 'max(date):T',
    y=str(LOWER)+':Q',
    y2=str(UPPER)+':Q',
 )

In [12]:
# multiIndex not supported by altair
source = df.reset_index()
# layer everything into one plot
alt.layer(points,lines,bound,data=source)

## Adding clickable legends and bounding boxes on click
To allow clickable legends, it has been created as a separate component. The method does not work efficiently because the data source is specified in alt.Chart and the data source is inherited. This does not allow linking the filtered data from the legend and the line plots in the graph (the opacity is set to 0 instead of actually filtering the data, which is why the graph does not rescale on selecting/deselecting legends).

Note1: transform_filter probably does not work well when the selectors used in the filters and the selectors are defined under the same class object

Note2: When applying transform_filters, there are different types of [predicates](https://altair-viz.github.io/user_guide/transform/filter.html). To apply multiple of these, the filters can be chained together through successive calls to transform_filters

In [13]:
## DEFINING selectors
# mouseover condition
highlight = alt.selection(type='single', on='mouseover',
                          fields=['trackerID'], nearest=True)
# selecting from legends
selection = alt.selection_multi(encodings=['color'], empty='none') #use encodings or fields. Doesn't matter

# interval selection
brush = alt.selection_interval()

# click condition
click = alt.selection(type='single',encodings=['color'],empty='none')

In [14]:
# defining the source properties
# properties are defined at the base itself
# the legend is removed from here, since the legend will be a separate component now
base = alt.Chart(source).encode(
    x=alt.X('date:T', title='Date'),
    y=alt.Y('interpolated_value:Q', title='Value'),
    color=alt.Color('trackerID',legend=None),
    tooltip=[ 'value:Q', 'date:T', 'trackerID' ,'section']
).properties(
    width=650,
    height=500,
)

In [15]:
# defining the points and setting their opacity based on selection(via legend) and size based on mouse
points = base.mark_point().encode(
    opacity=alt.condition(selection, alt.value(0.0), alt.value(1.0)),
    size=alt.condition(~highlight, alt.value(20), alt.value(40)),
).add_selection(
    highlight,
    click
).transform_filter(
    alt.FieldValidPredicate(
        field='value',
        valid=True
    )
)
#  adding the add_selection to base does not work, for some reason.
#  Probably because it will be then inherited by both points and line

In [16]:
# doing the same thing with lines
lines = base.mark_line().encode(
    size=alt.condition(~highlight, alt.value(1), alt.value(3)),
    opacity=alt.condition(selection , alt.value(0.0), alt.value(1.0))
)

In [17]:
"""
transform_filter filters out the points defined by the click action.
Note that it was added to the selection in points, but is used here, without any issues

Using the commented code is not compatible with the transform_filter
"""
bound = alt.Chart(source).mark_rect(opacity=0.3, filled=True).encode(
#bound = base.mark_rect(color='firebrick', opacity=0.3, filled=True).encode(
    x = 'min(date):T',
    x2 = 'max(date):T',
    y=str(LOWER)+':Q',
    y2=str(UPPER)+':Q',
    color = 'trackerID'
 ).transform_filter(click)

In [18]:
clickable_legend = alt.Chart(source).mark_circle(size=200,radius=200).encode(
    y='trackerID',
    color=alt.condition(selection, alt.value('lightgray'), 'trackerID', legend=None) #replacing selection with higlight is also useful
).add_selection(selection)

In [19]:
source = df.reset_index()
# + is layering, | is hconcat, & is vconcat
points + lines + bound | clickable_legend

## Final graph


### TODO: add filters for category as well
<a id='final_graph'></a>

In [122]:
INTERPOLATE = False

In [123]:
### Creating selectors

# mouseover condition
highlight = alt.selection(type='single', on='mouseover',
                          fields=['trackerID'], nearest=True)
# selecting from legends
selection = alt.selection_multi(fields=['trackerID'], empty='all') #use encodings or fields. Doesn't matter

# interval selection
brush = alt.selection_interval()

# click condition
click = alt.selection(type='single',encodings=['color'],empty='none')

In [124]:
# Note: the data source has not been specified yet
# Clickable legend which colors based on selection
clickable_legend = alt.Chart().mark_circle(size=200,radius=200).encode(
    y='trackerID',
    color=alt.condition(selection, 'trackerID', alt.value('lightgray'), legend=None)
).add_selection(selection)

In [125]:
# filter based on selection via clickable_legend. Wasn't possible in the previous section
# Base graph properties which filters based on selection
base = alt.Chart().encode(
    x=alt.X('date:T', title='Date'),
    y=alt.Y('interpolated_value:Q', title='Value'),#, impute=alt.ImputeParams(method='mean' if False else 'value')),
    color=alt.Color('trackerID',legend=None),
    tooltip=[ 'value:Q', 'date:T', 'trackerID' ,'section']
).transform_filter(
    selection
).properties(
    width=650,
    height=500,
)

In [126]:
# Same objects as before till the last code block
# Adding selectors for highlights and clicks
points = base.mark_point().encode(
    size=alt.condition(~highlight, alt.value(20), alt.value(40)),
).add_selection(
    highlight,
    click
).transform_filter(
    alt.FieldValidPredicate(
        field='value',
        valid=True
    )
)

In [127]:
#highlight the lines on mouseover and display if selected
lines = base.mark_line().encode(
    size=alt.condition(~highlight, alt.value(1), alt.value(3)),
)

In [128]:
# Bounding rectange which filters based on click
bound = alt.Chart().mark_rect(opacity=0.3, filled=True).encode(
    x = 'min(date):T',
    x2 = 'max(date):T',
    y=str(LOWER)+':Q',
    y2=str(UPPER)+':Q',
    color = 'trackerID'
).transform_filter(click)

Get the annotation file for vertical date markers

In [129]:
import os
ANNOTATION_FILE = 'annotate' # only info_type == date is required. Others can be dropped
if os.path.exists(ANNOTATION_FILE):
    annotate_df = pd.read_csv('annotate', sep=';')

    dates_df = annotate_df[annotate_df['info_type'] == 'date']
    dates_df = dates_df.drop(columns=['remarks'])
    dates_df.loc[:, 'date'] = pd.to_datetime(dates_df['date'])
else:
    dates_df = pd.DataFrame(columns=['date'])
    print("No notes to be marked")

In [130]:
# ensure that the date column taken from the source is in datetime format
vertical_rule = alt.Chart().mark_rule().encode(
    x = 'date:T',
    color=alt.value('red'),
).transform_filter(
    alt.FieldOneOfPredicate(
        field='date',
        oneOf=list(map(lambda x: getVegaFormatDate(x), dates_df['date']))
    )
)

In [131]:
# defining the data source here
source = df.reset_index()
graph = alt.concat(lines + points + bound + vertical_rule, clickable_legend, data=source)

In [132]:
graph

### Guide for using the graph
* Hover near a line to highlight it and show a tooltip for the nearest datapoint
* Click on any line to show the range of expected values for that trackerID
* Click on a legend to select that legend exclusively in the graph (Shift-click for multiple selections; to reset, deselect the selected legends)

In [50]:
graph.save('graph.html')