# Session 10: Interactive Visualization with Bokeh


# Background on Bokeh 

-   Interactive visualization software with Javascript backend
-   Especially known for good performance visualizing large datasets in web browsers


# Basic building block: the glyph 

-   While the basic building block of the figures in `altair` was the `mark`, in Bokeh we call these building blocks `glyphs`
- Steps are:
    - Use the `figure` function to create a basic repreresentation of the data
    - Add 1+ glyphs to correspond to the type of chart you want to create. Examples include: 
        - line plots
        - bar plots
        - scatter plots 

# Before plotting the data: convert data to bokeh data source

-  Rather than feeding the pandas dataframe directly to Bokeh, need to transform it using the `ColumnDataSource` class 
- As we'll illustrate later, can also do things like filtering to the transformed version of the data, which in the case of large datasets can have efficiency gains

# Imports

In [76]:
import pandas as pd
import numpy as np
import re
import bokeh.io
import bokeh.plotting
import bokeh.models
from bokeh.transform import factor_cmap, factor_mark
from bokeh.models import (BooleanFilter, CDSView, ColumnDataSource, 
    Range1d, DataTable, TableColumn, FactorRange)
from bokeh.palettes import GnBu3, OrRd3
from bokeh.layouts import row, column
bokeh.io.output_notebook()


# Loading data

In [4]:
who = pd.read_csv('../session9_altair/Life Expectancy Data.csv')
col1 = [col.strip().lower() for col in 
          who.columns]
col2 = [re.sub(" ", '_', col) for col in col1]
who.columns = col2

# Roadmap 

- Plots without interactivity
    - Scatterplots
    - Shading by group
    - Creating views of the data 
    - Bar charts
- Plots with interactivity
    - Hoverable points
    - Interactive legends
    - Linked chart selections 

# Basic scatterplot 

In [17]:
b_datasource = bokeh.models.ColumnDataSource(who[who.year == 2010])
type(b_datasource)

p = bokeh.plotting.figure(plot_width = 400, plot_height = 300,
                         x_axis_label = 'Years of schooling',
                         y_axis_label = "Life expectancy")

p.scatter(x = 'schooling', y = 'life_expectancy', source = b_datasource)
bokeh.io.show(p)

# How do we color by group?

- Can use `factor_cmap()` to map categories onto colors
- Tell it the field you want to map, a list of colors or palette to use for mapping, and a list of categories corresponding to that palette 

In [16]:
domain = ['Developing', 'Developed']
colors = ['seagreen', '#7D3C98']

p = bokeh.plotting.figure(plot_width = 400, plot_height = 300,
                         x_axis_label = 'Years of schooling',
                         y_axis_label = "Life expectancy")

p.scatter(x = 'schooling', y = 'life_expectancy', 
          legend_field = 'status',
          color = factor_cmap(field_name = 'status',
                             palette = colors,
                             factors = domain),
          source = b_datasource)


In [18]:
bokeh.io.show(p)

# How do we filter within the ColumnDataSource?

- Can use the `CDSView` class to create different filtered views of the data
- Can then pass these views to the plotting code directly 

In [19]:
b_datasource_all = bokeh.models.ColumnDataSource(who)
log_2000 = [True if y == 2000 else False for y in b_datasource_all.data['year']]
view_2000 = CDSView(source = b_datasource_all,
                   filters = [BooleanFilter(log_2000)])
log_2015 = [True if y == 2015 else False for y in b_datasource_all.data['year']]
view_2015 = CDSView(source = b_datasource_all,
                   filters = [BooleanFilter(log_2015)])



In [20]:
p = bokeh.plotting.figure(plot_width = 400, plot_height = 300,
                         x_axis_label = 'Years of schooling',
                         y_axis_label = "Life expectancy")

p.scatter(x = 'schooling', y = 'life_expectancy', 
          legend_field = 'status',
          color = factor_cmap(field_name = 'status',
                             palette = colors,
                             factors = domain),
          source = b_datasource_all,
          view = view_2000)
p.legend.location = "bottom_right"
p.y_range = Range1d(0, 90)
p.x_range = Range1d(0, 20)
p.title.text = "Years of schooling v. life expectancy: 2000"

In [21]:
p_later = bokeh.plotting.figure(plot_width = 400, plot_height = 300,
                         x_axis_label = 'Years of schooling',
                         y_axis_label = "Life expectancy")

p_later.scatter(x = 'schooling', y = 'life_expectancy', 
          legend_field = 'status',
          color = factor_cmap(field_name = 'status',
                             palette = colors,
                             factors = domain),
          source = b_datasource_all,
          view = view_2015)
p_later.legend.location = "bottom_right"
p_later.y_range = Range1d(0, 90)
p_later.x_range = Range1d(0, 20)
p_later.title.text = "Years of schooling v. life expectancy: 2015"

In [22]:
bokeh.io.show(row(p, p_later))

# Bar charts

- Same set up with using `figure` to initialize the basic structure of the chart
- The x axis variable needs to be string or categorical type for the plot to display correctly
- Layer on:
    - `vbar` for basic bar chart
    - `vbar_stacked` or `hbar_stacked` for stacked bar chart 

# Illustration with one country

In [50]:
one_country = who[(who.country == "China") &
        (who.year > 2010)].copy()
one_country['year_string'] = one_country.year.astype(str)
years = sorted(one_country.year_string.unique())
one_country_source = ColumnDataSource(one_country)

In [51]:
p = bokeh.plotting.figure(x_range= years,
                        toolbar_location=None, title="Years of schooling across time")
p.vbar(x= 'year_string', top='schooling',
       width=0.9, source=one_country_source)
p.y_range.start = 0
bokeh.io.show(p)

# Comparing multiple countries

- To do a comparison bar chart, Bokeh wants a particular format where the dataframe column contains tuples with the nested categories, eg:
    ('China', '2011')
    ('China', '2012')
    ('Brazil', '2011')
    ('Brazil', '2012')
- It will then create a bar chart where the first element in the tuple groups the bars and the second element in the tuple are categories repeated across bars 


In [73]:
focal_df = who[(who.country.isin(['Brazil', 'China', 'India', 'Russian Federation'])) &
        (who.year > 2012)].copy()
x_axis = [ (country, str(year)) 
          for country in focal_df.country.unique() 
          for year in sorted(focal_df.year.unique())]
x_axis 

[('Brazil', '2013'),
 ('Brazil', '2014'),
 ('Brazil', '2015'),
 ('China', '2013'),
 ('China', '2014'),
 ('China', '2015'),
 ('India', '2013'),
 ('India', '2014'),
 ('India', '2015'),
 ('Russian Federation', '2013'),
 ('Russian Federation', '2014'),
 ('Russian Federation', '2015')]

In [81]:
schooling = focal_df.sort_values(by = ['country', 'year'])['schooling']
source = ColumnDataSource(data=dict(x=x_axis, schooling=schooling))

p = bokeh.plotting.figure(x_range=FactorRange(*x_axis), height=350, 
                          title="Average years of schooling by year",
           toolbar_location=None, tools="")
p.vbar(x='x', top='schooling', width=0.9, source=source,
      fill_color=factor_cmap('x', palette=bokeh.palettes.GnBu3, factors= ['2013', '2014',
                                                                         '2015'],
                            start=1, end=2))

In [None]:
bokeh.io.show(p)

# Where we are 

- Plots without interactivity
    - Scatterplots
    - Shading by group
    - Creating views of the data 
    - Bar charts
- **Plots with interactivity**
    - Hoverable points
    - Interactive legends
    - Linked chart selections 

# Adding Interactivity

Different types of interactivity:
- Hoverable tooltips
- Making legends interactive to show a subset of the points 
- Similar capabilities as `altair` in having selections in one plot propagate through to another 

# Hoverable tooltips

- Within the original `figure` call, use the:
    - `tools` command: can specify `hover` or other actions
    - `tooltips` commands: can use the @ symbol to specify which columns in the dataset to display upon hovering 
- Make note of the list of tuples syntax to specify titles for the tooltip field


In [34]:
tooltips_map = [
    ('Country', '@country'),
    ('Years of schooling', '@schooling'), 
    ('Life expectancy', '@life_expectancy')
]

p = bokeh.plotting.figure(plot_width = 400, plot_height = 300,
                         x_axis_label = 'Years of schooling',
                         y_axis_label = "Life expectancy",
                         tools = "hover,pan,zoom_in",
                         tooltips = tooltips_map)

p.scatter(x = 'schooling', y = 'life_expectancy', 
          legend_field = 'status',
          color = factor_cmap(field_name = 'status',
                             palette = colors,
                             factors = domain),
          source = b_datasource)


In [None]:
bokeh.io.show(p)

# Making legends interactive 

- Need to modify chart code so that it iterates through the different categories and adds a layer for each category 
- Make sure to modify the `legend` argument to be responsive to the iteration

In [105]:
p = bokeh.plotting.figure(plot_width = 400, plot_height = 300,
                         x_axis_label = 'Years of schooling',
                         y_axis_label = "Life expectancy",
                         tools = "hover,pan,zoom_in",
                         tooltips = tooltips_map)
p.title.text = 'Click on legend entries to hide the corresponding points'
for one_status in who.status.unique():
    df = bokeh.models.ColumnDataSource(who[(who.year == 2010) & 
                                    (who.status == one_status)])
    p.scatter(x = 'schooling', y = 'life_expectancy', 
        legend_label = one_status,
        color = factor_cmap(field_name = 'status',
                             palette = colors,
                             factors = domain),
        source = df)
p.legend.location = "bottom_right"
p.legend.click_policy = "hide"

In [106]:
bokeh.io.show(p)

# Linking charts

- Similar to `altair`, can have user behavior on one graph affect the output of another graph
- Unlike with `altair`, where we needed to code that responsivity explicitly using either the color or `transform_filter` parameter, the `Bokeh` responsivity occurs more automatically

In [108]:
tooltips_map = [
    ('Country', '@country'),
    ('Years of schooling', '@schooling'), 
    ('Life expectancy', '@life_expectancy'),
    ('GDP', '@gdp')
]

p = bokeh.plotting.figure(plot_width = 400, plot_height = 300,
                         x_axis_label = 'Years of schooling',
                         y_axis_label = "Life expectancy",
                         tools = "hover,pan,zoom_in,box_select",
                         tooltips = tooltips_map)

p.scatter(x = 'schooling', y = 'life_expectancy', 
          legend_field = 'status',
          color = factor_cmap(field_name = 'status',
                             palette = colors,
                             factors = domain),
          source = b_datasource_all,
          view = view_2015)

p.legend.location = "bottom_right"

In [109]:

g = bokeh.plotting.figure(plot_width = 400, plot_height = 300,
                         x_axis_label = 'GDP',
                         y_axis_label = "Life expectancy",
                         tools = "hover,pan,zoom_in,box_select",
                         tooltips = tooltips_map)

g.scatter(x = 'gdp', y = 'life_expectancy', 
          legend_field = 'status',
          color = factor_cmap(field_name = 'status',
                             palette = colors,
                             factors = domain),
          source = b_datasource_all,
          view = view_2015)
g.legend.location = "bottom_right"

In [110]:
show(row(p, g))

# Adding DataTable widget to a linked chart

- Can also add a `DataTable` widget to display in conjunction with a chart
- Similar to the selection across plots, this allows us to highlight data table rows that align with the selected values on a plot
- See here for a more advanced example that allows users to edit data table values: https://docs.bokeh.org/en/3.0.0/docs/user_guide/interaction/linking.html 

In [129]:
columns = [TableColumn(field = "country", title = "Country"),
          TableColumn(field = "life_expectancy", title = "Life exp."),
          TableColumn(field = "schooling", title = "Years of Schooling"),
          TableColumn(field = "year", title = "Year")]
dt = DataTable(source = b_datasource_all, columns = columns,  view = view_2015,
              editable = True)




In [133]:
p = bokeh.plotting.figure(plot_width = 400, plot_height = 300,
                         x_axis_label = 'Years of schooling',
                         y_axis_label = "Life expectancy",
                         tools = "hover,pan,zoom_in,xbox_select",
                         tooltips = tooltips_map,
                         active_drag="xbox_select")

p.scatter(x = 'schooling', y = 'life_expectancy', 
          legend_field = 'status',
          color = factor_cmap(field_name = 'status',
                             palette = colors,
                             factors = domain),
          source = b_datasource_all,
          view = view_2015)

p.legend.location = "bottom_right"

In [134]:
show(column(p, dt))

# Summing up

- For discussion: pros and cons relative to `altair`?
- Plots without interactivity
    - Scatterplots
    - Shading by group
    - Creating views of the data 
    - Bar charts
- Plots with interactivity
    - Hoverable points
    - Interactive legends
    - Linked chart selections 