# Interactive Data Visualization with Bokeh

## Inline Plots

1. Jupyter Notebook

We need to use `output_notebook()` from `bokeh.io` module in order to display plots inline.

In [1]:
from bokeh.io import output_notebook
output_notebook()

2. JupyterLab

We'll install “jupyterlab_bokeh” JupyterLab extension to embed Bokeh plots inside the JupyterLab notebook. This can be done by running below command.

In [2]:
#!jupyter labextension install jupyterlab_bokeh

3. Interactors

We can see interactive plots using the Jupyter notebook widgets (or interactors).

In [3]:
from ipywidgets import interact

In [4]:
# Install extension (only for JupyterLab users)

#!jupyter labextension install @jupyter-widgets/jupyterlab-manager

## Importing and Exploring Data

We'll use _Gapminder_ data in this Notebook.

In [5]:
import pandas as pd
import matplotlib.pyplot as plt

In [6]:
# Read in the csv file
df = pd.read_csv('data/gapminder.csv')
# Print the info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10111 entries, 0 to 10110
Data columns (total 8 columns):
Country            10111 non-null object
Year               10111 non-null int64
fertility          10100 non-null float64
life               10111 non-null float64
population         10108 non-null float64
child_mortality    9210 non-null float64
gdp                9000 non-null float64
region             10111 non-null object
dtypes: float64(5), int64(1), object(2)
memory usage: 632.0+ KB


In [7]:
df.sample(3)

Unnamed: 0,Country,Year,fertility,life,population,child_mortality,gdp,region
5060,Lesotho,1974,5.78,50.418,1123261.0,158.8,984.0,Sub-Saharan Africa
3917,"Hong Kong, China",1981,1.85,75.215,5151920.0,13.93,17086.0,East Asia & Pacific
528,Azerbaijan,1992,2.883,64.245,7445925.0,95.4,6346.0,Europe & Central Asia


In [8]:
#  Remove rows where 'any' of the columns contains missing data
# df.dropna(how='any', inplace=True)
# df.shape

In [9]:
# Set index to 'Year'
df.set_index('Year', inplace=True)
df.head(3)

Unnamed: 0_level_0,Country,fertility,life,population,child_mortality,gdp,region
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1964,Afghanistan,7.671,33.639,10474903.0,339.7,1182.0,South Asia
1965,Afghanistan,7.671,34.152,10697983.0,334.1,1182.0,South Asia
1966,Afghanistan,7.671,34.662,10927724.0,328.7,1168.0,South Asia


## Interactive Visualization

We'll focus on the fertility, life and Country columns of the dataset.

In [10]:
# Import necessary modules
from bokeh.io import output_file, show
from bokeh.plotting import figure
from bokeh.models import HoverTool, ColumnDataSource

from bokeh.io import curdoc

from bokeh.layouts import widgetbox, row
from bokeh.models import Slider

from bokeh.io import push_notebook

`ColumnDataSource`, a table-like object, is the most important data structure in Bokeh. We'll create a ColumnDataSource from our DataFrame to use with Bokeh.

In [11]:
# Create a ColumnDataSource from df
# source = ColumnDataSource(df)

In [12]:
# Make the ColumnDataSource
source = ColumnDataSource(data={
    'x' : df.loc[1984].fertility,
    'y' : df.loc[1984].life,
    'country' : df.loc[1984].Country,
    'pop' : (df.loc[1984].population / 20000000) + 2,
    'region' : df.loc[1984].region,
})

In [13]:
# Save the minimum and maximum values of the fertility column: xmin, xmax
xmin, xmax = min(df.fertility), max(df.fertility)

# Save the minimum and maximum values of the life expectancy column: ymin, ymax
ymin, ymax = min(df.life), max(df.life)

# Create the figure: plot
plot = figure(title='Gapminder Data for 1970', plot_height=400, plot_width=700, x_range=(xmin, xmax), y_range=(ymin, ymax))

# Add circle glyphs to the plot
plot.circle(x='x', y='y', fill_alpha=0.8, source=source)

# Set the x-axis label
plot.xaxis.axis_label ='Fertility (children per woman)'

# Set the y-axis label
plot.yaxis.axis_label = 'Life Expectancy (years)'

# Add the plot to the current document and add a title
curdoc().add_root(plot)
curdoc().title = 'Gapminder'

In [21]:
# Define the callback function
def update(year):
    # Assign the value of the slider
    yr = year
    # Set new_data
    new_data = {
        'x'       : df.loc[yr].fertility,
        'y'       : df.loc[yr].life,
        'country' : df.loc[yr].Country,
        'pop'     : (df.loc[yr].population / 20000000) + 2,
        'region'  : df.loc[yr].region,
    }
    # Assign new_data to the source
    source.data = new_data

    # Add title to figure
    plot.title.text = 'Gapminder data for %d' % year

    push_notebook()

In [22]:
show(plot, notebook_handle=True)

In [23]:
interact(update, year=(1970, 2010, 1));

interactive(children=(IntSlider(value=1990, description='year', max=2010, min=1970), Output()), _dom_classes=(…

In [14]:
# Define the callback function
def update_plot(attr, old, new):
    # Assign the value of the slider
    year = slider.value
    # Set new_data
    new_data = {
        'x'       : df.loc[year].fertility,
        'y'       : df.loc[year].life,
        'country' : df.loc[year].Country,
        'pop'     : (df.loc[year].population / 20000000) + 2,
        'region'  : df.loc[year].region,
    }
    # Assign new_data to
    source.data = new_data

    # Add title to figure
    plot.title.text = 'Gapminder data for %d' % year

    push_notebook()  
# Create a slider object
slider = Slider(start=1970, end=2010, step=1, value=1970, title='Year')

# Attach the callback to the 'value' property of slider
slider.on_change('value', update_plot)

# Make a row layout of widgetbox(slider) and plot and add it to the current document
layout = row(widgetbox(slider), plot)
curdoc().add_root(layout)

In [15]:
show(plot)