# Interactive Visualizations

## Introduction

During exploratory analysis, dashboard and application creation, data scientists often find themselves moving from their analytical language softwares to using BI Tools (Tableau, MicroStrategy etc) or other traditional products (Excel, Powerpoint etc). The problem is a lack of documentation and examples along with limited capabilities including the absence of interactive or dynamic plots. Data visualization is critical to enable decision makers to grasp diffcult concepts, identify new patters and hence make data driven informed decisions. Today, the best visualizations on dashboards are ones which give the user to tailor it to their with the use of widgets, interactors and dynamic elements

This tutorial will help introduce data scientists to basic methods for creating interactive data visualizations and form a foundation to learn about further possibilities. Currently, in Python, packages such as matplotlib, plotly and seaborn offer several features but are limited in terms of their capablities. But new packages such as Bokeh, HoloViews have started to provide elegant construction of novel graphics.




## Tutorial Content

In this tutorial, we will teach you how to do some basic as well as some advanced interactive plots in Python, specifically using [Bokeh](https://bokeh.pydata.org) and [ipywidgets](https://ipywidgets.readthedocs.io).

We shall cover the following topics in this tutorial:
- [Installing Libraries](#Installing-Libraries)
- [Fundamentals](#Fundamentals)
- [Steps and Basic Plots](#Steps-and-Basic-Plots)
- [Categorical Plots](#Categorical-Plots)
- [Graph Plots](#Graph-Plots)
- [Interactive](#Interactive)
- [Widgets](#Widgets)
- [Example application: Unemployment](#Example-application:-Unemployment)
- [Summary and References](#Summary-and-References)


## Installing Libraries

To start with this tutorial, you must first make sure to see if you if you have already have or else install bokeh and ipywidgets libraries using `pip` or `conda` depending on your configuration: 
    
    $ conda install bokeh
    
    $ conda install ipywidgets
    
If you are not seeing the interactive widgets being displayed then please run the following commond along with restarting the jupyter notebook on your local machine:

    $ jupyter nbextension enable --py widgetsnbextension
    
Besides the above libraries which are used for visualization, we also use numpy,scipy, sklearn and networkx to use them for examples. You can install these libraries by using the `pip` or `conda` commands and then check if the following commands are working for you:

In [185]:
# import all necessary poackages
from bokeh.plotting import figure, output_file, show
from bokeh.io import push_notebook,output_notebook
from bokeh.models.graphs import from_networkx
from bokeh.models import Range1d, Plot
from bokeh.models.graphs import NodesAndLinkedEdges
from bokeh.models import Circle, HoverTool, MultiLine
from bokeh.io import output_file, show
from bokeh.layouts import widgetbox
from bokeh.models.widgets import Button, RadioButtonGroup, Select, Slider, RadioGroup, Dropdown, TextInput
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.transform import dodge
from bokeh.core.properties import value
from bokeh.palettes import Spectral11
import networkx as nx
import numpy as np
from ipywidgets import interact
import scipy.special as spl
from sklearn import datasets

In [186]:
# Run the cell to display all the plots inline (in the notebook instead of exporting them)
output_notebook()

## Fundamentals

It is important to understand the context of the following key terms that we will use in the tutorial:
 - Application: It is a running in browser Bokeh document.
 - Glyphs: These are the basic building blocks of a Bokeh plots. 
 - Widgets: User interface elements that can enable the user to play with the graph by adding computations. 

## Steps and Basic Plots

Let us take an example to learn the steps to overlay a plot by plotting multiple functions:

    1) Set values of x and y axes for the functions to be plotted
    2) Call the figure() to create a plot object by passing arguments which determin the labels, tools, title etc
    3) Overlay the plot object using multiple renderers and customize it according to your need (colors, sizes, legends etc)
    4) Call show() or save() by passing the plot object

It is important to understand how the tools argument works in figure(). There are 4 basic and commonly used options: `Pan` enables the user to move the graph by clicking and dragging it. `Box_zoom` option enables the user to cut a graph to magnify the selected box. `Reset` will re-fetch the original graph. `Save` will save the modified graph to the local machine.


In [187]:
# Initialize x values
x = [0, 1, 2, 3, 4, 5, 6, 7]

# Initialize different functions to overlay
y0 = [i*2 for i in x]
y1 = [i**2.5 for i in x]
y2 = [2**i for i in x]
y3 = [i**2 for i in x]
y4 = []
for i in x:
    if i%2 == 0:
        y4.append(10)
    else:
        y4.append(5)
        
# Create figure object and overlay with customizations (width, shape, color, etc.)
p = figure(tools="pan,box_zoom,reset,save", y_range=[0, 20], title="Overlay Plots Example", x_axis_label='X', y_axis_label='Y')
p.line(x, y0, legend="y=2x")
p.circle(x, y0, legend="y=2x", size=9)
p.line(x, x, legend="y=x")
p.circle(x, x, legend="y=x", fill_color="white", size=7)
p.line(x, y1, legend="y=x^2.5", line_width=3, line_color="red")
p.line(x, y2, legend="y=2^x", line_color="green")
p.circle(x, y2, legend="y=2^x", fill_color="green", line_color="green", size=7)
p.line(x, y3, legend="y=x^2", line_color="chocolate", line_dash="5 5", line_width = 3)
p.line(x, y4, legend="y=10||20", line_color = "violet", line_width = 2)

In [189]:
# Display the Bokeh plot
show(p)
# Play around with the tool options to see how it works

## Categorical Data Plots

Let us now take a look at how to plot when the data is in a categorical format. The most common forms of plots with these kind of data are bars, intervals, scatters and heatmaps. In Bar charts, there are hbar() and vbar() glyph methods to decide the orientation of the bar. Another type is to have nested categories where a particular category has multiple levels in it, Bokeh allows us to group these levels. Another desirable technnique is to stack bars on top of eachother to see how varying composition of a bar.

We shall take a basic example to learn how to plot categorical data by plotting multiple categorical graphs of our own data:

In [190]:
# Intialize data to be displayed
social_media = ["Facebook", "LinkedIn","Snapchat", "Instagram", "Reddit"]
years = ["2014", "2015", "2016", "2017", "2018"]
data = {'social_media': social_media, 
        '2014' : [60,30,10,20,20],
        '2015' : [60,30,40,50,20], 
        '2016' : [70,40,50,60,60], 
        '2017' : [75,40,70,70,70], 
        '2018' : [40,60,80,80,90]}
source = ColumnDataSource(data=data)

# Create figure object by specifizing dimensions, title, tools etc
p = figure(x_range=social_media, y_range=(0, 120), tools="pan,box_zoom,reset,save", plot_height=300, 
           title="Social Media Analysis: Avg minutes spent by a user/day over years")

# Add vertical bars for 2014 data
p.vbar(x=dodge('social_media', -0.30, range=p.x_range), top='2014', width=0.125, source=source,
       color="red", legend=value("2014"))

# Add vertical bars for 2015 data
p.vbar(x=dodge('social_media', -0.15, range=p.x_range), top='2015', width=0.125, source=source,
       color="orange", legend=value("2015"))

# Add vertical bars for 2016 data
p.vbar(x=dodge('social_media',  0.0,  range=p.x_range), top='2016', width=0.125, source=source,
       color="yellow", legend=value("2016"))

# Add vertical bars for 2017 data
p.vbar(x=dodge('social_media',  0.15, range=p.x_range), top='2017', width=0.125, source=source,
       color="green", legend=value("2017"))

# Add vertical bars for 2018 data
p.vbar(x=dodge('social_media',  0.30, range=p.x_range), top='2018', width=0.125, source=source,
       color="blue", legend=value("2018"))

# Adjust the padding and placement of legend
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"

# Display the plot
show(p)

Another way to visualize the above plot is to convert it into a stacked bar chart. Here the bars are stacked on top of each other in accordance to the levels. Stacked bar charts help us see the plot in a more compressed format along with the composition of different parts. Let us go ahead and plot it!

In [191]:
# Create list of colors denoting varying components of the bar
shades = ['red', 'orange', 'yellow', 'green', 'blue']

# Create figure object by specifizing dimensions, title, tools etc
p1 = figure(x_range=social_media, plot_height=450, title="Social Media Analysis: Avg minutes spent by a user/day over years", toolbar_location=None, tools="")

# Create a vertical stack of bar with the data pertaining to a level along with customizations
p1.vbar_stack(years, x='social_media', width=0.75, color=shades, source=source, legend=[value(x) for x in years])
p1.y_range.start = 0
p1.x_range.range_padding = 0.1
p1.xgrid.grid_line_color = None
p1.axis.minor_tick_line_color = None
p1.outline_line_color = None
p1.legend.location = "top_right"
p1.legend.orientation = "horizontal"

# Display the plot
show(p1)

## Graph Plots

Having learnt how to plot basic plots and how to play around with the tool options, let us now learn to plot some advanced plots - graphs. Graph Plots are a great way to understand relationships between different objects and are heavily used in analyzing social networks. 

Let us take one of the preloaded famous network graphs - Davis Southern Club Women preloaded in the networkx library which is an example of unipartite projections of a graph. We shall use a concentric circle based layout to make it look aesthetic and elegant. It will be easy to grasp and understand different items are connected using a plot as compared to a table spreadsheet which would have been impossible. Run the below cell to see the power of a graph plot with a neat layout

In [192]:
# Load the Davis Southern Club data
G = nx.davis_southern_women_graph()

# Create a basic plot object with the x,y ranges
plot = Plot(x_range=Range1d(-1.2,1.2), y_range=Range1d(-1.2,1.2))

# Decide on a layout from the the networkx library and plot with customizations such as center, scale etc.
graph = from_networkx(G, nx.shell_layout, scale=1, center=(0,0))
plot.renderers.append(graph)
show(plot)

## Interactive

The problem with the above graph plot is that it is tough to see the connections between items as we keep adding more and more items and edges. It starts becoming a mess from the orignal neat construction. Moreover, if the layout is different then it is impossible to identify the relationships. 

Bokeh enables the user to select, highlight and hover over an interested item. Bokeh allows users to every type of plot interactive by adding renderers and interactive elements. The graph plot can be made interactive by adding node and edge renderers along with hovering capabilities. Let us recreate the Davis Southern Club Women by giving it the mentioned features:

In [193]:
# Load the Davis Southern Club data
G = nx.davis_southern_women_graph()

# Create a basic plot object with the x,y ranges, layout and other customizations
plot = Plot(x_range=Range1d(-1.2,1.2), y_range=Range1d(-1.2,1.2))
graph = from_networkx(G, nx.shell_layout, scale=1, center=(0,0))

# To make it interactive, we add renderers.glyph
# Add customizations of how renderers are displayed along with how they will act when provided with hovering features
plot.renderers.append(graph)
graph.node_renderer.glyph = Circle(size=15, fill_color='red')
graph.edge_renderer.glyph = MultiLine(line_color="lightblue", line_alpha=0.7, line_width=2)
graph.node_renderer.hover_glyph = Circle(size=15, fill_color='red')
graph.edge_renderer.hover_glyph = MultiLine(line_color='blue', line_width=3)
graph.inspection_policy = NodesAndLinkedEdges()
plot.add_tools(HoverTool())
show(plot)

## Widgets

Widgets are HTML objects such as filters, dropdowns, radio buttons, sliders which help with selections. Bokeh allows users to create graphs dynamic and interative by adding single or multiple widgets. These are put together in a widget box to ensure they work together using the widgetbox().

The layout functions shall enable you to arrange the plots in an orderly mannner along with the placement of the widgetbox. This can be done by using the row() and column() calls to decide if we want it in row order or column order arrangement. Another important guideline to remember is to make sure that all the objects of the row or column are of the same size.

Let us take an example of a widgetbox which contains a dropdown to select a function along with 3 sliders which help modify the selected function in terms of amplitude, width and phase. Run the below cell and play around with the widget box to see how the graph changes:


In [194]:
# Load x,y range values and function to be plotted
x = np.linspace(-10, 10, 100)
y = 5*np.sinc(x)

# Create figure object by specifizing dimensions, title, tools etc
p = figure(title="Example of Plots using Widgets", plot_height=300, plot_width=500, y_range=(-20,20))
r = p.line(x, y, color="blue", line_width=4)

In [195]:
# Write update function to display selected function along with desired parameters
def update(f, w=1, A=5, phi=0):
    if f == "sinc": 
        func = np.sinc
        r.data_source.data['y'] = A * func(w * x + phi)
    elif f == "bessel":  
        r.data_source.data['y'] = 5 * A * spl.jv(2,w*x + phi) 
    elif f == "sin": 
        func = np.sin
        r.data_source.data['y'] = A * func(w * x + phi)
    elif f == "cos": 
        func = np.tan
        r.data_source.data['y'] = A * func(w * x + phi)
    elif f == "tan": 
        func = np.tan
        r.data_source.data['y'] = A * func(w * x + phi)
    elif f == "sinh": 
        func = np.sinh
        r.data_source.data['y'] = A * func(w * x + phi)
    elif f == "cosh": 
        func = np.cosh
        r.data_source.data['y'] = A/10 * func(w * x + phi)
    elif f == "tanh": 
        func = np.tanh
        r.data_source.data['y'] = A * func(w * x + phi)
    push_notebook()

In [196]:
# Display the plot
show(p, notebook_handle=True)

In [197]:
# Play around with the widgets by selecting different functions and tuning parameters! 
interact(update, f=["sinc", "bessel", "sin", "cos", "tan", "sinh", "cosh", "tanh"], w=(1,20), A=(5,10), phi=(-20, 20, 0.5))

<function __main__.update>

## Example application: Unemployment

Let us apply few of the above learnt techniques that you may find useful as a data scientist. Especially, during the initial stages of a case when one wants to get understand the dataset better we start of with exploratory analysis.

We take a preloaded dataset 'unemployment1948' which contains unemployment rates prevelant from 1948 to present across months of the year. Lets us first understand the data and then create a basic interactive graph to understand it further.

In [198]:
# Load the unemployment data from the bokeh library
from bokeh.sampledata.unemployment1948 import data

In [199]:
# Let us see how the data looks by printing out the first 5 rows
print(data.head())

   Year  Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
0  1948  4.0  4.7  4.5  4.0  3.4  3.9  3.9  3.6  3.4  2.9  3.3  3.6
1  1949  5.0  5.8  5.6  5.4  5.7  6.4  7.0  6.3  5.9  6.1  5.7  6.0
2  1950  7.6  7.9  7.1  6.0  5.3  5.6  5.3  4.1  4.0  3.3  3.8  3.9
3  1951  4.4  4.2  3.8  3.2  2.9  3.4  3.3  2.9  3.0  2.8  3.2  2.9
4  1952  3.7  3.8  3.3  3.0  2.9  3.2  3.3  3.1  2.7  2.4  2.5  2.5


In [200]:
# Converting Year into string format, dropping annual column and resetting the index to 'year' 
data['Year'] = data['Year'].astype(str)
# data.drop('Annual', axis = 1, inplace = True)
data = data.set_index('Year')
data.head()

Unnamed: 0_level_0,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1948,4.0,4.7,4.5,4.0,3.4,3.9,3.9,3.6,3.4,2.9,3.3,3.6
1949,5.0,5.8,5.6,5.4,5.7,6.4,7.0,6.3,5.9,6.1,5.7,6.0
1950,7.6,7.9,7.1,6.0,5.3,5.6,5.3,4.1,4.0,3.3,3.8,3.9
1951,4.4,4.2,3.8,3.2,2.9,3.4,3.3,2.9,3.0,2.8,3.2,2.9
1952,3.7,3.8,3.3,3.0,2.9,3.2,3.3,3.1,2.7,2.4,2.5,2.5


Neat!

In [201]:
# Let us see the last few rows of the dataset as well
data.tail()

Unnamed: 0_level_0,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2012,8.8,8.7,8.4,7.7,7.9,8.4,8.6,8.2,7.6,7.5,7.4,7.6
2013,8.5,8.1,7.6,7.1,7.3,7.8,7.7,7.3,7.0,7.0,6.6,6.5
2014,7.0,7.0,6.8,5.9,6.1,6.3,6.5,6.3,5.7,5.5,5.5,5.4
2015,6.1,5.8,5.6,5.1,5.3,5.5,5.6,5.2,4.9,4.8,4.8,4.8
2016,5.3,5.2,5.1,4.7,4.5,5.1,5.1,5.0,4.8,4.7,4.4,4.5


Let us create a graph that will plot unemployment line plots for each year. Let us overlay them to compare each of them against each other by using the techniques we learnt earlier.

In [202]:
# Load the x values (months of the year)
y = data.iloc[0]
x = [x for x in data.columns]
# Create figure object by specifizing dimensions, title, tools etc
p1 = figure(tools="pan,box_zoom,reset,save", y_range=[2, 12], plot_height=300, plot_width=500, x_range = x)
# Loop to add line plot of each year to the plot object
for index, row in data.iterrows():
    p1.line(x, row)
# Display the graph
show(p1)

The above plot looks pretty messy because of the large number of rows (years from 1948 through 2016) it contains. What would be immensely helpful if we could have a slider to pass through the years and see of the unemployment rate has changed over time. Let us dive in!

In [203]:
# Create figure object by specifizing dimensions, title, tools etc
p = figure(tools="pan,box_zoom,reset,save", y_range=[2, 13], plot_height=300, plot_width=500, x_range = x)
r = p.line(x, y)

In [204]:
def update(year = 1948):
    r.data_source.data['y'] = data.iloc[year - 1948]
    push_notebook()

In [205]:
show(p, notebook_handle=True)

In [206]:
interact(update, year = (1948,2016))

<function __main__.update>

Sometimes, static charts can also be extremely powerful to visualize data. To understand varying behaviour, we can use different shades of color to help with the same. Heatmap is a type of graphical representation of data that uses a system of varying color-coding to represent different values.  

We can use heatmaps to visualize the above unemployment data. A snapshot of creating an advanced heatmap plot using Bokeh has been shown below! By using the features of Bokeh we can even make it better by adding widgets and hovering capabilities to make it interactive! This is just an example of how powerful interactive visualization can be!
Source: https://bokeh.pydata.org/en/latest/docs/gallery/unemployment.html

![alt text](unemployment.png "Title")

## Summary and References

The above tutorial is just a basic introductory guide for interactive visualizations using Bokeh in Python. The possibilities of displaying advanced, creative applications are endless. You can find more details about different feature capabilities in Bokeh and questions on visualizations are available from the following links:

1. Bokeh: https://bokeh.pydata.org/en/latest/
2. Matplotlib: https://matplotlib.org/
3. Pygal: http://www.pygal.org/en/latest/index.html
4. Seaborn: http://seaborn.pydata.org/index.html
5. Plotly: https://plot.ly/python/