# SI 370 - Visualization II

## Interactive Visualization using Bokeh

* round out our understanding of visualization in Python
 * matplotlib
 * seaborn
 * bokeh

## Bokeh

* part of [pydata stack](https://pydata.org/)
* consists of two cooperating libraries: BokehJS (javascript) and bokeh (python)
* you program in python, bokeh communicates via JSON with BokehJS running in browser


## Bokeh
* https://bokeh.pydata.org/en/latest/
* Gallery: https://bokeh.pydata.org/en/latest/docs/gallery.html (note: we will not be covering Server Apps in this course)

In [1]:
# based on https://realpython.com/python-data-visualization-bokeh/
from bokeh.plotting import figure
import numpy as np
import pandas as pd
from bokeh.io import show, output_notebook

## 1. Opening a template with Bokeh

In [2]:
# open empty template
fig = figure()
show(fig)



In [3]:
# previous slide didn't show anything: need this function to run on jupyter
output_notebook()

In [4]:
# now open it again
fig = figure()
show(fig)



## 2. Creating a scatter plot with Bokeh

In [5]:
# create a new plot with default tools, using figure
p = figure(plot_width=400, plot_height=400)

# add a circle renderer with x and y coordinates, size, color, and alpha
p.circle([1, 2, 3, 4, 5], [6, 7, 2, 4, 5], size=15, line_color="navy", fill_color="green", fill_alpha=0.5)

show(p) # show the results

In [6]:
# create a new plot using different shapes
p = figure(plot_width=400, plot_height=400)

# add a square renderer with a size, color, alpha, and sizes
p.circle_x([1, 2, 3, 4, 5], [6, 7, 2, 4, 5], size=[10, 15, 20, 25, 30], color="firebrick", alpha=0.6)

show(p) # show the results

There are many different marker types available in Bokeh

- asterisk()

- circle()

- circle_cross()

- circle_x()

- cross()

- diamond()

- diamond_cross()

- hex()

- inverted_triangle()

- square()

- square_cross()

- square_x()

- triangle()

- x()

### <font color="magenta">Exercise 1: Re-create the above plot using a marker type of your choice.</font>


In [7]:
# create a new plot using different shapes
p = figure(plot_width=400, plot_height=400)

# add a square renderer with a size, color, alpha, and sizes
p.hex([1, 2, 3, 4, 5], [6, 7, 2, 4, 5], size=[10, 15, 20, 25, 30], color="purple", alpha=0.6)

show(p) # show the results

## 3. Creating a line plot with Bokeh

In [8]:
# create a new plot (with a title) using figure
p = figure(plot_width=400, plot_height=400, title="My Line Plot")

# add a line renderer
p.line([1, 2, 3, 4, 5], [6, 7, 2, 4, 5], line_width=2)

show(p) # show the results

In [9]:
# combine both line and circle plots

# set up some data
x = [1, 2, 3, 4, 5]
y = [6, 7, 8, 7, 3]

# create a new plot with figure
p = figure(plot_width=400, plot_height=400)

# add both a line and circles on the same plot
p.line(x, y, line_width=2)
p.circle(x, y, fill_color="white", size=8)

show(p) # show the results

## 4. Using the Hover function in Bokeh

In [10]:
# start by loading the iris dataset
import seaborn as sns
df = sns.load_dataset('iris')
print(df['species'].unique())
df.head()

['setosa' 'versicolor' 'virginica']


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [11]:
p = figure(plot_width=400, plot_height=400)
p.circle('petal_length', 'petal_width', source=df)
show(p)

In [12]:
# divide into different colors
fig = figure(plot_width=400, plot_height=400)
fig.circle('petal_length', 'petal_width', source=df[df.species=='setosa'],color='red')
fig.circle('petal_length', 'petal_width', source=df[df.species=='versicolor'],color='blue')
fig.circle('petal_length', 'petal_width', source=df[df.species=='virginica'],color='green')
show(fig)

In [13]:
# Specify the selection tools to be made available
select_tools = ['box_select', 'lasso_select', 'poly_select', 'tap', 'reset']

# Create the figure
fig = figure(plot_height=400,
             plot_width=600,
             x_axis_label='Petal length',
             y_axis_label='Petal width',
             title='Petal length vs width',
             toolbar_location='below',
             tools=select_tools)

In [14]:
# plot two types of species
fig.circle('petal_length', 'petal_width', source=df[df.species=='setosa'],color='red',size=8,legend="setosa")
fig.circle('petal_length', 'petal_width', source=df[df.species=='versicolor'],color='blue',size=8,legend="versicolor")
fig.circle('petal_length', 'petal_width', source=df[df.species=='virginica'],color='green',size=8,legend='virginica')
show(fig)

In [15]:
from bokeh.models.tools import HoverTool

In [16]:
# Format the tooltip so that when hovered, it gives information of the following values
tooltips = [
            ('Species','@species'),
            ('Petal length', '@petal_length'),
            ('Petal width', '@petal_width')
           ]

# Add the HoverTool to the figure
fig.add_tools(HoverTool(tooltips=tooltips))

# Visualize
show(fig)

## 5. Adding labels to data points

In [17]:
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, LabelSet

source = ColumnDataSource(data=dict(
    temp=[166, 171, 172, 168, 174, 162],
    pressure=[165, 189, 220, 141, 260, 174],
    names=['A', 'B', 'C', 'D', 'E', 'F']))

p = figure(x_range=(160, 175))
p.scatter(x='temp', y='pressure', size=8, source=source)
p.xaxis.axis_label = 'Temperature (C)'
p.yaxis.axis_label = 'Pressure (lbs)'

labels = LabelSet(x='temp', y='pressure', text='names', level='glyph',
                  x_offset=5, y_offset=5, source=source, render_mode='canvas')


p.add_layout(labels)

show(p)

Read the wine quality CSV file from last time:

In [18]:
df_wine = pd.read_csv('data/winequality-red.csv')

FileNotFoundError: [Errno 2] File b'data/winequality-red.csv' does not exist: b'data/winequality-red.csv'

Take a subsample of 10 wines:

In [None]:
df_wine = df_wine.sample(10)

In [None]:
print(df_wine.shape)
df_wine.head(1)

Set up a "color" column based on the numerical "quality" column:

In [None]:
from bokeh.palettes import RdYlBu10

f = lambda x: RdYlBu10[x]
df_wine['color'] = df_wine['quality'].map(f)
df_wine.head(1)

### <font color="magenta">Exercise 2: Using the subsampled wine dataframe, create a plot of pH on the y-axis vs. fixed acidity on the x-axis.  Label each point with the numerical "quality" of the wine.  Label your axes.

In [None]:
### how i did this initially
# source = ColumnDataSource(data=dict(
#     acid = df_wine['fixed acidity'],
#     ph = df_wine['pH'],
#     qual = df_wine['quality']))

# p = figure(x_range=(6, 12))
# p.scatter(x='acid', y='ph', size=8, source=source)
# p.xaxis.axis_label = 'Fixed Acidity'
# p.yaxis.axis_label = 'pH'

# labels = LabelSet(x='acid', y='ph', text='qual', level='glyph',
#                   x_offset=5, y_offset=5, source=source, render_mode='canvas')


# p.add_layout(labels)

# show(p)


### another way to do this, a cleaner way
source = ColumnDataSource(data=dict(df_wine))

p = figure(title='pH cs. Fixed Acidity of Wine')
p.scatter(x='fixed acidity', y='pH', size=8, source=source, color='color')
p.xaxis.axis_label = 'Fixed Acidity'
p.yaxis.axis_label = 'pH'

labels = LabelSet(x='fixed acidity', y='pH', text='quality', level='glyph',
                  x_offset=5, y_offset=5, source=source, render_mode='canvas')


p.add_layout(labels)

show(p)

## 7. Multiple plots in a row

In [None]:
from bokeh.layouts import row

x = list(range(11))
y0, y1, y2 = x, [10-i for i in x], [abs(i-5) for i in x]

# create a new plot
s1 = figure(width=250, plot_height=250)
s1.circle(x, y0, size=10, color="navy", alpha=0.5)

# create another one
s2 = figure(width=250, height=250)
s2.triangle(x, y1, size=10, color="firebrick", alpha=0.5)

# create and another
s3 = figure(width=250, height=250)
s3.square(x, y2, size=10, color="olive", alpha=0.5)

# show the results in a row
show(row(s1, s2, s3))


In [None]:
from bokeh.layouts import gridplot

# create a new plot
s1 = figure(width=250, plot_height=250)
s1.circle(x, y0, size=10, color="navy", alpha=0.5)

# create another one
s2 = figure(width=250, height=250)
s2.triangle(x, y1, size=10, color="firebrick", alpha=0.5)

# create and another
s3 = figure(width=250, height=250)
s3.square(x, y2, size=10, color="olive", alpha=0.5)

# put all the plots in a gridplot
p = gridplot([[s1, s2], [s3, None]], toolbar_location=None)

# show the results
show(p)

### <font color="magenta">Exercise 3: Create a 1x2 grid using "fixed acidity" as the horizontal axis and "pH" and "chlorides" for the y-axes.  In other words, one scatterplot for "pH vs. fixed acidity" and one for "chlorides vs. fixed acidity".

In [None]:
s = figure(title='pH cs. Fixed Acidity of Wine', height=300, width=300)
s.scatter(x='fixed acidity', y='pH', size=8, source=source, color='color')
s.xaxis.axis_label = 'Fixed Acidity'
s.yaxis.axis_label = 'pH'

s2 = figure(title='Chloride cs. Fixed Acidity of Wine', height=300, width=300)
s2.scatter(x='chlorides', y='pH', size=8, source=source, color='color')
s2.xaxis.axis_label = 'Chloride'
s2.yaxis.axis_label = 'pH'


# put all the plots in a gridplot
p = gridplot([[s, s2]], toolbar_location=None)

# show the results
show(p)

## 8. Linked plots

In [None]:
from bokeh.layouts import gridplot

x = list(range(10))
y0, y1, y2 = x, [10-i for i in x], [abs(i-5) for i in x]

plot_options = dict(width=250, plot_height=250, tools='pan,wheel_zoom')

# create a new plot
s1 = figure(**plot_options)
s1.circle(x, y0, size=10, color="navy")

# create a new plot and share both ranges
s2 = figure(x_range=s1.x_range, y_range=s1.y_range, **plot_options)
s2.triangle(x, y1, size=10, color="firebrick")

# create a new plot and share only one range
s3 = figure(x_range=s1.x_range, **plot_options)
s3.square(x, y2, size=10, color="olive")

p = gridplot([[s1, s2, s3]])

# show the results
show(p)

In [None]:
from bokeh.models import ColumnDataSource

x = list(range(-20, 21))
y0, y1 = [abs(xx) for xx in x], [xx**2 for xx in x]

# create a column data source for the plots to share
source = ColumnDataSource(data=dict(x=x, y0=y0, y1=y1))

TOOLS = "box_select,lasso_select,help"

# create a new plot and add a renderer
left = figure(tools=TOOLS, width=300, height=300)
left.circle('x', 'y0', source=source)

# create another new plot and add a renderer
right = figure(tools=TOOLS, width=300, height=300)
right.circle('x', 'y1', source=source)

p = gridplot([[left, right]])

show(p)

### <font color="magenta">Exercise 4: Link the two scatterplots from the previous exercise

In [None]:
s = figure(title='pH cs. Fixed Acidity of Wine', height=300, width=300)
s.scatter(x='fixed acidity', y='pH', size=8, source=source, color='color')
s.xaxis.axis_label = 'Fixed Acidity'
s.yaxis.axis_label = 'pH'

s2 = figure(title='Chloride cs. Fixed Acidity of Wine', height=300, width=300, y_range=s.y_range)
s2.scatter(x='chlorides', y='pH', size=8, source=source, color='color')
s2.xaxis.axis_label = 'Chloride'
s2.yaxis.axis_label = 'pH'


# put all the plots in a gridplot
p = gridplot([[s, s2]], toolbar_location=None)

# show the results
show(p)

## 9. Custom widget (slider plots)
* this involves JavaScript: if you're not into JS then just be aware that this functionality exists

In [None]:
from bokeh.layouts import column
from bokeh.models import CustomJS, ColumnDataSource, Slider

x = [x*0.005 for x in range(0, 201)]

source = ColumnDataSource(data=dict(x=x, y=x))

plot = figure(plot_width=400, plot_height=400)
plot.line('x', 'y', source=source, line_width=3, line_alpha=0.6)

slider = Slider(start=0.1, end=6, value=1, step=.1, title="power")

update_curve = CustomJS(args=dict(source=source, slider=slider), code="""
    var data = source.data;
    var f = slider.value;
    x = data['x']
    y = data['y']
    for (i = 0; i < x.length; i++) {
        y[i] = Math.pow(x[i], f)
    }
    
    // necessary becasue we mutated source.data in-place
    source.change.emit();
""")
slider.js_on_change('value', update_curve)


show(column(slider, plot))

## 10. Bar plots in Bokeh

In [None]:
from bokeh.models import ColumnDataSource
from bokeh.palettes import Spectral6

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]
fruits_df = pd.DataFrame(dict(fruits=fruits, counts=counts, color=Spectral6))

source = ColumnDataSource(fruits_df)

p = figure(x_range=fruits_df.fruits, plot_height=250, y_range=(0, 9), title="Fruit Counts")
p.vbar(x='fruits', top='counts', width=0.9, color='color', legend="fruits", source=source)

p.xgrid.grid_line_color = None
p.legend.orientation = "horizontal"
p.legend.location = "top_center"

show(p)

## Note the following examples do not use pandas dataframes:

In [None]:
# grouped bar charts

from bokeh.models import FactorRange

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ['2015', '2016', '2017']

data = {'fruits' : fruits,
        '2015'   : [2, 1, 4, 3, 2, 4],
        '2016'   : [5, 3, 3, 2, 4, 6],
        '2017'   : [3, 2, 4, 4, 5, 3]}

# this creates [ ("Apples", "2015"), ("Apples", "2016"), ("Apples", "2017"), ("Pears", "2015), ... ]
x = [ (fruit, year) for fruit in fruits for year in years ]
counts = sum(zip(data['2015'], data['2016'], data['2017']), ()) # like an hstack

source = ColumnDataSource(data=dict(x=x, counts=counts))

p = figure(x_range=FactorRange(*x), plot_height=250, title="Fruit Counts by Year")

p.vbar(x='x', top='counts', width=0.9, source=source)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None

show(p)

In [None]:
from bokeh.transform import factor_cmap

p = figure(x_range=FactorRange(*x), plot_height=250, title="Fruit Counts by Year")

p.vbar(x='x', top='counts', width=0.9, source=source, line_color="white",

       # use the palette to colormap based on the the x[1:2] values
       fill_color=factor_cmap('x', palette=['firebrick', 'olive', 'navy'], factors=years, start=1, end=2))

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None

show(p)

Let's take a look at a more advanced application of bokeh:

https://rebeccabilbro.github.io/interactive-viz-bokeh/

