DSC160 Data Science and the Arts - Twomey - Spring 2020 - [dsc160.roberttwomey.com](http://dsc160.roberttwomey.com)

## Bokeh for Interactive Plots

This notebook shows you how to make interactive plots in jupyter using Bokeh. 

What is Bokeh?

> Bokeh is an interactive visualization library for modern web browsers. 
> It provides elegant, concise construction of versatile graphics, and 
> affords high-performance interactivity over large or streaming datasets.
> Bokeh can help anyone who would like to quickly and easily make 
interactive plots, dashboards, and data applications.

This notebook walks you through setup/installation of bokeh on your jupyterhub instance, gives a few plotting examples including:
- creating line graphs
- setting the theme (dark mode)
- creating scatter plots, histogram
- using hover/tooltips to examine individual values within the graph. 

Bokeh could be an ideal package to make an interactive visualization, like Jason Bailey's plot of complexity score vs. time in Mondrian's work. It is very well documented: https://docs.bokeh.org/en/latest/index.html

## Setup

Uncomment and run this once to upgrade the previously installed version of bokeh (or to create a fresh install):

In [None]:
#!pip install --user bokeh --upgrade

What version are we running?

In [None]:
import bokeh
bokeh.__version__ # Should be > 2.0

Take care of imports:

In [None]:
from bokeh.io import output_notebook, curdoc, reset_output
from bokeh.plotting import figure, output_file, show, ColumnDataSource

import numpy as np
import scipy.special

and do some initial setup

In [None]:
# used to standardize plot size below
GLOBAL_WIDTH = 800
GLOBAL_HEIGHT = 600

# display notebooks inline
output_notebook() 

## Line Plot

Let's create a simple line plot. Note, you can customize the x and y axis labels, title the plot, add a legend label or make other changes.

In [None]:
# prepare some data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

# output to static HTML file
# output_file("lines.html") # also rendering to file

# display inline
output_notebook()

# create a new plot with a title and axis labels
p = figure(title="simple line example", x_axis_label='x', y_axis_label='y')

# add a line renderer with legend and line thickness
p.line(x, y, legend_label="Temp.", line_width=2)

# show the results
show(p)

Uncomment the `output_html()` above to save the plot as an html file, and open it in your browser. Does it look as expected?

Now reset the output format to display inline:

In [None]:
reset_output()

In [None]:
output_notebook()

## Line Plot with Dark Mode

If you are using jupyter in jupyterlab mode (dark mode), you may want to produce black graphs on a black background using `curdoc().theme`.

In [None]:
curdoc().theme = 'dark_minimal'

# prepare some data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

# output to static HTML file
# output_file("lines.html") # also rendering to file

# create a new plot with a title and axis labels
p = figure(title="simple line example", x_axis_label='x', y_axis_label='y')

# add a line renderer with legend and line thickness
p.line(x, y, legend_label="Temp.", line_width=2)

# show the results
show(p)

Reset the theme to the default. I'll show you dark mode another day.

In [None]:
curdoc().theme = 'caliber'

## Scatter plot

Bokeh can produce scatter plots. Imagine recreating your plots from earlier exercises (mean features for x and y), but being able to interact with them. 

The example below generates N random data points (x and y), and then plots them in red. Notice the toolbar on the right hand side of the plot: it includes tools to pan the plot, marquis (box) zoom, scroll zoom, save still to file, and reset plot settings. You can also add other tools or create your own.

In [None]:
# generate data
N = 120
x = np.random.random(size=N)
y = np.random.random(size=N)

# plotting
p = figure(plot_width=GLOBAL_WIDTH, plot_height=GLOBAL_HEIGHT)

# add a circle renderer with a size, color, and alpha
p.circle(x, y, size=5, color="red", alpha=0.9)

# show the results
show(p)

We can also color points individually by value:

In [None]:
# generate data (including colors)
N = 120
x = np.random.random(size=N)
y = np.random.random(size=N)

colors = [
    "#%02x%02x%02x" % (int(r), int(g), 150) for r, g in zip(256*x, 256*y)
]

# plot
p = figure(plot_width=GLOBAL_WIDTH, plot_height=GLOBAL_HEIGHT)
p.circle(x, y, size=5, color=colors, alpha=0.9)

# show the results
show(p)

## Tooltips for More Information

Bokeh can also use tooltips (hover) to provide information about specific data points that you are viewing. See their extended discussion of this in the bokeh docs: [custom-tooltips](https://docs.bokeh.org/en/latest/docs/user_guide/tools.html#custom-tooltip)

This example loads a list of data, including labels (`desc`), thumbnail images (`imgs`), and particular fonts in html. This can be quite useful to preview image files corresponding to specific data points, similar to what Jason Bailey/Artnome do with their interacxtive plot. 

The `TOOLTIPS` html code changes what information is displayed in the tooltip. Note the `$index`, `$x`, `$y` variables wich read from the ColumnDataSource.

In [None]:
source = ColumnDataSource(data=dict(
    x=[1, 2, 3, 4, 5],
    y=[2, 5, 8, 2, 7],
    desc=['A', 'b', 'C', 'd', 'E'],
    imgs=[
        'https://docs.bokeh.org/static/snake.jpg',
        'https://docs.bokeh.org/static/snake2.png',
        'https://docs.bokeh.org/static/snake3D.png',
        'https://docs.bokeh.org/static/snake4_TheRevenge.png',
        'https://docs.bokeh.org/static/snakebite.jpg'
    ],
    fonts=[
        '<i>italics</i>',
        '<pre>pre</pre>',
        '<b>bold</b>',
        '<small>small</small>',
        '<del>del</del>'
    ]
))

TOOLTIPS = """
    <div>
        <div>
            <img
                src="@imgs" height="42" alt="@imgs" width="42"
                style="float: left; margin: 0px 15px 15px 0px;"
                border="2"
            ></img>
        </div>
        <div>
            <span style="font-size: 17px; font-weight: bold;">@desc</span>
            <span style="font-size: 15px; color: #966;">[$index]</span>
        </div>
        <div>
            <span>@fonts{safe}</span>
        </div>
        <div>
            <span style="font-size: 15px;">Location</span>
            <span style="font-size: 10px; color: #696;">($x, $y)</span>
        </div>
    </div>
"""

p = figure(plot_width=GLOBAL_WIDTH, plot_height=GLOBAL_HEIGHT, tooltips=TOOLTIPS,
           title="Mouse over the dots")

p.circle('x', 'y', size=10, source=source)

show(p)

## Histogram

This shows a complex histogram with PDF and CDF in bokeh. (See https://docs.bokeh.org/en/latest/docs/gallery/histogram.html)

In [None]:
def make_plot(title, hist, edges, x, pdf, cdf):
    p = figure(title=title, tools='', width=GLOBAL_WIDTH, height=GLOBAL_HEIGHT)
    p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
           fill_color="navy", alpha=0.5)
    p.line(x, pdf, line_color="#ff8888", line_width=4, alpha=0.7, legend_label="PDF")
    p.line(x, cdf, line_color="orange", line_width=2, alpha=0.7, legend_label="CDF")

    p.y_range.start = 0
    p.legend.location = "center_right"
    p.legend.background_fill_color = "#fefefe"
    p.xaxis.axis_label = 'x'
    p.yaxis.axis_label = 'Pr(x)'
    p.grid.grid_line_color="white"
    return p

# Normal Distribution

mu, sigma = 0, 0.5

measured = np.random.normal(mu, sigma, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)

x = np.linspace(-2, 2, 1000)
pdf = 1/(sigma * np.sqrt(2*np.pi)) * np.exp(-(x-mu)**2 / (2*sigma**2)) # probability density function
cdf = (1+scipy.special.erf((x-mu)/np.sqrt(2*sigma**2)))/2 # cumulative density function

p = make_plot("Normal Distribution (μ=0, σ=0.5)", hist, edges, x, pdf, cdf)

# to write to file
#output_file('histogram.html')

show(p)

Check out the Gallery of Bokeh plots on their website: https://docs.bokeh.org/en/latest/docs/gallery.html

## Extensions
Explore bokeh:
- Look at those examples [on the cover page](https://docs.bokeh.org/en/latest/index.html)
- Find a good example "in the wild", e.g. from the user community.

Use bokeh to work with your images from exercise 1:
- Use bokeh to create a histogram plus tooltip images to see which kinds of images (via inspection) are the lowest resolution in your scraped collection
- Create scatter plots from pairs of metrics you calculated last week (mean brightness, mean hue, for instance)
- Create scatter plots from n > 2 metrics using PCA or some dimensional reduction technique like t-SNE or UMAP.