## Bokeh for Visual EDA

Using bokeh as an interactive plotting language for Exploratory Data Analysis.

What is Bokeh?

> Bokeh is an interactive visualization library for modern web browsers. 
> It provides elegant, concise construction of versatile graphics, and 
> affords high-performance interactivity over large or streaming datasets.
> Bokeh can help anyone who would like to quickly and easily make 
interactive plots, dashboards, and data applications.

Bokeh is very well documented: https://docs.bokeh.org/en/latest/index.html



### run once: upgrade the previously installed version

In [None]:
!pip install --user bokeh --upgrade

In [None]:
import bokeh
bokeh.__version__ # because we're curious

In [None]:
from bokeh.io import output_notebook, curdoc

output_notebook() # displays inline

GLOBAL_WIDTH = 800
GLOBAL_HEIGHT = 600

In [None]:
from bokeh.plotting import figure, output_file, show

# prepare some data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

# output to static HTML file
output_file("lines.html") # also rendering to file

# create a new plot with a title and axis labels
p = figure(title="simple line example", x_axis_label='x', y_axis_label='y')

# add a line renderer with legend and line thickness
p.line(x, y, legend_label="Temp.", line_width=2)

# show the results
show(p)

### dark mode

In [None]:
curdoc().theme = 'dark_minimal'


# prepare some data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

# output to static HTML file
output_file("lines.html") # also rendering to file

# create a new plot with a title and axis labels
p = figure(title="simple line example", x_axis_label='x', y_axis_label='y')

# add a line renderer with legend and line thickness
p.line(x, y, legend_label="Temp.", line_width=2)

# show the results
show(p)


### Scatter plot

In [None]:
from bokeh.plotting import figure, output_file, show
import numpy as np

output_file("scatter.htm")

N = 1200
x = np.random.random(size=N)
y = np.random.random(size=N)

p = figure(plot_width=GLOBAL_WIDTH, plot_height=GLOBAL_HEIGHT)

# add a circle renderer with a size, color, and alpha
p.circle(x, y, size=5, color="red", alpha=0.9)

# show the results
show(p)

In [None]:
from bokeh.plotting import figure, output_file, show
import numpy as np

output_file("scatter.htm")

N = 1200
x = np.random.random(size=N)
y = np.random.random(size=N)
colors = [
    "#%02x%02x%02x" % (int(r), int(g), 150) for r, g in zip(256*x, 256*y)
]

p = figure(plot_width=GLOBAL_WIDTH, plot_height=GLOBAL_HEIGHT)

# add a circle renderer with a size, color, and alpha
p.circle(x, y, size=5, color=colors, alpha=0.9)

# show the results
show(p)

### Scatter plot with pop-up images through tooltips

from https://docs.bokeh.org/en/latest/docs/user_guide/tools.html#custom-tooltip

(note our default datahub image has 1.3.4 not 1.4)

In [None]:
from bokeh.plotting import ColumnDataSource

output_file("toolbar.html")

source = ColumnDataSource(data=dict(
    x=[1, 2, 3, 4, 5],
    y=[2, 5, 8, 2, 7],
    desc=['A', 'b', 'C', 'd', 'E'],
    imgs=[
        'https://docs.bokeh.org/static/snake.jpg',
        'https://docs.bokeh.org/static/snake2.png',
        'https://docs.bokeh.org/static/snake3D.png',
        'https://docs.bokeh.org/static/snake4_TheRevenge.png',
        'https://docs.bokeh.org/static/snakebite.jpg'
    ],
    fonts=[
        '<i>italics</i>',
        '<pre>pre</pre>',
        '<b>bold</b>',
        '<small>small</small>',
        '<del>del</del>'
    ]
))

TOOLTIPS = """
    <div>
        <div>
            <img
                src="@imgs" height="42" alt="@imgs" width="42"
                style="float: left; margin: 0px 15px 15px 0px;"
                border="2"
            ></img>
        </div>
        <div>
            <span style="font-size: 17px; font-weight: bold;">@desc</span>
            <span style="font-size: 15px; color: #966;">[$index]</span>
        </div>
        <div>
            <span>@fonts{safe}</span>
        </div>
        <div>
            <span style="font-size: 15px;">Location</span>
            <span style="font-size: 10px; color: #696;">($x, $y)</span>
        </div>
    </div>
"""

p = figure(plot_width=GLOBAL_WIDTH, plot_height=GLOBAL_HEIGHT, tooltips=TOOLTIPS,
           title="Mouse over the dots")

p.circle('x', 'y', size=10, source=source)

show(p)

### Histogram

adapted from https://docs.bokeh.org/en/latest/docs/gallery/histogram.html

In [None]:
import scipy.special

curdoc().theme = 'dark_minimal'

def make_plot(title, hist, edges, x, pdf, cdf):
    p = figure(title=title, tools='', width=GLOBAL_WIDTH, height=GLOBAL_HEIGHT)
    p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
           fill_color="navy", alpha=0.5)
    p.line(x, pdf, line_color="#ff8888", line_width=4, alpha=0.7, legend_label="PDF")
    p.line(x, cdf, line_color="orange", line_width=2, alpha=0.7, legend_label="CDF")

    p.y_range.start = 0
    p.legend.location = "center_right"
    p.legend.background_fill_color = "#fefefe"
    p.xaxis.axis_label = 'x'
    p.yaxis.axis_label = 'Pr(x)'
    p.grid.grid_line_color="white"
    return p

# Normal Distribution

mu, sigma = 0, 0.5

measured = np.random.normal(mu, sigma, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)

x = np.linspace(-2, 2, 1000)
pdf = 1/(sigma * np.sqrt(2*np.pi)) * np.exp(-(x-mu)**2 / (2*sigma**2)) # probability density function
cdf = (1+scipy.special.erf((x-mu)/np.sqrt(2*sigma**2)))/2 # cumulative density function

p = make_plot("Normal Distribution (μ=0, σ=0.5)", hist, edges, x, pdf, cdf)
output_file('histogram.html')

show(p)

### Tasks
Explore bokeh:
- Look at those examples [on the cover page](https://docs.bokeh.org/en/latest/index.html))
- Find a good example "in the wild", e.g. from the user community.

Use bokeh:
- Use bokeh to create a histogram plus tooltip images to see which kinds of images (via inspection) are the lowest resolution in your scraped collection
- Create scatter plots from pairs of metrics you calculated last week (mean brightness, mean hue, for instance)
- Create scatter plots from n > 2 metrics using PCA or some dimensional reduction technique like t-SNE or UMAP.