In [1]:
# imports
import bokeh
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
import pandas as pd
# imports for figure tools
from bokeh.models import (
    ColumnDataSource,
    HoverTool,
    LogColorMapper,
    CategoricalColorMapper
)

# imports for iris sample data set
from sklearn.datasets import load_iris
# load iris data into dataframe
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
name_map = {0:'setosa',1:'versicolor',2:'virginica'}
df['species']= [name_map[x] for x in data.target]


In [2]:
output_notebook()

<div class="alert alert-success">
</p>
Bokeh is a python library developed to create interactive visualizations that can be easily integrated into the modern web browser. It does this by creating JSON input for BokehJS, which is a javascript plotting library. Bokeh can also be used with R, Scala, and Julia (but Julia binding are currently unmaintained). 
</p>
<p>
This tutorial uses a few different sources, but I think the following are particularly helpful.
    <li>Instalation: https://bokeh.pydata.org/en/latest/docs/installation.html</li>
    <li>User Guide: https://bokeh.pydata.org/en/latest/docs/user_guide.html </li>
    <li>Examples: https://bokeh.pydata.org/en/latest/docs/gallery.html </li>
</p>
</div>

# Figures
<p> If you were drawing a figure, the bokeh figure object would be the first thing you would put on the paper, the axes. A figure can exist without any data in it, however you cannot plot data without a figure. 
</p>

In [3]:
# instansiate figure
px_width = 400 # the number of pixels wide you want the figure
px_height = 400 # the number of pixels high you want the figure
tools = 'pan, box_zoom' # the type of tools you want the figure to have
x_label = 'x-axis'
y_label = 'y-axis'
x_data_type = 'datetime'
plot = figure(plot_width=px_width, plot_height=px_height, tools=tools, 
              x_axis_label= x_label, y_axis_label=y_label, x_axis_type = x_data_type)
show(plot)

# Glphys
<p>
    Glyphy objects are how data is displayed on a figure
</p>

In [4]:
#add data to the figure
x_values = [1,2,3,4,5]
y_values = [8,3,2,7,1]
# defines the size and color of each point
sizes = [10,20,30,40,50] # the size of each point
fill_color = 'teal' # solid color of each point
# add circle markers to the figure
plot.circle(x=x_values, y=y_values, size=sizes, fill_color=fill_color, alpha=0.8) # makes a scatter plot with circle markers of the x and y data
show(plot) # pops up a tab in your browser of the plot

In [5]:
# diamond markers
plot.diamond(x=x_values, y=y_values, size=sizes, fill_color='red', alpha=0.8) # makes a scatter plot with circle markers of the x and y data
show(plot) # pops up a tab in your browser of the plot


<div class="span5 alert alert-info">
Notice how the points overlap each other.  Bokeh is not making new figures with each call, it is just adding more points(glyphs) the initial figure.  If you want multiple figures, you need multiple figure objects.
</div>

#### Different built in markers (glyphs)
<p>
<li>asterisk()</li>
<li>cirlce()</li>
<li>circle_cross</li>
<li>cross()</li>
<li>diamond()</li>
<li>diamond_cross()</li>
<li>inverted_triangle()</li>
<li>square()</li>
<li>square_cross</li>
<li>square_x</li>
<li>triangle()</li>
<li>x()</li>
</p>

#### Acceptable colors
<p>
<li> hexadecimal strings </li>
<li>tuples of RGB values between 0 and 255</li>
<li>any CSS color name</li>
</p>


#### Glphys with Pandas dataframes
<p>
    Bokeh integrates really well with Pandas dataframes.
</p>
<p>
    A pandas series can be used for the x,y data points (it could also be used for the size)
</p>

In [6]:
# scatterplot of two dataframe columns
fig = figure(plot_width=px_width, plot_height=px_height, tools=tools, 
              x_axis_label= 'Sepal width (cm)', y_axis_label='Sepal length (cm)')

fig.circle(df['sepal length (cm)'], df['sepal width (cm)'], color= 'firebrick', size=15)
show(fig)

##### Below is an example of using a pandas series to color the points according to a categorical variable

In [7]:
# scatterplot of two dataframe columns, with categorical colors
fig = figure(plot_width=px_width, plot_height=px_height, tools=tools, 
              x_axis_label= 'Sepal width (cm)', y_axis_label='Sepal length (cm)')

colormap = {'setosa': 'OliveDrab', 'versicolor': 'orchid', 'virginica': 'DarkTurquoise'}
colors = [colormap[x] for x in df['species']]

fig.diamond_cross(df['sepal length (cm)'], df['sepal width (cm)'], color= colors, size=15)
show(fig)

### Patches

<p>
    The patches glphy is used to represent irregular shapes. The following examples can be found in the bokeh documentation https://bokeh.pydata.org/en/latest/docs/user_guide/plotting.html 
</p>

In [8]:
# instantiate figure
p1 = figure(plot_width=500, plot_height=500)

# define vertices
x_values = [1, 3, 2]
y_values = [2, 1, 4]

# add a circle at the 
p1.circle(x_values, y_values, alpha=0.8, size = 8)

# add a patch renderer with an alpha an line width
p1.patch(x_values, y_values, alpha=0.5, line_width=2)

show(p1)

##### Multiple patches
<div class="span5 alert alert-info">
<p>
    The above figure illustrates that the patch renderer takes the coordinates of a shape's vertices (x,y) and then draws a line that connects the vertices. A list of lists ([[x1],[x2]], [[y1],[y2]]) is used in order to draw multiple patches in the same figure.  As shown below (and in the documentation).

</p>
</div>

In [9]:
# plot 2 geometric shapes
p2 = figure(plot_width=700, plot_height=700)

p2.patches([[1, 3, 2], [3, 4, 6, 6]], [[2, 1, 4], [4, 7, 8, 5]],
          color=["green", "navy"], alpha=[0.8, 0.3], line_width=2)
show(p2)

# Column Data Source
<p>
    The column data source object is the fundamental data structure of Bokeh, and is how data gets processed into a glphy. Even when not explicitly called, like with the above plots, a column data source object is actually being created by Bokeh.
</p>

<p>
    The data element in the column data source object needs to map the name of the data to the values.  So some acceptable forms of data are:
    <li> dictionaires where the key:value pair is name:data </li>
    <li> Panda dataframes where the column name is the name, and the values in the column are the data </li>
</p>

<div class="span5 alert alert-info">
Note: All columns in a ColumnDataSource object need to be the same length at all times
</div>

In [10]:
# import
from bokeh.models import ColumnDataSource

In [11]:
# data source object from Pandas dataframe
data_source = ColumnDataSource(df)

In [12]:
p3 = figure(plot_width=400, plot_height=400)

p3.circle(source=data_source, x='sepal length (cm)',y='sepal width (cm)', size=10)
show(p3)


##### Colormaping

In [13]:
color_mapper = CategoricalColorMapper(factors=['setosa','versicolor','virginica'], palette=['Cyan','DarkOrange','DeepPink'])

In [14]:
p3.circle(source=data_source, x='sepal length (cm)',y='sepal width (cm)', size=10,
         color=dict(field='species', transform=color_mapper))
show(p3)

# Layouts

##### multiple plots


In [15]:
# imports
from bokeh.layouts import (
    row,
    column,
    gridplot,
    layout
)
from bokeh.models.widgets import Tabs, Panel

In [16]:
# four figures
p1 = figure(plot_width=400, plot_height=400, tools='box_select,lasso_select, pan,box_zoom,reset')
p2 = figure(plot_width=400, plot_height=400, tools='box_select,lasso_select, pan,box_zoom,reset')
p3 = figure(plot_width=400, plot_height=400, tools='box_select,lasso_select, pan,box_zoom,reset')
p4 = figure(plot_width=400, plot_height=400, tools='box_select,lasso_select, pan,box_zoom,reset')
# four different data glyphs
p1.circle(source=data_source, x='sepal length (cm)',y='sepal width (cm)', size=10,
         color=dict(field='species', transform=color_mapper))
p2.circle(source=data_source, x='sepal length (cm)',y='petal length (cm)', size=10,
         color=dict(field='species', transform=color_mapper))
p3.circle(source=data_source, x='sepal length (cm)',y='petal width (cm)', size=10,
         color=dict(field='species', transform=color_mapper))
p4.circle(source=data_source, x='petal length (cm)',y='petal width (cm)', size=10,
         color=dict(field='species', transform=color_mapper))

##### Row layout (with lasso functionality)

In [17]:
# row of figures
plot_rows = row(p1,p2,p3)
show(plot_rows)

##### Column layout

In [18]:
# column of figures
plot_column = column(p1,p2,p3)
show(plot_column)

##### Rows of Columns and vice versa

In [19]:
#columns and rows of figures
column_of_figs = column(p1,p2,p3)
grid_plots = row(column_of_figs,p4)
show(grid_plots)

In [20]:
#columns and rows of figures
row_of_figs = row(p1,p2,p3)
grid_plots = column(row_of_figs,p4)
show(grid_plots)

##### Gridplot

In [21]:
grid = gridplot([[p1,p2,p3],[p4, None, None]])
show(grid)

In [22]:
grid= gridplot([p1,p2,p3,p4], ncols=2)
show(grid)

##### Tabbed Layouts
<p>
    Tabs contain Panel objects
</p>

In [23]:
# Create Panel with a title
A = Panel(child=row(p1,p2,p3), title='A')
B = Panel(child=row(p4), title='B')

In [24]:
tabs = Tabs(tabs=[A,B])
show(tabs)

##### General layouts with sizing mode

In [25]:
# stretchs plots to fill figure
L1 = layout([
    [p1,p2],
    [p3],
    [p4]],
    sizing_mode = 'stretch_both')
show(L1)

In [26]:
L2 = layout([
    [p1,p2],
    [p3],
    [p4]],
    sizing_mode = 'fixed')
show(L2)

#### Sizing_mode  possibilities
<p>
<li>'fixed'</li>
<li>'scale_width'</li>
<li>'scale_height'</li>
<li>'scale_both'</li>
<li>'stretch_both'</li>
</p>

# Linking Plots

##### Linked axes

In [27]:
# link x-axes
p1.x_range = p2.x_range = p3.x_range
# link y-axes
p1.y_range = p2.y_range = p3.y_range


In [28]:
#row of figures
fig_row = row(p1,p2,p3)
show(fig_row)

##### Linked brushing
<p>
    Data needs to have a common data source
</p>

In [29]:
# row of figures
plot_rows = row(p1,p2,p3,p4)
show(plot_rows)

# Labeling
<p>
    * Making a picture a figure

In [30]:
# adding axes
plot = figure(x_axis_label= "X-axis", y_axis_label='y-axis', title="Labeled Figure")
plot.circle(x=[1,2,3],y=[8,9,10])
show(plot)

##### modifing labels

In [31]:
# title
plot.title.text_color = "olive"
plot.title.text_font = "helvetica"
plot.title.text_font_style = "bold" # or: normal, italic
plot.title.text_font_size = '12pt' # or: '12px', '12em'
plot.title.align = 'center' # or: left, right
show(plot)

##### Legends

In [32]:
p5 = figure()
p5.circle(source=data_source, x='sepal length (cm)',y='sepal width (cm)', size=10,
         color={'field':'species', 'transform' :color_mapper}, legend='species')
show(p5)

In [33]:
# change location
p5.legend.location = "bottom_left" # or: 'bottom_left', 'bottom_right', 'top_right'
show(p5)

##### Interactive legends from Bokeh's user guide (https://bokeh.pydata.org/en/latest/docs/user_guide/interaction/legends.html)

In [35]:
bokeh.sampledata.download()

Creating /home/aregel/.bokeh directory
Creating /home/aregel/.bokeh/data directory
Using data directory: /home/aregel/.bokeh/data
Downloading: CGM.csv (1589982 bytes)
   1589982 [100.00%]
Downloading: US_Counties.zip (3182088 bytes)
   3182088 [100.00%]
Unpacking: US_Counties.csv
Downloading: us_cities.json (713565 bytes)
    713565 [100.00%]
Downloading: unemployment09.csv (253301 bytes)
    253301 [100.00%]
Downloading: AAPL.csv (166698 bytes)
    166698 [100.00%]
Downloading: FB.csv (9706 bytes)
      9706 [100.00%]
Downloading: GOOG.csv (113894 bytes)
    113894 [100.00%]
Downloading: IBM.csv (165625 bytes)
    165625 [100.00%]
Downloading: MSFT.csv (161614 bytes)
    161614 [100.00%]
Downloading: WPP2012_SA_DB03_POPULATION_QUINQUENNIAL.zip (5148539 bytes)
   5148539 [100.00%]
Unpacking: WPP2012_SA_DB03_POPULATION_QUINQUENNIAL.csv
Downloading: gapminder_fertility.csv (64346 bytes)
     64346 [100.00%]
Downloading: gapminder_population.csv (94509 bytes)
     94509 [100.00%]
Download

In [36]:
# hiding data using the legend
from bokeh.palettes import Spectral4
from bokeh.sampledata.stocks import AAPL, IBM, MSFT, GOOG


# make axis datetime
p = figure(plot_width=800, plot_height=250, x_axis_type="datetime")
p.title.text = 'Click on legend entries to hide the corresponding lines'

# makes glypy from a dataframe for each company
for data, name, color in zip([AAPL, IBM, MSFT, GOOG], ["AAPL", "IBM", "MSFT", "GOOG"], Spectral4):
    df = pd.DataFrame(data)
    df['date'] = pd.to_datetime(df['date'])
    p.line(df['date'], df['close'], line_width=2, color=color, alpha=0.8, legend=name)

p.legend.location = "top_left"
p.legend.click_policy="hide"

show(p)

In [37]:
# muting the data using the legend
p = figure(plot_width=800, plot_height=250, x_axis_type="datetime")
p.title.text = 'Click on legend entries to mute the corresponding lines'

for data, name, color in zip([AAPL, IBM, MSFT, GOOG], ["AAPL", "IBM", "MSFT", "GOOG"], Spectral4):
    df = pd.DataFrame(data)
    df['date'] = pd.to_datetime(df['date'])
    p.line(df['date'], df['close'], line_width=2, color=color, alpha=0.8,
           muted_color=color, muted_alpha=0.2, legend=name)

p.legend.location = "top_left"
p.legend.click_policy="mute"

show(p)


<div class="span5 alert alert-info">
    *Note: the above plots are not using the same datasource
</div>


# User Interaction
<p>
    Bokeh has built-in tools, like box_select, lasso_select and hover, that allow users to dynamically interact with a plot.
    Tool parameters are specified in the figure object.
</p>

<li> box_select: allows selected points to have unique properties </li>
<li> lasso_select: allows selected points to have unique properties </li>
<li> hover: displays unique properties or information when moused over </li>


##### box_select

In [38]:
# Create a figure with the "box_select"
p = figure(tools='box_select')

# Add circle glyphs to the figure p with the selected and non-selected properties
p.circle(source=data_source, x='sepal length (cm)',y='sepal width (cm)', size=10, selection_color='red', nonselection_alpha = 0.1)

show(p)

##### lasso_select

In [39]:
# Create a figure with the lasso_select
p = figure(tools='lasso_select')

# Add circle glyphs to the figure p with the selected and non-selected properties
p.circle(source=data_source, x='sepal length (cm)',y='sepal width (cm)', size=10, selection_color='red', nonselection_alpha = 0.1)

show(p)

##### HoverTool

In [40]:
# import the HoverTool
from bokeh.models import HoverTool

# Instantiate hover object
hover = HoverTool(tooltips=[('Species', '@species')])

# make figure object with hover tool
p = figure(tools=['pan',hover],plot_width=600, plot_height=600)

# Add circle glyphs to figure p with hover parameters
p.circle(source=data_source, x='sepal length (cm)',y='sepal width (cm)', size=10,
         fill_color= 'black', alpha=0.5, line_color= None, hover_fill_color= 'firebrick',
         hover_alpha= 0.5, hover_line_color='white')
show(p)


#### Example of interactive figure using patches
<p>
    The bokeh gallery (https://bokeh.pydata.org/en/latest/docs/gallery.html) has some great examples of how to use patches, including GIS based figures.  Below is one such example (with some modifications). This plot has a lot of setup that will be addressed in other sections of the notebook.
</p>

In [42]:

# import for color palette
from bokeh.palettes import Viridis6 as palette
# imports for sample data
from bokeh.sampledata.us_counties import data as counties
from bokeh.sampledata.unemployment import data as unemployment

palette.reverse()
# create dictionary with the code number and name for each county in colorado
counties = {
    code: county for code, county in counties.items() if county["state"] == "co"
}

# extract the longitude(lons) and latitude(lats) for each county in the dictionary

county_xs = [county["lons"] for county in counties.values()]
county_ys = [county["lats"] for county in counties.values()]

# extract the name of each county
county_names = [county['name'] for county in counties.values()]
# extract the unemployment rate for each county
county_rates = [unemployment[county_id] for county_id in counties]
# create color mapper object with Viridis6 palette
color_mapper = LogColorMapper(palette=palette)

# create columndatasource object with data lists
source = ColumnDataSource(data=dict(
    x=county_xs,
    y=county_ys,
    name=county_names,
    rate=county_rates,
))

TOOLS = "pan,wheel_zoom,reset,hover,save"
# instantiate figure
p = figure(
    title="Colorado Unemployment, 2009", tools=TOOLS,
    x_axis_location=None, y_axis_location=None
)
p.grid.grid_line_color = None
# add patch glyphs
p.patches('x', 'y', source=source,
          fill_color={'field': 'rate', 'transform': color_mapper},
          fill_alpha=0.7, line_color="white", line_width=0.5)
# add hover tool
hover = p.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = [
    ("Name", "@name"),
    ("Unemployment rate)", "@rate%"),
    ("(Long, Lat)", "($x, $y)"),
]

show(p)

# These are just the basics.  If you know what your plot should look like, browse the Bokeh gallery to find a good starting point.