# Introduction

The purpose of this tutorial is to introduce Python's Bokeh data visualization library.  Although still a young package, Bokeh is an open-sourced package that is actively maintained by Continuum Analytics.  Bokeh is designed to allow for easy creation of interactive visualizations in modern browsers, standard HTML documents, or server-backed apps.  It is capable of versatile graphics and can handle large, dynamic or streaming data.  Bokeh's interactive nature and D3.js-like graphics make Bokeh an excellent package for creating exploratory graphs, as well as attractive graphics suitable for presentations.

# Contents

* [Installation](#Installation)<br>
* [Example Charts](#Example Charts)<br>
* [Basic Graphing](#Basic Graphing)<br>
* [Advanced Interactive Graphs](#Advanced Interactive Graphs)<br>

# Installation

Bokeh can be installed either by using Anaconda:

    $ conda install bokeh
    

or installed by using pip:

    $ pip install bokeh
    

When installing via pip, you will need to ensure that NumPy is installed.  After installing, make sure the following commands work:

In [1]:
import pandas as pd
import bokeh
from bokeh.io import push_notebook, show, output_notebook

#This is used to display the plots to Jupyter Notebooks
output_notebook()

# Example Charts

Now that we have Bokeh successfully installed, let's see an example of a Bokeh graph.  We'll use Bokeh's sample data from the musical, Les Miserables.  The chart will show the relationship between the characters in the musical.



In [2]:
from bokeh.charts import Chord
from bokeh.sampledata.les_mis import data

#Load the nodes and links from the sample data
nodes = data['nodes']
links = data['links']

nodes_df = pd.DataFrame(nodes)
links_df = pd.DataFrame(links)

#Merge the nodes and links into one dataframe
source_data = links_df.merge(nodes_df, how='left', left_on='source', right_index=True)
source_data = source_data.merge(nodes_df, how='left', left_on='target', right_index=True)
source_data = source_data[source_data["value"] > 5]  # Select those with 5 or more connections

#Plot the chord graph.  Specify the source, target, and value by using column names
chord_from_df = Chord(source_data, source="name_x", target="name_y", value="value")
show(chord_from_df)

Here, we can see that a more customized chart can be created using Bokeh's plotting interface.  This allows for the creating of some interesting renders.  Essentially, line and circle plots are being layered on top of each other to create a line graph where the points are marked by a circle.  This demonstrates that graphs can be created in an incremental manner.

http://bokeh.pydata.org/en/latest/docs/reference/plotting.html#bokeh-plotting

In [3]:
from bokeh.plotting import figure

x = range(1, 7)
y0 = [1, 2, 5, 9, 14, 19, 25]
y1 = [3, 7, 2, 7, 4, 8, 9]
y2 = [12, 9, 8, 6, 4, 1]

# create a new plot
p = figure(
   title="log axis example",
   x_axis_label='sections', y_axis_label='particles'
)

# add some renderers
p.line(x, x, legend="Baseline")
p.circle(x, x, legend= "Baseline", fill_color="white", size=8)
p.line(x, y0, legend="Line 1", line_width=3)
p.line(x, y1, legend="Line 2", line_color="red")
p.circle(x, y1, legend="Line 2", fill_color="red", line_color="red", size=6)
p.line(x, y2, legend="Line 3", line_color="orange", line_dash="4 4")

# show the results
show(p)

For the rest of this tutorial we will explore the City of Chicago's crime data during September 2015.  The full dataset can be found at their website: https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2.  Their data contains crime information dating back from 2001.

In [4]:
crimes = pd.read_csv("ChicagoCrimesSept2015.csv")
crimes = crimes[-crimes.isnull().any(axis=1)]
crimes["Primary Type"].replace("NON - CRIMINAL", "NON-CRIMINAL", inplace = True)
crimes["Date"] = pd.to_datetime(crimes["Date"])
crimes.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
2,10693199,HZ447924,2016-09-01,041XX S FEDERAL ST,810,THEFT,OVER $500,APARTMENT,False,False,...,3,38,06,1176378.0,1877703.0,2016,09/26/2016 03:52:09 PM,41.819771,-87.628481,"(41.819771291, -87.628481227)"
3,10689696,HZ444063,2016-09-01,001XX N WOLCOTT AVE,1751,OFFENSE INVOLVING CHILDREN,CRIM SEX ABUSE BY FAM MEMBER,RESIDENCE,False,True,...,27,28,20,1163722.0,1900910.0,2016,09/26/2016 03:52:09 PM,41.883729,-87.674255,"(41.883729075, -87.67425529)"
4,10678497,HZ431145,2016-09-01,033XX N SHEFFIELD AVE,820,THEFT,$500 AND UNDER,RESIDENTIAL YARD (FRONT/BACK),False,False,...,44,6,06,1169017.0,1922375.0,2016,09/15/2016 03:48:20 PM,41.942517,-87.654188,"(41.942517023, -87.654187534)"
5,10675339,HZ427545,2016-09-01,076XX S STEWART AVE,560,ASSAULT,SIMPLE,APARTMENT,False,True,...,17,69,08A,1174950.0,1854270.0,2016,09/12/2016 03:57:28 PM,41.755501,-87.634418,"(41.755500696, -87.63441848)"
6,10674857,HZ426794,2016-09-01,085XX S BENNETT AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,RESIDENCE,False,False,...,8,45,11,1190338.0,1848793.0,2016,09/12/2016 03:57:28 PM,41.740114,-87.578202,"(41.740114429, -87.578202208)"


# Basic Graphing

The Bokeh.charts interface is designed to allow for quick and easy building of statistical plotting.  This is different from the above examples where the charts were built the Figure class.  Glyphs were then added to the Figure to incremently build the chart.  When using classes from teh Bokeh.charts interface, data can be easily passed to the charting functions to create basic graphics.  There are many parameters that can be modified to tweak appearances to one's desire.

You can read more about all of the different charts and options here:
http://bokeh.pydata.org/en/latest/docs/reference/charts.html#bokeh-charts

>Note that off to the right side of the charts, there are interactive options that allow you to pan and zoom around the charts.

Let's first chart the counts of each type of crime. We'll do this by simply passing the dataframe to the Bar function, specifying that we want to use the "Primary Type" as our values.  We also want to aggregate by "count", instead of the default, "sum".  We'll also remove the legend since the x-axis will automatically have labels.

In [5]:
from bokeh.charts import Bar, show

plot = Bar(crimes, "Primary Type", values = "Primary Type", agg="count", legend=None)
show(plot)

Another interesting view would be to see how many incidents were reported, broken down by hour.  We'll first need to rearrange the data to create a new column that contains the hour that the incident took place.

In [6]:
hour = [i.hour for i in crimes["Date"]]
crimes = crimes.assign(Hour = hour)

plot2 = Bar(crimes, "Hour", values = "Hour", agg="count", legend=None)
show(plot2)

Let's add another element to create a stacked bar chart.  This way we can see the distribution of crimes across hours.  This is easily done by adding the "stack" argument.

In [7]:
plot3 = Bar(crimes, "Hour", values = "Primary Type", stack = "Primary Type", agg = "count")
show(plot3)

Unfortunately, the legend now obscures most of the graph.  While we could move the legend to the side of the graph, it would still be too long to see all the values.  Let's instead take advantage of Bokeh's interactive elements by adding a hover tool.

We'll add the "tooltips" argument, and specify a label and the data source.  The data source is specified by adding the "@" character before the name of the column that is used in the "stack" argument.

You can now hover your mouse over the segments in the graph to see what was the primary type of the incident.

In [8]:
crimes2 = crimes.copy()
crimes2 = crimes2.rename(columns = {"Primary Type": "PrimaryType"})

plot4 = Bar(crimes2, "Hour", values = "PrimaryType", stack = "PrimaryType", agg = "count", 
            tooltips=[("Primary Type", "@PrimaryType")], legend = None)

show(plot4)

# Advanced Interactive Graphs

Bokeh has the ability to implement interactive features that go beyond panning, zooming, and hovering.  Bokeh has two types of advanced interactions: Linking Plots and Widgets.  Linking plots simply relates two plots to another, such that when you pan or select data on one plot, the linked plot will also pan and have the same data selected.  This tutorial won't cover this kind of interactiveness since it is very simple.  You can learn more about it here:

http://bokeh.pydata.org/en/latest/docs/user_guide/interaction/linking.html#userguide-interaction-linking


We'll be focusing on using Widgets to make the plots interactive.  There are a few ways to use widgets, first is to use JavaScript Callbacks, the second is to run a Bokeh Server, which will be out of the scope of this tutorial, and lastly, use Jupyter Interactors to enable interactive graphs in a Jupyter notebook.  This is the type of widget that we'll cover in detail.


Let's first briefly look at JavaScript Callbacks.  The below code segment demonstrates how a JavaScript Callback can be used to create a simple interactive plot.  The <i>callback</i> function is the JavaScript part.  However, in this example we have actually written the code in native Python since many of you may not be familiar with JavaScript.  We can do this because Bokeh implemented what's called a CustomJS callback which uses pyscript to translate Python into JavaScript.  To enable this, you would have to install flexx:

    $conda install -c bokeh flexx
    
Using JavaScript callbacks have the advantage of being able to run on most web browsers, and do not require any Python code to execute.  On the otherhand, JavaScript also does not support many Python commands and packages, so it would be extremely difficult to use this method to interact with data that's stored in a Pandas dataframe.

In [9]:
from bokeh.layouts import column
from bokeh.models import CustomJS, ColumnDataSource, Slider
from bokeh.plotting import Figure, output_file, show

x = [x*0.005 for x in range(0, 200)]
y = x

source = ColumnDataSource(data=dict(x=x, y=y))

plot = Figure(plot_width=400, plot_height=400)
plot.line('x', 'y', source=source, line_width=3, line_alpha=0.6)

def callback(source=source, window=None):
    data = source.data
    f = cb_obj.value
    x, y = data['x'], data['y']
    for i in range(len(x)):
        y[i] = window.Math.pow(x[i], f)
    source.trigger('change')

slider = Slider(start=0.1, end=4, value=1, step=.1, title="power",
                callback=CustomJS.from_py_func(callback))

layout = column(slider, plot)

show(layout)

In this next example we'll use Jupyter Interactors to create an interactive graph that works in Jupyter Notebooks.  The obvious advantage of using Jupyter Interactors is that the code is 100% python, so you can manipulate the data as you would normally do.  This is much easier than trying to use JavaScript to work with complex data.  On the downside, the graphs will not work on all environments, as you may notice if viewing this notebook via NBViewer.

>The following graphs will not display if viewing the notebook in NBViewer.  They will work however, when viewed from Jupyter Notebooks.

In our Chicago Crimes dataset, we've explored the types and timing of crimes.  Let's now look at the timing and the location of crimes because different parts of the cities may exhibit more or less crime during different parts of the days.

>Unfortuately, we can't plot the locations on Google Maps since they recently changed their API and Bokeh has not yet addressed this yet.  We'll instead plot the crimes using a regular scatterplot using the longitude and latitude as coordinates.

We'll be using a ColumnDataSource as the source of the data instead of directly passing the dataframe to the plot.  We want to do this because we'll now be slicing data, and updating the values of the data.  In the ColumnDataSource we'll define the structure of the data by creating a dictionary of what data we want to display on the plot.  In this case, we'll display the longitude and the latitude.

In the update function, we'll assign values to the lon and lat keys in the data source, slicing based on the hour selected.

Lastly, in another cell block below the graph, we have the following code to implement a sliding widget where you can select what hour of the day to display:

    widgets = interact(update, Hour=FloatSlider(min=0, max=23, step=1, continuous_update=False))
    
This code must be in another cell block otherwise the graph will not display properly.  The name of the FloatSlider must also correspond with the name of the variable that's defined in the <i>update()</i> function.  In this case, they are both named, "Hour".

The key to making the interactive plots work in Jupyter Notebooks is using notebook handles.  When the argument <i>notebook_handle=True</i> is passed to <i>show()</i>, a handle object is returned.  This handle object can then be used with <i>push_notebook()</i> to update the plot.

In [10]:
from bokeh.layouts import layout, widgetbox
from bokeh.models import HoverTool
from bokeh.io import push_notebook
from ipywidgets import interact, FloatSlider

source = ColumnDataSource(data=dict(lon=[], lat=[]))

fig = figure()
plot = fig.circle(x="lon", y="lat", source=source)

def update(Hour=0):
    plot.data_source.data["lon"] = crimes[crimes["Hour"]==Hour]["Longitude"]
    plot.data_source.data["lat"] = crimes[crimes["Hour"]==Hour]["Latitude"]
    push_notebook(handle=graph)

graph = show(fig, notebook_handle=True)

In [11]:
widgets = interact(update, Hour=FloatSlider(min=0, max=23, step=1, continuous_update=False))

Pretty cool, right?  Let's now take it further and add in more interactions.  We'll first want to add back the colors to identify the different type of crimes.  Unfortunately when using the <i>figure</i> object, we can't simply pass in the dataframe to the color argument to color datapoints by grouping.  We'll have to do this manually by creating a new column in the Crimes dataframe, and assigning each row a color.

There isn't a way to list all the colors that Bokeh supports, but luckily, Bokeh can use the same colors that matplotlib does.  We'll take advantage of this by importing the colors from matplotlib and then map each type of crime to a color.

In [12]:
from matplotlib import colors

colorList = colors.cnames.keys()
colorMapping = {primaryType:color for primaryType, color in zip(set(crimes["Primary Type"]), colorList)}
crimes["Color"] = crimes.apply(lambda row: colorMapping[row["Primary Type"]], axis=1)

Let's next create the data that we'll use as our interaction widget options and the data source.

We're going to provide the ability to slice the data using the Primary Type, whether an arrest was made, and if the crime was a domestic crime or not.  We'll also create a hover tool that will show the case number, block where the crime occured at, the type, and the description.  The hover tool uses the following format: (Label, dataSourceName).  The dataSourceName must be the same as the name that was defined in the ColumnDataSource.

In the ColumnDataSource, we're going to include keys for all the data that will be displayed on the graph.

In [13]:
typeDropdown = ["ALL"]
temp = list(set(crimes["Primary Type"]))
typeDropdown.extend(sorted(temp))
adChoice = ["ALL", "TRUE", "FALSE"]

hoverTool = HoverTool(tooltips=[
        ("Case Number", "@caseNumber"),
        ("Block", "@block"),
        ("Primary Type", "@primaryType"),
        ("Description", "@description"),
    ])


source2 = ColumnDataSource(data=dict(
        lon=[], 
        lat=[], 
        caseNumber=[], 
        block=[], 
        primaryType=[], 
        description=[], 
        color=[]
    ))

To make things easier to read, we'll also create a new function that will perform the data slicing based on the options that the user has selected.  The function will take the different options as arguments and slice the data to return a dataframe that contains only the rows that meet all of the selected criteria.

In [14]:
def sliceData(Type, Hour, Arrested, Domestic):
    miniCrimes = crimes[crimes["Hour"] == Hour]
    if Type != "ALL":
        miniCrimes = miniCrimes[miniCrimes["Primary Type"] == Type]
    if Arrested == "TRUE":
        miniCrimes = miniCrimes[miniCrimes["Arrest"] == True]
    elif Arrested == "FALSE":
        miniCrimes = miniCrimes[miniCrimes["Arrest"] == False]
    if Domestic == "TRUE":
        miniCrimes = miniCrimes[miniCrimes["Domestic"] == True]
    elif Domestic == "FALSE":
        miniCrimes = miniCrimes[miniCrimes["Domestic"] == False]
            
    return miniCrimes

Finally, we can implement the rest of the code to actually update the graph.  The update2 function will be called by the widgets, and the data source will have the data updated every time a widget's value changes.  The function will then push the updates to the notebook.

Again, we create the widgets in another code block.  The type of widget that displays depends on the type of data it's displaying.  Jupyter Interactions selects this by default, and you can read more about the different options here:
https://github.com/ipython/ipywidgets/blob/d3cd557972fc0b14aa4291fde62b8dd00a499213/docs/source/examples/Using%20Interact.ipynb

After running the code blocks, we can now slice the data based on time, type, arrest, and if it's a domestic crime.  We can also hover over data points for more information, and zoom and pan around.

In [15]:
fig2 = figure()
fig2.add_tools(hoverTool)
plot2 = fig2.circle(x="lon", y="lat", color="color", alpha=0.6, source=source2)

def update2(Type="ALL", Hour=0, Arrested="ALL", Domestic="ALL"):
    miniCrimes = sliceData(Type, Hour, Arrested, Domestic)
    
    plot2.data_source.data["lon"] = miniCrimes["Longitude"]
    plot2.data_source.data["lat"] = miniCrimes["Latitude"]
    plot2.data_source.data["caseNumber"] = miniCrimes["Case Number"]
    plot2.data_source.data["block"] = miniCrimes["Block"]
    plot2.data_source.data["primaryType"] = miniCrimes["Primary Type"]
    plot2.data_source.data["description"] = miniCrimes["Description"]
    plot2.data_source.data["color"] = miniCrimes["Color"]
    fig2.title.text = "Number of Crimes: " + str(len(miniCrimes))
    push_notebook(handle=graph2)

graph2 = show(fig2, notebook_handle=True)

In [16]:
widgets2 = interact(update2, Type=typeDropdown, 
                    Hour=FloatSlider(min=0, max=23, step=1, continuous_update=False), 
                    Arrested=adChoice, Domestic=adChoice,)

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>