# Data Visualization Using Bokeh

In [2]:
#Import package
import bokeh as bk

## Introduction

Bokeh is a Python library specialized in creating interactive visualizations for web browsers. While you are a beginner, an intermediate, or an advanced in data visualization, you would find the corresponding section and create JavaScript-powered visualization using Python.

## Beginner

While Bokeh specializes in creating interactive plots, it can also fulfill the essential plot functions of other data visualization packages such as seaborn and matplotlib. At the beginning of our tutorial, we will show how to plot the basic plots, such as line chart plot and bar plot, and the steps to make the plots readable to the audience by adding legends, text, and annotation. We would further glimpse Bokeh's vital interactive data visualization function by instructing on how to use widgets to have users create custom plots.


In [37]:
## Generate random dataset
df = pd.DataFrame(np.random.randint(0,10,size=(5, 4)), columns=list('ABCD'))
df.head()

Unnamed: 0,A,B,C,D
0,6,3,2,7
1,2,8,0,5
2,5,0,0,7
3,0,3,1,8
4,1,7,3,4



We first start with plotting a simple line chart. After defining the figure p with title and coordinate labels, we can determine the three arguments in the line function: x, y, and data source. We can make our data visualization result more readable by adding the optional legend label and modifying the line color and width. After running the code, the default plot will be generated and displayed in a new HTML window. To use Bokeh with jupyter notebook, the user could also use the output_notebook() function from bokeh.io and the show() function to display the plot in Jupyter. The plot would have a side toolbar, where you can use the pan/drag tool along with the zoom tools such as box zoom and wheel zoom to explore the outcome plot. Users can further define the position of the toolbar in the plot code as well.

In [38]:
from bokeh.io import output_notebook

In [39]:
output_notebook()

In [40]:
from bokeh.plotting import figure, show
# create a new plot with a title and axis labels
p = figure(title="Line Chart Example", x_axis_label="X axis", y_axis_label="Y axis",
           toolbar_location="right",
           toolbar_sticky=False)

# add a line renderer with legend and line thickness
p.line('A', 'B', legend_label="A vs B",line_color = 'blue', line_width=2,source = df)
# add multiple line
p.line('A', 'C', legend_label="A vs C", line_color="red", line_width=2,source = df)
p.line('A', 'D', legend_label="A vs D", line_color="green", line_width=2,source = df)
# show the results
show(p)

## Renderer with glyphs

We can further add other renders to our line plot with Bokeh. Bokeh supported many different kinds of glyphs such as circles, bars, and hex tiles. We would give a brief introduction to rendering with Bokeh. In the demo code, we showed the codes for plotting a variety of glyphs, including circle plots, bar plots, and hexbin plots. Users can also alter the properties of the plots after creating the plot by overwriting the glpyth plot.

In [55]:
# create a new plot with a mix of line chart, scatter plot and bar plot
p = figure(title="glyph plot example", x_axis_label="X axis", y_axis_label="Y axis")
# add a line renderer with legend and line thickness
line = p.line('A', 'B', 
       legend_label="line chart",line_color = 'blue', line_width=2,source = df)
circle = p.circle('A', 'B', 
         legend_label="circle plot", line_color="yellow", size=12,
         source = df)
vbar = p.vbar(x='A', top='B', 
       legend_label="vbar", width=0.5, bottom=0, color="red",
       source = df,alpha = 0.5)
# change property of existing glyphs after defining the plot
glyph = circle.glyph
glyph.fill_color = 'blue'
show(p)

## Customer the plot appearance

The plot's general attributes can be easily designed and changed inside the figure() function in Bokeh. You can add the title of the graph, modify its size manually or change it to fit the screen proportionally. The axes' range can also be defined through the x_range and y_range objects.<br>
Bokeh also provided a variety of axes type to fit the different data types. Through importing the built-in packaged from Bokeh, users can enable different units of the axes. For instance, the NumeralTickFormatter can translate a number into strings, currency, byte, and time and display the symbols on the axes to make the graph more readable. The formatting process can be implemented through the formatter property.<br>
Similar to other plot libraries, Bokeh comes along with built-in themes to change its appearance. There are five built-in themes in Bokeh: caliber, dark_minimal, light_minimal, night_sky, and contrast, but the user can also create customing themes as wells. The theme design can be implemented through JSON and YAML files. 


In [61]:
from bokeh.models import DatetimeTickFormatter, NumeralTickFormatter
# Deine Plot
p = figure(title="glyph plot example", x_axis_label="X axis", y_axis_label="Y axis")
# Add Glyphs
line = p.line('A', 'B', 
       legend_label="line chart",line_color = 'blue', line_width=2,source = df)
circle = p.circle('A', 'B', 
         legend_label="circle plot", # add legend of glyph
        line_color="yellow", size=12,
         source = df)
vbar = p.vbar(x='A', top='B', 
       legend_label="vbar", width=0.5, bottom=0, color="red",
       source = df,alpha = 0.5)
p.legend.location = 'top_left' # the default position is 'top_right'
p.legend.title = 'glyph types' # add title to legend

# change the font, font style and text color of legend text
p.legend.label_text_font = "times"
p.legend.label_text_font_style = "italic"
p.legend.label_text_color = "blue"

# change location and style of title
p.title_location = "left"
p.title_location = "left"

#format axes ticks
p.yaxis[0].formatter = NumeralTickFormatter(format="$0.00")

show(p)

## Mapping Data to Colors and Sizes

While the user can adjust the color and size of the plot via the fill_color property, they can also change the color and size through a vectorization based on the data value. The process can be implemented by the radii and colors argument. The users can also custom color through mapping the color to the built-in pre-defined palettes inside Bokeh. In the example, we showed how to map the shade to the Magma palette, originally from the Matplotlib. After creating the linear color mapper, the user can then customize the circle plot color through the circle property.

In [135]:
# generate data
x = [1,2,3,4,5]
y = [i**2 for i in x]
r = np.array([0.1,0.2,0.3,0.4,0.5])
# create linear color mapper
mapper = linear_cmap(field_name="y", palette=Turbo256, low=min(y), high=max(y))

# create plot
p = figure(width=500, height=250)

# create circle renderer with color mapper
p.circle(x, y, 
         radius = r,
         color=mapper, size=10)

show(p)

## Using Widget

An important built-in tool of the bokeh package is its interactive widget, which allows users to adjust and control elements of the Bokeh document and produce dynamic results with additional information. Bokeh can produce a plot with three widgets: Div widget to display HTML text, Spinner widget to select Numeric Value, and RangeSlider widget to adjust a range. After creating the Spinner and RangeSlider widget, the user would need to link the widget objects to the glyph property through the js_link() function. After defining the widgets, the user can create a dashboard with the layout() to display the widgets in the resulting plot. In the following examples, we will display how to apply these three widget objects.<br>
In the outcome plot, we could find the two widgets on the sides of our plot. We could adjust the x-ais range, and the size of our scatter plot by interacting with the curse.

In [136]:
from bokeh.models import Div, RangeSlider, Spinner

In [145]:
from bokeh.layouts import layout
# Data
x = df['A'].tolist()
y = df['B'].tolist()
# create plot with circle glyphs
p = figure(x_range=(1, 9), width=500, height=250)
points = p.circle(x=x, y=y, size=30, fill_color="#21a7df")

# Div
div = Div(
    text="""
          <p>Select the circle's size using this control element:</p>
          """,
    width=200,
    height=30,
)

# Spinner
spinner = Spinner(
    title="Circle size",
    low=0,
    high=60,
    step=5,
    value=points.glyph.size,
    width=200,
)
spinner.js_link("value", points.glyph, "size")

# RangeSlider
range_slider = RangeSlider(
    title="Adjust x-axis range",
    start=0,
    end=10,
    step=1,
    value=(p.x_range.start, p.x_range.end),
)
range_slider.js_link("value", p.x_range, "start", attr_selector=0)
range_slider.js_link("value", p.x_range, "end", attr_selector=1)

# Layour
layout = layout(
    [
        [div, spinner],
        [range_slider],
        [p],
    ]
)

# show result
show(layout)

## Plot with missing values

Bokeh does not require the user to have a data resource with no missing values. In Bokeh, users can pass the null values in the line charts and patch charts by generating glyphs with disjointed lines or gaps for the null values. In the following example, we will show how to plot missing values in line charts and patch charts. This technique can provide the data scientists with a clear understanding of the missing data and the trend before and after the missing value points.

In [156]:
#plot with na values in line chart
from numpy import nan
A = [1,2,np.nan,3,4]
B = [2,4,6,8,10]
p = figure(width = 400, height = 400)
p.line(A,B)
show(p)

In [163]:
#plot with na values in patch chart
from numpy import nan
A = [1,3,2,np.nan,3,10,5]
B = [2,8,4,np.nan,6,10,1]
p = figure(width = 400, height = 400)
p.patch(A,B)
show(p)

## Categorical Value
Bokeh can also handle categorical data by representing the categorical and hierarchical values in a sequence of strings. The most common approaches to visualize these non-continuous values are bar charts, categorical heatmaps, and jitter plots. In the following example, we will display how to present the categorical data in these charts.

### Barchart
We plot the categorical value in a bar chart by first defining the x_range in the figure() function. After that, we call the vbar with the two arguments x, the categorical attribute, and y, the numerical value in value(x,y,...) module. For a better visualization effect, the user can assign colors to the name of color columns and outline and the color argument of vbar().
Users can also generate advanced bar charts using stacking and grouping with nested categories. The vbar_stack() function contains two main arguments, the stack indicator attribute, and the categorical attributes. Users can also group the bar plots by attributes with the dodge() process when defining the bar plot. In the following example, we display how to use these techniques to create plots of the stacked and grouped bar.

In [206]:
#define a categorical dataset
data = {'name': ['Ann', 'Bob', 'Cath', 'Duke','Ella',"Frank"], 
        'Reading': [100,90,70,60,20,10],
       'Writing':[20,30,40,50,40,80],
       'Math':[50,60,90,20,30,50]
       }
pd.DataFrame.from_dict(data)

Unnamed: 0,name,Reading,Writing,Math
0,Ann,100,20,50
1,Bob,90,30,60
2,Cath,70,40,90
3,Duke,60,50,20
4,Ella,20,40,30
5,Frank,10,80,50


In [203]:
from bokeh.models import ColumnDataSource
from bokeh.palettes import Spectral6

# Use ColumnDataSource to assign each value in the categorical variable to the color() argument
source_color = ColumnDataSource(data=dict(name = data['name'], 
                                          Reading=data['Reading'], 
                                          color=Spectral6))

p = figure(x_range=data['name'],height=250, title="Reading Scores", 
           toolbar_location=None, tools="")
#add sepearte bars for the catergorical variable
p.vbar(x='name', top = 'Reading', width=0.9,
       color = 'color',
       legend_field="name",#define legend for the catergorical value
       source = source_color)

#add legend for the categorical variable
p.xgrid.grid_line_color = None
p.legend.orientation = "horizontal"
p.legend.location = 'top_center'
show(p)





In [215]:

from bokeh.transform import dodge
source = ColumnDataSource(data=data)

p = figure(x_range=data['name'], height=250, title="Scores for each student",
           toolbar_location=None, tools="")
# use the dodge function to group the categorical value
p.vbar(x=dodge('name', -0.25, range=p.x_range), top='Reading', width=0.2, source=source,
       color="Red", legend_label="Reading")

p.vbar(x=dodge('name',  0.0,  range=p.x_range), top='Writing', width=0.2, source=source,
       color="Green", legend_label="Writing")

p.vbar(x=dodge('name',  0.25, range=p.x_range), top='Math', width=0.2, source=source,
       color="Blue", legend_label="Math")

p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"

show(p)

### Scatter plots and Heatmap


When dealing with large datasets with a massive amount of data for each definite value, the scatter plot would have a better visualization outcome than the bar plot. To create a scatter plot without overlapping every category, use the jitter() function when defining the circle() plot to provide each point with a random offset.
When dealing with datasets in which the value is related to two different categorical values, it would be hard to visualize the relationship through the bar or scatter plots. Under this situation, we could use the heatmaps through defining each of the x and y-axis to be a categorical value. By mapping the value to the palette, the audience can understand the distribution of values and quickly compare the categorical values. The visualization process can be implemented through the rect() function. 

## Display and export plot

While the plots in the previous examples were automatically converted to an HTML file, Bokeh also allows users to export the plot as PNG files, create a standalone HTML file,  or Jupyter/Zeppelin notebooks for different occasions. To generate the interactive output, the user can input output_file() function for an HTML output or output_notebook() function for an notebook output. These output method functions are often used together with the show() or save() functions. 

## Plot with Geographic Data in real-life map

When dealing with the real-life data, one important topic would be the geology data analysis.  Bokeh performed an outstanding role in putting our data in the rea-world context. Though the TileSource mechanism, Bokeh can plot data in geographic coordinates. <br>
The WTMS is the most widely used web standard for tiled map data. Using Web Mercator format to measure distances from Greenwich, England as meters north and meters west, the WMTS could convinently interpret the data on a geometry coordinate without distorting the global shape. <br>In the following example, we would give a brief introduction into the WMTS tile source.
While the user could define the interface with required zoom from the tile provider, the WTMS have built-in tile sources which could be achieved from bokeh.tile_providers() function. In the example, we are using the 'ESRI_IMAGERY' tile provider provied by the Esri Company.

In [261]:
#import data
from bokeh.plotting import figure,show
from bokeh.models import WMTSTileSource

# web mercator coordinates for USA
USA = x_range,y_range =  ((-13884029,-7453304), (2698291,6455972))
# define the plot and the axis type. Here we are using the mercator coordicates
p = figure(tools='pan, wheel_zoom', x_range=x_range, y_range=y_range, 
           x_axis_type="mercator", y_axis_type="mercator")

In [262]:
# import tile source
from bokeh.tile_providers import get_provider, Vendors
esri = get_provider(Vendors.ESRI_IMAGERY)

p.add_tile(esri)

In [263]:
show(p)

After defining the map, we could add circle plots on the graph through the circle function. 

In [264]:
import pandas as pd
import numpy as np

def wgs84_to_web_mercator(df, lon="lon", lat="lat"):
    """Converts decimal longitude/latitude to Web Mercator format"""
    k = 6378137
    df["x"] = df[lon] * (k * np.pi/180.0)
    df["y"] = np.log(np.tan((90 + df[lat]) * np.pi/360.0)) * k
    return df
#define the input dataset with the latitude and longtitude of our goal cities
df = pd.DataFrame(dict(name=["Washington, DC", "NYC"], lon=[-77.27,-118.14], lat=[38.54,34.3]))
# convert the lat and lon to web mercator coordinates
wgs84_to_web_mercator(df)

Unnamed: 0,name,lon,lat,x,y
0,"Washington, DC",-77.27,38.54,-8601657.0,4655993.0
1,NYC,-74.0059,40.7128,-8238299.0,4970072.0


In [265]:
#plot
p.circle(x=df['x'], y=df['y'], fill_color='orange', size=10)
show(p)

## Resource In Bokeh
Bokeh come along with many built-in datasets for the user to play around. The data can be obtained through the sampledata module. The users can further learn about the functions in Bokeh and explore advanced data visualization tools with these datasets

In [216]:
import bokeh.sampledata
bokeh.sampledata.download()

Creating /Users/ruyiyang/.bokeh directory
Creating /Users/ruyiyang/.bokeh/data directory
Using data directory: /Users/ruyiyang/.bokeh/data
Downloading: CGM.csv (1589982 bytes)
   1589982 [100.00%]
Downloading: US_Counties.zip (3171836 bytes)
   3171836 [100.00%]
Unpacking: US_Counties.csv
Downloading: us_cities.json (713565 bytes)
    713565 [100.00%]
Downloading: unemployment09.csv (253301 bytes)
    253301 [100.00%]
Downloading: AAPL.csv (166698 bytes)
    166698 [100.00%]
Downloading: FB.csv (9706 bytes)
      9706 [100.00%]
Downloading: GOOG.csv (113894 bytes)
    113894 [100.00%]
Downloading: IBM.csv (165625 bytes)
    165625 [100.00%]
Downloading: MSFT.csv (161614 bytes)
    161614 [100.00%]
Downloading: WPP2012_SA_DB03_POPULATION_QUINQUENNIAL.zip (4816256 bytes)
   4816256 [100.00%]
Unpacking: WPP2012_SA_DB03_POPULATION_QUINQUENNIAL.csv
Downloading: gapminder_fertility.csv (64346 bytes)
     64346 [100.00%]
Downloading: gapminder_population.csv (94509 bytes)
     94509 [100.00%]