Bokeh is an interactive data visualization library for Python (and other languages!) that targets modern web browsers for presentation. It can create versatile, data-driven graphics, and connect the full power of the entire Python data-science stack to rich, interactive visualizations.

## What are glyphs?

In Bokeh, visual properties of shapes are called glyphs. The visual properties of these glyphs such as position or color can be assigned single values.

In [16]:
import pandas as pd
df=pd.read_csv('literacy_birth_rate.csv')

In [17]:
df

Unnamed: 0,Country,Continent,female literacy,fertility,population
0,Chine,ASI,90.5,1.769,1.324655e+09
1,Inde,ASI,50.8,2.682,1.139965e+09
2,USA,NAM,99,2.077,3.040600e+08
3,Indonésie,ASI,88.8,2.132,2.273451e+08
4,Brésil,LAT,90.2,1.827,1.919715e+08
5,Pakistan,ASI,40,3.872,1.661115e+08
6,Bangladesh,ASI,49.8,2.288,1.600001e+08
7,Nigéria,AF,48.8,5.173,1.512123e+08
8,Fédération de Russie,EUR,99.4,1.393,1.419500e+08
9,Japan,ASI,99,1.262,1.277040e+08


In [18]:
df.columns

Index(['Country ', 'Continent', 'female literacy', 'fertility', 'population'], dtype='object')

In [19]:
# Import figure from bokeh.plotting
from bokeh.plotting import figure

# Import output_file and show from bokeh.io
from bokeh.io import output_file,show

# Create the figure: p
p = figure(x_axis_label='fertility (children per woman)',y_axis_label='female_literacy (% population)')

# Add a circle glyph to the figure p
p.circle(df['fertility'],df['female literacy'])

# Call the output_file() function and specify the name of the file
output_file('fert_lit.html')

# Display the plot

show(p)

## A scatter plot with different shapes

By calling multiple glyph functions on the same figure object, we can overlay multiple data sets in the same figure.
We plot female literacy vs fertility for two different regions, Africa and Latin America. 

Our job is to plot the Latin America data with the circle() glyph, and the Africa data with the x() glyph.

In [8]:
latinamerica=df[df['Continent']=='LAT']
africa=df[df['Continent']=='AF']

In [9]:

# Create the figure: p
p = figure(x_axis_label='fertility',y_axis_label='female_literacy (% population)')

# Add a circle glyph to the figure p
p.circle(latinamerica['fertility'],latinamerica['female literacy'])

# Add an x glyph to the figure p
p.x(africa['fertility'],africa['female literacy'])

# Specify the name of the file
output_file('fert_lit_separate.html')

# Display the plot
show(p)


## Customizing our scatter plots

The three most important arguments to customize scatter glyphs are **color**, **size**, and **alpha**. Bokeh accepts colors as hexadecimal strings, tuples of RGB values between 0 and 255, and any of the 147 CSS color names. Size values are supplied in screen space units with 100 meaning the size of the entire figure.

The alpha parameter controls transparency. It takes in floating point numbers between 0.0, meaning completely transparent, and 1.0, meaning completely opaque. 

In [10]:
# Create the figure: p
p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

# Add a blue circle glyph to the figure p
p.circle(latinamerica['fertility'],latinamerica['female literacy'], color='blue', size=10, alpha=0.8)

# Add a red circle glyph to the figure p
p.circle(africa['fertility'],africa['female literacy'], color='red', size=10, alpha=0.8)

# Specify the name of the file
output_file('fert_lit_separate_colors.html')

# Display the plot
show(p)


## Lines

We can draw lines on Bokeh plots with the line() glyph function

In [20]:
df=pd.read_csv('aapl.csv')

In [21]:
df

Unnamed: 0.1,Unnamed: 0,adj_close,close,date,high,low,open,volume
0,0,31.68,130.31,2000-03-01,132.06,118.50,118.56,38478000
1,1,29.66,122.00,2000-03-02,127.94,120.69,127.00,11136800
2,2,31.12,128.00,2000-03-03,128.23,120.00,124.87,11565200
3,3,30.56,125.69,2000-03-06,129.13,125.00,126.00,7520000
4,4,29.87,122.87,2000-03-07,127.44,121.12,126.44,9767600
5,5,29.66,122.00,2000-03-08,123.94,118.56,122.87,9690800
6,6,29.72,122.25,2000-03-09,125.00,118.25,120.87,9884400
7,7,30.57,125.75,2000-03-10,127.94,121.00,121.69,8900800
8,8,29.50,121.31,2000-03-13,126.50,119.50,122.12,10864400
9,9,27.78,114.25,2000-03-14,124.25,114.00,121.22,15321200


In [23]:
# Import figure from bokeh.plotting
from bokeh.plotting import figure

# Create a figure with x_axis_type="datetime": p
p = figure(x_axis_type='datetime',x_axis_label='Date', y_axis_label='US Dollars')

# Plot date along the x axis and price along the y axis
p.line(df['date'],df['close'])

# Specify the name of the output file and show the result
output_file('line.html')
show(p)


## Lines and Markers

In [25]:
# Import figure from bokeh.plotting
from bokeh.plotting import figure

# Create a figure with x_axis_type='datetime': p
p = figure(x_axis_type='datetime',x_axis_label='Date', y_axis_label='US Dollars')

# Plot date along the x-axis and price along the y-axis
p.line(df['date'],df['close'])

# With date on the x-axis and price on the y-axis, add a white circle glyph of size 4
p.circle(df['date'],df['close'], fill_color='white', size=4)

# Specify the name of the output file and show the result
output_file('line.html')
show(p)

## Patches

In Bokeh, extended geometrical shapes can be plotted by using the patches() glyph function. The patches glyph takes as input a list-of-lists collection of numeric values specifying the vertices in x and y directions of each distinct patch to plot.

In [26]:
# Create a list of az_lons, co_lons, nm_lons and ut_lons: x
x = [2,3,4,1]

# Create a list of az_lats, co_lats, nm_lats and ut_lats: y
y = [1,2,3,4]

# Add patches to figure p with line_color=white for x and y

p.patches(x,y,line_color='white')
# Specify the name of the output file and show the result
output_file('four_corners.html')
show(p)

## Plotting data from NumPy arrays

We'll generate NumPy arrays using np.linspace() and np.cos() and plot them using the circle glyph.

np.linspace() is a function that returns an array of evenly spaced numbers over a specified interval. For example, np.linspace(0, 10, 5) returns an array of 5 evenly spaced samples calculated over the interval [0, 10]. 

np.cos(x) calculates the element-wise cosine of some array x.

In [27]:
# Import numpy as np
import numpy as np

# Create array using np.linspace: x
x =np.linspace(0,5,100)

# Create array using np.cos: y
y =np.cos(x)

# Add circles at x and y
p.circle(x,y)

# Specify the name of the output file and show the result
output_file('numpy.html')
show(p)

## Plotting data from Pandas DataFrames

In [29]:
# Import pandas as pd
import pandas as pd

# Read in the CSV file: df
df = pd.read_csv('auto-mpg.csv')

In [30]:
df

Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name,color,size
0,18.0,6,250.0,88,3139,14.5,71,US,ford mustang,blue,15.0
1,9.0,8,304.0,193,4732,18.5,70,US,hi 1200d,blue,20.0
2,36.1,4,91.0,60,1800,16.4,78,Asia,honda civic cvcc,red,10.0
3,18.5,6,250.0,98,3525,19.0,77,US,ford granada,blue,15.0
4,34.3,4,97.0,78,2188,15.8,80,Europe,audi 4000,green,10.0
5,32.9,4,119.0,100,2615,14.8,81,Asia,datsun 200sx,red,10.0
6,32.2,4,108.0,75,2265,15.2,80,Asia,toyota corolla,red,10.0
7,22.0,4,121.0,76,2511,18.0,72,Europe,volkswagen 411 (sw),green,10.0
8,15.0,8,302.0,130,4295,14.9,77,US,mercury cougar brougham,blue,20.0
9,17.0,8,302.0,140,3449,10.5,70,US,ford torino,blue,20.0


In [31]:
# Import figure from bokeh.plotting
from bokeh.plotting import figure

# Create the figure: p
p = figure(x_axis_label='HP',y_axis_label='MPG')

# Plot mpg vs hp by color
p.circle(df['hp'],df['mpg'],color=df['color'],size=10)

# Specify the name of the output file and show the result
output_file('auto-df.html')
show(p)


## The Bokeh ColumnDataSource

The ColumnDataSource is a table-like data object that maps string column names to sequences (columns) of data. It is the central and most common data structure in Bokeh.We can create a ColumnDataSource object directly from a Pandas DataFrame by passing the DataFrame to the class initializer. 

In [32]:
# Import the ColumnDataSource class from bokeh.plotting
from bokeh.plotting import ColumnDataSource
df=pd.read_csv('sprint.csv')
# Create a ColumnDataSource: source
source = ColumnDataSource(df)

# Add circle glyphs to the figure p
p.circle(x='Year', y='Time', color='color', size=8, source=source)

# Specify the name of the output file and show the result
output_file('sprint.html')
show(p)


In [33]:
source

In [34]:
df

Unnamed: 0,Name,Country,Medal,Time,Year,color
0,Usain Bolt,JAM,GOLD,9.63,2012,goldenrod
1,Yohan Blake,JAM,SILVER,9.75,2012,silver
2,Justin Gatlin,USA,BRONZE,9.79,2012,saddlebrown
3,Usain Bolt,JAM,GOLD,9.69,2008,goldenrod
4,Richard Thompson,TRI,SILVER,9.89,2008,silver
5,Walter Dix,USA,BRONZE,9.91,2008,saddlebrown
6,Justin Gatlin,USA,GOLD,9.85,2004,goldenrod
7,Francis Obikwelu,POR,SILVER,9.86,2004,silver
8,Maurice Greene,USA,BRONZE,9.87,2004,saddlebrown
9,Maurice Greene,USA,GOLD,9.87,2000,goldenrod


## Selection and non-selection glyphs

We're going to add the box_select tool to a figure and change the selected and non-selected circle glyph properties so that selected glyphs are red and non-selected glyphs are transparent blue.

We'll use the ColumnDataSource object of the Olympic Sprint.

In [35]:
# Create a figure with the "box_select" tool: p
p =figure(x_axis_label='Year',y_axis_label='Time',tools='box_select')

# Add circle glyphs to the figure p with the selected and non-selected properties
p.circle(x='Year',y='Time',selection_color='red',nonselection_alpha=0.1,source=source)

# Specify the name of the output file and show the result
output_file('selection_glyph.html')
show(p)


## Hover glyphs


We're going to plot the blood glucose levels for an unknown patient. The blood glucose levels were recorded every 5 minutes on October 7th starting at 3 minutes past midnight.

The date and time of each measurement are provided to us as x and the blood glucose levels in mg/dL are provided as y.

A bokeh figure is also provided in the workspace as p.

Our job is to add a circle glyph that will appear red when the mouse is hovered near the data points. We will also add a customized hover tool object to the plot.

In [36]:
import pandas as pd
df=pd.read_csv('glucose.csv')

In [37]:
df

Unnamed: 0,datetime,isig,glucose
0,2010-10-07 00:03:00,22.10,150
1,2010-10-07 00:08:00,21.46,152
2,2010-10-07 00:13:00,21.06,149
3,2010-10-07 00:18:00,20.96,147
4,2010-10-07 00:23:00,21.52,148
5,2010-10-07 00:28:00,22.04,150
6,2010-10-07 00:33:00,21.94,152
7,2010-10-07 00:38:00,21.82,152
8,2010-10-07 00:43:00,21.70,152
9,2010-10-07 00:48:00,21.64,151


In [40]:
# import the HoverTool
from bokeh.models import HoverTool
time=[]
for i in range(0,288):
    time.append(i)
# Add circle glyphs to figure p
p.circle(time,df['glucose'], size=10,
         fill_color='grey', alpha=0.1, line_color=None,
         hover_fill_color='firebrick', hover_alpha=0.5,
         hover_line_color='white')

# Create a HoverTool: hover
hover = HoverTool(tooltips=None,mode='vline')

# Add the hover tool to the figure p
p.add_tools(hover)

# Specify the name of the output file and show the result
output_file('hover_glyph.html')
show(p)


## Colormapping

The final glyph customization we'll be using is the CategoricalColorMapper to color each glyph by a categorical property.

Here, we're going to use the automobile dataset to plot miles-per-gallon vs weight and color each circle glyph by the region where the automobile was manufactured.

The origin column will be used in the ColorMapper to color automobiles manufactured in the US as blue, Europe as red and Asia as green.

In [42]:
# Import pandas as pd
import pandas as pd

# Read in the CSV file: df
df = pd.read_csv('auto-mpg.csv')
#Import CategoricalColorMapper from bokeh.models
from bokeh.models import CategoricalColorMapper

# Convert df to a ColumnDataSource: source
source = ColumnDataSource(df)

# Make a CategoricalColorMapper object: color_mapper
color_mapper = CategoricalColorMapper(factors=['Europe', 'Asia', 'US'],
                                      palette=['red', 'green', 'blue'])

# Add a circle glyph to the figure p
p.circle('weight', 'mpg', source=source,
            color=dict(field='origin', transform=color_mapper),
            legend='origin')

# Specify the name of the output file and show the result
output_file('colormap.html')
show(p)


## Time Series Data 

In [36]:

# Import figure from bokeh.plotting
from bokeh.plotting import figure
# Import output_file and show from bokeh.io
from bokeh.io import output_file,show
# Create a figure with x_axis_type="datetime": p
p = figure(x_axis_type='datetime',x_axis_label='Date', y_axis_label='DST')

# Plot date along the x axis and price along the y axis
p.line(df['2017'].index,df.loc['2017']['DST Index'])

# Specify the name of the output file and show the result
output_file('line_DST.html')
show(p)


In [35]:
import pandas as pd
df=pd.read_csv('DST(Time-Series Format).csv')
df['Datetime'] = pd.to_datetime(df['Datetime'])
df.index=df['Datetime']

In [31]:
df.loc['2017']

Unnamed: 0_level_0,Datetime,DST Index
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-01-01 00:00:00,2017-01-01 00:00:00,-12.0
2017-01-01 01:00:00,2017-01-01 01:00:00,-13.0
2017-01-01 02:00:00,2017-01-01 02:00:00,-16.0
2017-01-01 03:00:00,2017-01-01 03:00:00,-23.0
2017-01-01 04:00:00,2017-01-01 04:00:00,-29.0
2017-01-01 05:00:00,2017-01-01 05:00:00,-30.0
2017-01-01 06:00:00,2017-01-01 06:00:00,-28.0
2017-01-01 07:00:00,2017-01-01 07:00:00,-20.0
2017-01-01 08:00:00,2017-01-01 08:00:00,-18.0
2017-01-01 09:00:00,2017-01-01 09:00:00,-16.0
