# Data Visualization with Bokeh

*Bokeh* is a **data visualization** library for Python. Unlike Matplotlib and Seaborn, they are also Python packages for data visualization, Bokeh renders its plots using **HTML and JavaScript**. Hence, it proves to be extremely useful for developing web based dashboards.

It has the following features: 
1. **Flexibility:** Used in common plotting requirements as well as custom and complex use-cases.

2. **Productivity:** Can easily interact with other popular Pydata tools such as Pandas and Jupyter notebook.

3. **Interactivity:** Non-static interactive plots that change when the user interacts with them. 

4. **Powerful:** By adding custom JavaScript, it is possible to generate visualizations for specialised use-cases.

5. **Sharable:** Plots can be embedded in output of Flask or Django enabled web applications. They can also be rendered in Jupyter
notebooks.

6. **Open source:** Bokeh is an open source project. It is distributed under Berkeley Source Distribution (BSD) license.

### Installation

In [1]:
import pandas as pd

In [2]:
#conda install bokeh
import bokeh
bokeh.__version__

'2.3.3'

In [3]:
#Download bokeh sample data
import bokeh.sampledata
bokeh.sampledata.download()

Using data directory: C:\Users\Dell\.bokeh\data
Skipping 'CGM.csv' (checksum match)
Skipping 'US_Counties.zip' (checksum match)
Skipping 'us_cities.json' (checksum match)
Skipping 'unemployment09.csv' (checksum match)
Skipping 'AAPL.csv' (checksum match)
Skipping 'FB.csv' (checksum match)
Skipping 'GOOG.csv' (checksum match)
Skipping 'IBM.csv' (checksum match)
Skipping 'MSFT.csv' (checksum match)
Skipping 'WPP2012_SA_DB03_POPULATION_QUINQUENNIAL.zip' (checksum match)
Skipping 'gapminder_fertility.csv' (checksum match)
Skipping 'gapminder_population.csv' (checksum match)
Skipping 'gapminder_life_expectancy.csv' (checksum match)
Skipping 'gapminder_regions.csv' (checksum match)
Skipping 'world_cities.zip' (checksum match)
Skipping 'airports.json' (checksum match)
Skipping 'movies.db.zip' (checksum match)
Skipping 'airports.csv' (checksum match)
Skipping 'routes.csv' (checksum match)
Skipping 'haarcascade_frontalface_default.xml' (checksum match)


## Line Chart

In [4]:
from bokeh.plotting import figure, show, output_notebook

In [24]:
x = [1,3,4,7,8,9]
y = [5,8,3,6,7,1]

In [25]:
# create new plot with a titile and axis labels
fig = figure(title="Line chart", x_axis_label='x',y_axis_label = 'y')

# add a line with legend and line thickness
fig.line(x,y,line_color='red',legend_label="Temp",line_width=4)
output_notebook()         

#show the figure
show(fig)

In [7]:
import numpy as np
import math
x = np.arange(0, math.pi*2, 0.05)
y = np.sin(x)
output_notebook()
p = figure(title = "sine wave example", x_axis_label = 'x', y_axis_label = 'y')
p.line(x, y, legend_label = "sine", line_width = 2)
show(p)

### Combining multiple graphs

In [8]:
# Data
x = [1,3,4,6,7,9,10,12,15]
y1 =[1,3,4,7,9,6,4,6,11]
y2 =[2,5,6,4,7,1,5,9,3]
y3 =[3,4,5,6,7,9,10,14,15]
# Create new plot with title and axis labels
output_notebook()
fig = figure(title="Multiple line", x_axis_label='X', y_axis_label='Y')

# add multiple lines
fig.line(x,y1, legend_label="Distance", line_color="red", line_width=2)
fig.line(x,y2, legend_label="Tempreture", line_color="blue", line_width=3)
fig.line(x,y3, legend_label="speed", line_color="green", line_width=5)

#show the fig
show(fig)

### Rendering circles

In [9]:
# Data
x = [1,3,4,6,7,9,10,12,15]
y1 =[1,3,4,7,9,6,4,6,11]
y2 =[2,5,6,4,7,1,5,9,3]
y3 =[3,4,5,6,7,9,10,14,15]

output_notebook()
fig = figure(title="Multiple line", x_axis_label='X', y_axis_label='Y')

fig.line(x,y1, legend_label="Distance", line_color="green", line_width=2)
fig.line(x,y2, legend_label="Tempreture", line_color="blue", line_width=3)
fig.circle(x,y3, legend_label='Objects',color='red', line_color='yellow',line_width=3,size=12)
show(fig)

## Bar Plot

In [10]:
output_notebook()
fig = figure(title = 'Bar Plot', x_axis_label = "X", y_axis_label = "Y",plot_width=750,plot_height =350)

fig.vbar(x, width = 0.5,top=y1, bottom=0, color = 'red' )

show(fig)

In [11]:
output_notebook()
fig = figure(title = 'verticle Bar Plot',x_axis_label='X',y_axis_label='Y', plot_width=1500,plot_height=750)

fig.hbar(x, height = 0.5,right=y3, left=0, color='blue')

show(fig)

#### Combining multiple graphs

In [12]:
x = [1,2,3,4,5,6,7,8,9,10]
y = [4,7,6,10,8,15,13,20,19,30]

output_notebook()
fig = fig = figure(title = 'Combining Multiple Graphs', x_axis_label = "X", y_axis_label = "Y",plot_width=700,plot_height =400)

fig.vbar(x,top=y,width=0.3,bottom=0,color='blue')
fig.circle(x,y, color = 'black',size=12)
fig.line(x,y, color = 'red' , line_width=5)

show(fig)

In [13]:
x = [1, 2, 3, 4, 5]
y = [4, 5, 5, 7, 2]

output_notebook()
fig = figure(plot_width=700,plot_height =400)

fig.circle(x,y, color='pink',line_color='black',line_width=3,size=80)

show(fig)

### Patch Plot
A plot which shades a region of space in a specific color to show a region or a group having similar properties is termed as a patch plot in Bokeh.

`patch()` adds patch glyph to given figure using parameters **x and y**.

In [14]:
fig = figure(plot_width=200, plot_height = 200)
fig.patch(x=[1,5,2,4],y=[2,3,5,7],color='cyan')
#fig.circle(x=[1,5,2,4],y=[2,3,5,7],color='black')
output_notebook()
show(fig)

`patches()` is used to draw multiple polygonal patches with parameters **xs and ys**.

In [15]:
xs = [[5,3,4], [2,4,3], [2,3,5,4]]
ys = [[6,4,2], [3,6,7], [2,4,7,8]]
fig = figure()
fig.patches(xs,ys, fill_color=['pink','green','blue'], line_color='white')
output_notebook()
show(fig)

### Scatter Markers

Scatter plots are used to determine the bi-variate relationship between two variables. Scatter plot is obtained by calling `scatter()` method of Figure object. It uses parameters **x, y, size, marker and color.**

Following marker type constants are defined in Bokeh: −

    Asterisk
    Circle
    CircleCross
    CircleX
    Cross
    Dash
    Diamond
    DiamondCross
    Hex
    InvertedTriangle
    Square
    SquareCross
    SquareX
    Triangle
    X

In [16]:
fig = figure(title="Scatter Markers Graph")
fig.scatter([1, 4, 3, 2, 5], [6, 5, 2, 4, 7], marker = "hex", size = 20, fill_color = "blue")
output_notebook
show(fig)

`varea()` Output of the varea() method is a vertical directed area that has one x coordinate array, and two y coordinate arrays, y1 and y2, which will be filled between.

In [17]:
fig = figure(title="Vertical Directed Area Graph")
x = [1, 2, 3, 4, 5]
y1 = [2, 6, 4, 3, 5]
y2 = [1, 4, 2, 2, 3]
fig.varea(x = x,y1 = y1,y2 = y2)
output_notebook
show(fig)

The `harea()` method on the other hand needs x1, x2 and y parameters.

In [18]:
fig = figure()
y = [1, 2, 3, 4, 5]
x1 = [2, 6, 4, 3, 5]
x2 = [1, 4, 2, 2, 3]
fig.harea(x1 = x1,x2 = x2,y = y)
output_notebook
show(fig)

`circle()`
The circle() method adds a circle glyph to the figure and needs x and y coordinates of its center. It can be configured with the help of parameters such as fill_color, line-color, line_width etc.
circle_cross()

The `circle_cross()` method adds circle glyph with a ‘+’ cross through the center.

The `circle_x()` method adds circle with an ‘X’ cross through the center.

In [19]:
plot = figure(plot_width = 300, plot_height = 300)
plot.circle(x = [1, 2, 3], y = [3,7,5], size = 20, fill_color = 'red')
plot.circle_cross(x = [2,4,6], y = [5,8,9], size = 20, fill_color = 'blue',fill_alpha = 0.2, line_width = 2)
plot.circle_x(x = [5,7,2], y = [2,4,9], size = 20, fill_color = 'green',fill_alpha = 0.6, line_width = 2)
show(plot)

The` rect() `method of Figure class adds a rectangle glyph based on x and y coordinates of center, width and height. The `square() `method has size parameter to decide dimensions.

The `ellipse()` and `oval()` methods adds an ellipse and oval glyph. They use similar signature to that of `rect()` having x, y,w and h parameters. The angle parameter determines rotation from horizontal.

In [20]:
fig = figure(plot_width = 300, plot_height = 300)
fig.ellipse(x = 7,y = 6, width = 30, height = 10, fill_color = 'red' , line_width = 2)
fig.oval(x = 6,y = 6,width = 2, height = 1, angle = -0.4, fill_color = 'pink')
fig.rect(x = 10,y = 10,width = 100, height = 50, width_units = 'screen', height_units = 'screen')
fig.square(x = 2,y = 3,size = 80, color = 'cyan')
show(fig)



The `arc()` method draws a simple line arc based on x and y coordinates, start and end angles and radius. Angles are given in radians whereas radius may be in screen units or data units. The wedge is a filled arc.

The `wedge()` method has same properties as arc() method. Both methods have provision of optional direction property which may be clock or anticlock that determines the direction of arc/wedge rendering. The `annular_wedge()` function renders a filled area between to arcs of inner and outer radius.

In [21]:
import math
fig = figure(plot_width = 300, plot_height = 300)
fig.arc(x = 3, y = 3, radius = 50, radius_units = 'screen', start_angle = 0.0, end_angle = math.pi/2,color='black')
fig.wedge(x = 3, y = 3, radius = 30, radius_units = 'screen',
start_angle = 0, end_angle = math.pi, direction = 'clock', color="pink")
fig.annular_wedge(x = 3,y = 3, inner_radius = 100, outer_radius = 75,outer_radius_units = 'screen',
inner_radius_units = 'screen',start_angle = 0.4, end_angle = 4.5,color = "red", alpha = 0.6)
show(fig)

`beizer()` adds a Bézier curve/parametric curve to the figure object. Other uses include the design of computer fonts and animation, user interface design and for smoothing cursor trajectory. It has parameters 

In [22]:
x = 2
y = 4
xp02 = x+0.4
xp01 = x+0.1
xm01 = x-0.1
yp01 = y+0.2
ym01 = y-0.2
fig = figure(title="Beizer Graph",plot_width = 300, plot_height = 300)
fig.bezier(x0 = x, y0 = y, x1 = xp02, y1 = y, cx0 = xp01, cy0 = yp01,
cx1 = xm01, cy1 = ym01, line_color = "red", line_width = 2)
show(fig)

`quadratic()` adds a parabola glyph and has the same parameters as `beizer()`, except cx0 and cx1.

In [34]:
graph = figure(title = "Bokeh Quadratic Graph",plot_width = 300, plot_height = 300) 
x0 = 1
y0 = 4
x1 = 5
y1 = 4
cx = 3
cy = 5 
graph.quadratic(x0, y0, 
                x1, y1, 
                cx, cy, line_color="blue", line_width=3)
output_notebook
show(graph) 

### Categorical Axes
In this the plots show numerical data along both x and y axes. In order to use categorical data along either of axes, we need to specify a FactorRange to specify categorical dimensions for one of them.

In the following example, we used a list of strings for x-axis to mark grades for students.

In [38]:
langs = ['A1', 'A2', 'B1', 'B2', 'C1']
students = [23,13,32,21,12]
fig = figure(x_range = langs, plot_width = 300, plot_height = 300, title="Categorical Axes")
fig.vbar(x = langs, top=students, width = 0.5, color = ['red','yellow','black','navy','orange'])
show(fig)

To render a vertical (or horizontal) stacked bar using `vbar_stack()` or `hbar_stack()` function, we set stackers property to list of fields to stack successively and source property to a dict object containing values corresponding to each field.

In following example, sales is a dictionary showing sales figures of three products in three years.

In [39]:
products = ['computer','mobile','printer']
months = ['October','November','December']
sales = {'products':products,
   'October':[10,40,5],
   'November':[8,0,10],
   'December':[25,60,22]}

In [42]:
pd.DataFrame(sales)

Unnamed: 0,products,October,November,December
0,computer,10,8,25
1,mobile,40,0,60
2,printer,5,10,22


In [54]:
cols = ['red','green','blue']#,'navy', 'cyan']
fig = figure(x_range= products, plot_width =300, plot_height=300, title="Verticle Stacking in Categorical Axes")
fig.vbar_stack(months,x = 'products', source = sales, color=cols, width = 0.5)
output_notebook()
show(fig)

A grouped bar plot is obtained by specifying a visual displacement for the bars with `dodge()` function in bokeh.transform module.

The `dodge()` function introduces a relative offset for each bar plot thereby achieving a visual impression of group. Seperating `vbar()` glyph  by an offset of 0.25 for each group of bars for a particular year.

In [56]:
from bokeh.transform import dodge
products = ['computer','mobile','printer']
months = ['Oct','Nov','Dec']
sales = {'products':products,
   'Jan':[10,40,5],
   'Feb':[8,45,10],
   'Mar':[25,60,22]}
fig = figure(x_range = products, plot_width = 300, plot_height = 300)
fig.vbar(x = dodge('products', -0.25, range = fig.x_range), top = 'Jan',
   width = 0.2,source = sales, color = "red")
fig.vbar(x = dodge('products', 0.0, range = fig.x_range), top = 'Feb',
   width = 0.2, source = sales,color = "green")
fig.vbar(x = dodge('products', 0.25, range = fig.x_range), top = 'Mar',
   width = 0.2,source = sales,color = "blue")
show(fig)

### Log Scale Axes 	

1. If there exists a power law relationship between x and y data series, it is desirable to use log scales on both axes.

2. Bokeh.plotting API's `figure()` function accepts x_axis_type and y_axis_type as arguments which may be specified as log axis by passing "log" for the value of either of these parameters.

In [57]:
from bokeh.plotting import figure, output_notebook, show
x = [0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0]
y = [10**i for i in x]
fig = figure(title = 'Linear scale example',plot_width = 400, plot_height = 400)
fig.line(x, y, line_width = 2)
show(fig)

In [58]:
x = [0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0]
y = [10**i for i in x]
fig = figure(title = 'Linear scale example',plot_width = 400, plot_height = 400, y_axis_type = "log")
fig.line(x, y, line_width = 2)
show(fig)

### Twin Axes
The figure object can be so configured by defining extra_x_range and extra_y_range properties. Displaying a sine curve and a straight line in same plot. Both glyphs have y axes with different ranges.

In [62]:
from numpy import pi, arange, sin, linspace
x = arange(-2*pi, 2*pi, 0.1)
y = sin(x)
y2 = linspace(0, 100, len(y))
from bokeh.plotting import output_file, figure, show
from bokeh.models import LinearAxis, Range1d
fig = figure(title='Twin Axis Example', y_range = (-1.1, 1.1))
fig.line(x, y, color = "red")
fig.extra_y_ranges = {"y2": Range1d(start = 0, end = 100)}
fig.add_layout(LinearAxis(y_range_name = "y2"), 'right')
fig.line(x, y2, color = "blue", y_range_name = "y2")
show(fig)

### Annotations
Annotations are pieces of explanatory text added to the diagram. Bokeh plot can be annotated by way of specifying plot title, labels for x and y axes as well as inserting text labels anywhere in the plot area.


In [64]:
x = np.arange(0, math.pi*2, 0.05)
y = np.sin(x)
fig = figure(title = "sine wave example", x_axis_label = 'angle', y_axis_label = 'sin')
fig.line(x, y,line_width = 2)
show(fig)

In [70]:
fig.title.text = "sine wave example"
fig.xaxis.axis_label = 'angle'
fig.yaxis.axis_label = 'sin'
fig.title.align = "center"
fig.title.text_color = "red"
fig.title.text_font_size = "25px"
fig.title.background_fill_color = "pink"
show(fig)

###Legends
We use `legend` property of any glyph method to add legends.

In [72]:
x = np.arange(0, math.pi*2, 0.05)
fig = figure()
fig.line(x, np.sin(x),line_width = 2, line_color = 'navy', legend = 'sine')
fig.circle(x,np.cos(x), line_width = 2, line_color = 'orange', legend = 'cosine')
fig.square(x,-np.sin(x),line_width = 2, line_color = 'grey', legend = '-sine')
show(fig)



### ColumnDataSource
A `ColumnDatasource` can be considered as a mapping between column name and list of data. A Python dict object with one or more string keys and lists or numpy arrays as values is passed to ColumnDataSource constructor.

In [74]:
from bokeh.models import ColumnDataSource
data = {'x':[1, 4, 3, 2, 5],
   'y':[6, 5, 2, 4, 7]}
cds = ColumnDataSource(data = data)
fig = figure()
fig.scatter(x = 'x', y = 'y',source = cds, marker = "circle", size = 20, fill_color = "lime")
show(fig)

### Filtering Data
The resultant figure shows a line glyph between x and y data series of the ColumnDataSource. A view object is obtained by applying index filter over it. The view is used to plot circle glyph as a result of `IndexFilter`.

In [80]:
from bokeh.models import ColumnDataSource, CDSView, IndexFilter
from bokeh.plotting import figure, output_notebook, show
source = ColumnDataSource(data = dict(x = list(range(1,11)), y = list(range(2,22,2))))
view = CDSView(source=source, filters = [IndexFilter([0, 2, 4,6])])
fig = figure(title = 'Line Plot example', x_axis_label = 'x', y_axis_label = 'y')
fig.circle(x = "x", y = "y", size = 10, source = source, view = view, legend = 'filtered')
fig.line(source.data['x'],source.data['y'], legend = 'unfiltered')
show(fig)



`CDSView` object is obtained by applying `BooleanFilter` over the given data source

In [81]:
from bokeh.models import ColumnDataSource, CDSView, BooleanFilter
from bokeh.plotting import figure, show
from bokeh.sampledata.unemployment1948 import data
source = ColumnDataSource(data)
booleans = [True if int(year) >= 1980 else False for year in
source.data['Year']]
print (booleans)
view1 = CDSView(source = source, filters=[BooleanFilter(booleans)])
p = figure(title = "Unemployment data", x_range = (1980,2020), x_axis_label = 'Year', y_axis_label='Percentage')
p.line(x = 'Year', y = 'Annual', source = source, view = view1, color = 'red', line_width = 2)
show(p)

[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]


ERROR:bokeh.core.validation.check:E-1024 (CDSVIEW_FILTERS_WITH_CONNECTED): CDSView filters are not compatible with glyphs with connected topology such as Line or Patch: GlyphRenderer(id='24616', ...)


### Layouts
The `column()` function is defined in bokeh.layouts module. The Column layout displays plot figures vertically.

In [97]:
from bokeh.layouts import column

x = np.arange(0, math.pi*2, 0.05)
y1 = np.sin(x)
y2 = np.cos(x)

fig1 = figure(plot_width = 200, plot_height = 200)
fig1.line(x, y1,line_width = 2, line_color = 'blue')

fig2 = figure(plot_width = 200, plot_height = 200)
fig2.line(x, y2,line_width = 2, line_color = 'red')

c = column(children = [fig1, fig2], sizing_mode = 'stretch_both')
show(c)

1. children − List of plots and/or widgets.

2. sizing_mode − determines how items in the layout resize. Possible values are "fixed", "stretch_both", "scale_width", "scale_height", "scale_both". Default is “fixed”.

Row layout arranges plots horizontally, for which `row() `function as defined in `bokeh.layouts` module is used. It also uses the same arguments as` children`, `sizing_mode`.

In [99]:
from bokeh.layouts import row
x = np.arange(0, math.pi*2, 0.05)
y1 = np.sin(x)
y2 = np.cos(x)

fig1 = figure(plot_width = 200, plot_height = 200)
fig1.line(x, y1,line_width = 2, line_color = 'blue')

fig2 = figure(plot_width = 200, plot_height = 200)
fig2.line(x, y2,line_width = 2, line_color = 'red')

r = row(children = [fig1, fig2], sizing_mode = 'stretch_both')
show(r)

The `gridplot()` function in bokeh.layouts module returns a grid and a single unified toolbar which may be positioned with the help of toolbar_location property.
The` grid()` function uses children and sizing_mode parameters where children is a list of lists.

In [100]:
from bokeh.layouts import gridplot

x = list(range(1,11))

y1 = x
y2 =[11-i for i in x]
y3 = [i*i for i in x]
y4 = [math.log10(i) for i in x]

fig1 = figure(plot_width = 200, plot_height = 200)
fig1.line(x, y1,line_width = 2, line_color = 'blue')

fig2 = figure(plot_width = 200, plot_height = 200)
fig2.circle(x, y2,size = 10, color = 'green')

fig3 = figure(plot_width = 200, plot_height = 200)
fig3.circle(x,y3, size = 10, color = 'grey')

fig4 = figure(plot_width = 200, plot_height = 200, y_axis_type = 'log')
fig4.line(x,y4, line_width = 2, line_color = 'red')

grid = gridplot(children = [[fig1, fig2], [fig3,fig4]], sizing_mode = 'stretch_both')
show(grid)

In [101]:
x2 = list(range(1,11))
y4 = [math.pow(i,2) for i in x2]
y2 = [math.log10(pow(10,i)) for i in x2]
fig = figure(y_axis_type = 'log')
fig.circle(x2, y2,size = 5, color = 'blue', legend = 'blue circle')
fig.line(x2,y4, line_width = 2, line_color = 'red', legend = 'red line')
fig.legend.location = 'top_left'
fig.legend.title = 'Legend Title'
fig.legend.title_text_font = 'Arial'
fig.legend.title_text_font_size = '20pt'
show(fig)

