We are going to learn basics of bokeh library. Bokeh is interactive visualization library.

cr: https://www.kaggle.com/kanncaa1/visualization-bokeh-tutorial-part-1/notebook

List of materials:

- Basic Data Exploration with Pandas

- Explanation of Bokeh Packages

- Plotting with Glyphs

- Additional Glyps

- Data Formats

- Customizing Glyphs

- Layouts

- Linking Plots

In [1]:
import numpy as np
import pandas as pd

In [2]:
data = pd.read_csv("vgsales.csv")
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          16598 non-null  int64  
 1   Name          16598 non-null  object 
 2   Platform      16598 non-null  object 
 3   Year          16327 non-null  float64
 4   Genre         16598 non-null  object 
 5   Publisher     16540 non-null  object 
 6   NA_Sales      16598 non-null  float64
 7   EU_Sales      16598 non-null  float64
 8   JP_Sales      16598 non-null  float64
 9   Other_Sales   16598 non-null  float64
 10  Global_Sales  16598 non-null  float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.4+ MB


As you can see from info method. There are 16598.

However, Year has 16327 entries. That means Year has NAN value.

Also Year should be integer but it is given as float. Therefore we will convert it.

In addition, publisher has NAN values.

In [3]:
# Lets start with dropping nan values
data.dropna(how="any",inplace = True)
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 16291 entries, 0 to 16597
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          16291 non-null  int64  
 1   Name          16291 non-null  object 
 2   Platform      16291 non-null  object 
 3   Year          16291 non-null  float64
 4   Genre         16291 non-null  object 
 5   Publisher     16291 non-null  object 
 6   NA_Sales      16291 non-null  float64
 7   EU_Sales      16291 non-null  float64
 8   JP_Sales      16291 non-null  float64
 9   Other_Sales   16291 non-null  float64
 10  Global_Sales  16291 non-null  float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.5+ MB


In [4]:
# Then convert data from float to int
data.Year = data.Year.astype(int)
data.head()     # head method always gives you overview of data.

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


##Explanation of Bokeh Packages
For bokeh library, we will use some packages:

- output_file: that save our figure with .html extension
- show: show the figure
- figure: creates empty figure
- ColumnarDataSource: Data source of bokeh
- HoverTool: like cursor
- CategoricalColorMapper: Like a hue in seaborn. If you do not know it look at this seaborn tutorial
https://www.kaggle.com/kanncaa1/seaborn-for-beginners
- Row and column: puts plots in row order or column order in figure
- gridplot
- Tabs and Panel: Panel is figure for each plot and tab is like button

In [5]:
# bokeh packages
from bokeh.io import output_file,show,output_notebook,push_notebook
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource,HoverTool,CategoricalColorMapper
from bokeh.layouts import row,column,gridplot
from bokeh.models.widgets import Tabs,Panel
output_notebook()

##Plotting with Glyphs
- Glyphs: visual shapes like circle, square, rectangle or diamond
- figure: creates figure
  - x_axis_label: label of x axis
  - y_axis_label: label of y axis
  - tools: tools to move or zoom plot
    - pan: slides the plot
    - box_zoom: zoom in
- circle: like scatter in matplotlib
  - size: size of circles
  - color: color
- alpha: opacity
- output_file: that save our figure with .html extension
- show: show the figure

In [7]:
plot = figure(x_axis_label = "x",y_axis_label = "y",tools = "pan,box_zoom")
plot.circle(x=[5,4,3,2,1],y=[1,2,3,4,5],size = 10,color = "black",alpha = 0.7)
output_file("my_first_bokeh_plot.html")
show(plot)

In [8]:
# There are other types of glyphs
plot = figure()
plot.diamond(x=[5,4,3,2,1],y=[1,2,3,4,5],size = 10,color = "black",alpha = 0.7)
plot.cross(x=[1,2,3,4,5],y=[1,2,3,4,5],size = 10,color = "red",alpha = 0.7)
show(plot)

## Additional Glyps
- line: line plot
  - line_width: width of line
  - fill_color: filling inside of circle with color
- patches: multiple polynomial shapes at once on a plot
  - fill_color: filling inside of patches
  - line_color: color of line around patches

In [9]:
# line
plot = figure()
plot.line(x=[1,2,3,4,5,6,7],y = [1,2,3,4,5,5,5],line_width = 2)
plot.circle(x=[1,2,3,4,5,6,7],y = [1,2,3,4,5,5,5],fill_color = "white",size = 10)
show(plot)

In [10]:
# patches
plot = figure()
plot.patches(xs = [[1,1,2,2],[2,2,3,3]],ys = [[1,2,1,2],[1,2,1,2]],fill_color = ["purple","red"],line_color = ["black","black"])
#show(plot)

In [17]:
# Lets use source in a basic example
# As you know from info() method we have Year and  Global_Sales columns
# Lets plot it to learn how to use ColumnDataSource
source = ColumnDataSource(data)
plot = figure()
plot.circle(x="Year",y="Global_Sales",source = source)
show(plot)
# If you remember our column names are "Year" and "Global_Sales" in pandas data frame.
# Nothing change when we convert pandas data frame to source.
# You can think source is like pandas data frame at this point. Only for now :)