![](https://www.fullstackpython.com/img/logos/bokeh.jpg)

## Introduction

**Data visualization** is an interdisciplinary field that deals with the graphic representation of data [wiki]. Data visualization can be employed at anytime during a lifecycle of data science project, especially at the beginning of the project to explore the data at hand (to suppliment the EDA); and at the end of the project when we want to communicate the findings of our work to the stakeholders of the project. So, understandably it is a vital tool to have in a data scientists/analysts skill sets. Depending upon the programming language preference and test, one can choose from several visualization tools such as matplotlib, seaborn, plolty, **bokeh**, ggplot2, and so on. In the past I have used the first three tools from the aforemensioned options. Today, using this notebook, I will explore bokeh's python data visualization library with the help of **holoviews** when needed.

**What is Bokeh ?**
> Bokeh is a Python library for creating interactive visualizations for modern web browsers. It helps you build beautiful graphics, ranging from simple plots to complex dashboards with streaming datasets. With Bokeh, you can create JavaScript-powered visualizations without writing any JavaScript yourself [bokeh's page].

**Waht is HoloViews ?** 
> **HoloViews** is an open-source Python library designed to make data analysis and visualization seamless and simple. With HoloViews, you can usually express what you want to do in very few lines of code, letting you focus on what you are trying to explore and convey, not on the process of plotting [holoviews'page].

**Objective**: 
> The **goal** of this notebook is to explore the bokeh python library for data vizualizations and at the same time **customize** the plots to make them look as good as possible while conveying the message as effectively as possible.

**Note**: This notebook is not an EDA on standalone dataset, it is an exploration of a visualization tool; hence I may use several datasets for demonstration purposes.

<a id="top"></a>

<h3 class="list-group-item list-group-item-action active" data-toggle="list" role="tab" aria-controls="home">Table of Contents</h3>


* [0. Plots overview](#0)
* [1. Basic plots](#1)
    * [1.1 Scatter plots](#1.1)
    * [1.2 Line plot](#1.2)
    * [1.3 Bar graphs](#1.3)
    * [1.4 Histograms](#1.4)
    * [1.5 Pie charts](#1.5)
    * [1.6 Box plots](#1.6)
    * [1.7 Violin plots](#1.7)
* [2. Area plots](#2)
* [3. Heatmap](#3)
* [4. Jitter Scatter (categorical)](#4)
* [5. Kde plots](#5)
* [6. Subplots](#6)
* [7. Range tools (useful for time series data)](#7)
* [8. Pairplots (grouped grids)](#8)
* [9. Reference](#8)



## Install bokeh

In [None]:
!pip install bokeh

## Import libraries

In [None]:
# all library may or may not be used 

import numpy as np
import pandas as pd 
from scipy.stats.kde import gaussian_kde

import colorcet as cc
from bokeh.models import BoxAnnotation, RangeTool
from bokeh.models import ColumnDataSource, FixedTicker, PrintfTickFormatter
from bokeh.io import output_file, show, output_notebook, curdoc
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, LabelSet, HoverTool, Range1d, Label
from bokeh.palettes import GnBu3, OrRd3, Category20c
from bokeh.transform import cumsum
from bokeh.transform import jitter
from bokeh.layouts import column
from holoviews.operation import gridmatrix
output_notebook()

import holoviews as hv
from holoviews import opts, dim
hv.extension('bokeh')

theme_list = ['caliber', 'dark_minimal', 'light_minimal', "night_sky", "contrast"]
import warnings
warnings.filterwarnings('ignore')

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

## Load data

In [None]:
df_iris = pd.read_csv('/kaggle/input/iris/Iris.csv')
df_titanic = pd.read_csv('/kaggle/input/titanic/train.csv')
df_rain= pd.read_csv('/kaggle/input/weather-dataset-rattle-package/weatherAUS.csv')
df_study = pd.read_csv('/kaggle/input/students-performance-in-exams/StudentsPerformance.csv')
df_heart = pd.read_csv('/kaggle/input/heart-failure-clinical-data/heart_failure_clinical_records_dataset.csv')
df_sale = pd.read_csv('/kaggle/input/competitive-data-science-predict-future-sales/sales_train.csv')
df_covid = pd.read_csv('/kaggle/input/d/gpreda/covid-world-vaccination-progress/country_vaccinations.csv')
df_housing = pd.read_csv('/kaggle/input/house-prices-advanced-regression-techniques/train.csv')
df_ts = pd.read_csv('/kaggle/input/tabular-playground-series-jul-2021/train.csv')

<a id="0"></a>
<font color="skyblue" size=+2.5><b>0. Plots overview</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>
## Bokeh:

After downloading and importing the necessary packages, creating a bokeh plot is a two-step process: First, you select from Bokeh’s building blocks to create your visualization and second, you customize these building blocks to fit your needs. The steps follows like this:

* select theme : (optional)
* get/prepare `data`
* call `figure()` object
* add the `renderer(s)` or the `plot(s)`
* customize figure layout (optional)
* `show()` the figure

Themes: in Bokeh one can choose from the following themes, ['caliber', 'dark_minimal', 'light_minimal', "night_sky", "contrast"] + custom-made

## HoloViews:

Similarlly, creating a holoview plot is a two-step process: First, you select from holowies’s building blocks (`elements`) to create your visualization and second, you customize the elements to fit your needs. The steps follows like this:

* prepare `data`
* call the `element` with the neccessary arguments
* `customize` the element (optional)

Note: The customization in holoviews is not as rich as Bokeh's and it has no its own theme selection as well. However, you can call bokeh's theme via `hv.renderer('bokeh').theme`= 'you choice here'

## Providing Data

The basis for any data visualization is the underlying data. Below we will see the various ways to provide data to Bokeh, from passing data values directly to creating a `ColumnDataSource`

* **Python lists/arrays**: You can use standard Python lists of data to pass values directly into a plotting function. For example, you could use python lists to make a plot as follow.
        - x_values = [1, 2, 3, 4, 5] --> (x_valuse as python list)
        - y_values = [6, 7, 2, 3, 6] --> (y_valuse as python list)
        - fig = figure() --> (add the renderer)
        - fig.circle(x=x_values, y=y_values) --> (pass the lists to the renderer)
        

* **Numpy data**: Similarly to using Python lists and arrays, you can also work with NumPy data structures. Example below.
        - x = [1, 2, 3, 4, 5] --> (x as list)
        - y = np.random.standard_normal(5) --> (y as Numpy array)
        - fig = figure() --> (add the renderer)
        - fig.circle(x=x, y=y) --> (pass the list and the Numpy array to the renderer)
        
        
* **ColumnDataSource**: The `ColumnDataSource` is the core of most Bokeh plots. It provides the data to the glyphs of your plot.
When you pass sequences like Python lists or NumPy arrays to a Bokeh renderer, Bokeh automatically creates a ColumnDataSource with this data for you. However, creating a ColumnDataSource yourself gives you access to more advanced options. Let's see example.

        * creating ColumnDataSource: 
          data = {'x_values': [1, 2, 3, 4, 5],
                  'y_values': [6, 7, 2, 3, 6]}
          source = ColumnDataSource(data=data)
         
        * plottling using ColumnDataSource: 
          fig = figure()
          fig.circle(x='x_values', y='y_values', source=source)
          
* **Pandas dataframe**: The data parameter (in the above method, ColumnsDataSource) can also be a pandas DataFrame or GroupBy object. A simple example is given below.
        - source = ColumnDataSource(df) --> pass in the dataFrame to the ColumnsDataSource
        
 
 **Remark**: Please refer to the official webpage of the Bokeh library (see in the ref. section) to read more about providing data to Bokeh.

<a id="1"></a>
<font color="skyblue" size=+2.5><b>1. Basic plots</b></font>

<a id="1.1"></a>
<font color="skyblue" size=+2.5><b>1.1 Scatter plots</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>


In [None]:
# prepare your data to be plotted
# I used iris dataset for this example

Iris_setosa = df_iris[df_iris['Species'] =='Iris-setosa']
Iris_virginica = df_iris[df_iris['Species'] =='Iris-virginica']
Iris_versicolor = df_iris[df_iris['Species'] =='Iris-versicolor']


curdoc().theme = 'dark_minimal'

# create and define the figure object

fig = figure(title='Iris flowers: sepal length vs sepal width', 
             x_axis_label='Sepal Length [cm]', 
             y_axis_label='Sepal Width [cm]',          
             plot_width=750, plot_height=500,
             tools= 'hover',# set this to None if you do not want data to be displayed on hover
             toolbar_location="above", 
             toolbar_sticky=False)

# adding glyphs (scatter plots of your choice, i.e, circle, triangle, square)

fig.circle(x="SepalLengthCm",y="SepalWidthCm", 
         size=12, alpha=0.5, 
         color="#F78888", 
         legend_label='Setosa', 
         source=Iris_setosa),
fig.triangle(x="SepalLengthCm",y="SepalWidthCm", 
         size=12, alpha=0.99, 
         color="#F3D250", 
         legend_label='Virginica', 
         source=Iris_virginica),
fig.square(x="SepalLengthCm",y="SepalWidthCm", 
         size=12, alpha=0.6, 
         color="#3AAFA9", 
         legend_label='Versicolor', 
         source=Iris_versicolor),

# layout update

fig.title.text_font_size = '20pt'
fig.title.text_font_style = 'bold'
fig.title.text_font = 'Serif'
fig.xaxis.axis_label_text_font_size = "16pt"
fig.yaxis.axis_label_text_font_size = "16pt"
fig.legend.location = 'top_left'
fig.legend.background_fill_color = "skyblue"

show(fig)

In [None]:
# data to plot: rain in australia dataset
source = ColumnDataSource(df_rain)

# theme style
curdoc().theme = 'light_minimal'

# figure object

fig = figure(title='Rainfall vs Sunshine', 
             x_axis_label='Sunshine', 
             y_axis_label='Rainfall [mm]',          
             plot_width=750, plot_height=500, 
             toolbar_location="above",
             tools="",
             toolbar_sticky=False)

# the scatter plot
fig.circle(x='Sunshine',y='Rainfall', 
           source = source,
           size=5, alpha=0.5,
           color='#F78888',
           )
         
# figure layout update
fig.title.text_font_size = '20pt'
fig.title.text_font_style = 'bold'
fig.title.text_font = 'Serif'
fig.xaxis.axis_label_text_font_size = "16pt"
fig.yaxis.axis_label_text_font_size = "16pt"

show(fig)

<a id="1.2"></a>
<font color="skyblue" size=+2.5><b>1.2 Line plots</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>


In [None]:
curdoc().theme = 'night_sky'

ts=df_sale.groupby(["date_block_num"])["item_cnt_day"].sum()
ts2=df_sale.groupby(["date_block_num"])["item_price"].sum()

x=list(ts.index)
y=list(ts)
y2=list(ts2)

fig = figure(title='Sales In A Shop', 
             x_axis_label='Date block', 
             y_axis_label='Amount sold',          
             plot_width=750, plot_height=500, 
             toolbar_location="above",
             tools="hover",
             toolbar_sticky=False,
             #background_fill_color="#f4f0ec"
            )


#fig.line(x, y, line_color='red', line_width=3)
fig.line(x, y2,line_color='#5da2d5',line_width=3,line_dash="dashed")

# fig.add_layout(BoxAnnotation(left=10, fill_alpha=0.1, fill_color='yellow', line_color='red'))
# fig.add_layout(BoxAnnotation(right=25, fill_alpha=0.1, fill_color='yellow', line_color='red'))
        
# figure layout update
fig.title.text_font_size = '20pt'
fig.title.text_font_style = 'bold'
fig.title.text_font = 'Serif'
fig.xaxis.axis_label_text_font_size = "16pt"
fig.yaxis.axis_label_text_font_size = "16pt"
fig.y_range.range_padding = 0.2



show(fig)

<a id="1.3"></a>
<font color="skyblue" size=+2.5><b>1.3 Bar graphs</b></font>

<font color="skyblue" size=+1.5><b>1.3.1 Vertical bars</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>



In [None]:
# plot theme
curdoc().theme = 'caliber'

# data to plot
Iris_setosa = df_iris[df_iris['Species'] =='Iris-setosa']
Iris_virginica = df_iris[df_iris['Species'] =='Iris-virginica']
Iris_versicolor = df_iris[df_iris['Species'] =='Iris-versicolor']

se = Iris_setosa['PetalLengthCm'].mean()
vi = Iris_virginica['PetalLengthCm'].mean()
ve = Iris_versicolor['PetalLengthCm'].mean()


species = ['setosa', 'virginica', 'versicolor']
avg_petal_length = [se, vi, ve]
species_sorted = sorted(species, key=lambda x: avg_petal_length[species.index(x)])

colors = ['#F78888', '#5DA2D5', '#F3D250', '#3AAFA9']

# figure object
fig = figure(x_range=species,
             title="Average Petal Length",
             x_axis_label='Species', 
             y_axis_label='Average petal legth [cm]',
             plot_width=750,plot_height=400,
             toolbar_location='above',
             tools="hover",
             background_fill_color="#f4f0ec"
            )

# the bar plot
fig.vbar(x=species_sorted, top=avg_petal_length, width=0.75, 
         fill_color=colors[:4],
        )


# layout update
fig.title.text_font_size = '20pt'
fig.title.text_font_style = 'bold'
fig.title.text_font = 'Serif'
fig.xaxis.axis_label_text_font_size = "16pt"
fig.yaxis.axis_label_text_font_size = "16pt"

#optional customization
fig.title.text_color = "black"
fig.title.background_fill_color = "#f4f0ec"

# show the figure
show(fig)

<a id="1.3. 2"></a>

<font color="skyblue" size=+1.5><b>1.3.2 Horizontal bars</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>


In [None]:
# plot theme
curdoc().theme = 'caliber'


# data to plot
Iris_setosa = df_iris[df_iris['Species'] =='Iris-setosa']
Iris_virginica = df_iris[df_iris['Species'] =='Iris-virginica']
Iris_versicolor = df_iris[df_iris['Species'] =='Iris-versicolor']

se = Iris_setosa['SepalWidthCm'].mean()
vi = Iris_virginica['SepalWidthCm'].mean()
ve = Iris_versicolor['SepalWidthCm'].mean()

species = ['setosa', 'virginica', 'versicolor']
avg_petal_length = [se, vi, ve]

fig = figure(y_range=species,
             title="Average Sepal Width",
             y_axis_label='Species', 
             x_axis_label='Average petal length [cm]',
             plot_width=750,plot_height=400,
             toolbar_location='above',
             background_fill_color="#f4f0ec",
             )
fig.hbar(y=species,
         right=avg_petal_length,
         height=0.8, 
         left=0,
         color=colors,
        )

# layout update
fig.title.text_font_size = '20pt'
fig.title.text_font_style = 'bold'
fig.title.text_font = 'Serif'
fig.xaxis.axis_label_text_font_size = "16pt"
fig.yaxis.axis_label_text_font_size = "16pt"

#optional customization
fig.title.text_color = "black"
fig.title.background_fill_color = "#f4f0ec"

show(fig)


<a id="1.3.3"></a>

<font color="skyblue" size=+1.5><b>1.3.3 Stacked bars</b></font>


<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>



In [None]:
# data to plot
am =df_study[(df_study['race/ethnicity'] == 'group A') &  (df_study['gender'] == 'male')]
af =df_study[(df_study['race/ethnicity'] == 'group A') &  (df_study['gender'] == 'female')]
bm =df_study[(df_study['race/ethnicity'] == 'group B') &  (df_study['gender'] == 'male')]
bf =df_study[(df_study['race/ethnicity'] == 'group B') &  (df_study['gender'] == 'female')]
cm =df_study[(df_study['race/ethnicity'] == 'group C') &  (df_study['gender'] == 'male')]
cf =df_study[(df_study['race/ethnicity'] == 'group C') &  (df_study['gender'] == 'female')]
dm =df_study[(df_study['race/ethnicity'] == 'group D') &  (df_study['gender'] == 'male')]
df = df_study[(df_study['race/ethnicity'] == 'group D') &  (df_study['gender'] == 'female')]
em =df_study[(df_study['race/ethnicity'] == 'group E') &  (df_study['gender'] == 'male')]
ef = df_study[(df_study['race/ethnicity'] == 'group E') &  (df_study['gender'] == 'female')]


# define function

def stacked_bars_horizontal():
    curdoc().theme = 'caliber'

    groups = ['group A', 'group B','group C','group D','group E']
    gender = ['male', 'female']
    #colors = color#['skyblue', 'pink']
    colors = ['#F78878', '#5DA2D5', '#F3D250', '#3AAFA9']

    data = {'groups' : groups,
            'male' : [len(am), len(bm), len(cm), len(dm), len(em)],

            'female' :[len(af), len(bf), len(cf), len(df), len(ef)],
                       }

    fig = figure(y_range=groups, plot_height=550, 
               title="Gender distribution within race groups",
               x_axis_label='Students count', 
               y_axis_label='race group',
               toolbar_location='above',
               tools= 'wheel_zoom,box_zoom, reset, save, hover',
               toolbar_sticky=False,
               background_fill_color="#f4f0ec", 
               border_fill_color="#f4f0ec"
              )

    fig.hbar_stack(gender, y='groups',  height=0.75, color= colors[0:2], source=ColumnDataSource(data), legend_label=["%s" % x for x in gender])

    hover = fig.select(dict(type=HoverTool))
    hover.tooltips = [("groups", "@groups"), 
                      ('male', "@male"), 
                      ("female", "@female")]

    fig.y_range.range_padding = 0.1
    fig.ygrid.grid_line_color = None
    fig.legend.location = "top_right"
    fig.axis.minor_tick_line_color = None
    fig.outline_line_color = None
    fig.grid.grid_line_color = None
    fig.axis.axis_line_color = '#f4f0ec'


    fig.title.text_font_size = '20pt'
    fig.title.text_font_style = 'bold'
    fig.title.text_font = 'Serif'
    fig.xaxis.axis_label_text_font_size = "16pt"
    fig.yaxis.axis_label_text_font_size = "16pt"

    #show(fig)
    return (fig)

show(stacked_bars_horizontal())

In [None]:
def stacked_bars_vertical():
    curdoc().theme = 'caliber'

    groups = ['group A', 'group B','group C','group D','group E']
    gender = ['male', 'female']
    #colors = color#['skyblue', 'pink']
    colors = ['#F78878', '#5DA2D5', '#F3D250', '#3AAFA9']

    data = {'groups' : groups,
            'male' : [len(am), len(bm), len(cm), len(dm), len(em)],

            'female' :[len(af), len(bf), len(cf), len(df), len(ef)],
                       }

    fig = figure(x_range=groups, plot_height=550, 
               title="Gender distribution within race groups",
               y_axis_label='Students count', 
               x_axis_label='race group',
               toolbar_location='above',
               tools= 'wheel_zoom,box_zoom, reset, save, hover',
               toolbar_sticky=False,
               background_fill_color="#f4f0ec", 
               border_fill_color="#f4f0ec"
              )

    fig.vbar_stack(gender, x='groups', width=0.8, color= colors[2:], source=ColumnDataSource(data), legend_label=["%s" % x for x in gender])

    hover = fig.select(dict(type=HoverTool))
    hover.tooltips = [("groups", "@groups"), 
                      ('male', "@male"), 
                      ("female", "@female")]

    fig.y_range.range_padding = 0.1
    fig.ygrid.grid_line_color = None
    fig.legend.location = "top_right"
    fig.axis.minor_tick_line_color = None
    fig.outline_line_color = None
    fig.grid.grid_line_color = None
    fig.axis.axis_line_color = '#f4f0ec'


    fig.title.text_font_size = '20pt'
    fig.title.text_font_style = 'bold'
    fig.title.text_font = 'Serif'
    fig.xaxis.axis_label_text_font_size = "16pt"
    fig.yaxis.axis_label_text_font_size = "16pt"

    #show(fig)
    return (fig)

show(stacked_bars_vertical())

<a id="1.4"></a>
<font color="skyblue" size=+2.5><b>1.4 Histograms </b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>


In [None]:
def make_histogram(title, hist1, hist2, edges):
    fig = figure(title=title, 
               #tools='',
               plot_width=750, 
               plot_height=450, 
               background_fill_color="#f4f0ec",
               )
    fig.quad(top=hist1, # histograms for survivors
             bottom=0, 
             left=edges[:-1], 
             right=edges[1:],
             fill_color='#F3D250', 
             line_color="white", 
             #alpha=0.5, 
             legend_label="1")
    fig.quad(top=hist2, # histograms for victims
             bottom=0, 
             left=edges[:-1], 
             right=edges[1:],
             fill_color='#3AAFA9', 
             line_color="white", 
             alpha=0.5, 
             legend_label="0")
    
    
    fig.y_range.start = 0
    fig.legend.location = "top_right"
    fig.legend.background_fill_color = "#f2f0ec"
    fig.legend.title = 'Death Event'
    fig.xaxis.axis_label = 'Age'
    fig.yaxis.axis_label = 'Probability (x)'
    fig.grid.grid_line_color="white"
    
    fig.title.text_font_size = '16pt'
    fig.title.text_font_style = 'bold'
    fig.title.text_font = 'Serif'
    fig.xaxis.axis_label_text_font_size = "12pt"
    fig.yaxis.axis_label_text_font_size = "12pt"

    return fig

# data to plot: age distribution of patients (heart-failure dataset)
surv = df_heart[df_heart['DEATH_EVENT'] == 0]['age']
vict = df_heart[df_heart['DEATH_EVENT'] == 1]['age']

hist1, edges = np.histogram(surv, density=True, bins=10)
hist2, edges = np.histogram(vict, density=True, bins=15)

fig1 = make_histogram("Heart Failure: Patients Age Distribution", hist1, hist2, edges)

show(fig1)


<a id="1.5"></a>
<font color="skyblue" size=+2.5><b>1.5 Pie Charts</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>


In [None]:
def pie_chart():
    curdoc().theme = 'caliber'    
    # data: titanic Pclass       
    x = {
        'Pclass 1': len(df_titanic[df_titanic['Pclass'] ==1]),
        'Pclass 2': len(df_titanic[df_titanic['Pclass'] ==2]),
        'Pclass 3': len(df_titanic[df_titanic['Pclass'] ==3]),
        }

    data = pd.Series(x).reset_index(name='value').rename(columns={'index':'pclass'})
    data['angle'] = data['value']/data['value'].sum()*2*np.pi
    data['color'] = ['#F78888', '#5DA2D5', '#F3D250', '#3AAFA9'][0:len(x)]
    data["value"] = data['value'].astype(str)
    data["value"] = data["value"].str.pad(35, side = "left")
    source = ColumnDataSource(data)

    fig = figure(plot_height=550, 
               plot_width=550,
               title="Titanic: Passenger Class", 
               toolbar_location=None,
               tools="hover", 
               tooltips="@pclass: @value",
               x_range=(-0.5, .75),                 
               background_fill_color="#f4f0ec", 
               border_fill_color="#f4f0ec"
              )

    fig.wedge(x=0, y=1, radius=0.4,
            start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
            line_color="black", fill_color='color', legend_field='pclass', source=data,)


    labels = LabelSet(x=0, y=1, text='value',
            angle=cumsum('angle', include_zero=True), source=source, render_mode='canvas')
    
    annotation = Label(x=0, y=0, x_units='screen', y_units='screen',
                       text='More passengers in class 3! This class has lower survival rate.', 
                       text_font_size = '12pt',
                       render_mode='css',
                       border_line_color=None, border_line_alpha=1.0,
                       background_fill_color='#f4f0ec', background_fill_alpha=1.0)

    fig.add_layout(annotation)
    fig.add_layout(labels)

    fig.axis.axis_label=None
    fig.axis.visible=False
    fig.grid.grid_line_color = None
    fig.title.text_font_size = '20pt'
    fig.title.text_font_style = 'bold'

    return (fig)

show(pie_chart())

In [None]:
def donut_chart():
    curdoc().theme = 'contrast'
    # data: titanic Pclass    
    x = {
        'Pclass 1': len(df_titanic[df_titanic['Pclass'] ==1]),
        'Pclass 2': len(df_titanic[df_titanic['Pclass'] ==2]),
        'Pclass 3': len(df_titanic[df_titanic['Pclass'] ==3]),
        }

    data = pd.Series(x).reset_index(name='value').rename(columns={'index':'pclass'})
    data['angle'] = data['value']/data['value'].sum()*2*np.pi
    data['color'] = ['#F78888', '#5DA2D5', '#F3D250', '#3AAFA9'][0:len(x)]
    data["value"] = data['value'].astype(str)
    data["value"] = data["value"].str.pad(30, side = "left")
    source = ColumnDataSource(data)

    fig = figure(plot_height=550, 
               plot_width=550,
               title="Titanic: Passenger Class", 
               toolbar_location=None,
               tools="hover", 
               tooltips="@pclass: @value",
               x_range=(-0.5, .75)
              )

    fig.annular_wedge(x=0, y=1, outer_radius=0.4,inner_radius=0.2,
            start_angle=cumsum('angle', include_zero=True), 
            end_angle=cumsum('angle'),
            line_color="black", fill_color='color', legend_field='pclass', source=data)


    labels = LabelSet(x=0, y=1, text='value',
            angle=cumsum('angle', include_zero=True), source=source, render_mode='canvas')
    
    annotation = Label(x=200, y=250, x_units='screen', y_units='screen',
                       text='Pclass', 
                       text_font_size = '16pt',
                       text_color='white',
                       render_mode='css',
                       border_line_color=None, border_line_alpha=1.0,
                       background_fill_color=None, background_fill_alpha=1.0)

    fig.add_layout(annotation)
    fig.add_layout(labels)

    fig.axis.axis_label=None
    fig.axis.visible=False
    fig.grid.grid_line_color = None
    fig.title.text_font_size = '20pt'
    fig.title.text_font_style = 'bold'

    return (fig)

show(donut_chart())

<a id="1.6"></a>
<font color="skyblue" size=+2.5><b>1.6 Box plots</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>

Function to call:
> boxwhisker = hv.**BoxWhisker**(data, dimensions, label)
>
> boxwhisker.**opts**(optional parameters)

In [None]:
# data: df_study

title = "Math test score: Effect of race and gender"
colors = ['#F78888', '#5DA2D5', '#F3D250', '#3AAFA9', '#BDBDBD']


boxwhisker = hv.BoxWhisker(df_study, 
                           ['gender', 'race/ethnicity'], 
                           'math score', 
                           label=title)
boxwhisker.opts(show_legend=False,                
                width=600, 
                height= 400,
                box_fill_color=dim('race/ethnicity').str(), 
                cmap=colors,
                show_grid=True,
                )

<a id="1.7"></a>
<font color="skyblue" size=+2.5><b>1.7 Violin plots</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>


Function to call:
> violin = hv.**Violin**(data, dimensions, label)
>
> violin.**opts**(optional parameters)

In [None]:
# data iris

title="Iris: Variation of Petal length"
colors = ['#F78888', '#5DA2D5', '#F3D250', '#3AAFA9', '#BDBDBD']

# define hv element
violin1 = hv.Violin(df_iris, 
                   'Species', 
                   'PetalLengthCm', 
                    label=title)

# optional params 
violin1.opts(height=500, 
            width=750,
            violin_fill_color=dim('Species').str(), 
            cmap=colors)

violin1 = hv.Violin(df_iris, 
                   'Species', 
                   'PetalLengthCm', 
                    label=title)

# optional params 
violin1.opts(height=500, 
            width=750,
            violin_fill_color=dim('Species').str(), 
            cmap=colors)


<a id="2"></a>
<font color="skyblue" size=+2.5><b>2. Area Plots</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>


In [None]:
curdoc().theme = 'dark_minimal'

# data to plot: covid vaccination USA US,and the Netherlands

usa = df_covid[df_covid['country'] == 'United States']
uk = df_covid[df_covid['country'] == 'United Kingdom']
nl = df_covid[df_covid['country'] == 'Netherlands']


source = ColumnDataSource(data=dict(
    x = pd.to_datetime(usa['date']),
    y1=usa['daily_vaccinations_per_million'],
    y2=uk['daily_vaccinations_per_million'],
    y3=nl['daily_vaccinations_per_million'],
    ))
                          
fig = figure(title='Daily Vaccinations Per-Million: USA, UK and Netherlands',
             x_axis_label='Date', 
             y_axis_label='Daily vaccination', 
             plot_width=750, 
             plot_height=500, 
             x_axis_type="datetime")

fig.varea_stack(stackers=['y1', 'y2', 'y3'], x='x',
              color=(['#F78888', '#5DA2D5', '#F3D250', '#3AAFA9'][0:3]), 
              alpha=0.99,
              legend_label=['USA', 'UK', 'Netherlands'],
              source=source)

fig.title.text_font_size = '16pt'
fig.title.text_font_style = 'bold'
fig.title.text_font = 'Serif'
fig.xaxis.axis_label_text_font_size = "12pt"
fig.yaxis.axis_label_text_font_size = "12pt"
fig.legend.location = 'top_left'
fig.legend.background_fill_color = None
show(fig)


In [None]:
curdoc().theme = 'dark_minimal'

# data to plot: covid vaccination USA US,and the Netherlands

usa = df_covid[df_covid['country'] == 'United States']
uk = df_covid[df_covid['country'] == 'United Kingdom']
nl = df_covid[df_covid['country'] == 'Netherlands']


source = ColumnDataSource(data=dict(
    y = pd.to_datetime(usa['date']),
    x1=usa['daily_vaccinations_per_million'],
    x2=uk['daily_vaccinations_per_million'],
    x3=nl['daily_vaccinations_per_million'],
    ))
                          
fig = figure(title='Daily Vaccinations Per-Million: USA, UK and Netherlands',
             y_axis_label='Date', 
             x_axis_label='Daily vaccination', 
             plot_width=750, 
             plot_height=600, 
             y_axis_type="datetime")

fig.harea_stack(stackers=['x1', 'x2', 'x3'], y='y',
              color=(['#F78888', '#5DA2D5', '#F3D250', '#3AAFA9'][1:]), 
              alpha=0.99,
              legend_label=['USA', 'UK', 'Netherlands'],
              source=source)

fig.title.text_font_size = '16pt'
fig.title.text_font_style = 'bold'
fig.title.text_font = 'Serif'
fig.xaxis.axis_label_text_font_size = "12pt"
fig.yaxis.axis_label_text_font_size = "12pt"
fig.legend.location = 'bottom_right'
fig.legend.background_fill_color = None
show(fig)


<a id="3"></a>
<font color="skyblue" size=+2.5><b>3. Heatmap</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>


Function to call: 

> heatmap = hv.Heatmap(data, label)
>
> heatmap.opts(optional parameters such as plot properties)


In [None]:
# make the dataframe 
feat= ['date_block_num', 'shop_id', 'item_cnt_day']
df = df_sale[feat]

# color palette
colors = ['#F78888', '#5DA2D5', '#F3D250', '#3AAFA9', '#BDBDBD']

# hv element == HeatMap
heatmap = hv.HeatMap(df, label="Sales [items count per day] in a company").aggregate(function=np.sum)

# optional params to change the default setting
heatmap.opts(width=1000, 
             height=600,
             xrotation=0,
             xaxis='bottom', 
             xlabel='Date block', 
             ylabel='Shop Id',  
             tools=['hover'], 
             cmap=colors,
            )


<a id="4"></a>
<font color="skyblue" size=+2.5><b>4. Jitter Scatter (for categorical variable)</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>

While plotting a scatter plots of categorical variable use the `jitter()` function to avoid overlap between numerous scatter points in a single category. Jitter function gives each point a random offset.

### Scatter plot without jitter effect

In [None]:
curdoc().theme = 'caliber'

# data titanic
data = df_titanic
Pclass = ['pclass_1', 'pclass_2', 'pclass_3']
source = ColumnDataSource(data)

fig = figure(plot_width=800, 
           plot_height=400, 
           y_range=Pclass, 
           title="Titanic Passenger Age per Pclass",
           x_axis_label='Passenger Age',
           y_axis_label='Passenger Class',
           background_fill_color="#f4f0ec")

fig.circle(x='Age', 
           y = 'Pclass',
           #y=jitter('Pclass', width=0.5,  range=fig.y_range), 
         source=source, 
         color='salmon',
         alpha=0.5)

fig.x_range.range_padding = 0
fig.y_range.range_padding = 0.2
fig.ygrid.grid_line_color = None

fig.title.text_font_size = '20pt'
fig.title.text_font_style = 'bold'
fig.title.text_font = 'Serif'
fig.xaxis.axis_label_text_font_size = "16pt"
fig.yaxis.axis_label_text_font_size = "16pt"

show(fig)

### Scatter plot with jitters

In [None]:
curdoc().theme = 'caliber'
# data titanic
data = df_titanic
Pclass = ['pclass_1', 'pclass_2', 'pclass_3']
source = ColumnDataSource(data)

fig = figure(plot_width=800, 
           plot_height=400, 
           y_range=Pclass, 
           title="Titanic Passenger Age per Pclass",
           x_axis_label='Passenger Age',
           y_axis_label='Passenger Class',
           background_fill_color="#f4f0ec")

fig.circle(x='Age', 
           y=jitter('Pclass', width=0.5,  range=fig.y_range), 
         source=source, 
         color='salmon',
         alpha=0.5)

fig.x_range.range_padding = 0
fig.y_range.range_padding = 0.2
fig.ygrid.grid_line_color = None

fig.title.text_font_size = '20pt'
fig.title.text_font_style = 'bold'
fig.title.text_font = 'Serif'
fig.xaxis.axis_label_text_font_size = "16pt"
fig.yaxis.axis_label_text_font_size = "16pt"

show(fig)

<a id="5"></a>
<font color="skyblue" size=+2.5><b>5. Kde plots</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>

Function to call: 

> dist = hv.Distribution(data, label)
>
> dist.opts(optional parameters such as plot properties)


In [None]:
# Data to plot
Iris_setosa = df_iris[df_iris['Species'] =='Iris-setosa']
Iris_virginica = df_iris[df_iris['Species'] =='Iris-virginica']
Iris_versicolor = df_iris[df_iris['Species'] =='Iris-versicolor']

width = 400
height = 300

kde1 = (hv.Distribution(Iris_setosa.SepalWidthCm, label='Setosa')* 
        hv.Distribution(Iris_virginica.SepalWidthCm, label='Virginica')*
        hv.Distribution(Iris_versicolor.SepalWidthCm, label='Versicolor'))                
kde1.opts(width=width,
         height=height,
         title='Iris Flowers: Sepal Width Density Plot',
         xrotation=0,
         xaxis='bottom',
         xlabel='Sepal Width [cm]', 
         ylabel='Density',  
         #tools=['hover'],
         background_fill_color="#f4f0ec",
         show_legend=False,                     

         )

kde2 = (hv.Distribution(Iris_setosa.PetalWidthCm, label='Setosa')* 
        hv.Distribution(Iris_virginica.PetalWidthCm, label='Virginica')*
        hv.Distribution(Iris_versicolor.PetalWidthCm, label='Versicolor'))                

kde2.opts(width=width,
         height=height,
         title='Iris Flowers: Petal Width Density Plot',
         xrotation=0,
         xaxis='bottom',
         xlabel='Sepal Width [cm]', 
         ylabel='Density',  
         #tools=['hover'],
         background_fill_color="#f4f0ec"
         )
kde3 = (hv.Distribution(Iris_setosa.SepalLengthCm, label='Setosa')* 
        hv.Distribution(Iris_virginica.SepalLengthCm, label='Virginica')*
        hv.Distribution(Iris_versicolor.SepalLengthCm, label='Versicolor'))                
kde3.opts(width=width,
         height=height,
         title='Iris Flowers: Sepal Length Density Plot',
         xrotation=0,
         xaxis='bottom',
         xlabel='Sepal Width [cm]', 
         ylabel='Density',  
         #tools=['hover'],
         #background_fill_color="#f4f0ec",
         bgcolor ='#f4f0ec'
         )

kde4 = (hv.Distribution(Iris_setosa.PetalLengthCm, label='Setosa')* 
        hv.Distribution(Iris_virginica.PetalLengthCm, label='Virginica')*
        hv.Distribution(Iris_versicolor.PetalLengthCm, label='Versicolor'))                

kde4.opts(width=width,
         height=height,
         title='Iris Flowers: Petal Length Density Plot',
         xrotation=0,
         xaxis='bottom',
         xlabel='Sepal Width [cm]', 
         ylabel='Density',  
         tools=['hover'],
         #bgcolor='#eee8e2',
         show_legend=True,
         show_grid=True
         )

(kde1 + kde2 + kde3 + kde4)

<a id="6"></a>
<font color="skyblue" size=+2.5><b>6. Subplots</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>

> Making subplots in holoviews is a simple two step process; first you create the elements you want to put in your subplot, say `plot1`, `plot2`, . . . `plotn` then call `.cols(k)` methode to make a subplot of `k` columns. Below I showed an example with 4 kde's , a violin and a box plot arranged in two columns using the iris dataset.


In [None]:
##### add new boxplot
colors = ['#F78888', '#5DA2D5', '#F3D250', '#3AAFA9', '#BDBDBD']
boxwhisker = hv.BoxWhisker(df_iris, 
                           ['Species'], 
                           'SepalWidthCm', 
                           label='Petal width variation')
boxwhisker.opts(show_legend=False,                
                width=600, 
                height= 400,
                box_fill_color=dim('Species').str(), 
                cmap=colors,
                show_grid=True,
                )

# make a subplots of the four kde plots + the violin plot + the new boxplot 
# note that I used already existing plots from section above
# the four plots from section 5, the violin plot from section 1.7 and the box plot created in code this cell (top)

subplots = (kde1 + kde2 + kde3 + kde4 + \
            violin1.opts(width=width, height=height) + \
            boxwhisker.opts(width=width, height=height)).opts(width=600, height=500).cols(2) 

subplots

<a id="7"></a>
<font color="skyblue" size=+2.5><b>7. Range tools (useful for time series data)</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>

> * Can be used to link two plots together
> * Can be useful to provide a more detailed view of a subset of the data while preserving an overview of the full data
> * In the examples below we can highlight interesting regions of the sensor readings such as peaks and valleys or region of faulty readings using the air-pollution data

In [None]:
curdoc().theme = 'dark_minimal'

dates = np.array(df_ts['date_time'], dtype=np.datetime64)
source = ColumnDataSource(data=dict(date=dates, signal=df_ts['target_carbon_monoxide']))

fig = figure(title='Carbon Monoxide',plot_height=250, plot_width=800, tools="xpan", toolbar_location=None,
           x_axis_type="datetime", x_axis_location="above",background_fill_color="#efefef", x_range=(dates[3500], dates[4500]))

fig.line('date', 'signal', color='red', source=source)
fig.yaxis.axis_label = None

fig.title.text_font_size = '20pt'
fig.title.text_font_style = 'bold'
fig.title.text_font = 'Serif'

select = figure(title="Drag the middle and edges of the selection box to change the range above",
                plot_height=130, plot_width=800, y_range=fig.y_range,
                x_axis_type="datetime", y_axis_type=None,
                tools="", toolbar_location=None, background_fill_color="#efefef")

range_tool = RangeTool(x_range=fig.x_range)
range_tool.overlay.fill_color = "navy"
range_tool.overlay.fill_alpha = 0.2

select.line('date', 'signal', color='red', source=source)
select.ygrid.grid_line_color = None
select.add_tools(range_tool)
select.toolbar.active_multi = range_tool

show(column(fig,select))

In [None]:
curdoc().theme = 'dark_minimal'

dates = np.array(df_ts['date_time'], dtype=np.datetime64)
source = ColumnDataSource(data=dict(date=dates, signal=df_ts['target_benzene']))

fig = figure(title='Benzene',
    plot_height=250, plot_width=800, tools="xpan", toolbar_location=None,
           x_axis_type="datetime", x_axis_location="above",
           background_fill_color="#efefef", x_range=(dates[3500], dates[4500]))

fig.line('date', 'signal', color='gold',source=source)
fig.yaxis.axis_label = None

fig.title.text_font_size = '20pt'
fig.title.text_font_style = 'bold'
fig.title.text_font = 'Serif'

select = figure(title="Drag the middle and edges of the selection box to change the range above",
                plot_height=130, plot_width=800, y_range=fig.y_range,
                x_axis_type="datetime", y_axis_type=None,
                tools="", toolbar_location=None, background_fill_color="#efefef")

range_tool = RangeTool(x_range=fig.x_range)
range_tool.overlay.fill_color = "salmon"
range_tool.overlay.fill_alpha = 0.2

select.line('date', 'signal', color='gold', source=source)
select.ygrid.grid_line_color = None
select.add_tools(range_tool)
select.toolbar.active_multi = range_tool

show(column(fig, select))

In [None]:
curdoc().theme = 'dark_minimal'
dates = np.array(df_ts['date_time'], dtype=np.datetime64)
source = ColumnDataSource(data=dict(date=dates, signal=df_ts['target_nitrogen_oxides']))

fig = figure(title='Nitrogen Oxides',
    plot_height=250, plot_width=800, tools="xpan", toolbar_location=None,
           x_axis_type="datetime", x_axis_location="above",
           background_fill_color="#efefef", x_range=(dates[5500], dates[6500]))

fig.line('date', 'signal', color='seagreen', source=source)
fig.yaxis.axis_label = None

fig.title.text_font_size = '20pt'
fig.title.text_font_style = 'bold'
fig.title.text_font = 'Serif'

select = figure(title="Drag the middle and edges of the selection box to change the range above",
                plot_height=130, plot_width=800, y_range=fig.y_range,
                x_axis_type="datetime", y_axis_type=None,
                tools="", toolbar_location=None, background_fill_color="#efefef")

range_tool = RangeTool(x_range=fig.x_range)
range_tool.overlay.fill_color = "salmon"
range_tool.overlay.fill_alpha = 0.2

select.line('date', 'signal', color='seagreen', source=source)
select.ygrid.grid_line_color = None
select.add_tools(range_tool)
select.toolbar.active_multi = range_tool

show(column(fig, select))

<a id="8"></a>
<font color="skyblue" size=+2.5><b>8. Pairplots (grouped grids)</b></font>


In [None]:
# data to use
df_iris.drop('Id', axis=1, inplace=True)

# groupby and species and creat overlay
_df = hv.Dataset(df_iris).groupby('Species').overlay()

density_grid = gridmatrix(_df, diagonal_type=hv.Distribution, chart_type=hv.Bivariate)
point_grid = gridmatrix(_df, chart_type=hv.Points)

(density_grid * point_grid).opts(
    opts.Bivariate(bandwidth=0.5, cmap=hv.Cycle(values=['Blues', 'Oranges', 'Reds'])),
    opts.Points(size=2, alpha=0.5),
    opts.NdOverlay(batched=False))

<a id="9"></a>
<font color="skyblue" size=+2.5><b>9. Reference</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">go to top</a>

1.  https://docs.bokeh.org/en/latest/
2.  https://holoviews.org/gallery/index.html
3.  https://malouche.github.io/notebooks/index.html

>> ### I sincerly hope that you find this notebook interesting and `useful!` If you do, please <font color= 'gray'> UPVOTE! </font>

>>> ### `Critiques` and `feedback` are most welcome.

>>> ### I have made a similar notebook for `plotly python library`. If you are interested you can check it [`HERE`](https://www.kaggle.com/desalegngeb/plotly-guide-customize-for-better-visualizations).

>># <font color= 'salmon'> Thank you! </font> 



