## Importing Libraries and Display Settings ##

We will need some standard Python libraries for data processing - Pandas and Numpy. 

Also, we will need particular functions from Bokeh data vizualization library. In order to maintain appropriate Jupyter Notebook style, all the uploads for subsequent chapters will be effected in the current section. Where necessary, some additional clarifications will be added.

In [1]:
#Usual imports for data processing
import pandas as pd
import numpy as np
import random

#Bokeh libraries and modules
from bokeh.io import  show, reset_output, output_notebook, export_png
from bokeh.plotting import figure
from bokeh.models import Range1d, FactorRange, ColumnDataSource, LabelSet, HoverTool
from bokeh.layouts import gridplot, row, column
from bokeh.transform import factor_cmap
from bokeh.models.annotations import Label

#Setting visualizations' display to the in-notebook mode
output_notebook()

#Setting Bokeh's visualtization toolset - a set of functional tools attached to every visual
#Their names are pretty self-explanatory
TOOLS = "pan,wheel_zoom,box_zoom,reset,save"

## Loading the Dataset ##

In [2]:
data = pd.read_csv('../data/data_clean.csv')

## Simple Bar Charts ##

It is prettry straight-forward to draw bar charts with Bokeh. As usual, we need to specify a type of chart (or chose a glyph) and pass the data to the plotting function.

### Vertical Bar Charts ###

Let's create a vertical bar chart showing changes in measles occurencies in the US over the years 2000-2015.

In [3]:
# Creating a list of categories
years = data[data['country']=='United States of America']['year']

#Creating the list of values
values = data[data['country']=='United States of America']['measles']

# Initializing the plot
p = figure( plot_height=300, 
           title="Measles in the USA 2000-2015",
          tools=TOOLS)

#Plotting
p.vbar(years,                            #categories
      top = values,                      #bar heights
       width = .9,
       fill_alpha = .5,
       fill_color = 'salmon',
       line_alpha = .5,
       line_color='green',
       line_dash='dashed'
      
  )

#Signing the axis
p.xaxis.axis_label="Years"
p.yaxis.axis_label="Measles stats"

show(p)

### Horizontal Bar Charts ###

Absolutely in the same fashion we could create horizontal bar charts. Let's use reported polio rates for Argentina in 2000-2015 for illustration purposes.

In [4]:
# Creating a list of categories
years = data[data['country']=='Argentina']['year']

#Creating the list of values
values = data[data['country']=='Argentina']['polio'].values

# Initializing the plot
p = figure( plot_height=300, 
           title="Polio in the Argentina 2000-2015")

#Plotting
p.hbar(years,
       left = 0,
       right = values,
       height = .9,
       fill_color = 'azure',
       line_color='green',
       line_alpha=.5
      
  )

p.xaxis.axis_label="Years"
p.yaxis.axis_label="Polio stats"


show(p)

All these operation look very logical and simple, but their functionality is also limited.

## Styling Bar Charts ##

In order to add much more style, interactivity and attractiveness to our bar charts let's study how to use different pallettes, add labels and hover over our bar charts. In order to illustrate corrsponding techniques let's pick a country at each continent.

In [5]:
countries = ['France', 'Canada', 'Brazil', 'Turkey', 'Australia']

Let's look at the corrsponding statistics for these countries.

### Using Pallettes ###

For a list of some available pallettes please visit [Bokeh pallettes documentation]( https://docs.bokeh.org/en/latest/docs/reference/palettes.html). In order to use any of them with Bokeh we need to import them specifically.

Let's look at the measles data for a number of countries in 2015 - we'll render two graphs with a pre-set palette and a randomly chosen colors, as well as we will use the *gridplot* technique. 

In [6]:
#Importing a pallette
from bokeh.palettes import Spectral5, Viridis256, Colorblind, Magma256, Turbo256

# Creating a list of categorical values 
values = data[(data['year']==2015)&(data['country'].isin(countries))]['measles']

# Set the x_range to the list of categories above
p1 = figure(x_range=countries,
           plot_height=250, 
           title="Measles in the world in 2015 (pre-set pallette)")

# Categorical values can also be used as coordinates
p1.vbar(x=countries, top=values, width=0.9,
      color = Spectral5, fill_alpha=.75)

# Set some properties to make the plot look better
p1.yaxis.axis_label="Measles stats"
p1.xgrid.grid_line_color='gray'
p1.xgrid.grid_line_alpha=.75
p1.xgrid.grid_line_dash = 'dashed'
p1.ygrid.grid_line_color='blue'
p1.ygrid.grid_line_alpha = .55
p1.ygrid.grid_line_dash = 'dotted'


p2 = figure(x_range=countries,
           plot_height=250, 
           title="Measles in the world in 2015 (randomly selected colors from a pallette)")

# Categorical values can also be used as coordinates
p2.vbar(x=countries, top=values, width=0.9,
      color = random.sample(Viridis256,5), fill_alpha=.75)

# Set some properties to make the plot look better
p2.yaxis.axis_label="Measles stats"
p2.xgrid.grid_line_color='gray'
p2.xgrid.grid_line_alpha=.75
p2.xgrid.grid_line_dash = 'dashed'
p2.ygrid.grid_line_color='blue'
p2.ygrid.grid_line_alpha = .55
p2.ygrid.grid_line_dash = 'dotted'

p = gridplot([[p1,None],[p2,None]], toolbar_location='right')
show(p)

## Grouped Bar Charts ##

Sometimes we need to plot a grouped bar chart. For example, we might need to group our health indicators for some countries. For that we need to import a special procedure from the bokeh.models module - FactorRange. Let's look at the data for measles, polio and hiv/aids\*1000 for our list of countries for 2014.

In [7]:
#List of used statistics
stats = ['measles','polio','hiv/aids*1000']

#Creating a dictionary of our data
mdata = {'countries' : countries,
        'measles'   : data[data['year']==2014][data['country'].isin(countries)]['measles'],
        'polio'   : data[data['year']==2014][data['country'].isin(countries)]['polio'],
        'hiv/aids*1000'   : data[data['year']==2014][data['country'].isin(countries)]['hiv/aids']*1000}

# Creating tuples for individual bars [ ("France", "measles"), ("France", "polio"), ("France", "hiv/aids*1000"), ("Canada", "measles"), ... ]
x = [ (country, stat) for country in countries for stat in stats ]
counts = sum(zip(mdata['measles'], mdata['polio'], mdata['hiv/aids*1000']), ()) 

#Creating a column data source - Bokeh's own data type with the fields (Country,[stats],[values],[colors]) 
source = ColumnDataSource(data=dict(x=x, counts=counts, color=random.sample(Turbo256,15)))

#Initializing our plot
p = figure(x_range=FactorRange(*x), plot_height=350, title="Health Stats by Country")

#Plotting our vertical bar chart
p.vbar(x='x', top='counts', width=0.9  ,fill_color='color',  source=source)

#Enhancing our graph
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = .9
p.xgrid.grid_line_color = None

show(p)

  
  import sys
  


## ColumnDataSource ##

This is a very useful Bokeh data format. It represents a mapping technique where name of the columns (strings) are mapped to their respective values (sequences of values). Even if we do not use it, this is the job Bokeh does under the hood for us. Let's look at at example.

In [16]:
#Importing the class
from bokeh.models import ColumnDataSource

#Creating our data as an empty dictionary
sdata = dict()

#Populating our data dictionary with key-value pairs
sdata.update({'x':[1,2,3,4]})
sdata.update({'y':[4,6,2,5]})
sdata.update({'color':['red','green','blue','orange']})

#Creating our ColumnDataSource instance
source = ColumnDataSource(sdata)

#Initializing our plot
p = figure(plot_width=400, plot_height=400)

#Plotting
p.vbar(x = 'x',                #coordinates of the bar centers
      top = 'y',               #bar heights
       width=.8,
      fill_color='color',
      fill_alpha=.6,
       source=source           #source of our data fields is a source instance of 
                               #ColumnDataSource class
  )
show(p)

Even though we do not have to explicitly create an instance of ColumnDataSource class each time when we call a plotting procedure in Bokeh, it's important to know that the framework does always create such a structure for us automatically under the hood.

## Color Transformations ##

Quite often we are not satisfied with a pre-set or a random palette, and we need to use some additional colormapping. That's the situation when we use *factor_cmap* function imported from bokeh.transform module. Let's look at the Canadian data for measles, polio and hiv/aids\*1000 in 2000, 2005, 2010 and 2015. 

In [23]:
#List of used statistics
stats = ['measles','polio','hiv/aids*1000']
years = ['2000','2005','2010','2015']

#Creating a dictionary of our data
mdata = {'years' : years,
        'measles'   : data[data['country']=="Canada"][data['year'].isin(years)]['measles'],
        'polio'   : data[data['country']=="Canada"][data['year'].isin(years)]['polio'],
        'hiv/aids*1000'   : data[data['country']=="Canada"][data['year'].isin(years)]['hiv/aids']*1000}



# Creating tuples for individual bars 
x = [ (year, stat) for year in years for stat in stats ]
counts = sum(zip(mdata['measles'], mdata['polio'], mdata['hiv/aids*1000']), ()) 

#Creating a column data source  
source = ColumnDataSource(data=dict(x=x, counts=counts, color=random.sample(Turbo256,12)))

#Initializing our plot with random colors
p1 = figure(x_range=FactorRange(*x), plot_height=350, title="Health Stats in Canada 2000-2015")

#Plotting our vertical bar chart
p1.vbar(x='x', top='counts', width=0.9  ,fill_color='color',  source=source)

#Enhancing our graph
p1.y_range.start = 0
p1.x_range.range_padding = 0.1
p1.xaxis.major_label_orientation = .9
p1.xgrid.grid_line_color = None

#Creating a new column data source without set colors   
source1 = ColumnDataSource(data=dict(x=x, counts=counts))

#Initializing our plot with synchronized fill colors with factor_cmap
p2 = figure(x_range=FactorRange(*x), plot_height=350,
            title="Health Stats in Canada 2000-2015, color mapped"
           )

p2.vbar(x='x', top='counts', width=0.9,
        
            source=source1,
           fill_color=factor_cmap('x', palette=['salmon', 'green', 'navy'], factors=stats, start=1, end=2)

)

p2.xaxis.major_label_orientation = .7
p=gridplot([[p1,None],[p2,None]], toolbar_location='right')

show(p)

  import sys
  
  if __name__ == '__main__':


## Adding Labels ##

Plotting a single label in Bokeh is quite straight-forward and doesn't really require any specific technique. We just need to import the *Label* class from the bokeh.models.annotations module and its sintax is quite simple. 
One just needs to know that Bokeh uses a separate layer for plotting, another one for labeling, etc. We will use an *add_layer()* method in order to assemble our graph together. Let's look at an example and create a graph of measles in Spain in 2000-2015. 

In [24]:
#Initializing our plot
p = figure(x_range=(2000,2015), title='Measles in Spain 2000-2015')

#Plotting a line
p.line(data[data['country']=='Spain']['year'],
      data[data['country']=='Spain']['measles'],
       line_color='navy',
      line_width=3)

#Plotting data points as cirles
p.circle(data[data['country']=='Spain']['year'],
      data[data['country']=='Spain']['measles'],
        radius=.2,
        fill_color='yellow',
        line_color='salmon')

#Instance of Label class as our 2011 Measles Outbreak label
label = Label(x=2011, 
              y=max(data[data['country']=='Spain']['measles']),
              x_offset=10, 
              text="2011 Outbreak",
              text_baseline="top")

#Adding a layout with our label to the graph
p.add_layout(label)

#Styling the graph
p.xaxis.axis_label = 'Year'
p.yaxis.axis_label = 'Measles stats'
p.xgrid.grid_line_dash = 'dashed'
p.xgrid.grid_line_color ='gray'
p.ygrid.grid_line_dash ='dotted'
p.ygrid.grid_line_color = 'gray'
p.background_fill_color='green'
p.background_fill_alpha=.05

show(p)

Adding a single "custom" label is really quite simple. The beauty of Bokeh is that adding a whole set of labels is hardly a tad more difficult. Let's look at the example of polio in India in 2000-2015 and try adding values to every datapoint. We will simply need to use an instance of *ColumnDataSource* for that and import from the bokeh.models module the *LabelSet* class.

In [25]:
#Instance of ColumnDataSource
source = ColumnDataSource(data=dict(
    x=data[data['country']=='India']['year'],
    y=data[data['country']=='India']['polio'],
    labels=data[data['country']=='India']['polio'].values))

#Initializing our plot
p = figure(x_range=(1999,2016),
           y_range=(50,90),
           title='Polio in India 2000-2015')


#Plotting data points as vertical bars
p.vbar(x = 'x',
         top = 'y',
       width = .8,
        fill_color='azure', fill_alpha = 1,
        line_color='navy', line_alpha=.25,
         line_width=2, line_dash='dotted',
        source=source)

#Plotting a line
p.line(x = 'x',
       y = 'y',
       line_color='red',line_width=4,
       line_alpha=.5,
      source=source)

#Plotting data points as circles
p.circle(x='x',y='y', 
         radius=.2, 
         fill_color='yellow', line_color='red', line_width=2,
         source=source)

#Instance of the LabelSet class
labels = LabelSet(x='x',                   #positions of labeled datapoints
                  y='y', 
                  text='labels',          #labels' text
                  level='glyph',          #labeling level
                 x_offset=-10, y_offset=15, #move from datapoints
                  source=source, 
                  render_mode='canvas',
                 text_baseline='bottom'   #relative position to datapoints
                 )

p.add_layout(labels)

p.xaxis.axis_label = 'Year'
p.yaxis.axis_label = 'Measles stats'
p.xgrid.grid_line_dash = 'dashed'
p.xgrid.grid_line_color ='gray'
p.ygrid.grid_line_dash ='dotted'
p.ygrid.grid_line_color = 'gray'
p.background_fill_color='salmon'
p.background_fill_alpha=.05

show(p)

## Interactive techniques ##

### Linking ###

Bokeh provides with an easy technique to link two and more visualizations. One of the best examples is to link the ranges on two graphs. Let's try plotting a measles' rates comparison for Poland and Ukraine, for example. We well use the linked panning and graph columns (a particular case for gridplot) techniques for that. 

In [12]:
#Setting the common range
x = data[data['country']=='Poland']['year']

#Setting the values for the both graphs

y1, y2 = data[data['country']=='Poland']['measles'], data[data['country']=='Ukraine']['measles']

#Setting the option for the future plot
plot_options = dict(width=500, plot_height=250, 
                    tools=TOOLS)

#Creating a new plot
p1 = figure(**plot_options,
           title = 'Measles in Poland 2000-2015')
p1.line(x, y1, color="navy")
p1.circle(x, y1, size=10, fill_color="azure")

#Creating another plot with the ranges linked to the previous plot's ranges
p2 = figure(x_range=p1.x_range,       #linking the x-ranges
            **plot_options,
           title = "Measles in Ukraine 2000-2015")
p2.line(x,y2,color='red')
p2.circle(x, y2, size=10, fill_color="salmon")

#joining both plots is a column
p = column([p1, p2])

# show the results
show(p)

Now, thanks to the linked x-ranges, we can "synchronize" the ranges of the both graphs and have a closer look at their areas using the tools.
We could also use another linking technique to highlight certain data ranges on both graphs simultaneously. Let's illustrate this with a look at life expectancies in Germany and Italy in 2000-2015.

In [13]:
#Adding new interactive tools 
TOOLS += ", box_select,lasso_select"

In [14]:
x = data[data['country']=='France']['year']
y1, y2 = data[data['country']=='France']['life expectancy'], data[data['country']=='Germany']['life expectancy']

#Creating a ColumnDataSource's instance for the plots to share
source = ColumnDataSource(data=dict(x=x, y1=y1, y2=y2))

#Creating a new plot
p1 = figure(tools=TOOLS,
            width=350, height=300,
           title = 'Life Expectancy in France 2000-2015')
p1.line('x', 'y1', source=source,color='navy')
p1.circle('x', 'y1', source=source,
         color='navy', radius = .25, fill_color='azure')


#Creating another plot
p2 = figure(tools=TOOLS,
            width=350, height=300,
           
           title = 'Life Expectancy in Germany 2000-2015')
p2.line('x', 'y2', source=source,color='red')
p2.circle('x', 'y2', source=source,
         color='red', radius = .25, fill_color='yellow')

p1.xaxis.axis_label = 'Year'
p1.yaxis.axis_label = 'Life Expectancy'
p1.xgrid.grid_line_dash = 'dotted'
p1.xgrid.grid_line_color ='gray'
p1.ygrid.grid_line_dash ='dotted'
p1.ygrid.grid_line_color = 'gray'
p1.background_fill_color='salmon'
p1.background_fill_alpha=.05

p2.xaxis.axis_label = 'Year'
p2.yaxis.axis_label = 'Life expectancy'
p2.xgrid.grid_line_dash = 'dotted'
p2.xgrid.grid_line_color ='gray'
p2.ygrid.grid_line_dash ='dotted'
p2.ygrid.grid_line_color = 'gray'
p2.background_fill_color='indigo'
p2.background_fill_alpha=.07

#Rendering the graphs in a row structure
p = row(p1,p2)

show(p)

### Hovering ###

Bokeh does give us another cool technique to add interactivity to our visualizations. It's called hovering and workd exactly like other techniques - through a special tool *'hoover'* that exists as an instance of the *HoverTool* class.
Let's look at the Russia data for measles in 2000-2015.

In [15]:
#Creating a data structure
source = ColumnDataSource(
        data=dict(x=data[data['country']=='Russian Federation']['year'],
                  y=data[data['country']=='Russian Federation']['measles']
                 )
                 )

#Creating an instance of the HoverTool class with required parameters
hover = HoverTool(
        tooltips=[
            ("year", "$x"),
            ("measles", "$y"),
           
        ]
    )

#Plotting
p = figure(plot_width=500, plot_height=500,
           tools=[hover],                   #Adding a hoover to the toolbar
           title="Measles in Russia 2000-2015")

p.line('x', 'y', source=source,
       color='blue', line_dash='dashed', line_width=2)
p.circle('x', 'y', source=source,
         color='magenta', radius = .35, fill_color='salmon', fill_alpha=.7)

p.xaxis.axis_label = 'Year'
p.yaxis.axis_label = 'Life Expectancy'
p.xgrid.grid_line_dash = 'dotted'
p.xgrid.grid_line_color ='gray'
p.ygrid.grid_line_dash ='dotted'
p.ygrid.grid_line_color = 'gray'
p.background_fill_color='lightblue'
p.background_fill_alpha=.25

show(p)

These have been just a few available functionalities of Bokeh, and there's a lot more to it. This was a second part of my moni-project on this wonderful visualization library. The following part is on geographical plots and is available at (). 