## Importing Libraries and Display Settings ##

We will need some standard Python libraries for data processing - Pandas and Numpy. 

Also, we will need particular functions from Bokeh data vizualization library. In order to maintain appropriate Jupyter Notebook style, all the uploads for subsequent chapters will be effected in the current section. Where necessary, some additional clarifications will be added.

In [51]:
#Usual imports for data processing
import pandas as pd
import numpy as np
import random

#Bokeh libraries and modules
from bokeh.io import  show, reset_output, output_notebook, export_png
from bokeh.plotting import figure
from bokeh.models import Range1d, FactorRange, ColumnDataSource
from bokeh.layouts import gridplot

#Setting visualizations' display to the in-notebook mode
output_notebook()

#Setting Bokeh's visualtization toolset - a set of functional tools attached to every visual
#Their names are pretty self-explanatory
TOOLS = "pan,wheel_zoom,box_zoom,reset,save"

## Loading the Dataset ##

In [2]:
data = pd.read_csv('../data/data_clean.csv')

## Simple Bar Charts ##

It is prettry straight-forward to draw bar charts with Bokeh. As usual, we need to specify a type of gart (or chose a glyph) and pass the data to the plotting function.

### Vertical Bar Charts ###

Let's create a vertical bar chart showing changes in measles occurencies in the US over the years 2000-2015.

In [3]:
# Creating a list of categories
years = data[data['country']=='United States of America']['year']

#Creating the list of values
values = data[data['country']=='United States of America']['measles']

# Initializing the plot
p = figure( plot_height=300, 
           title="Measles in the USA 2000-2015",
          tools=TOOLS)

#Plotting
p.vbar(years,                            #categories
      top = values,                      #bar heights
       width = .9,
       fill_alpha = .5,
       fill_color = 'salmon',
       line_alpha = .5,
       line_color='green',
       line_dash='dashed'
      
  )

p.xaxis.axis_label="Years"
p.yaxis.axis_label="Measles stats"

show(p)

### Horizontal Bar Charts ###

Absolutely in the same fashion we could create horizontal bar charts. Let's use reported polio rates for Tanzania in 2000-2015 for illustration purposes.

In [4]:
# Creating a list of categories
years = data[data['country']=='Argentina']['year']

#Creating the list of values
values = data[data['country']=='Argentina']['polio'].values

# Initializing the plot
p = figure( plot_height=300, 
           title="Polio in the Argentina 2000-2015")

#Plotting
p.hbar(years,
       left = 0,
       right = values,
       height = .9,
       fill_color = 'azure',
       line_color='green',
       line_alpha=.5
      
  )

p.xaxis.axis_label="Years"
p.yaxis.axis_label="Polio stats"


show(p)

All these operation look very logical and simple, but their functionality is also limited.

## Styling Bar Charts ##

In order to add much more style, interactivity and attractiveness to our bar charts let's study how to use different pallettes, add labels and hover over our bar charts. In order to illustrate corrsponding techniques let's pick a country at each continent.

In [5]:
countries = ['France', 'Canada', 'Brazil', 'Turkey', 'Australia']

Let's look at the corrsponding statistics for these countries.

### Using Pallettes ###

For a list of some available pallettes please visit [Bokeh pallettes documentation]( https://docs.bokeh.org/en/latest/docs/reference/palettes.html). In order to use any of them with Bokeh we need to import them specifically.

In [94]:
#Importing a pallette
from bokeh.palettes import Spectral5, Viridis256, Colorblind, Magma256, Turbo256

# Creating a list of categorical values 
values = data[(data['year']==2015)&(data['country'].isin(countries))]['measles']

# Set the x_range to the list of categories above
p1 = figure(x_range=countries,
           plot_height=250, 
           title="Measles in the world in 2015 (pre-set pallette)")

# Categorical values can also be used as coordinates
p1.vbar(x=countries, top=values, width=0.9,
      color = Spectral5, fill_alpha=.75)

# Set some properties to make the plot look better
p1.yaxis.axis_label="Measles stats"
p1.xgrid.grid_line_color='gray'
p1.xgrid.grid_line_alpha=.75
p1.xgrid.grid_line_dash = 'dashed'
p1.ygrid.grid_line_color='blue'
p1.ygrid.grid_line_alpha = .55
p1.ygrid.grid_line_dash = 'dotted'


p2 = figure(x_range=countries,
           plot_height=250, 
           title="Measles in the world in 2015 (randomly selected colors from a pallette)")

# Categorical values can also be used as coordinates
p2.vbar(x=countries, top=values, width=0.9,
      color = random.sample(Viridis256,5), fill_alpha=.75)

# Set some properties to make the plot look better
p2.yaxis.axis_label="Measles stats"
p2.xgrid.grid_line_color='gray'
p2.xgrid.grid_line_alpha=.75
p2.xgrid.grid_line_dash = 'dashed'
p2.ygrid.grid_line_color='blue'
p2.ygrid.grid_line_alpha = .55
p2.ygrid.grid_line_dash = 'dotted'

p = gridplot([[p1,None],[p2,None]], toolbar_location='right')
show(p)

## Grouped Bar Charts ##

Sometimes we need to plot a grouped bar chart. For example, we need to group our health indicators for all the countries. For that we need to import a special procedure from the bokeh.models module - FactorRange. Let's look at the data for measles, polio and hiv/aids\*1000 for our list of countries for 2014.

In [95]:
#List of used statistics
stats = ['measles','polio','hiv/aids*1000']

#Creating a dictionary of our data
mdata = {'countries' : countries,
        'measles'   : data[data['year']==2014][data['country'].isin(countries)]['measles'],
        'polio'   : data[data['year']==2014][data['country'].isin(countries)]['polio'],
        'hiv/aids*1000'   : data[data['year']==2014][data['country'].isin(countries)]['hiv/aids']*1000}

# Creating tuples for individual bars [ ("France", "measles"), ("France", "polio"), ("France", "hiv/aids*1000"), ("Canada", "measles"), ... ]
x = [ (country, stat) for country in countries for stat in stats ]
counts = sum(zip(mdata['measles'], mdata['polio'], mdata['hiv/aids*1000']), ()) 

#Creating a column data source - Bokeh's own data type with the fields (Country,[stats],[values],[colors]) 
source = ColumnDataSource(data=dict(x=x, counts=counts, color=random.sample(Turbo256,15)))

#Initializing our plot
p = figure(x_range=FactorRange(*x), plot_height=350, title="Health Stats byCountry")

#Plotting our vertical bar chart
p.vbar(x='x', top='counts', width=0.9  ,fill_color='color',  source=source)

#Enhancing our graph
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = .9
p.xgrid.grid_line_color = None

show(p)

  
  import sys
  


### ColumnDataSource ##

This is a very useful Bokeh data format. It represents a mapping technique where name of the columns (strings) are mapped to their respective values (sequences of values). Even if we do not use it, this is the job Bokeh does under the hood for us. Let's look at at example.

In [103]:
#Importing the class
from bokeh.models import ColumnDataSource

#Creating our data as an empty dictionary
data = dict()

#Populating our data dictionary with key-value pairs
data.update({'x':[1,2,3,4]})
data.update({'y':[4,6,2,5]})
data.update({'color':['red','green','blue','orange']})

#Creating our ColumnDataSource instance
source = ColumnDataSource(data)

#Initializing our plot
p = figure(plot_width=400, plot_height=400)

#Plotting
p.vbar(x = 'x',                #coordinates of the bar centers
      top = 'y',               #bar heights
       width=.8,
      fill_color='color',
      fill_alpha=.6,
       source=source           #source of our data fields is a source instance of 
                               #ColumnDataSource class
  )
show(p)

Even though we do not have to explicitly create an instance of ColumnDataSource class each time when we call a plotting procedure in Bokeh, it's important to know that the framework does create such a structure for us automatically.