# Exploring with Bokeh charts

In [21]:
import pandas as pd

from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource
from bokeh.io import output_notebook

Use the `WHO.csv` dataset in the `data` folder to build an interactive chart with Bokeh.

In [22]:
who_df = pd.read_csv('data/WHO.csv')

# examine dataset
who_df.head()

Unnamed: 0,Country,Region,Population,Under15,Over60,FertilityRate,LifeExpectancy,ChildMortality,CellularSubscribers,LiteracyRate,GNI,PrimarySchoolEnrollmentMale,PrimarySchoolEnrollmentFemale
0,Afghanistan,Eastern Mediterranean,29825,47.42,3.82,5.4,60,98.5,54.26,,1140.0,,
1,Albania,Europe,3162,21.33,14.93,1.75,74,16.7,96.39,,8820.0,,
2,Algeria,Africa,38482,27.42,7.17,2.83,73,20.0,98.99,,8310.0,98.2,96.4
3,Andorra,Europe,78,15.2,22.86,,82,3.2,75.49,,,78.4,79.4
4,Angola,Africa,20821,47.58,3.84,6.1,51,163.5,48.38,70.1,5230.0,93.1,78.2


In [23]:
# how many rows, columns?
who_df.shape

(194, 13)

In [24]:
# how many countries in each region?
who_df.Region.value_counts()

Europe                   53
Africa                   46
Americas                 35
Western Pacific          27
Eastern Mediterranean    22
South-East Asia          11
Name: Region, dtype: int64

In [25]:
# get column names for easy copy/pasting
who_df.columns

Index(['Country', 'Region', 'Population', 'Under15', 'Over60', 'FertilityRate',
       'LifeExpectancy', 'ChildMortality', 'CellularSubscribers',
       'LiteracyRate', 'GNI', 'PrimarySchoolEnrollmentMale',
       'PrimarySchoolEnrollmentFemale'],
      dtype='object')

I think I would like to make a scatter plot of child mortality against life expectancy. Let's investigate those two columns.

In [26]:
# examine ChildMortality
who_df.ChildMortality.describe()

count    194.000000
mean      36.148969
std       37.992935
min        2.200000
25%        8.425000
50%       18.600000
75%       55.975000
max      181.600000
Name: ChildMortality, dtype: float64

In [27]:
# examine LifeExpectancy
who_df.LifeExpectancy.describe()

count    194.000000
mean      70.010309
std        9.259075
min       47.000000
25%       64.000000
50%       72.500000
75%       76.000000
max       83.000000
Name: LifeExpectancy, dtype: float64

Seems like there are numeric values for all countries, since min and max are numbers for both columns I just examined. No cleaning or dleting of rows will be required.

Let's start with a Bokeh chart. 

See the URLs in the comments in the next cell for how I figured out the things.

In [28]:
# make a chart where x-axis is ChildMortality and y-axis is LifeExpectancy 

# for tools, see https://bokeh.pydata.org/en/latest/docs/user_guide/tools.html 
TOOLS = "zoom_in,zoom_out,hover,pan,crosshair,reset"

# define data source as the data frame
source = ColumnDataSource(who_df)

# for tooltips, see same page - the @ things are column names from the defined source 
TOOLTIPS = [
    ("country", "@Country"),
    ("mortality", "@ChildMortality"),
    ("life exp", "@LifeExpectancy")
]

# for figure, see https://bokeh.pydata.org/en/latest/docs/reference/plotting.html 
p = figure(tools=TOOLS,
           tooltips=TOOLTIPS,
           x_axis_label="Child Mortality per 1,000 Live Births",
           y_axis_label="Life Expectancy in Years",
           title="WHO Data: Child Mortality and Life Expectancy",
           plot_width=900)

# for scatter, see https://bokeh.pydata.org/en/latest/docs/gallery/color_scatter.html 
p.scatter('ChildMortality', 'LifeExpectancy', 
          source=source,
          line_color="#6666ee",
          fill_color="#ee6666", 
          fill_alpha=0.6,
          size=16)

# if you delete or comment out the next line, chart opens in new browser tab instead
output_notebook()

# show the chart defined above as "p" 
show(p)


In [29]:
# added code for exporting the chart as stand-alone HTML + JS
# https://bokeh.pydata.org/en/latest/docs/user_guide/embed.html 
from bokeh.resources import CDN
from bokeh.embed import file_html

# create a complete HTML file (p is the variable from above)
html = file_html(p, CDN, "bokeh_WHO_data")

# regular python to write the file
newfile = open('bokeh_WHO_data.html', 'w')
newfile.write(html)
newfile.close()


I would like to color the dots by region, but that will wait for another day. 