### Author: Ran Meng

This jupyter notebook contains my work for certification of "Interactive Data Visualization with Bokeh" instructed by Team Anaconda, from [DataCamp](https://learn.datacamp.com/courses/interactive-data-visualization-with-bokeh)

In [1]:
import pandas as pd
import numpy as np
import yaml
from bokeh.plotting import figure, ColumnDataSource
from bokeh.io import output_notebook, show
from bokeh.models import HoverTool, CategoricalColorMapper, Slider, Select, Button, \
CheckboxGroup, RadioGroup, Toggle
from bokeh.layouts import row, column, gridplot, widgetbox
from bokeh.models.widgets import Panel, Tabs
from bokeh.io import curdoc
from bokeh.themes import Theme
from bokeh.palettes import Spectral6

#### A simple scatter plot

In this example, you're going to make a scatter plot of female literacy vs fertility using data from the European Environmental Agency. This dataset highlights that countries with low female literacy have high birthrates. The x-axis data has been loaded for you as fertility and the y-axis data has been loaded as female_literacy.

Your job is to create a figure, assign x-axis and y-axis labels, and plot female_literacy vs fertility using the circle glyph.

After you have created the figure, in this exercise and the ones to follow, play around with it! Explore the different options available to you on the tab to the right, such as "Pan", "Box Zoom", and "Wheel Zoom". You can click on the question mark sign for more details on any of these tools.

In [2]:
literacy_birth_rate = pd.read_csv('literacy_birth_rate.csv')

In [3]:
literacy_birth_rate.head()

Unnamed: 0,Country,Continent,female literacy,fertility,population
0,Chine,ASI,90.5,1.769,1324655000.0
1,Inde,ASI,50.8,2.682,1139965000.0
2,USA,NAM,99.0,2.077,304060000.0
3,Indonésie,ASI,88.8,2.132,227345100.0
4,Brésil,LAT,90.2,1.827,191971500.0


In [5]:
p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

# Add a circle glyph to the figure p
p.circle(fertility, female_literacy)

# Call the output_notebook() function
output_notebook()
# Display the plot
show(p)

#### A scatter plot with different shapes

By calling multiple glyph functions on the same figure object, we can overlay multiple data sets in the same figure.

In [6]:
fertility_latinamerica = literacy_birth_rate[literacy_birth_rate['Continent'] == 'LAT']['fertility']
female_literacy_latinamerica = literacy_birth_rate[literacy_birth_rate['Continent'] == 'LAT']['female literacy']
fertility_africa = literacy_birth_rate[literacy_birth_rate['Continent'] == 'AF']['fertility']
female_literacy_africa = literacy_birth_rate[literacy_birth_rate['Continent'] == 'AF']['female literacy']

In [7]:
# Create the figure: p
p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

# Add a circle glyph to the figure p
p.circle(fertility_latinamerica, female_literacy_latinamerica)

# Add an x glyph to the figure p
p.x(fertility_africa, female_literacy_africa)

output_notebook()
show(p)

In [8]:
# Create the figure: p
p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

# Add a blue circle glyph to the figure p
p.circle(fertility_latinamerica, female_literacy_latinamerica, color = 'blue', size = 10, alpha = 0.8)

# Add a red circle glyph to the figure p
p.circle(fertility_africa, female_literacy_africa, color = 'red', size = 10, alpha = 0.8)

output_notebook()
show(p)

#### Lines

We can draw lines on Bokeh plots with the line() glyph function.

In this exercise, you'll plot the daily adjusted closing price of Apple Inc.'s stock (AAPL) from 2000 to 2013.

In [9]:
aapl = pd.read_csv('aapl.csv', index_col = 'Unnamed: 0')
aapl.head()

Unnamed: 0,adj_close,close,date,high,low,open,volume
0,31.68,130.31,2000-03-01,132.06,118.5,118.56,38478000
1,29.66,122.0,2000-03-02,127.94,120.69,127.0,11136800
2,31.12,128.0,2000-03-03,128.23,120.0,124.87,11565200
3,30.56,125.69,2000-03-06,129.13,125.0,126.0,7520000
4,29.87,122.87,2000-03-07,127.44,121.12,126.44,9767600


In [10]:
aapl.dtypes

adj_close    float64
close        float64
date          object
high         float64
low          float64
open         float64
volume         int64
dtype: object

In [11]:
# Create a figure with x_axis_type="datetime": p
aapl['date'] = pd.to_datetime(aapl['date'])
p = figure(x_axis_type = 'datetime', x_axis_label ='Date', y_axis_label ='US Dollars')

# Plot date along the x axis and price along the y axis
p.line(aapl['date'], aapl['adj_close'])

output_notebook()
show(p)

In [12]:
# With date on the x-axis and price on the y-axis, add a white circle glyph of size 4
p = figure(x_axis_type = 'datetime', x_axis_label ='Date', y_axis_label ='US Dollars')
p.circle(aapl['date'][:100], aapl['adj_close'][:100], fill_color='white', size=4)

output_notebook()
show(p)

#### Patches

In Bokeh, extended geometrical shapes can be plotted by using the patches() glyph function. The patches glyph takes as input a list-of-lists collection of numeric values specifying the vertices in x and y directions of each distinct patch to plot.

In this exercise, you will plot the state borders of Arizona, Colorado, New Mexico and Utah. The latitude and longitude vertices for each state have been prepared as lists.

Your job is to plot longitude on the x-axis and latitude on the y-axis. The figure object has been created for you as p.

In [15]:
p = figure(x_axis_label ='Longtitude', y_axis_label ='Latitude')

# Create a list of az_lons, co_lons, nm_lons and ut_lons: x
x = [az_lons, co_lons, nm_lons, ut_lons]

# Create a list of az_lats, co_lats, nm_lats and ut_lats: y
y = [az_lats, co_lats, nm_lats, ut_lats]

# Add patches to figure p with line_color=white for x and y
p.patches(x, y, line_color = 'white')

output_notebook()
show(p)

#### Plotting data from NumPy arrays

In the previous exercises, you made plots using data stored in lists. You learned that Bokeh can plot both numbers and datetime objects.

In this exercise, you'll generate NumPy arrays using np.linspace() and np.cos() and plot them using the circle glyph.

In [16]:
p = figure()
# Create array using np.linspace: x
x = np.linspace(0,5,100)

# Create array using np.cos: y
y = np.cos(x)

# Add circles at x and y
p.circle(x,y)

output_notebook()
show(p)

#### Plotting data from Pandas DataFrames

You can create Bokeh plots from Pandas DataFrames by passing column selections to the glyph functions.

Bokeh can plot floating point numbers, integers, and datetime data types. In this example, you will read a CSV file containing information on 392 automobiles manufactured in the US, Europe and Asia from 1970 to 1982.

The CSV file is provided for you as 'auto.csv'.

Your job is to plot miles-per-gallon (mpg) vs horsepower (hp) by passing Pandas column selections into the p.circle() function. Additionally, each glyph will be colored according to values in the color column.

In [17]:
# Read in the CSV file: df
auto = pd.read_csv('auto-mpg.csv')

auto.head()

Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name,color,size
0,18.0,6,250.0,88,3139,14.5,71,US,ford mustang,blue,15.0
1,9.0,8,304.0,193,4732,18.5,70,US,hi 1200d,blue,20.0
2,36.1,4,91.0,60,1800,16.4,78,Asia,honda civic cvcc,red,10.0
3,18.5,6,250.0,98,3525,19.0,77,US,ford granada,blue,15.0
4,34.3,4,97.0,78,2188,15.8,80,Europe,audi 4000,green,10.0


In [18]:
# Create the figure: p
p = figure(x_axis_label='HP', y_axis_label='MPG')

# Plot mpg vs hp by color
p.circle(x = auto['hp'], y = auto['mpg'], color = auto['color'], size = 10)

output_notebook()
show(p)

#### The Bokeh ColumnDataSource (continued)

You can create a ColumnDataSource object directly from a Pandas DataFrame by passing the DataFrame to the class initializer.

In this exercise, we have imported pandas as pd and read in a data set containing all Olympic medals awarded in the 100 meter sprint from 1896 to 2012. A color column has been added indicating the CSS colorname we wish to use in the plot for every data point.

Your job is to import the ColumnDataSource class, create a new ColumnDataSource object from the DataFrame df, and plot circle glyphs with 'Year' on the x-axis and 'Time' on the y-axis. Color each glyph by the color column.

In [19]:
df = pd.read_csv('sprint.csv')

df.head()

Unnamed: 0,Name,Country,Medal,Time,Year,color
0,Usain Bolt,JAM,GOLD,9.63,2012,goldenrod
1,Yohan Blake,JAM,SILVER,9.75,2012,silver
2,Justin Gatlin,USA,BRONZE,9.79,2012,saddlebrown
3,Usain Bolt,JAM,GOLD,9.69,2008,goldenrod
4,Richard Thompson,TRI,SILVER,9.89,2008,silver


In [20]:
source = ColumnDataSource(df)

p = figure()
p.circle(source = source, x = 'Year', y = 'Time', color = 'color', size = 8)

output_notebook()
show(p)

#### Selection and non-selection glyphs

In this exercise, you're going to add the box_select tool to a figure and change the selected and non-selected circle glyph properties so that selected glyphs are red and non-selected glyphs are transparent blue.

In [21]:
# Create a figure with the "box_select" tool: p
p = figure(tools = 'box_select', x_axis_label = 'Year', y_axis_label = 'Time')

# Add circle glyphs to the figure p with the selected and non-selected properties
p.circle(selection_color = 'red', nonselection_color = 'blue', nonselection_alpha = 0.1, \
         source = source, x = 'Year', y = 'Time')

output_notebook()
show(p)

#### Hover glyphs
Now let's practice using and customizing the hover tool.

In this exercise, you're going to plot the blood glucose levels for an unknown patient. The blood glucose levels were recorded every 5 minutes on October 7th starting at 3 minutes past midnight.

The date and time of each measurement are provided to you as x and the blood glucose levels in mg/dL are provided as y.

Your job is to add a circle glyph that will appear red when the mouse is hovered near the data points. You will also add a customized hover tool object to the plot.

When you're done, play around with the hover tool you just created! Notice how the points where your mouse hovers over turn red.

In [22]:
glucose = pd.read_csv('glucose.csv')

glucose.head()

Unnamed: 0,datetime,isig,glucose
0,2010-10-07 00:03:00,22.1,150
1,2010-10-07 00:08:00,21.46,152
2,2010-10-07 00:13:00,21.06,149
3,2010-10-07 00:18:00,20.96,147
4,2010-10-07 00:23:00,21.52,148


In [23]:
glucose.dtypes

datetime     object
isig        float64
glucose       int64
dtype: object

In [24]:
glucose['datetime'] = pd.to_datetime(glucose['datetime'])

In [25]:
p = figure(x_axis_type = 'datetime', x_axis_label ='Time', y_axis_label ='Blood gluclose level (mg/dL)')

p.circle(glucose['datetime'], glucose['glucose'], size=10,
         fill_color='grey', alpha=0.1, line_color=None,
         hover_fill_color= 'firebrick', hover_alpha= 0.5,
         hover_line_color='white')

# Create a HoverTool: hover
hover = HoverTool(tooltips = None, mode = 'vline')

# Add the hover tool to the figure p
p.add_tools(hover)

output_notebook()
show(p)

#### Colormapping

The final glyph customization we'll practice is using the CategoricalColorMapper to color each glyph by a categorical property.

Here, you're going to use the automobile dataset to plot miles-per-gallon vs weight and color each circle glyph by the region where the automobile was manufactured.

The origin column will be used in the ColorMapper to color automobiles manufactured in the US as blue, Europe as red and Asia as green.

In [26]:
# Convert df to a ColumnDataSource: source
source = ColumnDataSource(auto)

# Make a CategoricalColorMapper object: color_mapper
color_mapper = CategoricalColorMapper(factors =['Europe', 'Asia', 'US'],
                                      palette =['red', 'green', 'blue'])
p = figure(x_axis_label = 'weight (lbs)', y_axis_label = 'mpg')
# Add a circle glyph to the figure p
p.circle('weight', 'mpg', source=source,
            color= dict(field = 'origin', transform = color_mapper),
            legend_field ='origin', nonselection_alpha = 0.1)

output_notebook()
show(p)

#### Creating rows of plots

Layouts are collections of Bokeh figure objects.

In this exercise, you're going to create two plots from the Literacy and Birth Rate data set to plot fertility vs female literacy and population vs female literacy.

By using the row() method, you'll create a single layout of the two figures.

Remember, as in the previous chapter, once you have created your figures, you can interact with them in various ways.

In this exercise, you may have to scroll sideways to view both figures in the row layout. Alternatively, you can view the figures in a new window by clicking on the expand icon to the right of the "Bokeh plot" tab.

In [27]:
source = ColumnDataSource(literacy_birth_rate)
# Create the first figure: p1
p1 = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)', \
            x_range = [0,7], y_range = [0,100])

# Add a circle glyph to p1
p1.circle(x = 'fertility', y = 'female literacy', source = source)

# Create the second figure: p2
p2 = figure(x_axis_label='population', y_axis_label='female_literacy (% population)', y_range = [0,100])

# Add a circle glyph to p2
p2.circle(x = 'population', y = 'female literacy', source = source)

# Put p1 and p2 into a horizontal row: layout
layout = row(p1, p2)

output_notebook()
show(layout)

#### Column

In [28]:
# Create a blank figure: p1
p1 = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)', \
           x_range = [0,7], y_range = [0,100])

# Add circle scatter to the figure p1
p1.circle('fertility', 'female literacy', source=source)

# Create a new blank figure: p2

p2 = figure(x_axis_label = 'population', y_axis_label = 'female literacy (% population)', y_range = [0,100])
# Add circle scatter to the figure p2
p2.circle('population', 'female literacy', source = source)

# Put plots p1 and p2 in a column: layout
layout = column(p1, p2)

output_notebook()
show(layout)

#### Nesting rows and columns of plots

You can create nested layouts of plots by combining row and column layouts. In this exercise, you'll make a 3-plot layout in two rows using the auto-mpg data set. Three plots have been created for you: avg_mpg, mpg_hp, and mpg_weight.

Your job is to use the row() and column() functions to make a two-row layout where the first row will have only the average mpg vs year plot and the second row will have mpg vs hp and mpg vs weight plots as columns.

By using the sizing_mode argument, you can scale the widths to fill the whole figure.

In [29]:
auto.head()

Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name,color,size
0,18.0,6,250.0,88,3139,14.5,71,US,ford mustang,blue,15.0
1,9.0,8,304.0,193,4732,18.5,70,US,hi 1200d,blue,20.0
2,36.1,4,91.0,60,1800,16.4,78,Asia,honda civic cvcc,red,10.0
3,18.5,6,250.0,98,3525,19.0,77,US,ford granada,blue,15.0
4,34.3,4,97.0,78,2188,15.8,80,Europe,audi 4000,green,10.0


In [30]:
avg_mpg = auto.groupby(['yr']).mean()['mpg']

In [31]:
avg_mpg.values

array([17.68965517, 21.11111111, 18.71428571, 17.1       , 22.76923077,
       20.26666667, 21.57352941, 23.375     , 24.06111111, 25.09310345,
       33.8037037 , 30.18571429, 32.        ])

In [32]:
# Convert df to a ColumnDataSource: source
source = ColumnDataSource(auto)

# Make a CategoricalColorMapper object: color_mapper

p1= figure(x_axis_label = 'weight (lbs)', y_axis_label = 'mpg')
# Add a circle glyph to the figure p
p1.circle('weight', 'mpg', source=source)

p2= figure(x_axis_label = 'hp', y_axis_label = 'mpg')
# Add a circle glyph to the figure p
p2.circle('hp', 'mpg', source=source)

p3 = figure(x_axis_label = 'year', y_axis_label = 'average mpg', x_range = [70,85], y_range = [15,35])
p3.line(avg_mpg.index, avg_mpg.values)

In [33]:
# Make a row layout that will be used as the second row: row2
row2 = row([p1, p2], sizing_mode='scale_width')

# Make a column layout that includes the above row layout: layout
layout = column([p3, row2], sizing_mode='scale_width')

output_notebook()
show(layout)

#### Creating gridded layouts
Regular grids of Bokeh plots can be generated with gridplot.

In this example, you're going to display four plots of fertility vs female literacy for four regions: Latin America, Africa, Asia and Europe.

In [34]:
literacy_birth_rate.head(10)

Unnamed: 0,Country,Continent,female literacy,fertility,population
0,Chine,ASI,90.5,1.769,1324655000.0
1,Inde,ASI,50.8,2.682,1139965000.0
2,USA,NAM,99.0,2.077,304060000.0
3,Indonésie,ASI,88.8,2.132,227345100.0
4,Brésil,LAT,90.2,1.827,191971500.0
5,Pakistan,ASI,40.0,3.872,166111500.0
6,Bangladesh,ASI,49.8,2.288,160000100.0
7,Nigéria,AF,48.8,5.173,151212300.0
8,Fédération de Russie,EUR,99.4,1.393,141950000.0
9,Japan,ASI,99.0,1.262,127704000.0


In [35]:
fertility_asia = literacy_birth_rate[literacy_birth_rate['Continent'] == 'ASI']['fertility']
female_literacy_asia = literacy_birth_rate[literacy_birth_rate['Continent'] == 'ASI']['female literacy']
fertility_europe = literacy_birth_rate[literacy_birth_rate['Continent'] == 'EUR']['fertility']
female_literacy_europe = literacy_birth_rate[literacy_birth_rate['Continent'] == 'EUR']['female literacy']

In [36]:
# Create the figure: p
p1 = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)', \
            x_range = [1.5,4], y_range = [65,100], title = 'Latin America')
# Add a circle glyph to the figure p
p1.circle(fertility_latinamerica, female_literacy_latinamerica)

# Create the figure: p
p2 = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)', \
            x_range = [2,7], y_range = [15,100], title = 'Africa')
# Add a circle glyph to the figure p
p2.circle(fertility_africa, female_literacy_africa)

# Create the figure: p
p3 = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)', \
            x_range = [0,7], y_range = [0,100], title = 'asia')
# Add a circle glyph to the figure p
p3.circle(fertility_asia, female_literacy_asia)

# Create the figure: p
p4 = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)', \
            x_range = [1,3], y_range = [90,100], title = 'europe')
# Add a circle glyph to the figure p
p4.circle(fertility_europe, female_literacy_europe)

# Create a list containing plots p1 and p2: row1
row1 = [p1, p2]

# Create a list containing plots p3 and p4: row2
row2 = [p3, p4]

# Create a gridplot using row1 and row2: layout
layout = gridplot([row1, row2])

output_notebook()
show(layout)

#### Starting tabbed layouts

Tabbed layouts can be created in Bokeh by placing plots or layouts in Panels.

In this exercise, you'll take the four fertility vs female literacy plots from the last exercise and make a Panel() for each.

No figure will be generated in this exercise. Instead, you will use these panels in the next exercise to build and display a tabbed layout.

In [37]:
# Create tab1 from plot p1: tab1
tab1 = Panel(child=p1, title='Latin America')

# Create tab2 from plot p2: tab2
tab2 = Panel(child=p2, title='Africa')

# Create tab3 from plot p3: tab3
tab3 = Panel(child=p3, title='Asia')

# Create tab4 from plot p4: tab4
tab4 = Panel(child=p4, title='Europe')

In [38]:
# Create a Tabs layout: layout
layout = Tabs(tabs=[tab1, tab2, tab3, tab4])

# Specify the name of the output_file and show the result
output_notebook()
show(layout)

#### Linked axes

Linking axes between plots is achieved by sharing range objects.

In this exercise, you'll link four plots of female literacy vs fertility so that when one plot is zoomed or dragged, one or more of the other plots will respond.

The four plots p1, p2, p3 and p4 along with the layout that you created in the last section have been provided for you.

Your job is link p1 with the three other plots by assignment of the .x_range and .y_range attributes.

After you have linked the axes, explore the plots by clicking and dragging along the x or y axes of any of the plots, and notice how the linked plots change together.

In [39]:
layout = gridplot([row1, row2])
# Link the x_range of p2 to p1: p2.x_range
p2.x_range = p1.x_range

# Link the y_range of p2 to p1: p2.y_range
p2.y_range = p1.y_range

# Link the x_range of p3 to p1: p3.x_range
p3.x_range = p1.x_range

# Link the y_range of p4 to p1: p4.y_range
p4.y_range = p1.y_range

# Specify the name of the output_file and show the result
output_notebook()
show(layout)

#### Linked brushing

By sharing the same ColumnDataSource object between multiple plots, selection tools like BoxSelect and LassoSelect will highlight points in both plots that share a row in the ColumnDataSource.

In this exercise, you'll plot female literacy vs fertility and population vs fertility in two plots using the same ColumnDataSource.

After you have built the figure, experiment with the Lasso Select and Box Select tools. Use your mouse to drag a box or lasso around points in one figure, and notice how points in the other figure that share a row in the ColumnDataSource also get highlighted.

Before experimenting with the Lasso Select, however, click the Bokeh plot pop-out icon to pop out the figure so that you can definitely see everything that you're doing.

In [40]:
source = ColumnDataSource(literacy_birth_rate)

# Create the first figure: p1
p1 = figure(x_axis_label='fertility (children per woman)', y_axis_label='female literacy (% population)',
            tools='box_select, lasso_select', x_range = [0,7], y_range = [0,100])

# Add a circle glyph to p1
p1.circle( 'fertility', 'female literacy', source = source)

# Create the second figure: p2
p2 = figure(x_axis_label='fertility (children per woman)', y_axis_label='population (millions)',
            tools='box_select, lasso_select')

# Add a circle glyph to p2
p2.circle( 'fertility', 'population', source = source)

# Create row layout of figures p1 and p2: layout
layout = row(p1, p2)

# Specify the name of the output_file and show the result
output_notebook()
show(layout)

#### How to create legends
Legends can be added to any glyph by using the legend keyword argument.

In this exercise, you will plot two circle glyphs for female literacy vs fertility in Africa and Latin America.

Your job is to plot two circle glyphs for these two objects with fertility on the x axis and female_literacy on the y axis and add the legend values. 

In [41]:
latin_america = literacy_birth_rate[literacy_birth_rate['Continent'] == 'LAT']
latin_america['Country'] = latin_america['Country '].to_string()
africa = literacy_birth_rate[literacy_birth_rate['Continent'] == 'AF']
africa['Country'] = africa['Country '].to_string()

p = figure(x_axis_label = 'fertility', y_axis_label = 'female literacy', x_range = [0,7], y_range = [0,100])
# Add the first circle glyph to the figure p
p.circle('fertility', 'female literacy', source=ColumnDataSource(latin_america), size=10, color='red', legend_label ='Latin America')

# Add the second circle glyph to the figure p
p.circle('fertility', 'female literacy', source=ColumnDataSource(africa), size=10, color='blue', legend_label = 'Africa')

# Specify the name of the output_file and show the result
output_notebook()
show(p)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


#### Positioning and styling legends

Properties of the legend can be changed by using the legend member attribute of a Bokeh figure after the glyphs have been plotted.

In this exercise, you'll adjust the background color and legend location of the female literacy vs fertility plot from the previous exercise.

In [42]:
# Assign the legend to the bottom left: p.legend.location
p.legend.location = 'bottom_left'

# Fill the legend background with the color 'lightgray': p.legend.background_fill_color

p.legend.background_fill_color = 'lightgray'

# Specify the name of the output_file and show the result
output_notebook()
show(p)

#### Adding a hover tooltip

Working with the HoverTool is easy for data stored in a ColumnDataSource.

In this exercise, you will create a HoverTool object and display the country for each circle glyph in the figure that you created in the last exercise. This is done by assigning the tooltips keyword argument to a list-of-tuples specifying the label and the column of values from the ColumnDataSource using the @ operator.

In [43]:
# Create a HoverTool object: hover
hover = HoverTool(tooltips = [('Country', '@Country')])

# Add the HoverTool object to figure p
p.add_tools(hover)

# Specify the name of the output_file and show the result
output_notebook()
show(p)

#### Using the current document

Let's get started with building an interactive Bokeh app. This typically begins with importing the curdoc, or "current document", function from bokeh.io. This current document will eventually hold all the plots, controls, and layouts that you create. Your job in this exercise is to use this function to add a single plot to your application.

In [44]:
# Create a new plot: plot
def make_doc(doc):
    plot = figure()

    # Add a line to the plot
    plot.line(x = [1,2,3,4,5], y = [2,5,4,6,7])

    # Add the plot to the current document
    doc = doc.add_root(plot)
    
    return doc

In [45]:
show(make_doc)

Add a single slider
In the previous exercise, you added a single plot to the "current document" of your application. In this exercise, you'll practice adding a layout to your current document.

Your job here is to create a single slider, use it to create a widgetbox layout, and then add this layout to the current document.

In [46]:
# Create a new plot: plot
def make_doc(doc):
    # Create a slider: slider
    slider = Slider(title='my slider', start =0, end =10, step =0.1, value =2)

    # Create a widgetbox layout: layout
    layout = widgetbox(slider)

    # Add the layout to the current document
    doc.add_root(layout)
    
    return doc

In [47]:
show(make_doc)

#### Multiple sliders in one document

Having added a single slider in a widgetbox layout to your current document, you'll now add multiple sliders into the current document.

Your job in this exercise is to create two sliders, add them to a widgetbox layout, and then add the layout into the current document.

In [48]:
# Create a new plot: plot
def make_doc(doc):
    # Create a slider: slider
    slider1 = Slider(title='my slider', start =0, end =10, step =0.1, value =2)
    slider2 = Slider(title = 'slider2', start = 10, end = 100, step = 1, value = 20)
    # Create a widgetbox layout: layout
    layout = widgetbox(slider1, slider2)

    # Add the layout to the current document
    doc.add_root(layout)
    
    return doc

In [49]:
show(make_doc)

#### How to combine Bokeh models into layouts

Let's begin making a Bokeh application that has a simple slider and plot, that also updates the plot based on the slider.

In this exercise, your job is to first explicitly create a ColumnDataSource. You'll then combine a plot and a slider into a single column layout, and add it to the current document.

After you are done, notice how in the figure you generate, the slider will not actually update the plot, **because a widget callback has not been defined**. You'll learn how to update the plot using widget callbacks in the next exercise.

In [51]:
def make_doc(doc):
    # Create ColumnDataSource: source
    slider = Slider(title='my slider', start =1, end =10, step =1, value =1)
    plot = figure(x_range = [0,10], y_range = [-0.2, 1])
    
    source = ColumnDataSource(data = {'x': x, 'y': y})

    # Add a line to the plot
    plot.line(x = 'x', y = 'y', source = source)

    # Create a column layout: layout
    layout = column(widgetbox(slider), plot)

    # Add the layout to the current document
    doc.add_root(layout)

In [52]:
show(make_doc)

In [53]:
def make_doc(doc):
    # Create ColumnDataSource: source
    slider = Slider(title='my slider', start =1, end =10, step =1, value =1)
    plot = figure(x_range = [0,10], y_range = [-0.2, 1])
    source = ColumnDataSource(data = {'x': x, 'y': y})
    # Add a line to the plot
    plot.line(x = 'x', y = 'y', source = source)
    
    # Define a callback function: callback
    def callback(attr, old, new):

        # Read the current value of the slider: scale
        scale = slider.value

        # Compute the updated y using np.sin(scale/x): new_y
        new_y = np.sin(scale/x)

        # Update source with the new data values
        source.data = {'x': x, 'y': new_y}

    # Attach the callback to the 'value' property of slider
    slider.on_change('value', callback)

    # Create layout and add to current document
    layout = column(widgetbox(slider), plot)
    doc.add_root(layout)

In [54]:
show(make_doc)

#### Updating data sources from dropdown callbacks

You'll now learn to update the plot's data using a drop down menu instead of a slider. This would allow users to do things like select between different data sources to view.

In [55]:
population = literacy_birth_rate['population'][:162]

def make_doc(doc):
    source = ColumnDataSource(data={
    'x' : fertility,
    'y' : female_literacy
    })
    
    # Create a new plot: plot
    plot = figure()

    # Add circles to the plot
    plot.circle('x', 'y', source=source)

    # Define a callback function: update_plot
    def update_plot(attr, old, new):
        # If the new Selection is 'female_literacy', update 'y' to female_literacy
        if new == 'female_literacy': 
            source.data = {
                'x' : fertility,
                'y' : female_literacy
            }
        # Else, update 'y' to population
        else:
            source.data = {
                'x' : fertility,
                'y' : population
            }
            
    # Create a dropdown Select widget: select    
    select = Select(title="distribution", options=['female_literacy', 'population'], value='female_literacy')

    # Attach the update_plot callback to the 'value' property of select
    select.on_change('value' , update_plot)

    # Create layout and add to current document
    layout = column(select, plot)
    doc.add_root(layout)

In [56]:
show(make_doc)

#### Synchronize two dropdowns

Here, you'll practice using a dropdown callback to update another dropdown's options. This will allow you to customize your applications even further and is a powerful addition to your toolbox.

Your job in this exercise is to create two dropdown select widgets and then define a callback such that one dropdown is used to update the other dropdown.

In [57]:
def make_doc(doc):
    # Create two dropdown Select widgets: select1, select2
    select1 = Select(title='First', options=['A', 'B'], value='A')
    select2 = Select(title='Second', options=['1', '2', '3'], value='1')

    # Define a callback function: callback
    def callback(attr, old, new):
        # If select1 is 'A' 
        if select1.value == 'A':
            # Set select2 options to ['1', '2', '3']
            select2.options = ['1', '2', '3']

            # Set select2 value to '1'
            select2.value = '1'
        else:
            # Set select2 options to ['100', '200', '300']
            select2.options = ['100', '200', '300']

            # Set select2 value to '100'
            select2.value = '100'

    # Attach the callback to the 'value' property of select1
    select1.on_change('value', callback)

    # Create layout and add to current document
    layout = widgetbox(select1, select2)
    doc.add_root(layout)
    

In [58]:
show(make_doc)

#### Button widgets

It's time to practice adding buttons to your interactive visualizations. Your job in this exercise is to create a button and use its on_click() method to update a plot.

In [59]:
def make_doc(doc):
    plot = figure(x_range = [0,10], y_range = [-1, 2])
    source = ColumnDataSource(data = {'x': x, 'y': y})
    # Create scattered plot
    plot.circle(x = 'x', y = 'y', source = source)
    
    button = Button(label = 'Update Data')

    # Define an update callback with no arguments: update
    def update():

        # Compute new y values: y
        y = np.sin(x) + np.random.random(len(x))

        # Update the ColumnDataSource data dictionary
        source.data = {'x': x, 'y': y}

    # Add the update callback to the button
    button.on_click(update)

    # Create layout and add to current document
    layout = column(widgetbox(button), plot)
    doc.add_root(layout)

In [60]:
show(make_doc)

In [61]:
def make_doc(doc):
    # Add a Toggle: toggle
    toggle = Toggle(button_type = 'success', label = 'Toggle button')

    # Add a CheckboxGroup: checkbox
    checkbox = CheckboxGroup(labels=['Option 1', 'Option 2', 'Option 3'])

    # Add a RadioGroup: radio
    radio = RadioGroup(labels=['Option 1', 'Option 2', 'Option 3'])

    # Add widgetbox(toggle, checkbox, radio) to the current document
    doc.add_root(widgetbox(toggle, checkbox, radio))

In [62]:
show(make_doc)

### Introducing the project dataset

For the final chapter, you'll be looking at some of the Gapminder datasets combined into one tidy file called "gapminder_tidy.csv"

Here, you'll continue your Exploratory Data Analysis by making a simple plot of Life Expectancy vs Fertility for the year 1970.

Your job is to import the relevant Bokeh modules and then prepare a ColumnDataSource object with the fertility, life and Country columns, where you only select the rows with the index value 1970.

In [63]:
data = pd.read_csv('gapminder_tidy.csv', index_col = 'Year')

data.head()

Unnamed: 0_level_0,Country,fertility,life,population,child_mortality,gdp,region
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1964,Afghanistan,7.671,33.639,10474903.0,339.7,1182.0,South Asia
1965,Afghanistan,7.671,34.152,10697983.0,334.1,1182.0,South Asia
1966,Afghanistan,7.671,34.662,10927724.0,328.7,1168.0,South Asia
1967,Afghanistan,7.671,35.17,11163656.0,323.3,1173.0,South Asia
1968,Afghanistan,7.671,35.674,11411022.0,318.1,1187.0,South Asia


In [64]:
data.shape

(10111, 7)

In [65]:
# Make the ColumnDataSource: source
source = ColumnDataSource(data={
    'x'       : data.loc[1970].fertility,
    'y'       : data.loc[1970].life,
    'country' : data.loc[1970].Country,
})

# Create the figure: p
p = figure(title='1970', x_axis_label='Fertility (children per woman)', y_axis_label='Life Expectancy (years)',
           plot_height=400, plot_width=700,
           tools=[HoverTool(tooltips='@country')])

# Add a circle glyph to the figure p
p.circle(x='x', y='y', source=source)

output_notebook()
show(p)

Life expectancy seems to go down as fertility goes up

In [66]:
# Make the ColumnDataSource: source
source = ColumnDataSource(data={
    'x'       : data.loc[1970].fertility,
    'y'       : data.loc[1970].life,
    'country'      : data.loc[1970].Country,
    'pop'      : (data.loc[1970].population / 20000000) + 2,
    'region'      : data.loc[1970].region,
})

# Save the minimum and maximum values of the fertility column: xmin, xmax
xmin, xmax = min(data.fertility), max(data.fertility)

# Save the minimum and maximum values of the life expectancy column: ymin, ymax
ymin, ymax = min(data.life), max(data.life)

# Create the figure: plot
plot = figure(title='Gapminder Data for 1970', plot_height=400, plot_width= 700,
              x_range=(xmin, xmax), y_range=(ymin, ymax))

# Add circle glyphs to the plot
plot.circle(x='x', y='y', fill_alpha = 0.8, source=source)

# Set the x-axis label
plot.xaxis.axis_label ='Fertility (children per woman)'

# Set the y-axis label
plot.yaxis.axis_label = 'Life Expectancy (years)'

output_notebook()
show(plot)

#### Enhancing the plot with some shading

Now that you have the base plot ready, you can enhance it by coloring each circle glyph by continent.

Your job is to make a list of the unique regions from the data frame, prepare a ColorMapper, and add it to the circle glyph.

In [67]:
# Make a list of the unique values from the region column: regions_list
regions_list = data.region.unique().tolist()

# Make a color mapper: color_mapper
color_mapper = CategoricalColorMapper(factors=regions_list, palette=Spectral6)

# Add the color mapper to the circle glyph
plot.circle(x='x', y='y', fill_alpha=0.8, source=source,
            color=dict(field = 'region', transform=color_mapper), legend = 'region')

# Set the legend.location attribute of the plot to 'top_right'
plot.legend.location = 'bottom_left'

output_notebook()
show(plot)



#### Adding a slider to vary the year

Until now, we've been plotting data only for 1970. In this exercise, you'll add a slider to your plot to change the year being plotted. To do this, you'll create an update_plot() function and associate it with a slider to select values between 1970 and 2010.

After you are done, you may have to scroll to the right to view the entire plot.

In [68]:
def make_doc(doc):
    data = pd.read_csv('gapminder_tidy.csv', index_col = 'Year')
    
    source = ColumnDataSource(data={
    'x'       : data.loc[1970].fertility,
    'y'       : data.loc[1970].life,
    'country'      : data.loc[1970].Country,
    'pop'      : (data.loc[1970].population / 20000000) + 2,
    'region'      : data.loc[1970].region,
    })

    # Save the minimum and maximum values of the fertility column: xmin, xmax
    xmin, xmax = min(data.fertility), max(data.fertility)

    # Save the minimum and maximum values of the life expectancy column: ymin, ymax
    ymin, ymax = min(data.life), max(data.life)

    # Create the figure: plot
    plot = figure(title='Gapminder Data for 1970', plot_height=400, plot_width= 700,
                  x_range=(xmin, xmax), y_range=(ymin, ymax))

    # Add circle glyphs to the plot
    plot.circle(x='x', y='y', fill_alpha = 0.8, source=source)

    # Set the x-axis label
    plot.xaxis.axis_label ='Fertility (children per woman)'

    # Set the y-axis label
    plot.yaxis.axis_label = 'Life Expectancy (years)'

    # Make a list of the unique values from the region column: regions_list
    regions_list = data.region.unique().tolist()

    # Make a color mapper: color_mapper
    color_mapper = CategoricalColorMapper(factors=regions_list, palette=Spectral6)

    # Add the color mapper to the circle glyph
    plot.circle(x='x', y='y', fill_alpha=0.8, source=source,
                color=dict(field = 'region', transform=color_mapper), legend = 'region')

    # Set the legend.location attribute of the plot to 'top_right'
    plot.legend.location = 'bottom_left'

    # Define the callback function: update_plot
    def update_plot(attr, old, new):
        # Set the yr name to slider.value and new_data to source.data
        yr = slider.value
        new_data = {
            'x'       : data.loc[yr].fertility,
            'y'       : data.loc[yr].life,
            'country' : data.loc[yr].Country,
            'pop'     : (data.loc[yr].population / 20000000) + 2,
            'region'  : data.loc[yr].region,
        }
        source.data = new_data
        plot.title.text = 'Gapminder Data for {0}'.format(yr)
        

    # Make a slider object: slider
    slider = Slider(start = 1970, end = 2010, step = 1, value = 1970, title = 'Year')

    # Attach the callback to the 'value' property of slider
    slider.on_change('value', update_plot)

    # Make a row layout of widgetbox(slider) and plot and add it to the current document
    layout = row(widgetbox(slider), plot)

    doc.add_root(layout)

In [69]:
show(make_doc)

#### Adding a hover tool

In this exercise, you'll practice adding a hover tool to drill down into data column values and display more detailed information about each scatter point.

After you're done, experiment with the hover tool and see how it displays the name of the country when your mouse hovers over a point!

In [70]:
def make_doc(doc):
    data = pd.read_csv('gapminder_tidy.csv', index_col = 'Year')
    
    source = ColumnDataSource(data={
    'x'       : data.loc[1970].fertility,
    'y'       : data.loc[1970].life,
    'country'      : data.loc[1970].Country,
    'pop'      : (data.loc[1970].population / 20000000) + 2,
    'region'      : data.loc[1970].region,
    })

    # Save the minimum and maximum values of the fertility column: xmin, xmax
    xmin, xmax = min(data.fertility), max(data.fertility)

    # Save the minimum and maximum values of the life expectancy column: ymin, ymax
    ymin, ymax = min(data.life), max(data.life)

    # Create the figure: plot
    plot = figure(title='Gapminder Data for 1970', plot_height=400, plot_width= 700,
                  x_range=(xmin, xmax), y_range=(ymin, ymax))

    # Add circle glyphs to the plot
    plot.circle(x='x', y='y', fill_alpha = 0.8, source=source)

    # Set the x-axis label
    plot.xaxis.axis_label ='Fertility (children per woman)'

    # Set the y-axis label
    plot.yaxis.axis_label = 'Life Expectancy (years)'

    # Make a list of the unique values from the region column: regions_list
    regions_list = data.region.unique().tolist()

    # Make a color mapper: color_mapper
    color_mapper = CategoricalColorMapper(factors=regions_list, palette=Spectral6)

    # Add the color mapper to the circle glyph
    plot.circle(x='x', y='y', fill_alpha=0.8, source=source,
                color=dict(field = 'region', transform=color_mapper), legend = 'region')

    # Set the legend.location attribute of the plot to 'top_right'
    plot.legend.location = 'bottom_left'

    # Define the callback function: update_plot
    def update_plot(attr, old, new):
        # Set the yr name to slider.value and new_data to source.data
        yr = slider.value
        new_data = {
            'x'       : data.loc[yr].fertility,
            'y'       : data.loc[yr].life,
            'country' : data.loc[yr].Country,
            'pop'     : (data.loc[yr].population / 20000000) + 2,
            'region'  : data.loc[yr].region,
        }
        source.data = new_data
        plot.title.text = 'Gapminder Data for {0}'.format(yr)
        

    # Make a slider object: slider
    slider = Slider(start = 1970, end = 2010, step = 1, value = 1970, title = 'Year')

    # Attach the callback to the 'value' property of slider
    slider.on_change('value', update_plot)

    # Create a HoverTool: hover
    hover = HoverTool(tooltips = [('Country', '@country')])

    # Add the HoverTool to the plot
    plot.add_tools(hover)
    # Create layout: layout
    layout = row(widgetbox(slider), plot)

    doc.add_root(layout)

In [71]:
show(make_doc)

#### Adding dropdowns to the app
As a final step in enhancing your application, in this exercise you'll add dropdowns for interactively selecting different data features. In combination with the hover tool you added in the previous exercise, as well as the slider to change the year, you'll have a powerful app that allows you to interactively and quickly extract some great insights from the dataset!

In [72]:
def make_doc(doc):
    data = pd.read_csv('gapminder_tidy.csv', index_col = 'Year')
    
    source = ColumnDataSource(data={
    'x'       : data.loc[1970].fertility,
    'y'       : data.loc[1970].life,
    'country'      : data.loc[1970].Country,
    'pop'      : (data.loc[1970].population / 20000000) + 2,
    'region'      : data.loc[1970].region,
    })

    # Create the figure: plot
    plot = figure(title='Gapminder Data for 1970', plot_height=400, plot_width= 700,
                  x_range=(xmin, xmax), y_range=(ymin, ymax))

    # Add circle glyphs to the plot
    plot.circle(x='x', y='y', fill_alpha = 0.8, source=source)

    # Make a list of the unique values from the region column: regions_list
    regions_list = data.region.unique().tolist()

    # Make a color mapper: color_mapper
    color_mapper = CategoricalColorMapper(factors=regions_list, palette=Spectral6)

    # Add the color mapper to the circle glyph
    plot.circle(x='x', y='y', fill_alpha=0.8, source=source,
                color=dict(field = 'region', transform=color_mapper), legend = 'region')

    # Set the legend.location attribute of the plot to 'top_right'
    plot.legend.location = 'bottom_left'

    # Define the callback function: update_plot
    def update_plot(attr, old, new):
        # Read the current value off the slider and 2 dropdowns: yr, x, y
        yr = slider.value
        x = x_select.value
        y = y_select.value
        # Label axes of plot
        plot.xaxis.axis_label = x
        plot.yaxis.axis_label = y
        # Set new_data
        new_data = {
            'x'       : data.loc[yr][x],
            'y'       : data.loc[yr][y],
            'country' : data.loc[yr].Country,
            'pop'     : (data.loc[yr].population / 20000000) + 2,
            'region'  : data.loc[yr].region,
        }
        
        # Set the range of all axes
        plot.x_range.start = min(data[x])
        plot.x_range.end = max(data[x])
        plot.y_range.start = min(data[y])
        plot.y_range.end = max(data[y])
        
        source.data = new_data
        plot.title.text = 'Gapminder Data for {0}'.format(yr)
        

    # Make a slider object: slider
    slider = Slider(start = 1970, end = 2010, step = 1, value = 1970, title = 'Year')

    # Attach the callback to the 'value' property of slider
    slider.on_change('value', update_plot)

    # Create a HoverTool: hover
    hover = HoverTool(tooltips = [('Country', '@country')])

    # Add the HoverTool to the plot
    plot.add_tools(hover)
    
    # Create a dropdown Select widget for the x data: x_select
    x_select = Select(
        options=['fertility', 'life', 'child_mortality', 'gdp'],
        value='fertility',
        title='x-axis data'
    )

    # Attach the update_plot callback to the 'value' property of x_select
    x_select.on_change('value', update_plot)

    # Create a dropdown Select widget for the y data: y_select
    y_select = Select(
        options=['fertility', 'life', 'child_mortality', 'gdp'],
        value='life',
        title='y-axis data'
    )

    # Attach the update_plot callback to the 'value' property of y_select
    y_select.on_change('value', update_plot)
    
    
    # Create layout: layout
    layout = row(widgetbox(slider, x_select, y_select), plot)

    doc.add_root(layout)

In [73]:
show(make_doc)

