# 1. Interactive Data Visualization with Bokeh

## A simple scatter plot
In this example, you're going to make a scatter plot of female literacy vs fertility using data from the European Environmental Agency. This dataset highlights that countries with low female literacy have high birthrates. The x-axis data has been loaded for you as fertility and the y-axis data has been loaded as female_literacy.

Your job is to create a figure, assign x-axis and y-axis labels, and plot female_literacy vs fertility using the circle glyph.

After you have created the figure, in this exercise and the ones to follow, play around with it! Explore the different options available to you on the tab to the right, such as "Pan", "Box Zoom", and "Wheel Zoom". You can click on the question mark sign for more details on any of these tools.

Note: You may have to scroll down to view the lower portion of the figure.

In [3]:
import pandas as pd
df = pd.read_csv('./literacy_birth_rate.csv')
fertility = df.fertility
female_literacy = df['female literacy']

In [8]:
len(fertility)

182

In [9]:
len(female_literacy)

182

In [11]:
# Import figure from bokeh.plotting
from bokeh.plotting import figure

# Import output_file and show from bokeh.io
from bokeh.io import output_file, show

# Create the figure: p
p = figure(x_axis_label='fertility (children per woman)', 
y_axis_label='female_literacy (% population)')

# Add a circle glyph to the figure p
p.circle(fertility, female_literacy)

# Call the output_file() function and specify the name of the file
output_file('fert_lit.html')

# Display the plot
show(p)


## A scatter plot with different shapes
By calling multiple glyph functions on the same figure object, we can overlay multiple data sets in the same figure.

In this exercise, you will plot female literacy vs fertility for two different regions, Africa and Latin America. Each set of x and y data has been loaded separately for you as fertility_africa, female_literacy_africa, fertility_latinamerica, and female_literacy_latinamerica.

Your job is to plot the Latin America data with the circle() glyph, and the Africa data with the x() glyph.

figure has already been imported for you from bokeh.plotting.

In [19]:
df.head()
df_lat = df.loc[df.Continent == 'LAT',:]
fertility_latinamerica = df.fertility
female_literacy_latinamerica = df["female literacy"]
df_af = df.loc[df.Continent == 'AF',:]
fertility_africa = df_af.fertility
female_literacy_africa = df_af["female literacy"]

In [20]:
# Create the figure: p
p = figure(x_axis_label='fertility', y_axis_label='female_literacy (% population)')

# Add a circle glyph to the figure p
p.circle(fertility_latinamerica, female_literacy_latinamerica)

# Add an x glyph to the figure p
p.x(fertility_africa, female_literacy_africa)

# Specify the name of the file
output_file('fert_lit_separate.html')

# Display the plot
show(p)

## Customizing your scatter plots
The three most important arguments to customize scatter glyphs are color, size, and alpha. Bokeh accepts colors as hexadecimal strings, tuples of RGB values between 0 and 255, and any of the 147 CSS color names. Size values are supplied in screen space units with 100 meaning the size of the entire figure.

The alpha parameter controls transparency. It takes in floating point numbers between 0.0, meaning completely transparent, and 1.0, meaning completely opaque.

In this exercise, you'll plot female literacy vs fertility for Africa and Latin America as red and blue circle glyphs, respectively.

In [23]:
# Create the figure: p
p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

# Add a blue circle glyph to the figure p
p.circle(fertility_latinamerica, female_literacy_latinamerica, color='blue', size=10, alpha=0.8)

# Add a red circle glyph to the figure p
p.circle(fertility_africa, female_literacy_africa, color='red', size=10, alpha=0.8)

# Specify the name of the file
output_file('fert_lit_separate_colors.html')

# Display the plot
show(p)


## Lines
We can draw lines on Bokeh plots with the line() glyph function.

In this exercise, you'll plot the daily adjusted closing price of Apple Inc.'s stock (AAPL) from 2000 to 2013.

The data points are provided for you as lists. date is a list of datetime objects to plot on the x-axis and price is a list of prices to plot on the y-axis.

Since we are plotting dates on the x-axis, you must add x_axis_type='datetime' when creating the figure object.

In [49]:
from datetime import datetime
def convert_datetime(date):
    return datetime.strptime(date, '%Y-%m-%d')


In [50]:
df = pd.read_csv('./aapl.csv')
df.head()
date = df.date.map(convert_datetime)
price = df.close

In [51]:
# Import figure from bokeh.plotting
from bokeh.plotting import figure

# Create a figure with x_axis_type="datetime": p
p = figure(x_axis_type='datetime', x_axis_label='Date', y_axis_label='US Dollars')

# Plot date along the x axis and price along the y axis
p.line(date, price)

# Specify the name of the output file and show the result
output_file('line.html')
show(p)

## Lines and markers
Lines and markers can be combined by plotting them separately using the same data points.

In this exercise, you'll plot a line and circle glyph for the AAPL stock prices. Further, you'll adjust the fill_color keyword argument of the circle() glyph function while leaving the line_color at the default value.

The date and price lists are provided. The Bokeh figure object p that you created in the previous exercise has also been provided.

In [52]:
# Import figure from bokeh.plotting
from bokeh.plotting import figure

# Create a figure with x_axis_type='datetime': p
p = figure(x_axis_type='datetime', x_axis_label='Date', y_axis_label='US Dollars')

# Plot date along the x-axis and price along the y-axis
p.line(date, price)

# With date on the x-axis and price on the y-axis, add a white circle glyph of size 4
p.circle(date, price, fill_color='white', size=4)

# Specify the name of the output file and show the result
output_file('line.html')
show(p)

## Patches
In Bokeh, extended geometrical shapes can be plotted by using the patches() glyph function. The patches glyph takes as input a list-of-lists collection of numeric values specifying the vertices in x and y directions of each distinct patch to plot.

In this exercise, you will plot the state borders of Arizona, Colorado, New Mexico and Utah. The latitude and longitude vertices for each state have been prepared as lists.

Your job is to plot longitude on the x-axis and latitude on the y-axis. The figure object has been created for you as p.

In [54]:
az_lons = [-114.63332,
 -114.63349,
 -114.63423,
 -114.60899,
 -114.63064,
 -114.57354,
 -114.58031,
 -114.61121,
 -114.6768,
 -114.66076,
 -114.65449,
 -114.68702,
 -114.69704,
 -114.70415,
 -114.67489,
 -114.70883,
 -114.74365,
 -114.73513,
 -114.6729,
 -114.51122,
 -114.32346,
 -114.22646,
 -114.1139,
 -114.04404,
 -114.04338,
 -114.04736,
 -114.05014,
 -114.0506,
 -114.0506,
 -114.05052,
 -113.94557,
 -113.86852,
 -113.62465,
 -113.4727,
 -113.32097,
 -113.17698,
 -113.02079,
 -112.99281,
 -112.96895,
 -112.75086,
 -112.48455,
 -112.32985,
 -111.99142,
 -111.58602,
 -111.39598,
 -111.2523,
 -111.03957,
 -110.73783,
 -110.54945,
 -110.272,
 -110.13851,
 -109.83491,
 -109.43568,
 -109.26993,
 -109.04538,
 -109.04522,
 -109.04522,
 -109.04531,
 -109.04544,
 -109.04547,
 -109.04579,
 -109.04575,
 -109.04601,
 -109.04578,
 -109.04606,
 -109.04621,
 -109.04636,
 -109.04662,
 -109.04644,
 -109.04598,
 -109.04603,
 -109.04633,
 -109.04692,
 -109.047,
 -109.04691,
 -109.0474,
 -109.04762,
 -109.04764,
 -109.04811,
 -109.04905,
 -109.04911,
 -109.05004,
 -109.0587,
 -109.25062,
 -109.30069,
 -109.33682,
 -109.38186,
 -109.45105,
 -109.5287,
 -109.62562,
 -109.79302,
 -109.97582,
 -110.20503,
 -110.49327,
 -110.56918,
 -110.65415,
 -110.77828,
 -110.87564,
 -110.93778,
 -110.94286,
 -110.97553,
 -111.12565,
 -111.24082,
 -111.29191,
 -111.32558,
 -111.3574,
 -111.38483,
 -111.44337,
 -111.47861,
 -111.49725,
 -111.53479,
 -111.56975,
 -111.62412,
 -111.66,
 -111.73365,
 -111.79498,
 -111.9182,
 -111.97172,
 -111.99115,
 -112.02937,
 -112.09379,
 -112.13972,
 -112.15906,
 -112.21295,
 -112.32605,
 -112.39932,
 -112.43603,
 -112.52208,
 -112.57141,
 -112.63294,
 -112.67695,
 -112.72356,
 -112.75567,
 -112.8055,
 -112.83423,
 -112.8711,
 -112.90863,
 -113.20884,
 -113.2279,
 -113.30314,
 -113.61086,
 -113.78489,
 -113.90756,
 -113.97121,
 -114.11135,
 -114.20719,
 -114.25559,
 -114.28755,
 -114.38472,
 -114.61337,
 -114.77804,
 -114.81394,
 -114.81518,
 -114.80524,
 -114.81037,
 -114.81335,
 -114.80551,
 -114.80529,
 -114.79282,
 -114.79206,
 -114.79555,
 -114.81362,
 -114.80894,
 -114.80404,
 -114.80093,
 -114.80804,
 -114.80891,
 -114.80192,
 -114.79518,
 -114.7819,
 -114.77309,
 -114.76427,
 -114.74805,
 -114.74638,
 -114.74505,
 -114.74456,
 -114.74203,
 -114.74,
 -114.73874,
 -114.73062,
 -114.72924,
 -114.72377,
 -114.71994,
 -114.71919,
 -114.69096,
 -114.63501,
 -114.58576,
 -114.46563,
 -114.48131,
 -114.62973,
 -114.68157,
 -114.72123,
 -114.61185,
 -114.5402,
 -114.49649,
 -114.52801,
 -114.51318,
 -114.49813,
 -114.4355,
 -114.35765,
 -114.26017,
 -114.14737,
 -114.29195,
 -114.38169,
 -114.44166,
 -114.48236,
 -114.56953,
 -114.63305]

az_lats = [34.87057,
 35.00186,
 35.00332,
 35.07971,
 35.11791,
 35.14231,
 35.21811,
 35.37012,
 35.49125,
 35.5417,
 35.60517,
 35.66942,
 35.73579,
 35.81412,
 35.86436,
 35.9167,
 35.98542,
 36.05493,
 36.11546,
 36.15058,
 36.10119,
 36.01461,
 36.09833,
 36.21464,
 36.37619,
 36.60322,
 36.817,
 36.99997,
 37.0004,
 37.0004,
 36.99998,
 36.99998,
 36.99998,
 36.99998,
 36.99998,
 36.99998,
 37.00022,
 37.00017,
 37.00012,
 37.00048,
 37.00094,
 37.00105,
 37.00097,
 37.00166,
 37.00147,
 37.00102,
 37.00247,
 37.00325,
 37.00383,
 36.99828,
 36.99845,
 36.99831,
 36.9991,
 36.99926,
 36.99908,
 36.99908,
 36.99897,
 36.8531,
 36.70384,
 36.54513,
 36.41637,
 36.29154,
 36.18724,
 36.03128,
 35.93088,
 35.81044,
 35.65092,
 35.45859,
 35.30697,
 34.91388,
 34.71264,
 34.44558,
 34.08446,
 33.71335,
 33.3477,
 33.07165,
 32.70386,
 32.40743,
 32.1771,
 31.87069,
 31.63698,
 31.3325,
 31.33252,
 31.3338,
 31.33396,
 31.334,
 31.33394,
 31.33406,
 31.33393,
 31.33408,
 31.33399,
 31.33341,
 31.33363,
 31.33296,
 31.33299,
 31.33305,
 31.33363,
 31.33328,
 31.33281,
 31.33283,
 31.33257,
 31.34898,
 31.38586,
 31.40231,
 31.41305,
 31.42333,
 31.43196,
 31.45068,
 31.46195,
 31.4678,
 31.47995,
 31.49099,
 31.50825,
 31.51945,
 31.54305,
 31.56227,
 31.6012,
 31.61823,
 31.62425,
 31.63623,
 31.65645,
 31.67094,
 31.67701,
 31.69377,
 31.72891,
 31.75165,
 31.76301,
 31.78954,
 31.80473,
 31.82357,
 31.83702,
 31.8513,
 31.86132,
 31.87666,
 31.88514,
 31.89671,
 31.90787,
 31.99917,
 32.0054,
 32.02905,
 32.12566,
 32.17992,
 32.21797,
 32.2376,
 32.28088,
 32.31044,
 32.32538,
 32.33509,
 32.36468,
 32.43408,
 32.48373,
 32.49526,
 32.50602,
 32.50999,
 32.51839,
 32.52419,
 32.53277,
 32.54351,
 32.55396,
 32.56772,
 32.56625,
 32.56133,
 32.57093,
 32.58137,
 32.5955,
 32.60317,
 32.61295,
 32.6238,
 32.62325,
 32.6247,
 32.63705,
 32.65006,
 32.66489,
 32.66985,
 32.67414,
 32.6785,
 32.68221,
 32.68517,
 32.68732,
 32.6986,
 32.70545,
 32.71192,
 32.71829,
 32.71943,
 32.73946,
 32.73137,
 32.73487,
 32.87408,
 32.97206,
 33.03255,
 33.23376,
 33.39691,
 33.47131,
 33.58709,
 33.6969,
 33.84446,
 33.91285,
 33.96372,
 34.04257,
 34.12866,
 34.17212,
 34.31087,
 34.41527,
 34.47903,
 34.64288,
 34.71453,
 34.79181,
 34.86997]

In [57]:
# Create a list of az_lons, co_lons, nm_lons and ut_lons: x
x = [az_lons]

# Create a list of az_lats, co_lats, nm_lats and ut_lats: y
y = [az_lats]

# Add patches to figure p with line_color=white for x and y
p.patches(x,y, line_color='white')

# Specify the name of the output file and show the result
output_file('four_corners.html')
show(p)