## Basic plotting with Bokeh

### A simple scatter plot

In this example, you're going to make a scatter plot of female literacy vs fertility using data from the European Environmental Agency. This dataset highlights that countries with low female literacy have high birthrates. The x-axis data has been loaded for you as 'fertility' and the y-axis data has been loaded as 'female_literacy'.

Your job is to create a figure, assign x-axis and y-axis labels, and plot female_literacy vs fertility using the circle glyph.

In [1]:
import pandas as pd
df = pd.read_csv('literacy_birth_rate.csv')
fertility = df['fertility']
female_literacy = df['female literacy']

# Import figure from bokeh.plotting
from bokeh.plotting import figure

# Import output_file and show from bokeh.io
from bokeh.io import output_notebook, show

output_notebook()

# Create the figure: p
p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

# Add a circle glyph to the figure p
p.circle(fertility,female_literacy)

# Call the output_file() function and specify the name of the file
#output_notebook()

# Display the plot
show(p)


### A scatter plot with different shapes

By calling multiple glyph functions on the same figure object, we can overlay multiple data sets in the same figure.

In this exercise, you will plot female literacy vs fertility for two different regions, Africa and Latin America. Each set of x and y data has been loaded separately for you as fertility_africa, female_literacy_africa, fertility_latinamerica, and female_literacy_latinamerica.

Your job is to plot the Latin America data with the circle() glyph, and the Africa data with the x() glyph.

In [13]:
latinamerica = df.loc[df['Continent'] == 'LAT']
fertility_latinamerica=latinamerica['fertility'] 
female_literacy_latinamerica=latinamerica['female literacy']

africa = df.loc[df['Continent'] == 'AF']
fertility_africa=africa['fertility'] 
female_literacy_africa=africa['female literacy']

# Create the figure: p
p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

# Add a circle glyph to the figure p
p.circle(fertility_latinamerica, female_literacy_latinamerica)

# Add an x glyph to the figure p
p.x(fertility_africa, female_literacy_africa)

# Specify the name of the file
#output_file('fert_lit_separate.html')
#output_notebook()

# Display the plot
show(p)

### Customizing your scatter plots

The three most important arguments to customize scatter glyphs are `color`, `size`, and `alpha`. Bokeh accepts colors as hexadecimal strings, tuples of RGB values between 0 and 255, and any of the 147 CSS color names. Size values are supplied in screen space units with 100 meaning the size of the entire figure.

The `alpha` parameter controls transparency. It takes in floating point numbers between 0.0, meaning completely transparent, and 1.0, meaning completely opaque.

In this exercise, you'll plot female literacy vs fertility for Africa and Latin America as red and blue circle glyphs, respectively.

In [14]:
# Create the figure: p
p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

# Add a blue circle glyph to the figure p
p.circle(fertility_latinamerica, female_literacy_latinamerica, color='blue', size=10, alpha=0.8)

# Add a red circle glyph to the figure p
p.circle(fertility_africa, female_literacy_africa, color='red', size=10, alpha=0.8)


# Specify the name of the file
#output_file('fert_lit_separate_colors.html')


# Display the plot
show(p)


### Lines

We can draw lines on Bokeh plots with the `line()` glyph function.

In this exercise, you'll plot the daily adjusted closing price of Apple Inc.'s stock (AAPL) from 2000 to 2013.

The data points are provided for you as lists. `date` is a list of datetime objects to plot on the x-axis and `price` is a list of prices to plot on the y-axis.

Since we are plotting dates on the x-axis, you must add `x_axis_type='datetime'` when creating the figure object.

In [22]:
apple=pd.read_csv('aapl.csv', parse_dates=['date'])
display(apple.head(2))

# Create a figure with x_axis_type="datetime": p
p = figure(x_axis_type="datetime", x_axis_label='Date', y_axis_label='US Dollars')

# Plot date along the x axis and price along the y axis
p.line(apple.date, apple.adj_close)

# Specify the name of the output file and show the result
#output_file('line.html')
show(p)



Unnamed: 0.1,Unnamed: 0,adj_close,close,date,high,low,open,volume
0,0,31.68,130.31,2000-03-01,132.06,118.5,118.56,38478000
1,1,29.66,122.0,2000-03-02,127.94,120.69,127.0,11136800


### Lines and markers

Lines and markers can be combined by plotting them separately using the same data points.

In this exercise, you'll plot a line and circle glyph for the AAPL stock prices. Further, you'll adjust the `fill_color` keyword argument of the `circle()` glyph function while leaving the `line_color` at the default value.

In [29]:
apple_2000=apple[(apple.date > '2000-03-01') & (apple.date <= '2000-09-01')]
display(apple_2000.head(2))

# Create a figure with x_axis_type="datetime": p
p = figure(x_axis_type="datetime", x_axis_label='Date', y_axis_label='US Dollars')

# Plot date along the x axis and price along the y axis
p.line(apple_2000.date, apple_2000.adj_close)

# With date on the x-axis and price on the y-axis, add a white circle glyph of size 4
p.circle(apple_2000.date, apple_2000.adj_close, fill_color='white', size=4)

show(p)

Unnamed: 0.1,Unnamed: 0,adj_close,close,date,high,low,open,volume
1,1,29.66,122.0,2000-03-02,127.94,120.69,127.0,11136800
2,2,31.12,128.0,2000-03-03,128.23,120.0,124.87,11565200


### Plotting data from NumPy arrays

In this exercise, you'll generate NumPy arrays using `np.linspace()` and `np.cos()` and plot them using the circle glyph.

`np.linspace()` is a function that returns an array of evenly spaced numbers over a specified interval. For example, `np.linspace(0, 10, 5)` returns an array of 5 evenly spaced samples calculated over the interval `[0, 10]`. `np.cos(x)` calculates the element-wise cosine of some array x.

For more information on NumPy functions, you can refer to the NumPy User Guide and NumPy Reference.

In [30]:
# Import numpy as np
import numpy as np

p=figure(x_axis_label='x', y_axis_label='y')

# Create array using np.linspace: x
x = np.linspace(0, 5, 100)

# Create array using np.cos: y
y = np.cos(x)

# Add circles at x and y
p.circle(x, y)

# Specify the name of the output file and show the result
#output_file('numpy.html')
show(p)

### Plotting data from Pandas DataFrames

You can create Bokeh plots from Pandas DataFrames by passing column selections to the glyph functions.

Bokeh can plot floating point numbers, integers, and datetime data types. In this example, you will read a CSV file containing information on 392 automobiles manufactured in the US, Europe and Asia from 1970 to 1982.

The CSV file is provided for you as 'auto.csv'.

Your job is to plot miles-per-gallon (`mpg`) vs horsepower (`hp`) by passing Pandas column selections into the `p.circle()` function. Additionally, each glyph will be colored according to values in the color column.

In [33]:
# Read in the CSV file: df
df = pd.read_csv('auto-mpg.csv')

# Create the figure: p
p = figure(x_axis_label='HP', y_axis_label='MPG')

# Plot mpg vs hp by color
p.circle(df['hp'], df['mpg'], color=df['color'], size=10)

# Specify the name of the output file and show the result
# output_file('auto-df.html')
show(p)


### The Bokeh ColumnDataSource

In [37]:
# Import the ColumnDataSource class from bokeh.plotting
from bokeh.plotting import ColumnDataSource

sprint = pd.read_csv('sprint.csv', parse_dates=['Year'])

p = figure(x_axis_type="datetime", x_axis_label='Year', y_axis_label='Time')

# Create a ColumnDataSource from df: source
source = ColumnDataSource(sprint)

# Add circle glyphs to the figure p
p.circle("Year", "Time", source=source, color='color', size=8)

# Specify the name of the output file and show the result
# output_file('sprint.html')
show(p)


### Selection and non-selection glyphs

In this exercise, you're going to add the `box_select` tool to a figure and change the selected and non-selected circle glyph properties so that selected glyphs are red and non-selected glyphs are transparent blue.

You'll use the ColumnDataSource object of the Olympic Sprint dataset you made in the last exercise. 

After you have created the figure, be sure to experiment with the Box Select tool you added! As in previous exercises, you may have to scroll down to view the lower portion of the figure.

In [40]:
# Create a figure p with an x-axis label of 'Year', y-axis label of 'Time', and the 'box_select' tool.
# To add the 'box_select' tool, you have to specify the keyword argument tools='box_select' inside the figure() function.
p = figure(x_axis_type="datetime", x_axis_label='Year', y_axis_label='Time', tools='box_select')

# Add in circle glyphs with p.circle() such that the selected glyphs are red and non-selected glyphs are transparent blue. 
# This can be done by specifying 'red' as the argument to selection_color and 0.1 to nonselection_alpha. 
# Remember to also pass in the arguments for the x ('Year'), y ('Time'), and source parameters of p.circle().
p.circle("Year", "Time", source=source, color='color', size=4, selection_color='red', nonselection_alpha=0.1, nonselection_fill_color='blue')

show(p)

### Hover glyphs

In this exercise, you're going to plot the blood glucose levels for an unknown patient. The blood glucose levels were recorded every 5 minutes on October 7th starting at 3 minutes past midnight.  
Your job is to add a circle glyph that will appear red when the mouse is hovered near the data points. You will also add a customized hover tool object to the plot.

When you're done, play around with the hover tool you just created! Notice how the points where your mouse hovers over turn red.

In [41]:
glucose = pd.read_csv('glucose.csv', parse_dates=['datetime'])

p = figure(x_axis_type="datetime", x_axis_label='Time of day', y_axis_label='Blood glucose(mg/dL)', title="Blood Glucose")

# import the HoverTool
from bokeh.models import HoverTool

# Add a circle glyph to the existing figure p for x and y with a size of 10, 
# fill_color of 'grey', alpha of 0.1, line_color of None, 
# hover_fill_color of 'firebrick', hover_alpha of 0.5, and hover_line_color of 'white'
p.circle(glucose.datetime, glucose.glucose, size=10,
         fill_color='grey', alpha=0.1, line_color=None,
         hover_fill_color='firebrick', hover_alpha=0.5,
         hover_line_color='white')

# Use the HoverTool() function to create a HoverTool called hover with tooltips=None and mode='vline'
hover = HoverTool(tooltips=None, mode='vline')

# Add the hover tool to the figure p
p.add_tools(hover)

show(p)

### Colormapping

The final glyph customization we'll practice is using the CategoricalColorMapper to color each glyph by a categorical property.

Here, you're going to use the automobile dataset to plot miles-per-gallon vs weight and color each circle glyph by the region where the automobile was manufactured.

The `origin` column will be used in the ColorMapper to color automobiles manufactured in the US as blue, Europe as red and Asia as green.

In [44]:
#Import CategoricalColorMapper from bokeh.models
from bokeh.models import CategoricalColorMapper

p = figure(x_axis_label='weight(lbs)', y_axis_label='miles-per-gallon')

# Convert df to a ColumnDataSource: source
source = ColumnDataSource(df)

# Make a CategoricalColorMapper object: color_mapper with the CategoricalColorMapper() function. 
# It has two parameters here: factors and palette.
color_mapper = CategoricalColorMapper(factors=['Europe', 'Asia', 'US'],
                                      palette=['red', 'green', 'blue'])

# Add a circle glyph to the figure p to plot 'mpg' (on the y-axis) vs 'weight' (on the x-axis). 
# Remember to pass in source and 'origin' as arguments to source and legend. 
# For the color parameter, use dict(field='origin', transform=color_mapper)
p.circle('weight', 'mpg', source=source,legend='origin',
            color=dict(field='origin', transform=color_mapper))

# Specify the name of the output file and show the result
output_file('colormap.html')
show(p)
