# Intro to Visualization using Bokeh

## What is Bokeh?
* Interactive visualization, controls, and tools
* Versatile and high-level graphics
* High-level statistical charts
* Streaming, dynamic, large data
* For the browser, with or without a server
* No JavaScript

## What you will learn
* Basic plo!ing with bokeh.plotting
* Layouts, interactions, and annotations
* Statistical charting with bokeh.charts
* Interactive data applications in the browser
* Case Study: A Gapminder explore

## What are Glyphs
* Visual shapes
* circles, squares, triangles
* rectangles, lines, wedges
* With properties a!ached to data
* coordinates (x,y)
* size, color, transparency

## Markers
* asterisk()
* circle()
* circle_cross()
* circle_x()
* cross()
* diamond()
* diamond_cross()
* inverted_triangle()
* square()
* square_cross()
* square_x()
* triangle()
* x()

In [34]:
import pandas as pd
import numpy as np

from bokeh.plotting import figure

from bokeh.io import output_file, show, output_notebook

from bokeh.plotting import ColumnDataSource


## A simple scatter plot

In this example, you're going to make a scatter plot of female literacy vs fertility using data from the [European Environmental Agency](http://www.eea.europa.eu/data-and-maps/figures/correlation-between-fertility-and-female-education). This dataset highlights that countries with low female literacy have high birthrates. The x-axis data has been loaded for you as fertility and the y-axis data has been loaded as female_literacy.

Your job is to create a figure, assign x-axis and y-axis labels, and plot female_literacy vs fertility using the circle glyph.

**To display Bokeh plots inline in a classic Jupyter notebooks, use the output_notebook() function from bokeh.io instead of (or in addition to) the output_file() function.**

In [11]:
# load the data
gap = pd.read_csv('/home/sousae/projects/dataCamp/data/gapminder_tidy.csv')

In [12]:
# Create the figure: p
p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

# Add a circle glyph to the figure p (just for 1983 data)
p.circle(gap[gap.Year == 1983].fertility, gap[gap.Year == 1983].life)

# Call the output_file() function and specify the name of the file
output_notebook()

# Display the plot
show(p)

## A scatter plot with different shapes

By calling multiple glyph functions on the same figure object, we can overlay multiple data sets in the same figure.

In this exercise, you will plot female literacy vs fertility for two different years, 1983 and 1993. Your job is to plot the 1983 data with the circle() glyph, and the 1993 data with the x() glyph.

In [14]:
# Create the figure: p
p = figure(x_axis_label='fertility', y_axis_label='female_literacy (% population)')

# Add a circle glyph to the figure p
p.circle(gap[gap.Year == 1983].fertility, gap[gap.Year == 1983].life)

# Add an x glyph to the figure p
p.x(gap[gap.Year == 1993].fertility, gap[gap.Year == 1993].life)

# Display the plot
show(p)

## Customizing your scatter plots
The three most important arguments to customize scatter glyphs are color, size, and alpha. Bokeh accepts colors as hexadecimal strings, tuples of RGB values between 0 and 255, and any of the 147 [CSS color names](http://www.colors.commutercreative.com/grid/). Size values are supplied in screen space units with 100 meaning the size of the entire figure.

The alpha parameter controls transparency. It takes in floating point numbers between 0.0, meaning completely transparent, and 1.0, meaning completely opaque.

In [15]:
# Create the figure: p
p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

# Add a blue circle glyph to the figure p
p.circle(gap[gap.Year == 1983].fertility, gap[gap.Year == 1983].life, size=10, alpha=0.8, color='blue')

# Add a red circle glyph to the figure p
p.circle(gap[gap.Year == 1993].fertility, gap[gap.Year == 1993].life, size=10, alpha=0.8, color='red')

# Specify the name of the file
#output_file('fert_lit_separate_colors.html')

# Display the plot
show(p)

## Lines

We can draw lines on Bokeh plots with the line() glyph function.

In this exercise, you'll plot the daily adjusted closing price of Apple Inc.'s stock (AAPL) from 2000 to 2013.

Since we are plotting dates on the x-axis, you must add x_axis_type='datetime' when creating the figure object.

In [19]:
aapl = pd.read_csv('/home/sousae/projects/dataCamp/data/aapl.csv')

In [20]:
aapl.head()

Unnamed: 0.1,Unnamed: 0,adj_close,close,date,high,low,open,volume
0,0,31.68,130.31,2000-03-01,132.06,118.5,118.56,38478000
1,1,29.66,122.0,2000-03-02,127.94,120.69,127.0,11136800
2,2,31.12,128.0,2000-03-03,128.23,120.0,124.87,11565200
3,3,30.56,125.69,2000-03-06,129.13,125.0,126.0,7520000
4,4,29.87,122.87,2000-03-07,127.44,121.12,126.44,9767600


In [28]:
# Create a figure with x_axis_type="datetime": p
p = figure(x_axis_type="datetime", x_axis_label='Date', y_axis_label='US Dollars')

# Plot date along the x axis and price along the y axis
p.line(pd.to_datetime(aapl.date),aapl.close.tolist())

# Specify the name of the output file and show the result
show(p)

## Lines and markers

Lines and markers can be combined by plotting them separately using the same data points.

In this exercise, you'll plot a line and circle glyph for the AAPL stock prices. Further, you'll adjust the fill_color keyword argument of the circle() glyph function while leaving the line_color at the default value.

In [29]:
# Create a figure with x_axis_type='datetime': p
p = figure(x_axis_type='datetime', x_axis_label='Date', y_axis_label='US Dollars')

# Plot date along the x-axis and price along the y-axis
p.line(pd.to_datetime(aapl.date),aapl.close.tolist())

# With date on the x-axis and price on the y-axis, add a white circle glyph of size 4
p.circle(pd.to_datetime(aapl.date),aapl.close.tolist(), fill_color='white', size=4)

# Specify the name of the output file and show the result
#output_file('line.html')
show(p)

## Patches

In Bokeh, extended geometrical shapes can be plotted by using the patches() glyph function. The patches glyph takes as input a list-of-lists collection of numeric values specifying the vertices in x and y directions of each distinct patch to plot.

In [30]:
p = figure(plot_width=400, plot_height=400)

# add a patch renderer with an alpha an line width
p.patch([1, 2, 3, 4, 5], [6, 7, 8, 7, 3], alpha=0.5, line_width=2)

show(p)

## Plotting data from Pandas DataFrames

You can create Bokeh plots from Pandas DataFrames by passing column selections to the glyph functions.

Bokeh can plot floating point numbers, integers, and datetime data types. In this example, you will read a CSV file containing information on 392 automobiles manufactured in the US, Europe and Asia from 1970 to 1982.

Your job is to plot miles-per-gallon (mpg) vs horsepower (hp) by passing Pandas column selections into the p.circle() function. Additionally, each glyph will be colored according to values in the color column.

In [39]:
# Read in the CSV file: df
auto = pd.read_csv('/home/sousae/projects/dataCamp/data/auto-mpg.csv')

# Create the figure: p
p = figure(x_axis_label='HP', y_axis_label='MPG')

# Plot mpg vs hp by color
p.circle(auto['hp'], auto['mpg'],color=auto['color'],size=10)

# Specify the name of the output file and show the result
#output_file('auto-df.html')
show(p)

## The Bokeh ColumnDataSource

The ColumnDataSource is a table-like data object that maps string column names to sequences (columns) of data. It is the central and most common data structure in Bokeh.

You can create a ColumnDataSource object directly from a Pandas DataFrame by passing the DataFrame to the class initializer.

In this exercise, we'll read in a data set containing all Olympic medals awarded in the 100 meter sprint from 1896 to 2012. A color column has been added indicating the CSS colorname we wish to use in the plot for every data point.

Your job is to import the ColumnDataSource class, create a new ColumnDataSource object from the DataFrame df, and plot circle glyphs with 'Year' on the x-axis and 'Time' on the y-axis. Color each glyph by the color column.

In [40]:
sprint = pd.read_csv('/home/sousae/projects/dataCamp/data/sprint.csv')

In [41]:
# Create a ColumnDataSource from df: source
source = ColumnDataSource(sprint)

# Create the figure: p
p = figure(x_axis_label='Year', y_axis_label='Time')

# Add circle glyphs to the figure p
p.circle('Year', 'Time', source=source, color='color',size=8)

# Specify the name of the output file and show the result
#output_file('sprint.html')
show(p)

## Selection and non-selection glyphs

In this exercise, you're going to add the box_select tool to a figure and change the selected and non-selected circle glyph properties so that selected glyphs are red and non-selected glyphs are transparent blue.

In [42]:
# Create a figure with the "box_select" tool: p
p = figure(x_axis_label='Year', y_axis_label='Time', tools='box_select')

# Add circle glyphs to the figure p with the selected and non-selected properties
p.circle('Year','Time',source=source,
         selection_color='red',
         nonselection_alpha=0.1)

# Specify the name of the output file and show the result
# output_file('selection_glyph.html')
show(p)

## Hover glyphs

Now let's practice using and customizing the hover tool.

In this exercise, you'll add the name off the sprinter, the country he is from that the time he made to earn the medal.

Your job is to add a circle glyph that will appear red when the mouse is hovered near the data points. You will also add a customized hover tool object to the plot.

In [45]:
# import the HoverTool
from bokeh.models import HoverTool

# Add circle glyphs to figure p
p.circle('Year','Time',source=source, size=10,
         fill_color='grey', alpha=0.1, line_color=None,
         hover_fill_color='firebrick', hover_alpha=0.5,
         hover_line_color='white')

# Create a HoverTool: hover
hover = HoverTool(tooltips=None, mode='vline')

# Add the hover tool to the figure p
p.add_tools(hover)

# Specify the name of the output file and show the result
#output_file('hover_glyph.html')
show(p)

## Colormapping

The final glyph customization we'll practice is using the CategoricalColorMapper to color each glyph by a categorical property.

Here, you're going to use the automobile dataset to plot miles-per-gallon vs weight and color each circle glyph by the region where the automobile was manufactured.

The origin column will be used in the ColorMapper to color automobiles manufactured in the US as blue, Europe as red and Asia as green.

In [47]:
#Import CategoricalColorMapper from bokeh.models
from bokeh.models import CategoricalColorMapper

# Convert df to a ColumnDataSource: source
source = ColumnDataSource(auto)

# Make a CategoricalColorMapper object: color_mapper
color_mapper = CategoricalColorMapper(factors=['Europe', 'Asia', 'US'],
                                      palette=['red', 'green', 'blue'])
# Create the figure: p
p = figure(x_axis_label='Weight', y_axis_label='MPG')

# Add a circle glyph to the figure p
p.circle('weight', 'mpg', source=source,
            color=dict(field='origin', transform=color_mapper),
            legend='origin')

# Specify the name of the output file and show the result
#output_file('colormap.html')
show(p)