# Interactive Data Visualization with Bokeh

Bokeh is an interactive data visualization library for Python (and other languages!) that targets modern web browsers for presentation. It can create versatile, data-driven graphics, and connect the full power of the entire Python data-science stack to rich, interactive visualizations.

http://bokeh.pydata.org/en/latest/


### What are glyphs?

In Bokeh, visual properties of shapes are called **glyphs**. The visual properties of these glyphs such as position or color can be assigned single values, for example x=10 or fill_color='red'.

What other kinds of values can glyph properties be set to in normal usage?

## A simple scatter plot

In this example, you're going to make a scatter plot of **female literacy vs fertility** using data from the [European Environmental Agency](http://www.eea.europa.eu/data-and-maps/figures/correlation-between-fertility-and-female-education/trend01-5g-soer2010-xls/at_download/file). This dataset highlights that countries with low female literacy have high birthrates. The x-axis data has been loaded for you as **fertility** and the y-axis data has been loaded as **female_literacy**.

Your job is to create a figure, assign x-axis and y-axis labels, and plot female_literacy vs fertility using the circle glyph.

After you have created the figure, in this exercise and the ones to follow, play around with it! Explore the different options available to you on the tab to the right, such as "Pan", "Box Zoom", and "Wheel Zoom". You can click on the question mark sign for more details on any of these tools.

Note: You may have to scroll down to view the lower portion of the figure.


### Instructions

- Import the **figure** function from bokeh.plotting, and the **output_notebook** and **show** functions from bokeh.io.
- Create the figure **p** with **figure()**. It has two parameters: **x_axis_label** and **y_axis_label**. The former is data["fertility"] and the latter is data["female literacy"].
- Add a circle glyph to the figure **p** using the function **p.circle()** where the inputs are, in order, the **x-axis** data and **y-axis** data.
- Use the **output_notebook()** function to generate the figure in notebook.
- Create and display the output file using **show()** and passing in the figure **p**.



In [45]:
# Import figure from bokeh.plotting
from bokeh.plotting import figure

# Import output_notebook and show from bokeh.io
from bokeh.io import output_notebook, show

# Import pandas as pd
import pandas as pd

# Import the fertility.csv data: data
data = pd.read_csv("fertility.csv", encoding = 'latin2')
#print(data)
# Create the figure: p
p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

# Add a circle glyph to the figure p
p.circle(data["fertility"], data["female literacy"])

# Call the output_notebook() 
output_notebook()
# Display the plot
show(p)

In [46]:
print(data)

                               Country Continent  female literacy  fertility  \
0                                Chine       ASI             90.5      1.769   
1                                 Inde       ASI             50.8      2.682   
2                                  USA       NAM             99.0      2.077   
3                            Indonesie       ASI             88.8      2.132   
4                               Bresil       LAT             90.2      1.827   
5                             Pakistan       ASI             40.0      3.872   
6                           Bangladesh       ASI             49.8      2.288   
7                              Nig_ria        AF             48.8      5.173   
8                 F_d_ration de Russie       EUR             99.4      1.393   
9                                Japan       ASI             99.0      1.262   
10                             Mexique       LAT             91.5      2.156   
11                         Philippines  

## A scatter plot with different shapes

By calling multiple glyph functions on the same figure object, we can overlay multiple data sets in the same figure.

In this exercise, you will plot female literacy vs fertility for two different regions, Africa and Latin America. Each set of x and y data has been loaded separately for you as **fertility_africa**, **female_literacy_africa**, **fertility_latinamerica**, and **female_literacy_latinamerica**.

Your job is to plot the Latin America data with the **circle()** glyph, and the Africa data with the **x()** glyph.

### Instruction

- Configure the variables according to Latin America and Africa continents
- Create the figure *p* with the *figure()* function. It has two parameters: x_axis_label and y_axis_label.
- Add a circle glyph to the figure p using the function **p.circle()** where the inputs are the x and y data from Latin America: fertility_latinamerica and female_literacy_latinamerica.
- Add an x glyph to the figure p using the function **p.x()** where the inputs are the x and y data from Africa: fertility_africa and female_literacy_africa.



In [19]:
# Import figure from bokeh.plotting
from bokeh.plotting import figure

# Import output_notebook and show from bokeh.io
from bokeh.io import output_notebook, show

# Import pandas as pd
import pandas as pd

# Import the fertility.csv data: data
data = pd.read_csv("fertility.csv", encoding = 'latin2')
#print(data)
# Configure the variables for each continent

# Latin America
fertility_latinamerica = data[data["Continent"] == "LAT"]["fertility"]
female_literacy_latinamerica = data[data["Continent"] == "LAT"]["female literacy"]
#print (fertility_latinamerica)

# Africa
fertility_africa = data[data["Continent"] == "AF"]["fertility"]
female_literacy_africa = data[data["Continent"] == "AF"]["female literacy"]

# Create the figure: p
p = figure(x_axis_label='fertility', y_axis_label='female_literacy (% population)')

# Add a circle glyph to the figure p
p.circle(fertility_latinamerica, female_literacy_latinamerica)

# Add an x glyph to the figure p
p.x(fertility_africa, female_literacy_africa)

# Call the output_notebook() 
output_notebook()

# Display the plot
show(p)


## Customizing your scatter plots

The three most important arguments to customize scatter glyphs are **color**, **size**, and **alpha**. Bokeh accepts colors as hexadecimal strings, tuples of RGB values between 0 and 255, and any of the 147 [CSS color names](http://www.colors.commutercreative.com/grid/). Size values are supplied in screen space units with 100 meaning the size of the entire figure.

The alpha parameter controls transparency. It takes in floating point numbers between 0.0, meaning completely transparent, and 1.0, meaning completely opaque.

In this exercise, you'll plot female literacy vs fertility for Africa and Latin America as red and blue circle glyphs, respectively.

### Instructions

- Using the Latin America data (fertility_latinamerica and female_literacy_latinamerica), add a **blue** circle glyph of size=10 and alpha=0.8 to the figure p. To do this, you will need to specify the color, size and alpha keyword arguments inside p.circle().
- Using the Africa data (fertility_africa and female_literacy_africa), add a **red** circle glyph of size=10 and alpha=0.8 to the figure p.

In [24]:
# Create the figure: p
p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

# Add a blue circle glyph to the figure p
p.circle(fertility_latinamerica, female_literacy_latinamerica, color="#0000BB",size=10, alpha=2)

# Add a red circle glyph to the figure p
p.circle(fertility_africa, female_literacy_africa, color="#ff0000",size=10, alpha=2)

# Call the output_notebook() 
output_notebook()

# Display the plot
show(p)

## Lines

We can draw lines on Bokeh plots with the **line()** glyph function.

In this exercise, you'll plot the daily adjusted **closing price** of Apple Inc.'s stock (AAPL) from 2000 to 2017.

The data points are provided for you as lists. **data.index** is a list of datetime objects to plot on the x-axis and **data** is a list of prices to plot on the y-axis.

Since we are plotting dates on the x-axis, you must add x_axis_type='datetime' when creating the figure object.



In [41]:
# Import pandas as pd
import pandas as pd

# install pandas-datareader
# Package and modules for importing data;
from pandas_datareader import data

# Package for indicate the time reference
import datetime

# Import figure from bokeh.plotting
from bokeh.plotting import figure

# Import output_notebook and show from bokeh.io
from bokeh.io import output_notebook, show


# We will look at stock prices over the past year, starting at January 1, 2000
start = datetime.datetime(2000,1,1)
end = datetime.date.today()
 
# Let's get Apple stock data; Apple's ticker symbol is AAPL
# First argument is the series we want, second is the source ("yahoo" for Yahoo! Finance),
# third is the start date, fourth is the end date
apple = data.DataReader("AAPL", "google", start, end)
intel = data.DataReader("AAPL", "yahoo", start, end)

data = apple["Close"]
data2 = intel["Close"]

# Create a figure with x_axis_type="datetime": p
p = figure(x_axis_type ='datetime', x_axis_label='Date', y_axis_label='US Dollars')

# Plot date along the x axis and price along the y axis
p.line(data.index,data, color="#0000BB")
p.line(data2.index,data2, color="#ff0000")

# Print inline in the notebook
output_notebook()

# Display the plot
show(p)

In [42]:
print(apple)

              Open    High     Low   Close     Volume
Date                                                 
2001-04-23    1.74    1.79    1.71    1.73  132970600
2001-04-24    1.74    1.77    1.68    1.72   91861000
2001-04-25    1.73    1.78    1.68    1.77   81222400
2001-04-26    1.80    1.86    1.75    1.76  197206800
2001-04-27    1.80    1.88    1.77    1.87  114543800
2001-04-30    1.90    1.94    1.78    1.82  121542400
2001-05-01    1.81    1.89    1.80    1.85  102646600
2001-05-02    1.88    1.91    1.84    1.90   90347600
2001-05-03    1.88    1.88    1.77    1.78   73033800
2001-05-04    1.73    1.85    1.71    1.84   67862200
2001-05-07    1.83    1.84    1.77    1.78   67835600
2001-05-08    1.81    1.82    1.71    1.76   77050400
2001-05-09    1.72    1.75    1.69    1.71   80414600
2001-05-10    1.73    1.75    1.63    1.64   70197400
2001-05-11    1.64    1.68    1.63    1.63   48841800
2001-05-14    1.64    1.69    1.62    1.66   76511400
2001-05-15    1.67    1.74  

## Lines and markers

Lines and markers can be combined by plotting them separately using the same data points.

In [28]:
# Create a figure with x_axis_type='datetime'
p = figure(x_axis_type='datetime', x_axis_label='Date', y_axis_label='US Dollars')

# Plot date along the x-axis and price along the y-axis
p.line(data.index,data)

# With date on the x-axis and price on the y-axis, add a white circle glyph of size 4
p.circle(data.index, data, fill_color="white", size=4)

# Show the graphic
output_notebook()
show(p)

## Plotting data from NumPy arrays

In the previous exercises, you made plots using data stored in lists. You learned that Bokeh can plot both numbers and datetime objects.

In this exercise, you'll generate **NumPy** arrays using **np.linspace()** and **np.cos()** and plot them using the circle glyph.

**np.linspace()** is a function that returns an array of evenly spaced numbers over a specified interval. For example, np.linspace(0, 10, 5) returns an array of 5 evenly spaced samples calculated over the interval [0, 10]. np.cos(x) calculates the element-wise cosine of some array x.

In [29]:
# Import figure from bokeh.plotting
from bokeh.plotting import figure

# Import output_notebook and show from bokeh.io
from bokeh.io import output_notebook, show

# Import numpy as np
import numpy as np

# Create a figure
p = figure(x_axis_label='X', y_axis_label='Y')


# Create array using np.linspace: x
x = np.linspace(0,5,100)

# Create array using np.cos: y
y = np.cos(x)

# Add circles at x and y
p.circle(x,y)

# Specify the name of the output file and show the result
output_notebook()
show(p)


## The Bokeh ColumnDataSource 

You can create a **ColumnDataSource** object directly from a Pandas DataFrame by passing the DataFrame to the class initializer.

In this exercise, we have imported pandas as pd and read in a data set containing all **Olympic medals awarded in the 100 meter sprint from 1896 to 2012**. A color column has been added indicating the **CSS colorname** we wish to use in the plot for every data point.


In [47]:
# Import figure from bokeh.plotting
from bokeh.plotting import figure

# Import output_notebook and show from bokeh.io
from bokeh.io import output_notebook, show

# Import pandas as pd
import pandas as pd

# Import the ColumnDataSource class from bokeh.plotting
from bokeh.plotting import ColumnDataSource

# Import the fertility.csv data: data
data = pd.read_csv("olympics.csv")
print(data)
# Create a figure
p = figure(x_axis_label='Year', y_axis_label='Time')

# Create a ColumnDataSource from df: source
source = ColumnDataSource(data)

# Add circle glyphs to the figure p
p.circle('Year','Time',size=8, source=source, color='color')

# Print inline the figure in the notebook
output_notebook()
show(p)

                  Name Country   Medal   Time  Year        color
0           Usain Bolt     JAM    GOLD   9.63  2012    goldenrod
1          Yohan Blake     JAM  SILVER   9.75  2012       silver
2        Justin Gatlin     USA  BRONZE   9.79  2012  saddlebrown
3           Usain Bolt     JAM    GOLD   9.69  2008    goldenrod
4     Richard Thompson     TRI  SILVER   9.89  2008       silver
5           Walter Dix     USA  BRONZE   9.91  2008  saddlebrown
6        Justin Gatlin     USA    GOLD   9.85  2004    goldenrod
7     Francis Obikwelu     POR  SILVER   9.86  2004       silver
8       Maurice Greene     USA  BRONZE   9.87  2004  saddlebrown
9       Maurice Greene     USA    GOLD   9.87  2000    goldenrod
10          Ato Boldon     TRI  SILVER   9.99  2000       silver
11    Obadele Thompson     BAR  BRONZE  10.04  2000  saddlebrown
12      Donovan Bailey     CAN    GOLD   9.84  1996    goldenrod
13  Frankie Fredericks     NAM  SILVER   9.89  1996       silver
14          Ato Boldon   

In [48]:
# Selection and non-selection glyphs

# Create a figure with the "box_select" tool: p
p = figure(x_axis_label='Year', y_axis_label='Time', tools="box_select")

# Add circle glyphs to the figure p with the selected and non-selected properties
p.circle('Year','Time',source=source, selection_color='red',nonselection_alpha=0.1)

# Specify the name of the output file and show the result
output_notebook()
show(p)