## A simple scatter plot

In this example, you're going to make a scatter plot of female literacy vs fertility using data from the European Environmental Agency(https://www.eea.europa.eu/data-and-maps/figures/correlation-between-fertility-and-female-education). This dataset highlights that countries with low female literacy have high birthrates. The x-axis data has been loaded for you as fertility and the y-axis data has been loaded as female_literacy.

Your job is to create a figure, assign x-axis and y-axis labels, and plot female_literacy vs fertility using the circle glyph.

After you have created the figure, in this exercise and the ones to follow, play around with it! Explore the different options available to you on the tab to the right, such as "Pan", "Box Zoom", and "Wheel Zoom". You can click on the question mark sign for more details on any of these tool

    # Import figure from bokeh.plotting
    from bokeh.plotting import ____

    # Import output_file and show from bokeh.io
    from bokeh.io import ____, ____

    # Create the figure: p
    p = ____(____='fertility (children per woman)', ____='female_literacy (% population)')

    # Add a circle glyph to the figure p
    p.circle(____, ____)

    # Call the output_file() function and specify the name of the file


    # Display the plot



In [15]:
import pandas as pd 
from bokeh.plotting import figure
from bokeh.io import output_file, show,output_notebook
import numpy as np
from bokeh.charts.attributes import cat, color
from bokeh.charts.operations import blend
from bokeh.charts.utils import df_from_json
output_notebook()

In [16]:
data = pd.read_excel("../data/TREND01-5G-educ-fertility-bubbles.xls")
data.head()

Unnamed: 0,Country,Continent,female literacy,fertility,population
0,Chine,ASI,90.5,1.769,1324655000.0
1,Inde,ASI,50.8,2.682,1139965000.0
2,USA,NAM,99.0,2.077,304060000.0
3,Indonésie,ASI,88.8,2.132,227345100.0
4,Brésil,LAT,90.2,1.827,191971500.0


In [17]:
data.tail()

Unnamed: 0,Country,Continent,female literacy,fertility,population
177,Antilles néerlandaises,,96.3,,
178,Iles Caïmanes,,99.0,,
179,Seychelles,,92.3,,
180,Territoires autonomes palestiniens,,90.9,,
181,WORLD,WORLD,77.0,,


In [18]:
p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

In [19]:
data.shape

(182, 5)

In [20]:
data.dropna(subset=['fertility', 'population','female literacy'], inplace=True)

In [21]:
data.shape

(162, 5)

In [22]:
df = df_from_json(data)

In [23]:
print(df)

None


In [24]:
data = data.fillna('')

In [25]:
fertility = [np.asarray(i) for i in data['fertility']]
female_lit = [np.asarray(i) for i in data['female literacy']]

In [26]:
p.circle(fertility,female_lit)

<bokeh.models.renderers.GlyphRenderer at 0x1fae2da60b8>

In [28]:
show(p)

***

## A scatter plot with different shapes

By calling multiple glyph functions on the same figure object, we can overlay multiple data sets in the same figure.

In this exercise, you will plot female literacy vs fertility for two different regions, Africa and Latin America. Each set of x and y data has been loaded separately for you as **`fertility_africa`**, **`female_literacy_africa`**, **`fertility_latinamerica`**, and **`female_literacy_latinamerica`**.

Your job is to plot the Latin America data with the `circle()` glyph, and the Africa data with the `x()` glyph.

`figure` has already been imported for you from `bokeh.plotting`.


#### Instructions 
   - Create the figure p with the figure() function. It has two parameters: x_axis_label and y_axis_label.
   - Add a circle glyph to the figure p using the function p.circle() where the inputs are the x and y data from Latin America: fertility_latinamerica and female_literacy_latinamerica.
   - Add an x glyph to the figure p using the function p.x() where the inputs are the x and y data from Africa: fertility_africa and female_literacy_africa.
   - The code to create, display, and specify the name of the output file has been written for you, so after adding the x glyph, hit 'Submit Answer' to view the figure.



            # Create the figure: p
            p = ____(____='fertility', ____='female_literacy (% population)')

            # Add a circle glyph to the figure p


            # Add an x glyph to the figure p


            # Specify the name of the file
            output_file('fert_lit_separate.html')

            # Display the plot
            show(p)


In [29]:
data.Continent.unique()

array(['ASI', 'NAM', 'LAT', 'AF', 'EUR', 'OCE'], dtype=object)

In [30]:
p = figure(x_axis_label='fertility', y_axis_label='female_literacy (% population)')

In [31]:
latinamerica = data[data['Continent'] == 'LAT']
latinamerica.shape

(24, 5)

In [32]:
africa = data[data['Continent'] == 'AF']
africa.shape

(49, 5)

In [33]:
lat_fertility = [np.asarray(i) for i in latinamerica['fertility']]
lat_female_lit = [np.asarray(i) for i in latinamerica['female literacy']]

In [34]:
af_fertility = [np.asarray(i) for i in africa['fertility']]
af_female_lit = [np.asarray(i) for i in africa['female literacy']]

In [35]:
p.circle(lat_fertility,lat_female_lit)

<bokeh.models.renderers.GlyphRenderer at 0x1fae494afd0>

In [36]:
p.x(af_fertility,af_female_lit)

<bokeh.models.renderers.GlyphRenderer at 0x1fae49449e8>

In [37]:
show(p)

***
## Customizing your scatter plots

The three most important arguments to customize scatter glyphs are **color**, **size**, and **alpha**. Bokeh accepts colors as hexadecimal strings, tuples of RGB values between 0 and 255, and any of the 147 CSS color names. Size values are supplied in screen space units with 100 meaning the size of the entire figure.

The **alpha** parameter controls transparency. It takes in floating point numbers between 0.0, meaning completely transparent, and 1.0, meaning completely opaque.

In this exercise, you'll plot female literacy vs fertility for Africa and Latin America as red and blue circle glyphs, respectively.

#### Instructions

 - Using the Latin America data (**fertility_latinamerica** and **female_literacy_latinamerica**), add a **blue** circle glyph of **size=10** and **alpha=0.8** to the figure **p**. To do this, you will need to specify the **color**, **size** and **alpha** keyword arguments inside **p.circle()**.
 - Using the Africa data (**fertility_africa** and **female_literacy_africa**), add a **red** circle glyph of **size=10** and **alpha=0.8** to the figure **p**.


    # Create the figure: p
    p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

    # Add a blue circle glyph to the figure p
    p.circle(fertility_latinamerica, female_literacy_latinamerica, ____, ____, ____)

    # Add a red circle glyph to the figure p


    # Specify the name of the file
    output_file('fert_lit_separate_colors.html')

    # Display the plot
    show(p)


In [38]:

p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')


In [39]:

p.circle(lat_fertility, lat_female_lit, color='blue', size=10, alpha=0.8)
p.circle(af_fertility , af_female_lit, color='red', size=10, alpha=0.8)


<bokeh.models.renderers.GlyphRenderer at 0x1fae48ec5f8>

In [40]:
show(p)

*** 

## Lines 

We can draw lines on Bokeh plots with the **line()** glyph function.

In this exercise, you'll plot the daily adjusted closing price of Apple Inc.'s stock (AAPL) from 2000 to 2013.

The data points are provided for you as lists. **date** is a list of **datetime objects** to plot on the x-axis and **price** is a list of prices to plot on the y-axis.

Since we are plotting dates on the x-axis, you must add **x_axis_type='datetime'** when creating the figure object.

#### Instructions

  - Import the **figure** function from **bokeh.plotting**
  - Create a figure **p** using the **figure()** function with **x_axis_type** set to **'datetime'**. The other two parameters are **x_axis_label** and **y_axis_label**
  - Plot **date** and **price** along the x- and y-axes using **p.line()**.

    # Import figure from bokeh.plotting
    from ____ import ____

    # Create a figure with x_axis_type="datetime": p
    p = ____(____, ____='Date', ____='US Dollars')

    # Plot date along the x axis and price along the y axis


    # Specify the name of the output file and show the result
    output_file('line.html')
    show(p)

In [41]:
from bokeh.sampledata.stocks import AAPL, GOOG, IBM, MSFT

In [45]:
aapl = np.array(AAPL['adj_close'])
aapl_dates = np.array(AAPL['date'], dtype=np.datetime64)


In [56]:
aapl_dates

array(['2000-03-01', '2000-03-02', '2000-03-03', ..., '2013-02-27',
       '2013-02-28', '2013-03-01'], dtype='datetime64[D]')

In [48]:
aapl_price = np.array(AAPL['adj_close'])

In [47]:
AAPL.keys()

dict_keys(['open', 'high', 'date', 'volume', 'close', 'low', 'adj_close'])

In [49]:
p = figure(x_axis_type='datetime', x_axis_label='Date', y_axis_label='US Dollars')

In [53]:
p.line(aapl_dates, aapl_price,color='red')

<bokeh.models.renderers.GlyphRenderer at 0x1fae5bb2da0>

In [54]:
p.circle(aapl_dates, aapl_price, fill_color='white', size=4)

<bokeh.models.renderers.GlyphRenderer at 0x1fae5bb29e8>

In [55]:
show(p)

***
## Plotting data from NumPy arrays

In the previous exercises, you made plots using data stored in lists. You learned that Bokeh can plot both numbers and datetime objects.

In this exercise, you'll generate NumPy arrays using **np.linspace()** and **np.cos()** and plot them using the circle glyph.

**np.linspace()** is a function that returns an array of evenly spaced numbers over a specified interval. For example, **np.linspace(0, 10, 5)** returns an array of 5 evenly spaced samples calculated over the interval **[0, 10]**. **np.cos(x)** calculates the element-wise cosine of some array **x**.

The figure **p** has been provided for you.

#### Instructions
 - Import **numpy** as **np**.
 - Create an array **x** using **np.linspace()** with 0, 5, and 100 as inputs.
 - Create an array **y** using **np.cos()** with **x** as input.
 - Add circles at **x** and **y** using **p.circle()**.

    # Import numpy as np
    import numpy as np

    # Create array using np.linspace: x
    x = ____

    # Create array using np.cos: y
    y = ____

    # Add circles at x and y


    # Specify the name of the output file and show the result
    output_file('numpy.html')
    show(p)

In [57]:
x = np.linspace(0, 5, 100)
y = np.cos(x)


In [60]:
p = figure( x_axis_label='X', y_axis_label='Y')

In [61]:
p.circle(x,y)

<bokeh.models.renderers.GlyphRenderer at 0x1fae5bda8d0>

In [62]:
show(p)

***
## Plotting data from Pandas DataFrames

You can create Bokeh plots from Pandas DataFrames by passing column selections to the glyph functions.

Bokeh can plot floating point numbers, integers, and datetime data types. In this example, you will read a CSV file containing information on 392 automobiles manufactured in the US, Europe and Asia from 1970 to 1982.

The CSV file is provided for you as **'auto.csv'**.

Your job is to plot miles-per-gallon (**mpg**) vs horsepower (**hp**) by passing Pandas column selections into the **p.circle()** function. Additionally, each glyph will be colored according to values in the **color** column

#### Instructions
 - Import **pandas** as **pd**.
 - Use the **read_csv()** function of pandas to read in **'auto.csv'** and store it in the DataFrame **df**.
 - Import **figure** from **bokeh.plotting**.
 - Use the **figure()** function to create a figure **p** with the x-axis labeled **'HP'** and the y-axis labeled **'MPG'**.
 - Plot **mpg** (on the y-axis) vs hp (on the x-axis) by **color** using **p.circle()**. Note that the x-axis should be specified before the y-axis inside **p.circle()**. You will need to use Pandas DataFrame indexing to pass in the columns. For example, to access the **color** column, you can use **df['color']**, and then pass it in as an argument to the **color** parameter of **p.circle()**. Also specify a **size** of **10**.

    # Import pandas as pd


    # Read in the CSV file: df
    df = ____

    # Import figure from bokeh.plotting


    # Create the figure: p
    p = ____(____='HP', ____='MPG')

    # Plot mpg vs hp by color


In [85]:
from bokeh.sampledata.autompg import autompg
from bokeh.sampledata.autompg2 import autompg2

In [86]:
autompg2.columns

Index(['Unnamed: 0', 'manufacturer', 'model', 'displ', 'year', 'cyl', 'trans',
       'drv', 'cty', 'hwy', 'fl', 'class'],
      dtype='object')

In [65]:
p = figure(x_axis_label='HP', y_axis_label='MPG')

In [87]:
p.circle(autompg['hp'], autompg['mpg'])

<bokeh.models.renderers.GlyphRenderer at 0x1fae5be4278>

In [88]:
show(p)

In [75]:
type(autompg)

pandas.core.frame.DataFrame

In [78]:
autompg.columns

Index(['mpg', 'cyl', 'displ', 'hp', 'weight', 'accel', 'yr', 'origin', 'name'], dtype='object')

***
## The Bokeh ColumnDataSource 
You can create a **ColumnDataSource** object directly from a Pandas DataFrame by passing the DataFrame to the class initializer.

In this exercise, we have imported pandas as **pd** and read in a data set containing all Olympic medals awarded in the 100 meter sprint from 1896 to 2012. A **color** column has been added indicating the CSS colorname we wish to use in the plot for every data point.

Your job is to import the **ColumnDataSource** class, create a new **ColumnDataSource** object from the DataFrame **df**, and plot circle glyphs with **'Year'** on the x-axis and **'Time'** on the y-axis. Color each glyph by the **color** column.

The figure object **p** has already been created for you.

#### Instructions 
 - Import the ColumnDataSource class from bokeh.plotting.
 - Use the ColumnDataSource() function to make a new ColumnDataSource object called source from the DataFrame df.
 - Use p.circle() to plot circle glyphs of size=8 on the figure p with 'Year' on the x-axis and 'Time' on the y-axis. Be sure to also specify source=source and color='color' so that the ColumnDataSource object is used and each glyph is colored by the color column.

In [95]:
from bokeh.plotting import ColumnDataSource
import bokeh.sampledata.olympics2014 

In [102]:
type(olympics2014.data)

dict

In [104]:
olympics2014.data.keys()

dict_keys(['data', 'object', 'count'])