# Interactive Data Visualization with Bokeh(3) - High-level Charts 


In [1]:
from bokeh.io import output_notebook
output_notebook()

In [12]:
import pandas as pd
df = pd.read_csv('literacy_birth_rate.csv')
df = df.rename(columns={'Country ':'Country', 'female literacy':'female_literacy'})
df.female_literacy = pd.to_numeric(df.female_literacy, errors='coerce')
df.fertility = pd.to_numeric(df.fertility, errors='coerce')
df = df.dropna()
df.head()

Unnamed: 0,Country,Continent,female_literacy,fertility,population
0,Chine,ASI,90.5,1.769,1324655000.0
1,Inde,ASI,50.8,2.682,1139965000.0
2,USA,NAM,99.0,2.077,304060000.0
3,Indonésie,ASI,88.8,2.132,227345100.0
4,Brésil,LAT,90.2,1.827,191971500.0


## Histograms

### A basic histogram

Create a simple histogram with the Histogram() function. Again we'll use the fertility data set to plot the distribution of female literacy around the world.

As in the previous two chapters, you can interact with the figures you create in this chapter as well, and you may have to scroll down to view the lower portion of some of them.

In [13]:
# Import Histogram, output_file, and show from bokeh.charts
from bokeh.charts import Histogram, output_file, show

# Make a Histogram: p
p = Histogram(df, 'female_literacy', title='Female Literacy')

# Set the x axis label
p.xaxis.axis_label = 'literacy'

# Set the y axis label
p.yaxis.axis_label = 'count'

# Specify the name of the output_file and show the result
output_file('histogram.html')
show(p)

### Controlling the number of bins

By default, Bokeh makes histograms with 10 bins. By controlling the bins parameter of the Histogram() function, you can adjust the number of bins.

In this exercise, you'll plot the 'female_literacy' column of df with 40 bins.

In [14]:
# Import Histogram, output_file, and show from bokeh.charts
from bokeh.charts import Histogram, output_file, show

# Make the Histogram: p
p = Histogram(df, 'female_literacy', title='Female Literacy', bins=40)

# Set axis labels
p.xaxis.axis_label = 'Female Literacy (% population)'
p.yaxis.axis_label = 'Number of Countries'

# Specify the name of the output_file and show the result
output_file('histogram.html')
show(p)

### Generating multiple histograms at once

Now, you'll make separate histograms, each with 10 bins for each of the 6 continents.

To do this with Bokeh charts, you need to pass a column name to the the color parameter of the Histogram() function. In this case, the 'Continent' column contains the continent abbreviation that you'll use to group the female literacy values in each of the 6 continents.

In [15]:
# Import Histogram, output_file, and show from bokeh.charts
from bokeh.charts import Histogram, output_file, show

# Make a Histogram: p
p = Histogram(df, 'female_literacy', title='Female Literacy',
              color='Continent', legend='top_left', bins=10)

# Set axis labels
p.xaxis.axis_label = 'Female Literacy (% population)'
p.yaxis.axis_label = 'Number of Countries'

# Specify the name of the output_file and show the result
output_file('hist_bins.html')
show(p)

## BoxPlots

### A basic box plot

In this exercise, you'll make a box plot of female literacy per continent by setting values='female_literacy' and label='Continent' with the BoxPlot() function.

In [16]:
# Import BoxPlot, output_file, and show from bokeh.charts
from bokeh.charts import BoxPlot, output_file, show
# Make a box plot: p
p = BoxPlot(df, values='female_literacy', label='Continent',
            title='Female Literacy (grouped by Continent)', legend='bottom_right')

# Set the y axis label
p.yaxis.axis_label = 'Female literacy (%population)'

# Specify the name of the output_file and show the result
output_file('boxplot.html')
show(p)

### Color different groups differently

Like in the Histogram() function, you can use the color parameter of the BoxPlot() function to color the box plot of each continent separately.

In this exercise, you'll distinguish between the six continents by setting set the color parameter of the BoxPlot() function to 'Continent'.

In [18]:
# Import BoxPlot, output_file, and show
from bokeh.charts import BoxPlot, output_file, show

# Make a box plot: p
p = BoxPlot(df, values='female_literacy', label='Continent', color='Continent',
            title='Female Literacy (grouped by Continent)', legend='bottom_right')

# Set y-axis label
p.yaxis.axis_label = 'Female literacy (% population)'

# Specify the name of the output_file and show the result
output_file('boxplot.html')
show(p)

## Scatter

### A basic scatter plot

In this exercise, you'll make a simple scatter plot of female literacy on the y axis and population on x axis. The dataset you have been working with has been pre-loaded into a DataFrame called df.

In [19]:
# Import Scatter, output_file, and show from bokeh.charts
from bokeh.charts import Scatter, output_file, show

# Make a scatter plot: p
p = Scatter(df, x='population', y='female_literacy',
            title='Female Literacy vs Population')

# Set the x-axis label
p.xaxis.axis_label = 'population'

# Set the y-axis label
p.yaxis.axis_label = 'female literacy'

# Specify the name of the output_file and show the result
output_file('scatterplot.html')
show(p)

### Using colors to group data

Just like you've seen with other Bokeh charts, you can use the color parameter in the Scatter() function to color each circle by its 'Continent' in the plot of Female Literacy vs Population.

Once again, the DataFrame has been pre-loaded as df.

In [23]:
# Import Scatter, output_file, and show from bokeh.charts
from bokeh.charts import Scatter, output_file, show

# Make a scatter plot such that each circle is colored by its continent: p
p = Scatter(df, x='population', y='female_literacy', color='Continent',
            title='Female Literacy vs Population',
            legend = 'bottom_right')

# Set x-axis and y-axis labels
p.xaxis.axis_label = 'Population (millions)'
p.yaxis.axis_label = 'Female literacy (% population)'

# Specify the name of the output_file and show the result
output_file('scatterplot.html')
show(p)

### Using shapes to group data

Like the color parameter, the marker type can be set to a column of categorical data to select a different marker for each value.

Here you'll plot Female Literacy vs Population and set a different marker for each of the 6 continents using the marker parameter of the Scatter() function. As before, the DataFrame has been pre-loaded as df.

In [22]:
# Import Scatter, output_file, and show from bokeh.charts
from bokeh.charts import Scatter, output_file, show

# Make a scatter plot such that each continent has a different marker type: p
p = Scatter(df, x='population', y='female_literacy', 
            color='Continent', marker='Continent',
            title='Female Literacy vs Population',
            legend='bottom_right')

# Set x-axis and y-axis labels
p.xaxis.axis_label = 'Population (millions)'
p.yaxis.axis_label = 'Female literacy (% population)'

# Specify the name of the output_file and show the result
output_file('scatterplot.html')
show(p)