## Plotting with Bokeh

In this activity, we want to use the higher-level interface which is focused around providing a simple interface for quick visualization creation.   

We will use the plotting interface to get some insights into the population density development of Germany and Switzerland.

We are already familiar with this dataset since we've used it before.   
As a reminder, the world population dataset contains information about the population density of each country for different years.

#### Loading our dataset

In [None]:
# importing the necessary dependencies
import pandas as pd
from bokeh.plotting import figure, show

In [None]:
# make bokeh display figures inside the notebook
from bokeh.io import output_notebook

output_notebook()

**Note:**   
The cell above allows us to plot the bokeh visualizations inline in the notebook. By default it will open a new tab in your browser window with the plot.

In [None]:
# loading the Dataset 
# importing the necessary dependencies
# data uploading

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

#2. Get the file
downloaded = drive.CreateFile({'id':"1qG5JXbgk2Z19zcX_EYVQTcUJMlhmXDJU"})   # replace the id with id of file you want to access
downloaded.GetContentFile('world_population.csv')  
# loading the Dataset
dataset = pd.read_csv('world_population.csv', index_col=0)

In [None]:
# looking at the dataset
dataset.head()

Unnamed: 0_level_0,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Aruba,ABW,Population density (people per sq. km of land ...,EN.POP.DNST,,307.972222,312.366667,314.983333,316.827778,318.666667,320.622222,...,562.322222,563.011111,563.422222,564.427778,566.311111,568.85,571.783333,574.672222,577.161111,
Andorra,AND,Population density (people per sq. km of land ...,EN.POP.DNST,,30.587234,32.714894,34.914894,37.170213,39.470213,41.8,...,180.591489,182.161702,181.859574,179.614894,175.161702,168.757447,161.493617,154.86383,149.942553,
Afghanistan,AFG,Population density (people per sq. km of land ...,EN.POP.DNST,,14.038148,14.312061,14.599692,14.901579,15.218206,15.545203,...,39.637202,40.634655,41.674005,42.830327,44.127634,45.533197,46.997059,48.444546,49.821649,
Angola,AGO,Population density (people per sq. km of land ...,EN.POP.DNST,,4.305195,4.384299,4.464433,4.544558,4.624228,4.703271,...,15.387749,15.915819,16.459536,17.020898,17.600302,18.196544,18.808215,19.433323,20.070565,
Albania,ALB,Population density (people per sq. km of land ...,EN.POP.DNST,,60.576642,62.456898,64.329234,66.209307,68.058066,69.874927,...,108.394781,107.566204,106.843759,106.314635,106.013869,105.848431,105.717226,105.60781,105.444051,


---

#### Plotting the first data

Before we are able to plot the line that displays the population density growth of Germany, we need to do some data extraction.   
In this case we only want to have the column headers that are years in order to later retrieve the values for those years from our dataset for the given country.   
When looking at our dataset, we can see that only the columns that are years start with a numerical character. We can use this knowledge in order to write a filter condition using list comprehension.   

Once we have the years extracted, we can use this list of years to retrieve the values for given years from our country row of the dataset using the `loc` method.   
We will use a list comprehension to extract that kind of knowledge from our dataframe.

In [None]:
# preparing our data for Germany
years = [year for year in dataset.columns if not year[0].isalpha()]
de_vals = [dataset.loc[['Germany']][year] for year in years]

After preparing the data to be plotted, we can set up our visualization by first defining a plot using the `figure` method.   
This is the place where we can define the `title` and the text for the `x` and `y` axis of our plot.   

Once we have defined our plot figure, we can then add "*layers*" onto that plot.   
There are several pre-defined visualizations incorporated.   
- Lines
- Scatters
- Bars
- Patches

> You can find even more in the official documentaion: https://bokeh.pydata.org/en/latest/docs/user_guide/plotting.html#userguide-plotting


Comparable to Matplotlib, the final call to display is done with the `show` method.   
By default, this will open a new browser window/tab displaying the visualization. However since we're using `output_notebook` here, it will be embeded into this notebook.

In [None]:
# plotting the population density change in Germany in the given years
plot = figure(title='Population Density of Germany', x_axis_label='Year', y_axis_label='Population Density')

plot.line(years, de_vals, line_width=2, legend='Germany')

show(plot)



**Note:**   
Note that the amount of data points in the first and second argument passed to the plotting methods has to be the same.   
If your `x` list has 10 values, your `y` list also has to have 10 values.

---

#### Simple plotting with gridplot

When using bokehs high-level plotting interface, we are able to display several graphs in one plot.   
This means that we can simply add different layers on top of each other to create more comlex constructs.   
In our case, we want to add the population denstiy development of Switzerland on top of the one from Germany.  
We also want to visually distinguish them from each other by using different colors and also adding a different graph style to our Switzerland line.   

We first need to filter our data to extract the data of Switzerland.   
After we've done that, we can use the same technique we saw in the first task to plot the line.

In [None]:
# preparing the data for the second country
ch_vals = [dataset.loc[['Switzerland']][year] for year in years]

In [None]:
# plotting the data for Germany and Switzerland in one visualization, 
# adding circles for each data point for Switzerland
plot = figure(title='Population Density of Germany and Switzerland', x_axis_label='Year', y_axis_label='Population Density')

plot.line(years, de_vals, line_width=2, legend='Germany')
plot.line(years, ch_vals, line_width=2, color= 'orange', legend='Switzerland')
plot.circle(years, ch_vals, size=4, color= 'orange',fill_color='white', legend='Switzerland')
show(plot)




**Note:**   
Using Bokeh, we are also able to stack different graphs on top of each other which gives us the tooling to not only differentiate the graphs by color but other features like different line styles.

Next, we want to display the plots next to each other instead if having them stacked.   
In some cases this will allow us to compare many plots better since they are not layered on top of each other.   
We can still make sure all of the plots have the same `x` and `y` axis for better comparison by providing the `x_range` and `y_range` values to our other figures.   

Bokeh provides a submodule that gives us access to several layouts. So we need an additional import here.   
The `gridplot` object allows us to provide a two dimensional array which describes the positions of the plots on a grid.

In [None]:
# plotting the Germany and Switzerland plot in two different visualizations
# that are interconnected in terms of view port
from bokeh.layouts import gridplot


**Note:**   
This means that if we provide two lists nested, we can control the output to be printed in a vertically stacked way instead of next to each other.

In [None]:
plot_de = figure( title='Population Density of Germany',
x_axis_label='Year', y_axis_label='Population Density',
plot_height=300,
plot_width=450)

plot_ch=figure(
title='Population Density of Switzerland', x_axis_label='Year',
y_axis_label='Population Density',
plot_height=300,
plot_width=450,
x_range=plot_de.x_range,
y_range=plot_de.y_range)

plot_de.line(years, de_vals, line_width=2)
plot_ch.line(years, ch_vals, line_width=2)


plot= gridplot([[plot_de, plot_ch]])
show(plot)

We can see that as soon we have our data prepared, it's fairly simple to visualize the data in a simple plot.   
Using the `bokeh.plotting` interface allows us to display the data with different, so called, `glyphs` without much work.   