# <center>NPS Python for Data Analysis Primer</center>
## <center> <img src='Images/NPS_Logo.jpg' height=250/></center>
<center style="font-size:24px">LTC Matt Smith</center>
<center style="font-size:24px">NPS Operations Research Dept.</center>
<center>matthew.smith@nps.edu</center>

# <center>Lesson 3: Data Visualization</center>

In this notebook:
- [Matplotlib](#Matplotlib)
    - [Quick Matplotlib Orientation](#Quick-Matplotlib-Orientation)
    - [Figures, Axes, and Subplots](#Figures,-Axes,-and-Subplots)
    - [Common Plot Types](#Common-Plot-Types)
    - [Labeling Plots](#Labeling-Plots)
- [Plotting with Pandas](#Plotting-with-Pandas)
    - [Pandas Plotting Overview](#Plotting-with-Pandas)
    - [Example: Plotting GFEBS Data](#Example:-Plotting-GFEBS-Data)
- [Seaborn](#Seaborn)
- [Interactive Graphics with Bokeh](#Interactive-Graphics-with-Bokeh)
    - [Example: COVID19 Dashboard](#Example:-COVID19-Dashboard)
- [Making GIFs](#Making-GIFs)

# Matplotlib

Matplotlib is the most common plotting library in python.  Those familiar with MATLAB will notice that it has a similar interface and appearance, which is due to the fact that it was originally developed in 2002 by a physics PhD student who liked the MATLAB computing interface but needed more computational power than it could provide.  Matplotlib has matured since then and now provides a robust platform for producing high quality, mainly 2-dimensional and static plots.

This notebook gives a quick intro to the main mechanics and functionality to help get started in Matplotlib.  For more details, see the [matplotlib documentation](https://matplotlib.org/).  In particular, the matplotlib gallery is a great place to look for pre-built products that you can then tweak or adapt to your specific use case.

### Quick Matplotlib Orientation

In [None]:
#Standard import convention for pyplot, the primary matplotlib plotting tool
import matplotlib.pyplot as plt

#Lets also import pandas and numpy
import pandas as pd
import numpy as np

Now lets plot!

In [None]:
#A simple plot
x = np.arange(0,15,.01)
y = np.sin(x)
plt.plot(x,y,color='red',linewidth=3)
#Thanks to the %matplotlib notebook command, the plot will automatically display in the cell output

In [None]:
#Say we want to make another plot with y2 = 2*sin(x)
y2 = 2*np.sin(x)
plt.plot(x,y2,color='blue',linewidth=3)

We can create new figures with plt.figure() to move on to a new plot.

In [None]:
plt.figure(figsize=(3,3)) #Specify figsize in (x inches, y inches) to make it smaller
plt.plot(x,y,color='red',linewidth=3)

In [None]:
#Move on to a NEW figure instead of continuing to add to the previous plot
plt.figure(figsize=(3,3))
plt.plot(x,y2,color='blue',linewidth=3)

### Figures, Axes, and Subplots

Before moving on and creating cool looking plots, it's helpful to understand the way that matplotlib sets up the containers that will hold the plots, using Figure, Subplot, and Axes objects.

This can be a bit confusing, because matplotlib plots into **Axes** objects, not into **Figure** objects.  Essentially, each Axes object is a canvas which we can draw on with pyplot, and a Figure object is a container that holds one or more Axes objects.

Lets look at a couple examples of why this distinction matters.

In [None]:
#Set up a new figure
fig1 = plt.figure(figsize=(4,4))
#A blank figure should show up below (use fig1.show() if it doesn't)
#You can ignore it for now, but it will populate as we run following cells

In [None]:
#fig1 is a Figure object
type(fig1)

In [None]:
#When we establish a new Figure, it automatically creates a single Axes object
#We can access with wtih plt.gca(), for get current axes
ax1 = plt.gca()
type(ax1)
#Notice that this now initiated the x and y axes in the figure above

Since a Figure contains one or more Axes objects, Axes objects are effectively subplots, so sometimes you will see the terms axes and subplot used interchangably.  In fact, we just saw that `ax1` is an "AxesSubplot" object, further proof that the terms are inter-mingled.  

In [None]:
#At this point, we can't plot directly into the figure, so this won't work
#fig.plot(x,y,color='red')

#However, we CAN plot into the axes object
ax1.plot(x,y,'red')

Note that if we simply run `plt.plot()` instead of `ax.plot()`, it will automatically plot into the current Axes object.  Either method will work, so it's a personal preference which one to use.  I prefer to use ax.plot() to make sure I know exactly what plot I am referring to.  For example, if you use plt.plot() in your cell, then try to come back to the current cell after moving on to a new Axes object elsewhere in the notebook, the plt.plot() command may not do what you expect. 

In [None]:
#Plot into most recent Axes object with plt.plot()
plt.plot(x,y2,'blue')
#This should have added a blue curve to the previous plot

In [None]:
#We can use the Axes object to set up and plot in subplots within a figure
fig2 = plt.figure(figsize=(4,4))
ax1 = fig2.add_subplot(2, 2, 1)
ax2 = fig2.add_subplot(2, 2, 2)
ax3 = fig2.add_subplot(2, 2, 3)
ax1.plot(x,y,'red')
ax2.plot(x,y2,'blue')
ax3.plot(x,y,'green')

It is also common to creat an array of axes/subplot objects when initializing a new figure with subplots.

In [None]:
fig, axes = plt.subplots(2,2)

In [None]:
#Now note that axes is an array of AxesSubplot objects
axes

In [None]:
#This makes it easy to reference a desired subplot, starting with 0,0 for upper left
axes[0,0].plot(x,y,'red')
axes[0,1].plot(x,2*y,'blue')
axes[1,0].plot(x,3*y,'green')
#See plot above for updated suplots

In [None]:
#We can also adjust the padding and whitespace around and between the subplots
fig, axes = plt.subplots(2,2)
axes[0,0].plot(x,y,'red')
axes[0,1].plot(x,2*y,'blue')
axes[1,0].plot(x,3*y,'green')
axes[1,1].plot(x,4*y,'purple')
#Adjust padding on left, right, top and bottom, and adjust width-space and height space between
plt.subplots_adjust(wspace=0, hspace=0)

In [None]:
#You may have noticed that the y-axes are different for each subplot
#We can force subplots to use same axes
fig, axes = plt.subplots(2,2,sharex=True,sharey=True)
axes[0,0].plot(x,y,'red')
axes[0,1].plot(x,2*y,'blue')
axes[1,0].plot(x,3*y,'green')
axes[1,1].plot(x,4*y,'purple')
#Adjust padding on left, right, top and bottom, and adjust width-space and height space between
plt.subplots_adjust(wspace=0, hspace=0)

We canvsave a figure with `fig.savefig(filname)`, where fig is the name of a Figure object.  We could also use `plt.savefig(filename)'` to save the current active figure, though this seems to work best when used inside the same cell where the figure was defined.

In [None]:
fig.savefig('Images/sine_waves.png')

The savefig() method infers the file type from the name, so if you provided a filename with .pdf or .jpg or .svg, it will automatically save into that format.  

With the basics covered, lets move on to making cool plots!

### Common Plot Types

#### Line Plots

We've already seen basic line plots, but lets take a closer look at how they are constructed and get more familiarity with the various mechanics of how to build plots in matplotlib.

In [None]:
#Build some data to plot
x_data = np.array([1,2,3,4,5,6,7,8,9,10])
linear_data = 2*x_data
exponential_data = x_data**2

In [None]:
#initialize figure and axes
fig, axes = plt.subplots(3,3)

#We can just specify x and y value to plot a single line
axes[0,0].plot(x_data,linear_data,color='blue')

#If we just specify y, it automatically uses index as the x coordinates
axes[0,1].plot(linear_data,color='red')

#To plot multiple lines, we could call plot multiple times
axes[0,2].plot(linear_data)
axes[0,2].plot(exponential_data)

#We can also plot multiple lines by feeding in ax.plot(x1,y1,x2,y2,...)
axes[1,0].plot(x_data,linear_data,x_data,exponential_data)

#Specify various line and marker styles with (x1,y1,format1,x2,y2,format2,...)
axes[1,1].plot(x_data,linear_data,'-o',x_data,exponential_data,':x')

Instead of feeding in x and y arrays, we can also feed in a dataset and then reference the name of the column containing the x and y data.

In [None]:
#Build a dataframe
df_lines = pd.DataFrame({'x':x_data,'y1':linear_data,'y2':exponential_data})
df_lines

In [None]:
#Plot single line by specifying columns in data
axes[1,2].plot('x','y1',data=df_lines)

In [None]:
#It's tricky to plot multiple columns using the data argument
#Easier to just feed in multiple columns directly
axes[2,0].plot(df_lines['x'],df_lines[['y1','y2']])

In [None]:
#Another useful line function is filling between two lines
plt.figure(figsize=(3,3))
plt.plot(x_data,linear_data,x_data,exponential_data)
plt.fill_between(x_data,linear_data, exponential_data, 
                       facecolor='blue', 
                       alpha=0.25)

#### Scatterplots

In [None]:
fig = plt.figure(figsize=(3,3))
#Basic scatter plot with plt.scatter() or ax.scatter()
plt.scatter(x_data,linear_data)

In [None]:
#Adjust attributes such as size (in screen pixels), color, and marker
fig = plt.figure(figsize=(3,3))
plt.scatter(x_data,linear_data,s=100,color='red',marker='^')

#### Barplots

In [None]:
plt.figure()
plt.bar(x_data, linear_data, width = 0.3,color='blue')

In [None]:
#It can be clunky to plot mutliple bars side by side.  Generally have to manually build the new list of x coordinates
x_data2 = [x+.25 for x in x_data]
plt.bar(x_data2,exponential_data,width = 0.3,color='red')

In [None]:
#To make stacked bar charts, we need to speicfy the start of the upper stack as the top of the lower stack
plt.figure(figsize=(3,3))
plt.bar(x_data,linear_data,width = 0.3,color='blue')
plt.bar(x_data,exponential_data,width=0.3,color='red',bottom=linear_data)

In [None]:
#Horizontal bar charts
plt.figure(figsize=(3,3))
plt.barh(x_data, linear_data, height = 0.3, color='b')
plt.barh(x_data, exponential_data, height = 0.3, left=linear_data, color='r')

#### Histograms

Neat example drawn from Coursera course Applied Plotting in Python.

In [None]:
fig, axes = plt.subplots(2, 2, sharex=True)
axes_list = list(np.reshape(axes,(4,1)))
axes_list[0]

In [None]:
# create 2x2 grid of axis subplots
fig, axes = plt.subplots(2, 2, sharex=True)
axes_list = np.hstack(np.reshape(axes,(4,1)))

# draw n = 10, 100, 1000, and 10000 samples from the normal distribution and plot corresponding histograms
for n in range(0,len(axes)):
    sample_size = 10**(n+1)
    sample = np.random.normal(loc=0.0, scale=1.0, size=sample_size)
    axes_list[n].hist(sample)
    axes_list[n].set_title('n={}'.format(sample_size))

In [None]:
fig, axes = plt.subplots(2, 2, sharex=True)
axes_list = np.hstack(np.reshape(axes,(4,1)))

# draw n = 10, 100, 1000, and 10000 samples from the normal distribution and plot corresponding histograms
for n in range(0,len(axes)):
    sample_size = 10**(n+1)
    sample = np.random.normal(loc=0.0, scale=1.0, size=sample_size)
    axes_list[n].hist(sample,bins=50)
    axes_list[n].set_title('n={}'.format(sample_size))

#### Heatmaps

In [None]:
#We can create heatmaps in matplotlib using plt.hist2d
plt.figure(figsize=(3,3))

Y = np.random.normal(loc=0.0, scale=1.0, size=10000)
X = np.random.random(size=10000)
plt.hist2d(X, Y, bins=25)

#Add a colorbar
plt.colorbar()

We can also create heatmaps using the plt.pcolor() plotting call.  Here's a neat example adapted from https://stackoverflow.com/questions/14391959/heatmap-in-matplotlib-with-pcolor.  

In [None]:
nba = pd.read_csv("http://datasets.flowingdata.com/ppg2008.csv", index_col=0)

# Normalize data columns
nba_norm = (nba - nba.mean()) / (nba.max() - nba.min())

# Sort data according to Points, lowest to highest
# This was just a design choice made by Yau
# inplace=False (default) ->thanks SO user d1337
nba_sort = nba_norm.sort_values('PTS', ascending=True)

nba_sort['PTS'].head(10)

# Plot it out
fig, ax = plt.subplots()
heatmap = ax.pcolor(nba_sort, cmap=plt.cm.Blues, alpha=0.8)

# Format
fig = plt.gcf()
fig.set_size_inches(8, 11)

# turn off the frame
ax.set_frame_on(False)

# put the major ticks at the middle of each cell
ax.set_yticks(np.arange(nba_sort.shape[0]) + 0.5, minor=False)
ax.set_xticks(np.arange(nba_sort.shape[1]) + 0.5, minor=False)

# want a more natural, table-like display
ax.invert_yaxis()
ax.xaxis.tick_top()

# Set the labels

# label source:https://en.wikipedia.org/wiki/Basketball_statistics
labels = [
    'Games', 'Minutes', 'Points', 'Field goals made', 'Field goal attempts', 'Field goal percentage', 'Free throws made', 'Free throws attempts', 'Free throws percentage',
    'Three-pointers made', 'Three-point attempt', 'Three-point percentage', 'Offensive rebounds', 'Defensive rebounds', 'Total rebounds', 'Assists', 'Steals', 'Blocks', 'Turnover', 'Personal foul']

# note I could have used nba_sort.columns but made "labels" instead
ax.set_xticklabels(labels, minor=False)
ax.set_yticklabels(nba_sort.index, minor=False)

# rotate the xticks
plt.xticks(rotation=90)

ax.grid(False)

# Turn off all the ticks
ax = plt.gca()

for t in ax.xaxis.get_major_ticks():
    t.tick1On = False
    t.tick2On = False
for t in ax.yaxis.get_major_ticks():
    t.tick1On = False
    t.tick2On = False

### Labeling Plots

The NBA heatmap examples highlights the importance of being able to manipulate the axis and tick labels.  Lets take a closer look at how to manipulate these features of a plot.

#### Title, Ticks, and Axis Labels

In [None]:
#Recreate a basic bar chart
x_data = np.array([1,2,3,4,5,6,7,8,9,10])
linear_data = 2*x_data
exponential_data = x_data**2
fig,ax = plt.subplots(figsize=(6,6))
bar_width = 0.3
ax.bar(x_data,linear_data,width=bar_width,color='blue')
x_data2 = [x+bar_width for x in x_data]
ax.bar(x_data2,exponential_data,width=bar_width,color='red')

In general there are two ways to adjust the plot labels.  We can either use plt commands (e.g. `plt.title("Plot Title")` to set title), or we can use methods of the AxesSubplot objects (e.g. `ax.set_title("Plot Title")`).  I prefer to use the Axes methods, because they tend to be more explicit and clear.  For one, when calling an Axes object you know exactly what object it is referring to, whereas plt.title() refers to the currently active axes object (which can change).

Second, the Axes methods tend to be more explicit.  For example, we can call `plt.xlim()` to GET a tuple of the current x limits, or we can call `plt.xlim([0,5])` to SET the xlimits.  By contrast, the commands for Axes objects tend to be more clear, e.g. `ax.get_xlim()` to get the limits and `ax.set_xlim([0,5])` to set it.  Hence, I mainly use the Axes methods, but both are available.

In [None]:
#Some common methods to update labels.  These should update the bar plot above after running this cell.

#Set title
#Matplotlib doesn't have built in subtitle, but you can add newline character \n to move to new line
ax.set_title('Bar Chart \nBuilt in matplotlib')

#Set x and y limits
ax.set_xlim([-1,12])
ax.set_ylim([-10,np.max(exponential_data)+20])

#Set x and y axis labels
ax.set_xlabel('Categories')
ax.set_ylabel('Value')

#Set exact xtick locations and 
ax.set_xticks([2,4,6,8,10])
ax.set_xticklabels(['A','B','C','D','E'],rotation=90)



#### Legends

There are two common implementations to get legends.  First is to add all the plots, and then call `ax.legend(labels)` with a list of the labels you want to add.

In [None]:
fig,ax = plt.subplots(figsize=(4,4))
ax.plot(x_data,linear_data,'-bo',x_data,exponential_data,':rx')
ax.legend(['Linear','Exponential'])

Another approach is to add a label to each element as it is plotted, then simply call legend() at the end.

In [None]:
fig,ax = plt.subplots(figsize=(4,4))
ax.plot(x_data,linear_data,'-bo',label='Linear Data')
ax.plot(x_data,exponential_data,':rx',label='Exponential Data')
ax.legend(loc='center left') #See matplotlib documentation for more possible legend locations

#### Adding Annotations

In [None]:
fig,ax = plt.subplots(figsize=(4,4))
ax.plot(x_data,linear_data,'-bo',x_data,exponential_data,':rx')

#We can add text with ax.text()
ax.text(np.min(x_data),np.max(exponential_data)-10,'My Plot',ha='left')

#We can also add fancier annotations such as arrows with ax.annotate()
labels = ['Close Together','Far Apart']
xs=[3,7]
ys = [exponential_data[x-1] for x in xs]
for i, label in enumerate(labels):
    ax.annotate(label,
                xy=(xs[i],ys[i]+5),
                xytext = (xs[i],ys[i]+25),
               arrowprops=dict(facecolor='black', headwidth=4, width=2,headlength=4),
               horizontalalignment='center', verticalalignment='top')
#The xytext option pushes the text up above

# Plotting with Pandas

Pandas has many built in methods for creating plots from the data with a DataFrame or Series.  These methods are built on top of matplotlib, but provide some useful shortcuts and functionality that often can be better suited to plotting the multiple columns of data we have in a DataFrame.  

Since the pandas plotting methods are implemented in matplotlib, we will need one of the matplotlib magic commands (`%matplotlib notebook` or `%matplotlib inline`), even if we do not call the matplotlib package at all.  Lets re-import the key components in case you are starting from this point in the notebook.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#### Pandas Plotting Overview

In [None]:
#Create some fake stock data
np.random.seed(1)

df = pd.DataFrame({'A': np.random.randn(365).cumsum(0), 
                   'B': np.random.randn(365).cumsum(0) + 20,
                   'C': np.random.randn(365).cumsum(0) - 20}, 
                  index=pd.date_range('1/1/2017', periods=365))

#If we assign a name to the dataframe index and columns, then those names will show up in the plots
df.index.name = "Date"
df.columns.name = "Stocks"

df.head()

In [None]:
#Now plot a line chart
df.plot(kind='line',y=['A','B','C']) 
#y specifies which columns to plot.  Default is all columns, so we technically don't need to include it here. 
#By default df.plot() uses DataFrame index for x, or we could specify a different column if we wanted with df.plot(x='col_name')

In [None]:
#We can also feed in many of the plot properties when we call df.plot()
df.plot(kind='line',title='Title',figsize=(5,3),rot=45) #Default is to plot all columns, so we dont' need y=

In [None]:
#Can also create separate subplot for each column
df.plot(subplots=True,layout=(3,1))

While we can specify some of the plot properties directly in the arguments of df.plot(), this only provides limited options.  For example, there's no easy way to add a custom axis label.  

To get more ability to customize the plot, we can take advantage of the fact that the plotting call `df.plot()` technically returns an AxesSubplot object.  For example, we could get the Axes object by running `ax = plt.gca()` right after making the plot (gca = get current axes), and then update the plot properties.  Or we could simply assign a plot call to an axes variable, as follows.

In [None]:
ax = df.plot(kind='line',figsize=(5,4))
ax.set_title('Cool Title')
ax.set_xlabel('Date')
ax.set_ylabel('Price')
#Can also use matplotlib.pyplot to manipulate plot
plt.xticks(rotation=45)
plt.subplots_adjust(bottom=.25)

So far we've seen line plots, implemented with `df.plot(kind='line')`.  Pandas supports many other types of plots using other values of the "kind" argument, including 'area', 'bar', 'barh', 'density', 'hist', 'kde', 'line', 'pie'.  

A quick note on plotting methods before moving on.  For any of the plot kinds, you can also create the plots with `df.plot.line()`, `df.plot.hist()`, etc.  These two approaches (`df.plot(kind='line')` vs `df.plot.line()`) are roughly the same, though sometimes the direct plotting call (e.g. `df.plot.hist()`) will offer some additional plotting parameters that you won't necessarily find in the generic plotting call (e.g. `df.plot(kind='hist')`).  Hence, I generally prefer to use `df.plot.kind()` methods.  

In [None]:
#Scatter plots
#This will plot 'A' on x axis, 'C' on y axis, and color code according to values in column B
ax = df.plot.scatter('A', 'C', c='B', s=df['B'], colormap='viridis')
ax.set_aspect('equal')

In [None]:
#Plot a histogram
#Here the histograms will overlap, so we use the alpha parameter to make the plots transparent (alpha=0 for fully transparent)
df.plot.hist(alpha=0.7,bins=10);

Kernel desnity estimates are a neat way to find continuous function fits to the data samples.

In [None]:
df.plot.kde()

In [None]:
#Bar plot example, taken from McKinney's Python Data Analysis book
df = pd.DataFrame(np.random.rand(6, 4),
                  index=['one', 'two', 'three', 'four', 'five', 'six'],
                  columns=pd.Index(['A', 'B', 'C', 'D'], name='Genus'))
df.plot.bar()

We can easily create stacked bar charts with stacked=True.  This was not an available option with basic matplotlib, which highlights some of the added utility of using the pandas plotting methods.

In [None]:
#Create horizontal stacked bar chart
df.plot.barh(stacked=True,alpha=.5)

We can also create a scatter matrix, very useful for machine learning applications.

In [None]:
#Lets load the iris dataset
from sklearn.datasets import load_iris
iris = load_iris()
print(iris.keys())
df_iris = pd.DataFrame(data=iris['data'],columns=iris['feature_names'])
df_iris['Name'] = iris['target']
df_iris.Name = df_iris.Name.map(dict(zip(np.arange(len(iris['target_names'])),iris['target_names'])))
df_iris.head(3)

In [None]:
#The on diagonal plots show hist of values for that feature; off-diag show relation between each pair
ax = pd.plotting.scatter_matrix(df_iris,figsize=(8,8))

# Seaborn

The python plotting package [Seaborn](https://seaborn.pydata.org/index.html) is popular in the python data visualization ecosystem as it provides many great looking plot templates and integrates well with matplotlib and pandas.  Here are a few quick examples.

In [None]:
#Standard import 
import seaborn as sns

In [None]:
#Pairplots
sns.pairplot(df_iris, hue='Name', diag_kind='kde', size=2);

Seaborn has a nifty [heatmap](https://seaborn.pydata.org/generated/seaborn.heatmap.html) function as well.  Here's a quick example adapted from https://www.statology.org/seaborn-heatmap/:

In [None]:
#load "flights" dataset
df_flights = sns.load_dataset("flights")

df_flights.head()

In [None]:
#pivot to get into right structure
df_flights_plot = df_flights.pivot_table(columns="month",index="year", 
                                         values="passengers",aggfunc='sum')
df_flights_plot.head()

In [None]:
#Aaaand, plot!
hm = sns.heatmap(df_flights_plot)

In [None]:
#We can also change the look and colors with all kinds of options
sns.heatmap(df_flights_plot, annot=True, fmt="d", 
            cmap="coolwarm",
            annot_kws={"size":8},
            cbar_kws={'label': 'colorbar title'})

As with pandas plots, Seaborn plots are built on top of matplotlib.  That means we can access the underlying matplotlib object to alter some of the plot characterstics.

For example, Seaborn doesn't have a built-in option to adjust figure size or add titles, but we can use the matplotlib figure to adjust these values.

In [None]:
#set heatmap size
plt.figure(figsize = (8,3))

sns.heatmap(df_flights_plot, annot=True, fmt="d", 
            cmap="coolwarm",
            annot_kws={"size":8},
            cbar_kws={'label': 'colorbar title'})

#Add title
plt.title('Total Flights by Month')

# Interactive Graphics with Bokeh

Although matplotlib, pandas, and seaborn are great for producing static images, they provide hardly any ability to produce interactive graphics.  Bokeh provides the ability to produce rich, interactive graphics that can be shared over the web, and has grown in popularity in recent years.  I used it extensively in my last data science role and got a lot of positive feedback from users who enjoyed the interactive graphics.  

As an example, see https://mdsmith44.github.io/COVID19_Analysis/ for an example COVID19 dashboard built in bokeh.  We will show here how to create the "square map" portion here.

The [bokeh documentation](https://docs.bokeh.org/en/latest/index.html) has a useful tutorial and a great [gallery](https://docs.bokeh.org/en/latest/docs/gallery.html#gallery) with many helpful examples.  We give a quick demo here to give an idea for the power of the bokeh package.

In [None]:
#Re-import in case you are starting from this point in the notebook
import pandas as pd
import numpy as np

### Bokeh Plotting Basics

Lets start with a simple example from the bokeh gallery to give a feel for how to create plots and how to add user interactions.  We will build off of the [Penguin Species](https://docs.bokeh.org/en/latest/docs/gallery/marker_map.html) example from bokeh's website.

Lets read in the sample data to see what we are working with. 

In [None]:
import pandas as pd
import numpy as np

#Import the primary bokeh figure object
from bokeh.plotting import figure

#The following calls are needed to view bokeh output in a Jupyter environment
from bokeh.io import output_notebook, save, show
from bokeh.resources import INLINE
output_notebook(INLINE)

If it works, you should see something like: 
>BokehJS 3.7.2 successfully loaded.

We now have the option to use the `show` method to view a bokeh figure directly in the Jupyter cell output, or we can `save` the figure to an html file.  

Lets check out a first example.

## Bokeh Example 1: Basic Line Chart
Lets start with the basic workflow of creating bokeh plots:

1. Prepare some data
2. Create a bokeh `figure` object
3. Add `glyphs` to the figure
4. Output the figure (either show or save it)

In [None]:
# 1. Prepare some data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

# 2. Create a new plot/figure with a title and axis labels
p = figure(title="simple line example", 
           x_axis_label='x', y_axis_label='y',
           height=400,width=400)
#height and width are in screen pixels

#3. Add Glyphs directly to our figure object
p.line(x,y,legend_label='A Line',line_width=2)
p.scatter(x,y,marker='circle_x',size=20)

#4. Select whether to show or save figure
show(p)

We have a line plot, similar to what we could have built with matplotlib or seaborn.  But notice that our plot is interactive!  You can click and drag the plot to move around, or click on other tools to do things like wheel-zoom or box-zoom.  

That is, the plot contains the javascript needed to handle those interactions, even though we only needed python to code it up.  Cool!

There are many other interactive tools we can add to our bokeh plots.  See [Bokeh Plot Tools](https://docs.bokeh.org/en/latest/docs/user_guide/interaction/tools.html) for complete reference. 

<blockquote>
    <u>Bokeh Plot Tools</u>
    <p>
    <b>Pan/Drag Gesture Tools:</b> Respond to user panning (on touch devices) or left-dragging (on mouse devices).
    <ul><li>Tool Names: box_select, box_zoom, lasso_select, pan
    </li></ul>
    <b>Click/Tap Gesture Tools:</b> Respond to user tapping (on touch device) or left-clicking (on mouse devices).
    <ul><li>Tool Names: poly_select, tap
        </li></ul>
<b>Scroll/Pinch Gesture Tools:</b> Respond to user pinching (on touch devices) or wheel scrolling (on mouse devices).
<ul><li>Tool Names: wheel_zoom, xwheel_zoom, ywheel_zoom
    </li></ul>
    <b>Action Tools:</b> Take some action when user clicks on the tool button.
<ul><li>Tool Names: save, reset, undo, redo, zoom_out, zoom_in
    </li></ul>
    <b>Inspection Tools:</b> Report information based on current cursor position. 
<ul><li>Tool Names: hover, crosshair
    </li></ul>
    </blockquote>
    
We can add as many tools as we want, but only one tool of each gesture type can be active at a time.

In [None]:
# Plot with plot tools

#random points
np.random.seed(1)
x = np.random.random(size=100)
y = np.random.random(size=100)

# Specify tools as string of tool names
# Set active drag tool with active_drag
# Can change toolbar_location
tools = 'pan,box_select,lasso_select,tap,wheel_zoom,hover,reset,save'
p1 = figure(title="A Basic Bokeh Plot", 
           x_axis_label='x', y_axis_label='y',
           height=400,width=400,
          tools=tools,
          active_drag='box_select',active_scroll='wheel_zoom',active_tap='tap',
          toolbar_location='right')


#Now add Glyphs directly to our figure object
p1.scatter(x,y,size=8)

#Select whether to show or save figure
show(p1)

Note that the `box_select` is our active Drag tool, `wheel_zoom` is our active Scroll tool, and `tap` is the active Tap tool.

We also added the Hover tool.  If you hover over any point, you will see some default information, but we can add custom Tooltips.

In [None]:
# Redo previous example with Hover tooltips
# The '@x' notation means display value of the "x" field of the data
TOOLTIPS = [
    ('x val','@x'),
    ('y val','@y')
]

tools = 'pan,box_select,lasso_select,tap,wheel_zoom,hover,reset,save'
p1 = figure(title="A Basic Bokeh Plot", 
           x_axis_label='x', y_axis_label='y',
           height=400,width=400,
          tools=tools,
          active_drag='box_select',active_scroll='wheel_zoom',active_tap='tap',
          toolbar_location='right',
          tooltips=TOOLTIPS)


#Now add Glyphs directly to our figure object
p1.scatter(x,y,size=5)

#Select whether to show or save figure
show(p1)

### Saving bokeh plots
One of the powers of bokeh is that you can export the fully interactive graphics to a stand-alone html file, which you can then email, share, or host online.  

We can save them locally using bokeh's `save` method.  

There are two options for how we save a bokeh html file:
- `resources=CDN`: When you open the html file, your machine will reach out to remote url's to pull in the javascript and css resources needed to display and render the graphics.  (CDN = Content Delivery Network)
- `resources=INLINE`: All of those resources are included in the html file.  This means it works even when not connected to the internet (very useful on closed or classified networks), but file will be a bit bulkier (extra 1-2 MBs or so).  

In [None]:
#Make sure we have the resources to save to html
from bokeh.resources import INLINE, CDN

#Save as CDN mode
save(p1,'bokeh_ex1_CDN.html',resources=CDN,title='Bokeh Plot')

#Save as INLINE mode
save(p1,'bokeh_ex1_INLINE.html',resources=INLINE,title='Bokeh Plot')

#title is what shows up in the browser tab
#you will get a warning if you don't include it in the save() method

You should now be able to open either file from your local environment.  Notice that the INLINE file will be a bit larger than the CDN version (for me about 1MB vs 10 kB).  

**Note:** In older versions of bokeh, you could either `show` or `save` a figure, but not both.  More recent versions seem to have fixed this issue, but if you get an error saying `Models must be owned by only a single document` when you try to save a figure, then you may have to comment out the `show(p1)` line in a previous cell in order to save it.

## Bokeh Example 2: Plotting from Pandas DataFrame
Bokeh (as with any python plotting package) makes it easy to plot directly from a panads DataFrame.  In bokeh this is done by converting the DataFrame to a bokeh `ColumnDataSource` object, which makes it easy for bokeh to pass the data to javascript arrays. 

In [None]:
from bokeh.models import ColumnDataSource

In [None]:
#If cell below doesn't work, may have to uncomment and run this cell
#to download all sample data sets
# Only need to run this one time

# !pip install bokeh_sampledata

In [None]:
#Read in sample auto data
from bokeh.sampledata.autompg import autompg_clean as df_auto
df_auto.head()

In [None]:
#Plot from dataframe
p2 = figure(height=400,width=400,title='Plotting autompg data using DataFrame')

#Bokeh feeds plots with a ColumnDataSource object
#We can create a bokeh ColumnDataSource directly from the pandas DataFrame
source = ColumnDataSource(df_auto)

#Now add circle glyphs, plotting hp (horsepower) vs mpg (miles per gallon)
#   We specify that plot data is coming from the ColumnDataSource "source"
#   We can reference column names from that source as follows
p2.scatter(source=source,x='mpg',y='hp')

#Here's another way to update axis labels
p2.xaxis.axis_label = 'Miles per Gallon'
p2.yaxis.axis_label = 'Horsepower'

show(p2)

An additional advantage of creating a ColumnDataSource is that we now have all fields of the dataframe within the plot.

Here's how we can use those extra fields to add color coding and a more descriptive hover tool.

In [None]:
#Import a few extra modules we can use

#Use this to add a HoverTool and to add Color
from bokeh.models import HoverTool, CategoricalColorMapper

#Import a color palette to use
#See https://docs.bokeh.org/en/latest/docs/reference/palettes.html
from bokeh.palettes import Spectral6

In [None]:
#Build Tooltips and see what they look like
TOOLTIPS = [(c,'@'+c) for c in df_auto.columns]
TOOLTIPS

In [None]:
#Plot from dataframe
p2 = figure(height=400,width=400,
           title='Plotting autompg data using DataFrame')

#Build ColumnDataSource
source2 = ColumnDataSource(df_auto)

#Now add circle glyphs, plotting hp (horsepower) vs mpg (miles per gallon)
#Add unique colors for each value of "origin" field
origin_list = list(df_auto.origin.unique())
color_mapper = CategoricalColorMapper(palette=Spectral6, factors=origin_list)
p2.scatter(source=source2,x='mpg',y='hp',
         color={'field': 'origin', 'transform': color_mapper},
        fill_alpha=0.8,
         legend_group='origin'
        )

#Update Axis Labels
p2.xaxis.axis_label = 'Miles per Gallon'
p2.yaxis.axis_label = 'Horsepower'

#Add Hovertool to existing plot
h = HoverTool(tooltips=TOOLTIPS)
p2.add_tools(h)

show(p2)

#Hover over points to see tooltips

## Bokeh Example 3: Adding Widgets and Interactions
Now comes some of the real magic of bokeh: Adding widgets and handling user interactions. 

However, this magic doesn't come for free.  You do need to add a bit of javascript code to handle the callback behavior, but bokeh provides convenient methods and frameworks that make this relatively painless.  And once you master this, you can create incredibly powerful interactive web graphics as a stand-alone html file without needing to set up a server infrastructure. 

The key bokeh mechanism is to use a `CustomJS` object to add JavaScript snippets that control how widget interactions should change the plot data or the plot itself.  

Lets look at an example.

In [None]:
#Import a few more things we will need

#Import the widgets we plan to use
from bokeh.models.widgets import Select, RangeSlider

#Import layout methods for combining plots and widgets
from bokeh.layouts import row, column

#Import CustomJS, which enables us to add JavaScript code to control behavior
from bokeh.models import CustomJS

In [None]:
#Lets first build the new layout with plot and widgets

#Build standard plot, same as before
p3 = figure(height=400,width=400,
           title='Autompg Data with Interactive Widgets')
source3 = ColumnDataSource(df_auto)
origin_list = list(df_auto.origin.unique())
color_mapper = CategoricalColorMapper(palette=Spectral6, factors=origin_list)
p3.scatter(source=source3,x='mpg',y='hp',
         color={'field': 'origin', 'transform': color_mapper},
        fill_alpha=0.8,
         legend_group='origin'
        )
p3.xaxis.axis_label = 'Miles per Gallon'
p3.yaxis.axis_label = 'Horsepower'
p3.add_tools(h)

#Now we want to add some widgets
origin_select = Select(options=['All'] + list(df_auto.origin.unique()),
                       value='All',
                      title='Select Origin')
low_yr = df_auto.yr.min()
high_yr = df_auto.yr.max()
yr_slider = RangeSlider(start=low_yr,end=high_yr,value=(low_yr,high_yr),
                        width=200,
                       title='Select Year Range')

#Create a widget layout
widgets = column(origin_select,yr_slider)

#Create final layout
layout3 = row(p3,widgets)

show(layout3)

Now we have a cool layout showing the plot next to our widgets.  But so far the widgets don't **do anything**.

**`CustomJS`** to the rescue!  Lets see it in action.

One small change, lets add an `x_active` field to original dataframe and use that to plot our points.  This will enable us to control (with the CustomJS javascript snippet) whether each point is active or not on the plot.

In [None]:
#Add x_active to df
df_auto['x_active'] = df_auto.mpg
df_auto.head(2)

In [None]:
#Add JavaScript callbacks to wire up widgets

#Build standard plot, same as before
#NEW: Fix x and y range
p3 = figure(height=400,width=400,
           title='Plotting autompg data with Interactive Widgets ',
          x_range=(df_auto.mpg.min()-1,df_auto.mpg.max()+1),
          y_range=(df_auto.hp.min()-5,df_auto.hp.max()+5))
source3 = ColumnDataSource(df_auto)
origin_list = list(df_auto.origin.unique())
color_mapper = CategoricalColorMapper(palette=Spectral6, factors=origin_list)
#NEW: Change x to be x_active
p3.scatter(source=source3,x='x_active',y='hp',
         color={'field': 'origin', 'transform': color_mapper},
        fill_alpha=0.8,
         legend_group='origin'
        )
p3.xaxis.axis_label = 'Miles per Gallon'
p3.yaxis.axis_label = 'Horsepower'
p3.add_tools(h)

#Now we want to add some widgets
origin_select = Select(options=['All'] + list(df_auto.origin.unique()),
                       value='All',
                      title='Select Origin')
low_yr = df_auto.yr.min()
high_yr = df_auto.yr.max()
yr_slider = RangeSlider(start=low_yr,end=high_yr,value=(low_yr,high_yr),
                        width=200,
                       title='Select Year Range')

#Create a widget layout
widgets = column(origin_select,yr_slider)

#Create final layout
layout3 = row(p3,widgets)

#NEW: Add CustomJS
#  "args" makes python bokeh objects available within the javascript environment
#  "code" is javascript snippet that can access and update those bokeh objects
callback = CustomJS(args=dict(source=source3, origin_select=origin_select, 
                              yr_slider=yr_slider),
                    code="""
                    //Now we are in javascript world!
                    //Use double forward slash for comments
                    /*
                    Can also do multi line comment like this
                    Useful for debugging..
                    */
                    
                    //Get the current value of all widgets
                    var origin = origin_select.value;
                    var yr_range = yr_slider.value; //A JS array such as [70,82]
                    var yr_min = yr_range[0];
                    var yr_max = yr_range[1];
                    
                    //Grab the source data
                    //This returns a JavaScript array
                    var data = source.data;
                    
                    //Now grab the x_active column so we can update it
                    var x_active_col = data['x_active'];
                    
                    //Go through every row, and remove point if needed
                    for (var i=0; i<x_active_col.length; i++) {
                        //Check all conditions
                        var origin_bool = ((data['origin'][i]==origin) || (origin=='All'));
                        var yr_bool = ((data['yr'][i] <= yr_max) && (data['yr'][i] >= yr_min));
                        
                        if ( origin_bool && yr_bool) {
                            //This point fits with all filters
                            //Set it equal to the mpg column
                            x_active_col[i] = data['mpg'][i];
                        } else {
                            //This point does NOT fit with filters
                            //Set it to javascript NaN to null it out
                            x_active_col[i] = NaN;
                        }
                    }
                    
                    //We have now updated the x_active column based on filters
                    //One more step to push the changes
                    source.change.emit();
""")

#Now "wire up" the widgets with this CustomJS callback
#We can attach this callback to the widgets by calling js_on_change()
origin_select.js_on_change('value',callback)
yr_slider.js_on_change('value',callback)

show(layout3)

Many other useful options for building in interactivity into Bokeh plots.  See [Bokeh Interactivity User Guide](https://docs.bokeh.org/en/latest/docs/user_guide/interaction.html#ug-interaction) as well as [Bokeh Gallery](https://docs.bokeh.org/en/latest/docs/gallery.html) (particularly the Interactivity examples).  

Here are a couple other interaction tricks that might be useful.

#### Adding Download Button
Using the right Javascript code, you can give a user the option to download the underlying datasets.

In [None]:
#Function to generate Javascript Code used to export 
def get_source_download_code(df,source_name='source'):
    download_code = """
        let data = {}.data;
        let L = data['{}'].length;
        let out = \"""".format(source_name,df.columns[0])
    
    for c in df.columns:
        download_code += c + ","
        
    download_code += """\\n";
        for (let i = 0; i < L; i++) {"""
    
    for c in df.columns:
        download_code += """
            out += data['{}'][i] + ",";""".format(c)
        
    download_code += """
            out += "\\n";
        }
        
        //create a text file
        let file = new Blob([out], {type: 'text/plain'});
        
        //create a link element (<a>link</a>)
        let elem = window.document.createElement('a');
        
        //modify the link to point to the text file
        elem.href = window.URL.createObjectURL(file);
        
        //make that file download, from the link
        elem.download = 'plot_data.csv'
        
        //add the link to the web page
        document.body.appendChild(elem);
        
        elem.click(); //on click..
        document.body.removeChild(elem); //remove the link
    """
    
    return download_code

#Get a feel for what the javascript code looks like
print(get_source_download_code(df_auto))

Now add a bokeh `Button` that uses that code as its javascript callback when clicked.

In [None]:
#Use Button to Export Data
from bokeh.models import Button

#Re-build basic autompg plot
p4 = figure(height=400,width=400,
           title='Plotting autompg data')
source4 = ColumnDataSource(df_auto)
origin_list = list(df_auto.origin.unique())
color_mapper = CategoricalColorMapper(palette=Spectral6, factors=origin_list)

#Plot points
p4.scatter(source=source3,x='mpg',y='hp',
           color={'field': 'origin', 'transform': color_mapper},
           fill_alpha=0.8,
           legend_group='origin'
        )
p4.xaxis.axis_label = 'Miles per Gallon'
p4.yaxis.axis_label = 'Horsepower'
p4.add_tools(h)

#Add button 
button = Button(label='Download Data',button_type='success')
button_callback = CustomJS(args=dict(source=source4),
                           code=get_source_download_code(df_auto))
button.js_on_click(button_callback)

layout = row(p4,button)
show(layout)

### Bokeh Example 4: Creating Layouts
Bokeh also shines in its ability to create layouts of multiple elements and multiple charts side by side.

In [None]:
my_dashboard = column(row(p1,p2),
                      layout3)
show(my_dashboard)

Another useful feature is Bokeh Tabs.

In [None]:
from bokeh.models import Tabs, TabPanel

tab1 = TabPanel(child=p1, title="Basic Plot")
tab2 = TabPanel(child=p2, title="Autompg from DataFrame")
tab3 = TabPanel(child=layout3, title="Autompg with Widgets")
all_tabs = Tabs(tabs=[tab1, tab2,tab3])
show(all_tabs)

In [None]:
from bokeh.models import Div

header = Div()
#Add some HTML text to the Div element
header.text = """
<img src='Images/NPS_Logo.jpg' style="float:left;height:75px;"/>
<h1>My First Bokeh Plots</h1>
<h3>Here are a few plots built using Bokeh</h3>
<ul><li>Some plots use 
<a href="https://docs.bokeh.org/en/latest/docs/reference/sampledata.html#sampledata-autompg2"
   target="_blank">
autompg</a> sample data from the bokeh library.
</li></ul>
"""

#Combine into final layout
my_layout = column(header,all_tabs)
show(my_layout)

## Bokeh and Interactive Plotting Summary
There is certainly a learning curve with bokeh, but once you understand the basic mechanics of integrating custom JavaScript callbacks, you can create very powerful interactive graphics that are fully contained within a stand-alone html file.  

Bokeh also has a server mode where user interactions trigger *python* callbacks.  This of course can be more powerful, but requires setting up a server infrastructure.  

There are also other useful python packages that build on top of bokeh to make it easier to create interactive dashboards, such as [Panel](https://panel.holoviz.org/) and [HoloViews](https://holoviews.org/).  These packages are beyond the scope of this primer, but worth checking out if you want to build more complex interactive dashboards.

Finally, there are other python visualization packages that provide similar functionality, such as [Plotly](https://plotly.com/python/) and [streamlit](https://docs.streamlit.io/).  These are also worth checking out if you want to explore other options for interactive graphics.

# Making GIFs

Now what everyone came here for, making GIFs!

<img src="Images/carlton_dance.gif" width=400 align=center>

In [None]:
#Re-import in case you're starting at this point in the notebook
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [107]:
#Create a folder called "GIF_Images" in the same directory as this notebook
import os
if not os.path.exists('GIF_Images'):
    os.makedirs('GIF_Images')

In [None]:
#Create sequence of image files
base_filename = "GIF_Images/Image_sequence_{:02d}.png" #Formats numbers to 2 places, e.g. 3 becomes 03
x_list = [np.sin(a) for a in np.linspace(0,2*3.1415,20)]
for i, x in enumerate(x_list):
    fig, ax = plt.subplots()
    plt.scatter(x,0,s=400,c='orange')
    ax.set_xlim([-2,2])
    plt.savefig(base_filename.format(i))
    plt.close() #Stops current plot from showing up in output
    
#Now look in the GIF_Images folder to see the sequence of image files

In [None]:
#Convert sequence of images to a GIF
from PIL import Image
import glob
 
# Create the frames
frames = []
imgs = glob.glob("GIF_Images/Image_sequence_*.png") 
#glob grabs all files that match this format, with * being any substring
#sort the images in order
imgs = sorted(imgs)
for i in imgs:
    new_frame = Image.open(i)
    frames.append(new_frame)

#Save into a GIF file that loops forever (loop=0)
#duration specifies time (in ms) to stay on each image
frames[0].save('GIF_Images/my_gif.gif', format='GIF',
               append_images=frames[1:],
               save_all=True,
               duration=300, loop=0)

Now check out the gif file in the Images folder.  You can play around with duration to make the gif faster or slower. 