# Important note!

Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
YOUR_ID = "" # Please enter your GT login, e.g., "rvuduc3" or "gtg911x"
COLLABORATORS = [] # list of strings of your collaborators' IDs

In [None]:
import re

RE_CHECK_ID = re.compile (r'''[a-zA-Z]+\d+|[gG][tT][gG]\d+[a-zA-Z]''')
assert RE_CHECK_ID.match (YOUR_ID) is not None

collab_check = [RE_CHECK_ID.match (i) is not None for i in COLLABORATORS]
assert all (collab_check)

del collab_check
del RE_CHECK_ID
del re

**Jupyter / IPython version check.** The following code cell verifies that you are using the correct version of Jupyter/IPython.

In [None]:
import IPython
assert IPython.version_info[0] >= 3, "Your version of IPython is too old, please update it."

# Visualizing data: Grammar of Graphics ("lite") using Bokeh

There are many packages for visualizing data in Python, including [Seaborn](), [ggplot]() (a Python adaptation of the famous [R equivalent]()), [matplotlib](), [plotly](), and [numerous others](https://blog.modeanalytics.com/python-data-visualization-libraries/). This tutorial-only mini-lab, or _lablet_, focuses on [Bokeh](https://blog.modeanalytics.com/python-data-visualization-libraries/). It's an especially popular package for creating _interactive_ (rather than only _static_) visual displays.

If you just need static charts in Python, see this tutorial on Seaborn: [./seaborn.ipynb](./seaborn.ipynb).

**Historical note.** The design and use of Bokeh is based on Leland Wilkinson's Grammar of Graphics (GoG). The best known implementation of GoG is probably Hadley Wickham's R package, [ggplot2](http://ggplot2.org/).

## Setup

Here are the modules we'll need for this lablet:

In [None]:
from IPython.display import display, Markdown
import pandas as pd
import bokeh

If any of the above did not work, try installing it (uncomment the appropriate line and execute):

In [None]:
#!pip install pandas
# OR
#!conda install pandas

#!pip install bokeh
# OR
#!conda install bokeh

Bokeh is designed to output HTML, which you can then embed in any website. To embed Bokeh output into a Jupyter notebook, we need to do the following:

In [None]:
from bokeh.io import output_notebook
output_notebook ()

Lastly, we need a recent version of Bokeh. If the code cell below returns a version number less than `0.12.x`, you might need to try to upgrade.

In [None]:
print ("Bokeh version:", bokeh.__version__)

#!conda upgrade bokeh

## Philosophy: Grammar of Graphics

[The Grammar of Graphics](http://www.springer.com.prx.library.gatech.edu/us/book/9780387245447) is an idea of Leland Wilkinson. Its basic idea is that the way most people think about visualizing data is ad hoc and unsystematic, whereas there exists in fact a "formal language" for describing visual displays.

The reason why this idea is important and powerful in the context of our course is that it makes visualization more systematic, thereby making it easier to create those visualizations through code.

The high-level concept is simple:
1. Start with a (tidy) data set.
2. Transform it into a new (tidy) data set.
3. Map variables to geometric objects (e.g., bars, points, lines) or other aesthetic "flourishes" (e.g., color).
4. Rescale or transform the visual coordinate system.
5. Render and enjoy!

![From data to visualization](http://r4ds.had.co.nz/images/visualization-grammar-3.png)

> This image is "liberated" from: http://r4ds.had.co.nz/data-visualisation.html

## (High-Level) Charts

The easiest way to use Bokeh is to call its interface for "canned" charts. For instance, let's reload the Iris data set from last time and study relationships among its variables, such as petal length vs. petal width.

The cells below demonstrate histograms, simple scatter plots, and box plots. However, there is a much larger gallery of options: http://bokeh.pydata.org/en/latest/docs/user_guide/charts.html

In [None]:
flora = pd.read_csv ('iris.csv')
display (flora.head ())

In [None]:
# Load some Bokeh stuff
# YOUR CODE HERE
raise NotImplementedError()

Here is a histogram:

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Some observations:

- The `Histogram(df, c)` function can take a Pandas data frame `df` as input and refer to the aggregation variable, `c`, by column name.
- The plot is interactive and comes with a bunch of tools. You can customize these tools as well; for your many options, see http://bokeh.pydata.org/en/latest/docs/user_guide/tools.html.

In [None]:
TOOLS = 'resize,pan,box_zoom,wheel_zoom,lasso_select,save,reset,help'

# YOUR CODE HERE
raise NotImplementedError()

Here is an example of a scatter plot, with different colors and markers for each unique species.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Lastly, here is a box plot.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Mid-level charts: the Plotting interface

Beyond the canned methods above, Bokeh provides a "mid-level" interface that more directly exposes the grammar of graphics methodology for constructing visual displays.

The basic procedure is to first create a blank canvas by calling `bokeh.plotting.figure` and then add glyphs, which are geometric shapes.

> For a full list of glyphs, refer to the methods of `bokeh.plotting.figure`: http://bokeh.pydata.org/en/latest/docs/reference/plotting.html

In [None]:
from bokeh.plotting import figure

In [None]:
# Create a canvas
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Add one or more glyphs
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Add any other flourishes and then render
# YOUR CODE HERE
raise NotImplementedError()

show (p)

Here is another way to do the same thing, but using a Pandas data frame as input.

In [None]:
from bokeh.models import ColumnDataSource

# YOUR CODE HERE
raise NotImplementedError()

Incidentally, there are many choices of colors! http://bokeh.pydata.org/en/latest/docs/reference/palettes.html

Let's make a map that assigns each unique species its own color from one of the built-in palettes (see above).

In [None]:
unique_species = flora['species'].unique ()
print (unique_species)

In [None]:
from bokeh.palettes import brewer

# Make a color map
# YOUR CODE HERE
raise NotImplementedError()

Now let's create data sources for each species.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Now we can more programmatically generate the same plot as above, but use a unique color for each species.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Here is an example that creates many plots ("small multiples") and uses `gridplot()` to arrange them.

In [None]:
from bokeh.layouts import gridplot

# YOUR CODE HERE
raise NotImplementedError()

One cool aspect of programmable interaction is that we can _link_ the panning of these plots, in this case, by creating shared axes.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()