# Plotting With Python

You can do a lot of data science and machine learning with python, but the techniques are useless if you're unable to communicate the story of the data to the outside world. Now this can be done through tables and words, but as they say a picture is worth a thousand words. 

In this notebook we'll take a tour through various plotting packages in python. We'll start with the foundation `matplotlib`, move on to more stunning plots with `seaborn`, and end with interactive visualizations with `bokeh`.

Let's go ahead and get started.

## matplotlib

Now I'm sure a number of you have experience with MATLAB. `matplotlib` was a project started by John Hunter in 2002 to enable MATLAB like plotting in python. If you've done a lot of plotting in MATLAB matplotlib will come very naturally to you. Let's start by importing the package, and setting it so that our plots will display properly in the notebook.

In [None]:
# matplotlib.pyplot contains most of the matplotlib functionality
# we'll need
import matplotlib.pyplot as plt

# Run this code to make plots display properly in notebooks
%matplotlib inline

In [None]:
# We'll use this package to generate data
import numpy as np

# We'll use pandas as well
import pandas as pd

In [None]:
# Let's consider this command we've seen before
plt.plot(np.random.randn(20))

plt.show()

What did this command do? Well it plotted the `np array`, duh. Behind the scenes what is happening? `matplotlib` creates a figure object, and on that object it places a subplot object, and finally it places the points from our array on the subplot then connects the points with straight lines. This works well when we want to investigate a single plot, but perhaps we want more. We'll see how to manually create figure objects and subplots now.

In [None]:
# This creates a figure object
fig = plt.figure()
plt.show()

Now nothing showed up because we didn't put a subplot on the figure. We can add a subplot onto the figure and then we'll see an empty box.

In [None]:
# We have to make a new figure each code block
# In jupyter the figure is reset each block
# We can set the figsize, the units are inches
fig = plt.figure(figsize = (10,6))

# let's add subplots!
fig.add_subplot(2, 2, 1)
fig.add_subplot(2, 2, 4)

# We added a plot in a 2 x 2 configuration in the 3rd position
plt.show()

In [None]:
# That may have been annoying
# We can create a figure with many subplots at once
fig, axes = plt.subplots(2, 3, figsize = (10,6))

plt.show()

Now we've created some blank plots... how do we make them not.. blank.. plots?

Like so.

In [None]:
# import numpy to get some data
import numpy as np

In [None]:
# fig is the figure object
# axes is an array of subplot objects
# you can index it like a list or array
fig, axes = plt.subplots(2, 2, figsize = (8,6))

# We'll plot a histogram on axes[0,0]
axes[0,0].hist(np.random.randn(1000), bins = 50)

# A random walk on axes[0,1]
axes[0,1].plot(np.random.randn(20).cumsum(),'r--')

# A scatter plot on axes[0,0]
axes[1,0].scatter(np.random.random(20), np.random.randn(20), color = 'g')

# Some text on axes[1,1]
axes[1,1].text(0.2, 0.5, "Hi Mom", fontsize = 14)

plt.show()

There are even more options we can add to our plots.

For instance notice that there is white space between the plots. We can adjust that with `plt.subplots_adjust` with the `wspace` and `hspace` arguments. These arguments accept a nonnegative number.

In [None]:
# Play around with vspace and hspace here
fig, axes = plt.subplots(2, 2, figsize = (8,6))

# We'll plot a histogram on axes[0,0]
axes[0,0].hist(np.random.randn(1000), bins = 50)

# A random walk on axes[0,1]
axes[0,1].plot(np.random.randn(20).cumsum(),'r--')

# A scatter plot on axes[0,0]
axes[1,0].scatter(np.random.random(20), np.random.randn(20), color = 'g')

# Some text on axes[1,1]
axes[1,1].text(0.2, 0.5, "Hi Mom", fontsize = 14)

plt.subplots_adjust(wspace = 0, hspace=0)

plt.show()

Try making wspace and hspace 0. Isn't it unappealing to have the vertical axes ticks overlap with the other plots? In this instance this probably indicates that we should leave some white space between the plots. However, there are instances where it makes snese to not have axis tick labels on each plot. For instance, say that the plots occur on the same scale, or we'd like to compare the outputs to see if there are differences. If we want the x and y axes to be the same for all the plots we can use the `sharex` and `sharey` commands.

In [None]:
fig, axes = plt.subplots(2, 2, figsize = (8,6), sharex = True, sharey = True)

axes[0,0].plot(np.random.randn(30))
axes[0,1].plot(np.random.randn(30))
axes[1,0].plot(np.random.randn(30))
axes[1,1].text(0,0,"Yay sharex and sharey", fontsize = 14)

You may have noticed that we can control the appearance of what is plotted. Here's a quick cheatsheet:



| Color           | Description  |
| :-------------: |:------------:|
| r               | red          |
| b               | blue         |
| k               | black        |
| g               | green        |
| y               | yellow       |
| m               | magenta      |
| c               | cyan         |
| w               | white        |

|Line Style | Description   |
|:---------:|:-------------:|
| -         | Solid line    |
| --        | Dashed line   |
| :         | Dotted line   |
| -.        | Dash-dot line |

| Marker | Description    |
|:------:|:--------------:|
|o       | Circle         |
|+       | Plus Sign      |
|*       | Asterisk       |
|.       | Point          |
| x      | Cross          |
| s      | Square         |
|d       | Diamond        |
|^       | Up Triangle    |
|<       | Right Triangle |
|>       | Left Triangle  |
|p       | Pentagram      |
| h      | hexagram       |



#### Practice :D

In [None]:
fig, ax = plt.subplots(1,1)

n = 75

ax.plot(np.random.random(n), np.random.random(n),'kx',alpha = .7)

# Play around with the following commands
# ax.get_xlim()
# ax.set_ylim()
# ax.xlabel()
# ax.set_xticks()
# ax.set_xticklabels()

plt.show()

We'll finish up our `matplotlib` section by adding legends to our plots and then seeing how we can add shapes to our plots.

We'll examine legends with the iris data set.

In [None]:
iris = pd.read_csv("iris.csv")

iris.head()

In [None]:
classes = list(set(iris['class']))
colors = ['red','blue','green']

for i in range(len(classes)):
    subset = iris.loc[iris['class'] == classes[i],['sepal_length','sepal_width']]
    # label is used to identify the points that were plotted
    plt.scatter(subset.sepal_length, subset.sepal_width, color = colors[i], label = classes[i])

plt.xlabel("Sepal Length", fontsize = 12)
plt.ylabel("Sepal Width", fontsize = 12)
    
# Now we insert the legend
# By default the legend is placed in the 'best' location
plt.legend()

plt.show()

#### Practice :D

In [None]:
beers = pd.read_csv('beer.csv')
beers.head()

In [None]:
beers.tail()

In [None]:
# Plot IBU vs ABV for the beers dataframe
# plot IPA in red and Stouts in black











We'll end our `matplotlib` section with how to put shapes in a plot. The `matplotlib.patches` sub-package contains methods for making a number of shapes. Luckily we can make circles, rectangles, and triangles with just `matplotlib.pyplot`

In [None]:
figure, ax = plt.subplots(1,1, figsize = (6,6))

rect = plt.Rectangle((0.2, 0.75), 0.4, 0.15, color = 'r', alpha = 0.3)
circ = plt.Circle((.8,.3), .1, color = 'b', alpha = .5)
pgon = plt.Polygon(((.2,.2), (.7,.2), (.5,.6)), color = 'g', alpha = .7)

ax.add_patch(rect)
ax.add_patch(circ)
ax.add_patch(pgon)

plt.show()

We're ready to move on from base `matplotlib`. Next we'll move on to `seaborn` a package built on top of `matplotlib`. If you'd like to learn more about `matplotlib` check out the documentation here: https://matplotlib.org

## Seaborn

`seaborn` is a powerful plotting package, if you've seen R's ggplot library the two are quite similar. It simplifies some of the work we had to do in `matplotlib`. Here's an example.

In [None]:
import seaborn as sns

In [None]:
# We return to the iris data


# This plot makes a scatter plot, it will also fit a linear
# regression line, but we've told it not to with fit_reg = False
sns.lmplot(data = iris, x = 'sepal_length', y = 'sepal_width', hue = 'class', 
           fit_reg = False, height = 6, aspect = 1)
plt.xlabel("Sepal Length", fontsize = 12)
plt.ylabel("Sepal Width", fontsize = 12)

plt.show()

That was much nicer, we didn't need to make a for loop at all! We can also take advantage of the extra features `seaborn` offers. If we wanted a regression line with `matplotlib` we would have needed to fit the model and then plot the line. With `seaborn` we get it for free!

In [None]:
sns.lmplot(data = iris, x = 'sepal_length', y = 'sepal_width', 
           hue = 'class', height = 6, aspect = 1)
plt.xlabel("Sepal Length", fontsize = 12)
plt.ylabel("Sepal Width", fontsize = 12)

plt.show()

#### Practice :o

In [None]:
# Remake the ibu vs abu plot with different shaped markers
# for beer type










Now that we've seen how `seaborn` can make some plotting jobs easier. Let's see how we can use it to make statistically descriptive plots. Our examples will be histograms/density plots, box and whisker plots, and swarmplots.

### Histograms

Let's start with histograms.

In [None]:
# First make some normal data
data = 2*np.random.randn(100) + 3

# distplot helps visualize the distribution for continuous data
sns.distplot(data, bins = 10, kde = False, hist_kws = {'alpha':1})

plt.show()

In [None]:
# Earlier we had kde = False, what if it was True?

sns.distplot(data, bins = 10, kde = True, hist_kws = {'alpha':.5})


plt.show()

#### Practice 8o

In [None]:
# It puts a kernel density estimate on top of the histogram
# There's an additional argument rug, examine what rug = True does
# Hint set kde = False







### Box and Whisker Plots

A box and whisker plot is an excellent way to better understand the distribution of a continuous variable. It creates a box where the bottom (or left) of the box is 25th percentile of the data, the middle line in the box corresponds to the median of the data, and the top corresponds to the 75th percentile of the data. There are also two whiskers that help cover the extremes of the data. Let's see one with the iris data.

In [None]:
sns.boxplot(data = iris, x = 'class', y = 'sepal_length')

plt.show()

What if we want a horizontal box plot?

#### Practice

In [None]:
# Try to intuite how to make a horizontal box plot







Note we should be careful with boxplots. They can be misleading. What if in the iris data there were only 10 virginica flowers, but 90 setosa flowers. Here's a quick way around that, we'll see another example in the next section.

In [None]:
plt.figure(figsize = (10,8))
ax = sns.boxplot(x="class", y="sepal_length", data=iris)
 
# Calculate number of obs per group & median to position labels
medians = iris.groupby(['class'])['sepal_length'].median().values
nobs = iris['class'].value_counts().values
nobs = [str(x) for x in nobs.tolist()]
nobs = ["n: " + i for i in nobs]
 
# Add it to the plot
pos = range(len(nobs))
for tick,label in zip(pos,ax.get_xticklabels()):
    ax.text(pos[tick], medians[tick] + 0.03, nobs[tick],
            horizontalalignment='center', size=12, 
            color='w', weight='semibold')
 
plt.show()

### Swarm plots

What's a swarm plot? Sounds weird.

Swarm plots are somewhat similar to box and whisker plots. The plot takes the observations and stacks them up one by one at the observation level. We'll see an example with a pokemon data set found here, https://elitedatascience.com/python-seaborn-tutorial.

In [None]:
pokemon = pd.read_csv("Pokemon.csv")

In [None]:
pokemon.head()

In [None]:
pokemon.describe()

In [None]:
plt.figure(figsize = (14, 8))

sns.swarmplot(data = pokemon, x = 'Type 1', y = 'HP')

plt.show()

This is nice, but for any pokemon fan it isn't aesthetically pleasing, I mean pink for grass types... come on. We can fix that with the palette option. Also let's make the points larger.

In [None]:
pkmn_type_colors = ['#78C850',  # Grass
                    '#F08030',  # Fire
                    '#6890F0',  # Water
                    '#A8B820',  # Bug
                    '#A8A878',  # Normal
                    '#A040A0',  # Poison
                    '#F8D030',  # Electric
                    '#E0C068',  # Ground
                    '#EE99AC',  # Fairy
                    '#C03028',  # Fighting
                    '#F85888',  # Psychic
                    '#B8A038',  # Rock
                    '#705898',  # Ghost
                    '#98D8D8',  # Ice
                    '#7038F8',  # Dragon
                   ]

plt.figure(figsize = (14, 8))

sns.swarmplot(data = pokemon, x = 'Type 1', y = 'HP', 
              palette = pkmn_type_colors, size = 7)

plt.show()

There's another way we can use swarm plots to help us distinguish any potential systematic differences based on class. We first 'melt' a dataframe so that we have the categorical variable in one column and the relevant statistics in another column. We'll see this demonstrated below with the iris data.

In [None]:
iris = pd.read_csv("iris.csv")

# "Melt" the dataset to "long-form" or "tidy" representation
iris_melt = pd.melt(iris, "class", var_name="measurement")

iris_melt.head(10)

In [None]:
iris = pd.read_csv("iris.csv")

# "Melt" the dataset to "long-form" or "tidy" representation
iris_melt = pd.melt(iris, "class", var_name="measurement")

plt.figure(figsize=(6,8))
# Draw a categorical scatterplot to show each observation
sns.swarmplot(x = "measurement", y = "value", hue = "class",
              palette = ["r", "c", "y"], data = iris_melt)

plt.show()

That seems pretty useful! We can see that there appears to be a clear cutoff in petal_length and petal_width for setosa and the other two irises.

That's it for `matplotlib` and `seaborn`. Before moving on to `bokeh` I'll leave the following practice problem for you to look at later. 

Also you can find the documentation for `seaborn` here, https://seaborn.pydata.org. We've only scratched the surface of `seaborn`, it can do a ton of cool stuff.

#### Practice :^)

Below I've written a bunch of code to make a couple of plots. Go through it and convince yourself you understand it.

In [None]:
jr_shots = pd.read_csv("JR_Smith_Shots_2015_16.csv")

jr_shots.head()

In [None]:
# This code comes from Savvas Tjortjoglo, I have written none of it
# Visit his page to see his explanations for the code for more help
from matplotlib.patches import Circle, Rectangle, Arc

def draw_court(ax=None, color='black', lw=2, outer_lines=False):
    # If an axes object isn't provided to plot onto, just get current one
    if ax is None:
        ax = plt.gca()

    # Create the various parts of an NBA basketball court

    # Create the basketball hoop
    # Diameter of a hoop is 18" so it has a radius of 9", which is a value
    # 7.5 in our coordinate system
    hoop = Circle((0, 0), radius=7.5, linewidth=lw, color=color, fill=False)

    # Create backboard
    backboard = Rectangle((-30, -7.5), 60, -1, linewidth=lw, color=color)

    # The paint
    # Create the outer box 0f the paint, width=16ft, height=19ft
    outer_box = Rectangle((-80, -47.5), 160, 190, linewidth=lw, color=color,
                          fill=False)
    # Create the inner box of the paint, widt=12ft, height=19ft
    inner_box = Rectangle((-60, -47.5), 120, 190, linewidth=lw, color=color,
                          fill=False)

    # Create free throw top arc
    top_free_throw = Arc((0, 142.5), 120, 120, theta1=0, theta2=180,
                         linewidth=lw, color=color, fill=False)
    # Create free throw bottom arc
    bottom_free_throw = Arc((0, 142.5), 120, 120, theta1=180, theta2=0,
                            linewidth=lw, color=color, linestyle='dashed')
    # Restricted Zone, it is an arc with 4ft radius from center of the hoop
    restricted = Arc((0, 0), 80, 80, theta1=0, theta2=180, linewidth=lw,
                     color=color)

    # Three point line
    # Create the side 3pt lines, they are 14ft long before they begin to arc
    corner_three_a = Rectangle((-220, -47.5), 0, 140, linewidth=lw,
                               color=color)
    corner_three_b = Rectangle((220, -47.5), 0, 140, linewidth=lw, color=color)
    # 3pt arc - center of arc will be the hoop, arc is 23'9" away from hoop
    # I just played around with the theta values until they lined up with the 
    # threes
    three_arc = Arc((0, 0), 475, 475, theta1=22, theta2=158, linewidth=lw,
                    color=color)

    # Center Court
    center_outer_arc = Arc((0, 422.5), 120, 120, theta1=180, theta2=0,
                           linewidth=lw, color=color)
    center_inner_arc = Arc((0, 422.5), 40, 40, theta1=180, theta2=0,
                           linewidth=lw, color=color)

    # List of the court elements to be plotted onto the axes
    court_elements = [hoop, backboard, outer_box, inner_box, top_free_throw,
                      bottom_free_throw, restricted, corner_three_a,
                      corner_three_b, three_arc, center_outer_arc,
                      center_inner_arc]

    if outer_lines:
        # Draw the half court line, baseline and side out bound lines
        outer_lines = Rectangle((-250, -47.5), 500, 470, linewidth=lw,
                                color=color, fill=False)
        court_elements.append(outer_lines)

    # Add the court elements onto the axes
    for element in court_elements:
        ax.add_patch(element)

    return ax

In [None]:
plt.figure()
draw_court(outer_lines=True)
plt.xlim(-300,300)
plt.ylim(-100,500)
plt.show()

In [None]:
JointChart = sns.jointplot(data = jr_shots, x = "LOC_X", y = "LOC_Y", 
                            stat_func = None, kind = 'scatter', space = 0, 
                            alpha = .7)

JointChart.fig.set_size_inches((12,11))

Court = JointChart.ax_joint
draw_court(Court)

Court.set_xlim(-250,250)
Court.set_ylim(422.5, -47.5)

Court.tick_params(labelbottom = "off", labelleft = "off")
Court.set_xlabel('')
Court.set_ylabel('')

Court.text(70,390,"JR Smith Shots",color = 'black',fontsize = 14)
Court.text(70,410,"2015 - 2016 Season",color = 'black',fontsize = 14)



plt.show()

In [None]:
cmap = plt.cm.plasma
JointChart = sns.jointplot(jr_shots.LOC_X, jr_shots.LOC_Y, stat_func=None,
                                 kind='kde', space=0, color=cmap(0.1),
                                 cmap=cmap, n_levels=15)

JointChart.fig.set_size_inches(12,11)
Court = JointChart.ax_joint
draw_court(Court)
Court.set_xlim(-250,250)
Court.set_ylim(422.5, -47.5)
Court.set_xlabel('')
Court.set_ylabel('')
Court.tick_params(labelbottom='off', labelleft='off')

Court.text(70,390,"JR Smith Shot Distribution",color = 'white',fontsize = 14)
Court.text(70,410,"2015 - 2016 Season",color = 'white',fontsize = 14)

plt.show()

## Bokeh

We'll complete this notebook by seeing how we can make some interactive plots with python. For this will use `bokeh`, if you don't have it installed take this moment to install it. Run `pip install bokeh`, `pip3 install bokeh` if you're using a Mac.

In [None]:
# Run the following to check
import bokeh

Here's a snippet from `bokeh`'s official documentation page:

"Bokeh is an interactive visualization library that targets modern web browsers for presentation. Bokeh provides elegant, concise construction of versatile graphics with high-performance interactivity over very large or streaming datasets in a quick and easy way from Python (or other languages)."

That's what's nice about `bokeh`, not only does it allow us to make neat interactive plots in python, but it will also produce html files that contain those graphics! This saves us the time of having to learn things like html and javascript. Although, those are still valuable tools.

Let's start with a simple example of a `bokeh` plot, and then disect what is going on.

In [None]:
# We import the tools we need from bokeh.plotting
from bokeh.plotting import figure, output_notebook, show

# Next we generate the data we'll be plotting
x = [1,2,3,4,5,6]
y = [10,1,-2,7,12,10]

# Set our settings to display our plot inside of jupyter
output_notebook()

# We make a bokeh figure object
p = figure(title = "Our First bokeh Plot", x_axis_label = "x", 
           y_axis_label = "y", y_range = [-3,14])

# We now add a line to our figure
p.line(x, y, legend = "Quarterly Returns", line_width = 3)

# Show the plot
show(p)

This plot is certainly different from our previous plots. For one you can interact with it! Click on it and drag it around. There are also tools along the right. Play around with those and see what they do.

Let's review the process here:
1. Prepared some data,
2. Set where we want the output, in this instance it was the notebook but html is also an option with output_file(),
3. We made a figure with the desired settings,
4. We added a 'renderer' with .line() giving the desired settings and inputs,
5. We told bokeh to show our figure,

This is the general procedure for all `bokeh` plots. Why don't you investigate what `circle` does with the following practice problem.

#### Practice :0

In [None]:
# plot the following data with circle
x = [1,11,3,-2,5]
y = [-2,2,4,5,1]

# set it so the plot shows in the notebook
output_notebook()

# Create the figure here


# Create a list of positive multiples of 5
# It should be the same length as x and y
# Call it rs


# Plot the circles here, set size = r



# Uncomment this when you'd like to show your plot
# show(p)

We've made two plots, and we've demonstrated a few of the core concepts in `bokeh`.

1. Plots - plots are what we put everything on, the figure() function allowed us to make the plots.

2. Glyphs - glyphs are what represent our data on the plot, line() and circle() were two examples of this.

3. Ranges - ranges determine the plotting area displayed, these were set with x_range and y_range

There are many more things we could cover explicitly with `bokeh`. But that is beyond the scope of this notebook. We'll finish off the notebook with a couple more examples of what `bokeh` is can do. I'll end by linking you to where you can learn more about `bokeh`. Note the snippets of code in the following examples have been adapted from the `bokeh` documentation. I am not the original author.

les mis 
ohio poverty 2009 -> practice, choose a state

### Hovering Over Data

`bokeh` allows us to bind additional information to our data other than just the x and y coordinates. We'll see an example with the elements now.

In [None]:
# Import everything we need from bokeh
from bokeh.models import ColumnDataSource, LabelSet
from bokeh.sampledata.periodic_table import elements # This is the data

In [None]:
elements.head()

We'll plot atomic weight by density and then provide additional information on the element as we hover over the data points.

In [None]:
## Data Wrangling ##

# Make a copy of the data frame
elements = elements.copy()

# Grab only the elements with atomic number less than 83
elements = elements.loc[elements['atomic number'] < 83,]

# Remove those elements that are missing a melting point
elements = elements[~pd.isnull(elements["melting point"])]

# Replace atomic mass with the float version of the data
mass = [float(x.strip("[]")) for x in elements["atomic mass"]]
elements["atomic mass"] = mass

# Set a color palette that we'll use
palette = ["#053061", "#2166ac", "#4393c3", "#92c5de", "#d1e5f0",
           "#f7f7f7", "#fddbc7", "#f4a582", "#d6604d", "#b2182b", "#67001f"]

# Record the melting points
melting_points = elements["melting point"]

# Get the min
low = min(melting_points)

# Get the max
high = max(melting_points)

# box the melting points of each element into one of 10 boxes for color assignment
melting_point_inds = [int(10*(x-low)/(high-low)) for x in melting_points]
elements['melting_colors'] = [palette[i] for i in melting_point_inds]

# This stores our data in columnar format
# Slightly beyond the reach of this lecture
# Essentially it stores the data as columns as opposed to rows
# This increases speed by limiting the number of searches algorithms have to do.
source = ColumnDataSource(elements)

In [None]:
## Preparing the Figure ##

TITLE = "Density vs Atomic Weight of Elements (colored by melting point)"

# The tools control what will be shown along the side of the plot
# Note the hover tool
TOOLS = "hover,pan,wheel_zoom,box_zoom,reset,save"

# Make the figure and give settings
p = figure(tools=TOOLS, toolbar_location="above", plot_width=900, 
           title=TITLE, x_range = (-10,230))
p.toolbar.logo = "grey"
p.background_fill_color = "#dddddd"
p.xaxis.axis_label = "atomic weight (amu)"
p.yaxis.axis_label = "density (g/cm^3)"
p.grid.grid_line_color = "white"

# Here we set what we want to show up when
# the mouse hovers over the data point.
p.hover.tooltips = [
    ("name", "@name"),
    ("symbol:", "@symbol"),
    ("density", "@density"),
    ("atomic weight", "@{atomic mass}"),
    ("melting point", "@{melting point}")
]

In [None]:
## Add the glyphs ##

# We plot circle glyphs, with atomic mass on the x and
# density on the y, we input our columnar data as the source
# We also set the color according to the palette we made earlier
p.circle("atomic mass", "density", size=12, source = source,
         color='melting_colors', line_color="black", fill_alpha=0.8)

# Now we add a label glyph
# the x and y are the same
# we add the symbol as the text and give some aesthetic settings
labels = LabelSet(x="atomic mass", y="density", text="symbol", y_offset=8,
                  text_font_size="8pt", text_color="#555555",
                  source=source, text_align='center')
p.add_layout(labels)


# show the plot
show(p)

Play around with the plot we just made. What happens when we hover over a point on the plot? This is a nice plot and it appears that there is sort of a pattern in the data, however this could just be spurious.

#### Practice :P

Make a plot of the beer data. Put abv on the x and ibu on the y. Each unique beer should be circle glyph, and color it by the type of beer. Add a hover feature so that when I hover I can see the name of the beer, the ibu of the beer, the abv of the beer, and the rating of the beer.

In [None]:
# Practice here












### We did not plot until today

We'll now see a `bokeh` plot that was used to analyze a piece of literature, Les Mis (to be fair it is unclear to me if their analyzing the novel or the musical).

In [None]:
# Import numpy
import numpy as np

# import the data
from bokeh.sampledata.les_mis import data

# Set the output as our notebook
output_notebook()

In [None]:
# examine the data
data

Now the data we just downloaded is a dictionary with two entries, `nodes` and `links`. Each node is a character from Les Miserables and the links have a source a target and a value. So it appears that this dictionary comes from a weighted graph (possibly directed) between the characters of Les Mis. 

Upon further inspection (aka Googling) I found the original data comes from Mike Bostock in a `d3.js` demonstration, https://bost.ocks.org/mike/miserables/. Here is an adaptation of his description.

Each node is a character from the novel Les Miserables. The source and the target are numerical representations of two of the characters. The value of the link refers to the number of chapters of the novel in which the characters both occur. So this data measures the co-occurences of  Les Mis characters. Looking at the data is hard, so it is reasonable to want a visualization of this. What follows is one example of a visualization.

In [None]:
## Data Wrangling ##

# Follow our pattern we first handle the data

# Grab the nodes
nodes = data['nodes']

# Now we extract the character names from the nodes,
# We sort them by group, all this does is make our plot prettier down the line
names = [node['name'] for node in sorted(data['nodes'], key=lambda x: x['group'])]

# Now make an N by N matrix (np array)
# each entry will be the number of co-occurences of the two characters
N = len(nodes)
counts = np.zeros((N, N))
for link in data['links']:
    counts[link['source'], link['target']] = link['value']
    counts[link['target'], link['source']] = link['value']
    
# This will be used to color our data by group
colormap = ["#444444", "#a6cee3", "#1f78b4", "#b2df8a", "#33a02c", "#fb9a99",
            "#e31a1c", "#fdbf6f", "#ff7f00", "#cab2d6", "#6a3d9a"]

# Now we record all of our data in lists for plotting later
xname = []
yname = []
color = []
alpha = []
for i, node1 in enumerate(nodes):
    for j, node2 in enumerate(nodes):
        xname.append(node1['name'])
        yname.append(node2['name'])

        # We're scaling our counts data here
        # We take the min of the count/4 or .9 then we add 0.1
        # Why 4? I don't know! We can play around with this 
        # and see how the plot changes.
        alpha.append(min(counts[i,j]/4.0, 0.9) + 0.1)

        # If these two characters are in the same group
        # we assign them the color for that group
        # Else we set their color as light grey
        if node1['group'] == node2['group']:
            color.append(colormap[node1['group']])
        else:
            color.append('lightgrey')

# Now we store our data in a dictionary
# Note that flatten, flattens our 2-d array
# into a 1-d array. This is done by row.
data=dict(
    xname=xname,
    yname=yname,
    colors=color,
    alphas=alpha,
    count=counts.flatten(),
)

In [None]:
## Making the Plot ##

# We first make our figure
# We set it so our x axis is on the top
# We have the hover tool
# We set our "ranges" to be the names of the characters
# We do it so that the picture will look like our matrix
# Finally we tell the figure what we want to show when we hover
p = figure(title="Les Mis Occurrences",
           x_axis_location="above", tools="hover,save",
           x_range=list(reversed(names)), y_range=names,
           tooltips = [('names', '@yname, @xname'), ('count', '@count')])

# Set the plot aesthetics
p.plot_width = 800
p.plot_height = 800
p.grid.grid_line_color = None
p.axis.axis_line_color = None
p.axis.major_tick_line_color = None
p.axis.major_label_text_font_size = "5pt"
p.axis.major_label_standoff = 0
p.xaxis.major_label_orientation = np.pi/3

In [None]:
## Add Glyph ##

# We'll add rectangles
# we set the width and length to be .9
p.rect('xname', 'yname', 0.9, 0.9, source=data,
       color='colors', alpha='alphas', line_color=None,
       hover_line_color='black', hover_color='colors')

# Show the plot
show(p)

Pretty nifty! 

Now be aware, the reason this is so aesthetically pleasing is because of that `group` values. Someone produced data ahead of so that when we plotted in this way we'd get nice squares around the diagonal. There was a lot of work that went into cleaning the data ahead of time that we didn't have to worry about.

### The State of Unemployment in 2009

We'll end our jaunt through `bokeh` with a a plot of county level data on unemployment from 2009. Our example will be on that state up north.

In [None]:
# You need to run this to get the data
bokeh.sampledata.download()

In [None]:
# Import what we need
from bokeh.models import LogColorMapper
from bokeh.palettes import Viridis6 as palette
from bokeh.sampledata.us_counties import data as counties
from bokeh.sampledata.unemployment import data as unemployment

output_notebook()

In [None]:
## Wrangling Data ##


# Create a dictionary with all of the counties in michigan
counties = {
    code: county for code, county in counties.items() if county["state"] == "mi"
}

# Extract the longitude and latitude of each county
county_xs = [county["lons"] for county in counties.values()]
county_ys = [county["lats"] for county in counties.values()]

# Extract the county names
county_names = [county['name'] for county in counties.values()]

# Extract the unemployment rate for each county
county_rates = [unemployment[county_id] for county_id in counties]


# Prepare colors for the plot
palette.reverse()
color_mapper = LogColorMapper(palette=palette)

# Save all our data in a dictionary
data=dict(
    x=county_xs,
    y=county_ys,
    name=county_names,
    rate=county_rates,
)

In [None]:
## Make the Figure ##

# We'll use these tools
TOOLS = "pan,wheel_zoom,reset,hover,save"

# Make the figure
p = figure(
    title="Michigan Unemployment, 2009", tools=TOOLS,
    x_axis_location=None, y_axis_location=None,
    tooltips=[
        ("Name", "@name"), ("Unemployment rate)", "@rate%"), ("(Long, Lat)", "($x, $y)")
    ])

# Set the grid line to be off
p.grid.grid_line_color = None


p.hover.point_policy = "follow_mouse"

In [None]:
## Plot the glyphs ##

p.patches('x', 'y', source=data,
          fill_color={'field': 'rate', 'transform': color_mapper},
          fill_alpha=0.7, line_color="white", line_width=0.5)

show(p)

#### Practice 

In [None]:
# Pick a state and make a 2009 unemployment plot
# Don't just copy and paste, do your best to 
# recreate the plot on your own









To learn more about `bokeh` check out the docs, https://bokeh.pydata.org/en/latest/index.html. You should have a vague enough idea of how it works now to get started. I hope to see it in some of the group projects for this year's boot camps! 

## Wrapping it Up

Well we've done a lot in this notebook. We started with 'simple' plots in `matplotlib`, got a little fancier with `seaborn`, and saw the power of `bokeh` first hand. There are a number of other plotting packages in python you can explore, that's part of the beauty of an open source language. Each has pros and cons and the one that you use really depends on your needs. 