## Notebook Outcomes

In this notebook we will learn:
<ul>
    <li>how to make basic plots in matplotlib,</li>
    <li>about the different plotting functionality</li>
    <li>how to make figures with subplots</li>
    <li>get practice plotting real data sets</li>
</ul>

In [None]:
import numpy as np
import pandas as pd

# Basic Plotting

A picture is worth a thousand words. Often times it is easier and more informative to plot the data we are examining then to rely on descriptive statistics alone. In this notebook we'll go over the minimal `python` plotting skills you'll need to get through this boot camp.

## `matplotlib`

A number of you have experience with MATLAB. `matplotlib` was a project started by John Hunter in 2002 to enable MATLAB like plotting in python. If you've done a lot of plotting in MATLAB matplotlib will come very naturally to you. If you've never even heard of MATLAB, don't worry! `matplotlib` is very intuitive and you'll be plotting like a pro in no time.

Let's start by importing the package.

In [None]:
# We will be using the pyplot subpackage
# it is standard to call it plt
import matplotlib.pyplot as plt

### A First Plot

Let's jump right in and make our first plot.

In [None]:
# Here's our data
x = [0,1,2,3,4,5,6,7,8,9,10]
y = [2*i - 3 for i in x]

# plt.plot will make the plot
# First put what you want on the x, then the y
plt.plot(x,y)

# Always end your plotting block with plt.show
# in jupyter this makes sure that the plot displays 
# properly
plt.show()

### What Happened?

So what happened when we ran the above code?

`matplotlib` creates a figure object, and on that object it places a subplot object, and finally it places the points on the subplot then connects the points with straight lines.

We'll return to the topic of subplots later in the notebook

Now you try plotting the following `x` and `y`.

In [None]:
x = 10*np.linspace(-5,5,100)
y = x**2 - 3

In [None]:
# Plot here

plt.plot(x,y)
plt.show()
plt.clf()







### Getting More Control

#### Making The Figure Object

We can have more control over how the plot itself looks by creating the figure object ourselves.

In [None]:
# plt.figure() will make the figure object
# figsize can control how large it is (width,height)
plt.figure(figsize = (10,12))

# This still creates the subplot object
# that we plot on
plt.plot(x,y)

# we can add axis labels
# and control their fontsize
plt.xlabel("x", fontsize = 16)
plt.ylabel("y", fontsize = 16)

# we can set the plot axis limits like so
plt.xlim((-20,20))
plt.ylim(-100,100)

# Also a title
plt.title("A Plot Title", fontsize = 20)

# Now we show the plot
plt.show()

#### Controlling How the Plotted Data Looks

We can control the appearance of what is plotted. Here's a quick cheatsheet of easy to use options:



| Color           | Description  |
| :-------------: |:------------:|
| r               | red          |
| b               | blue         |
| k               | black        |
| g               | green        |
| y               | yellow       |
| m               | magenta      |
| c               | cyan         |
| w               | white        |

|Line Style | Description   |
|:---------:|:-------------:|
| -         | Solid line    |
| --        | Dashed line   |
| :         | Dotted line   |
| -.        | Dash-dot line |

| Marker | Description    |
|:------:|:--------------:|
|o       | Circle         |
|+       | Plus Sign      |
|*       | Asterisk       |
|.       | Point          |
| x      | Cross          |
| s      | Square         |
|d       | Diamond        |
|^       | Up Triangle    |
|<       | Right Triangle |
|>       | Left Triangle  |
|p       | Pentagram      |
| h      | hexagram       |


In [None]:
# plt.figure() will make the figure object
# figsize can control how large it is (width,height)
plt.figure(figsize = (10,12))

# This still creates the subplot object
# that we plot on
# This will make our plot magenta pentagrams
# label will allow us to add a legend to the plot
plt.plot(x,y,'mp', label="points")

## We can even plot two things on the same plot
plt.plot(x+10,y-100,'g--', label="shifted line")

# we can add axis labels
# and control their fontsize
plt.xlabel("x", fontsize = 16)
plt.ylabel("y", fontsize = 16)

# Also a title
plt.title("A Plot Title", fontsize = 20)

# plt.legend() adds the legend to the plot
plt.legend(fontsize=14)


# Now we show the plot
plt.show()

In [None]:
## Now you try with the following data
x = 10*np.random.random(100) - 5
y = x**3 - x**2 + x

In [None]:
# Plot it here
# What's the best way to plot it?

plt.figure(figsize=(10,5))

plt.plot(x,y, 'ko', label='raw data')
plt.plot(x+10,y, 'bs', label='x-shift by 10')
plt.plot(x,y+10, 'rs', label='y-shift by 10')
plt.plot(x+10, y+10, 'gp', label='x and y shifted by 10')

plt.xlim(-15, 15)
plt.ylim(-150,150)

plt.xlabel("x", fontsize = 16)
plt.ylabel("y", fontsize = 16)
plt.title("A Plot Title", fontsize = 20)
plt.legend(fontsize=14)

plt.show()









### Subplots

What if we want to plot more than one thing in the same figure? We'll want to make some subplots.

In [None]:
# plt.subplots makes a figure object
# then populates it with subplots
# the first number is the number of rows
# the second number is the number of columns
# so this makes a 2 by 2 subplot matrix
# fig is the figure object
# axes is a matrix containing the four subplots
fig, axes = plt.subplots(2, 2, figsize = (10,8))

# We can plot like before but instead of plt.plot
# we use axes[i,j].plot
# A random walk on axes[0,0]
axes[0,0].plot(np.random.randn(20).cumsum(),'r--')
# I can set x and y labels on subplots like so
axes[0,0].set_xlabel("X")
axes[0,0].set_ylabel("y")


# .hist() plots a histogram, you can control the number of 
# bins with bins
axes[0,1].hist(np.random.randn(1000), bins = 50)

# A scatter plot on axes[0,0]
# .scatter() is a quicker way to produce a scatter plot
axes[1,0].scatter(np.random.random(20), np.random.randn(20), color = 'g')

# Some text on axes[1,1]
axes[1,1].text(0.2, 0.5, "Hi Mom", fontsize = 14)


plt.show()

Now you practice!

In [None]:
# the data
x1 = 2*np.random.randn(500) + 3
x2 = np.random.randn(500) + 4

y = x1 + x1**2 + np.log(x2) + .5*np.random.randn(500)

In [None]:
# Make a 3 by 3 subplot
# Have histograms of x1, x2, and y in the diagonal
# plot x1 vs x2 and x1 vs y in the remaining columns of the top row
# plot x2 vs y in the second row 3rd column


fig, axes = plt.subplots(3, 3, figsize = (12,12))

axes[0,0].hist(x1, bins = 50)
axes[1,1].hist(x2, bins = 50)
axes[2,2].hist(y,  bins = 50)

axes[0,1].scatter(x1, x2)
axes[0,2].scatter(x1, y)

axes[1,2].scatter(x2, y)


axes[1,0].text(0.2, 0.5, "Nope", fontsize = 14)
axes[2,0].text(0.2, 0.5, "Nada", fontsize = 14)
axes[2,1].text(0.2, 0.5, "Nothing\nto\nsee", fontsize = 14)


plt.show()







### Plotting Data

Finally we'll see how we can use `matplotlib` to examine some real data.

#### JR Smith's Attempted Shots 2015-16 NBA Season

We'll plot the $(x,y)$ position of JR Smith's 2015-16 NBA Season shots. We'll make the made shots blue dots and the missed shots red X's

In [None]:
# Read the Data
shots = pd.read_csv("JR_Smith_Shots_2015_16.csv")

In [None]:
shots.head()

In [None]:
# Make the figure
plt.figure(figsize=(12,14))

# We'll use plt.scatter()
# Made Shots
plt.scatter(shots.loc[shots.SHOT_MADE_FLAG == 1,'LOC_X'], 
            shots.loc[shots.SHOT_MADE_FLAG == 1,'LOC_Y'],
           marker = 'o',c='blue')
# Missed Shots
plt.scatter(shots.loc[shots.SHOT_MADE_FLAG == 0,'LOC_X'],
           shots.loc[shots.SHOT_MADE_FLAG == 0,'LOC_Y'],
           marker = 'x', c='red')


plt.show()

#### Beer ABV vs IBU

Your turn.

Explore the following beer data. Play around plotting the `ABV` by the `IBU`.

In [None]:
# Read in the data
beer = pd.read_csv("beer.csv")

In [None]:
beer.head()

In [None]:
beer['Beer_Type'].unique()

In [None]:
# Plot here

plt.figure()#figsize=(12,14))

plt.scatter(beer.loc[beer.Beer_Type == 'Stout','IBU'], 
            beer.loc[beer.Beer_Type == 'Stout','ABV'],
           marker = 'o',c='blue')

plt.scatter(beer.loc[beer.Beer_Type == 'IPA','IBU'],
           beer.loc[beer.Beer_Type == 'IPA','ABV'],
           marker = 'x', c='red')


plt.show()




## Plotting the test_dnapar file

In [None]:
## Load the test_dnapar file into a dataframe


In [None]:
## Draw a line plot of Roll and Twist side-by-side


In [None]:
## Draw a line plot of Shift, Slide, and Rise

In [None]:
## Draw a scatter plot with Roll as the x-axis and Twist as the y

## Repeat for Tilt vs. Twist and for Rise vs. Twist



## Advanced: Plotting the test_dnarefframe.dat file

In [None]:
## load the test_dnarefframe file

## A refframe frame has 5 lines for every base-pair
## the nucleotide base-pair
## the base-pair origin
## and three lines that describe the reference frame, or coordinate frame axis, of the base-pair

## Go through the file and store only the origin values into a dataframe with x, y, and z column labels


In [None]:
## Create a 3-D scatter plot of this dataframe






In [None]:
## You can calculate the Euclidian distance of each origin relative to another
## This can be done using a specific module:

from scipy.spatial import distance 

## in this is a function: pdist()

## Look up information on pdist() here: https://docs.scipy.org/doc/scipy/reference/spatial.distance.html

## Calculate the pdist of the refframe origins and then create a heatmap plot
## This plot will be a square plot that shows different colors based on how close or how far two points are







### The End

That's it for this intro notebook! If you want to know more check out the `matplotlib` docs here, <a href="https://matplotlib.org/">https://matplotlib.org/</a>, or just do a web search if you have a specific question.

You should now be ready for the Basic Plotting - Skill Check Notebook!