# Data Visualization with Matplotlib
### BIOF309 - Week 11
---

### [Matplotlib](https://matplotlib.org/) is a fairly popular visualization tool for python, but there are [many others](https://blog.modeanalytics.com/python-data-visualization-libraries/):
* [Altair](https://altair-viz.github.io/)
* [Bokeh](https://bokeh.pydata.org/en/latest/)
* [Plotly](https://plot.ly/python/)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline

In [None]:
# generate some random data to work with
x1 = np.random.uniform(0,1,10000)
x2 = np.random.normal(0, 1, 10000)
x3 = np.random.poisson(2, 10000)

In [None]:
# make a histrogram of the uniform data (x1)
plt.hist(x1)


In [None]:
# now add in some labels
plt.hist(x1)
plt.title("Title of My Plot")
plt.xlabel("X-Axis")
plt.ylabel("Y-Axis")


In [None]:
# add a legend
plt.hist(x1, label='random_uniform')
plt.legend( loc='upper right', numpoints = 1 )


In [None]:
# notice that the plot labels generated earlier are no longer present in the above plot

In [None]:
# plot multiple histograms side-by-side
fig, (ax1, ax2, ax3) = plt.subplots(1,3)
ax1.hist(x1)
ax2.hist(x2)
ax3.hist(x3)
ax1.set_title("Uniform")
ax2.set_title("Normal")
ax3.set_title("Poisson")

In [None]:
# you can generate bins to sort the observations into, which changes the resolution
fig, (ax1, ax2) = plt.subplots(1,2)
ax1.hist(x1)
ax2.hist(x1, bins=100)
ax1.set_title("Without Bins")
ax2.set_title("With 100 Bins")


In [None]:
# make a scatterplot with the normal distribution
plt.plot(np.random.normal(0, 1, 1000), np.random.normal(0, 1, 1000), "g.") # "g." is formating notation, try "bs"
plt.title("Independent Gaussians")
plt.xlabel("First Random Var")
plt.ylabel("Second Random Var")
plt.show()

In [None]:
# How to make a QQ plot
LMDA = 1000.
xn1 = np.random.poisson(LMDA, 10000) #poisson with lambda = 1000
xn2 = np.random.normal(LMDA, np.sqrt(LMDA), 10000) #approximate it with a gaussian
# sort your data
xn1.sort()
xn2.sort()
# scatter plot sorted data
plt.plot(xn1, xn2, "b.")
# plot a line with slope of 1 for perspective
m1 = np.min(np.concatenate([xn1, xn2]))
m2 = np.max(np.concatenate([xn1, xn2]))
plt.plot([m1, m2], [m1, m2], "r", linestyle = "dashed")


## Exercise
* Take the following arrays:  
```
x4 = np.random.normal(10, 2, 10000)
x5 = np.random.normal(0, 1, 10000)
```
* Transform x5 so it looks like x4
* Check your results with a QQ plot
* Check again by creating histograms of each in a subplot
* Measure the means and standard deviations of each distribution to see if they are similar


In [None]:
# starting qq plot
x4 = np.random.normal(10, 2, 10000)
x5 = np.random.normal(0, 1, 10000)
x4.sort()
x5.sort()
plt.plot(x4,x5)
m1 = np.min(np.concatenate([x4, x5]))
m2 = np.max(np.concatenate([x4, x5]))
plt.plot([m1, m2], [m1, m2], "r", linestyle = "dashed")

In [None]:
# can you get x5 to overlay x4 by transforming the x5 array?
plt.hist(x4, label='x4')
plt.hist(x5, label = 'x5')
plt.legend( loc='upper right', numpoints = 1 )

In [None]:
# Hint: try adding 5 to all the values in x5
x6 = x5 + 5
plt.hist(x4, label='x4')
plt.hist(x5, label = 'x5')
plt.hist(x6, label = 'x6')
plt.legend( loc='upper right', numpoints = 1 )

In [None]:
# solution
x6 = (x5* 2) + 10
plt.hist(x4, label='x4')
plt.hist(x5, label='x5')
plt.hist(x6, label='x6')
plt.legend( loc='upper left', numpoints = 1 )

In [None]:
# qq plot solution
x4.sort()
x6.sort()
plt.plot(x4,x6)
m1 = np.min(np.concatenate([x4, x6]))
m2 = np.max(np.concatenate([x4, x6]))
plt.plot([m1, m2], [m1, m2], "r", linestyle = "dashed")