<h1>17. More Matplotlib</h1>
<h2>11/16/2020</h2>

<h2>17.0 Last Time...</h2>
<ul>
    <li><b>matplotlib</b>'s <b>pyplot</b> module lets us use Matlab's powerful plotting tools in Python.</li>
    <li>The <b>matplotlib.pyplot.plot()</b> function is a simple way to plot 2-D data.</li>
    <li>We can specify axis limits as well as line style and color.</li>
</ul>

<h2>17.1 Keyword Strings</h2>

With a normal scatterplot, you can convey two pieces of information for each point: (1) what the x value is, and (2) what the y value is. You can get additional information crammed into one plot by allowing the size and color of the points being plotted to vary!

matplotlib.pyplot has a handy function for this particular application called <b>scatter()</b>. By default, you only need two arguments that consist of arrays of your x data and your y data.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Let's have our x values just be a count from 0 to 49.
var_1 = np.arange(50)

# And let's randomly generate some y values!
var_2 = var_1 + 10*np.random.randn(50)

plt.scatter(var_1,var_2)
plt.xlabel('Variable 1')
plt.ylabel('Variable 2')
plt.show()

Okay! So we have some information being shown here: we can see that there's a general positive trend going on here. But what if we have additional information (say this is temperature versus dewpoint temperature and we also know something about relative humidity)? We can convey that information using the 'c' argument in scatter(): the color of the data points.

In [None]:
# Now we have a third variable that's a new set of random numbers from 0 to 50.
var_3 = np.random.randint(0,50,50)

plt.scatter(var_1,var_2,c = var_3)
plt.xlabel('Variable 1')
plt.ylabel('Variable 2')
plt.show()

More information has been conveyed! Let's try adding even more information - say, a fourth meteorological variable like wind speed - through the size of the circle, which is the 's' argument in scatter(). This marker size is in 'points' squared.

In [None]:
var_4 = abs(np.random.randn(50))*100 #This gives random numbers on a N(0,1) Gaussian.

plt.scatter(var_1,var_2,c = var_3,s = var_4)
plt.xlabel('Variable 1')
plt.ylabel('Variable 2')
plt.show()

<h2>17.2 Categorical Variables</h2>

Sometimes you have data in the form of categories! You may have, for instance, two different sets of tornado data (like in Homework 3), or three different future climate regimes, or five different locations. Any sort of comparative research will require this sort of categorical data analysis!

In [None]:
# Let's look at the example of three pieces of data.
# They might be mean values of three parameters, average grades on an assignment, etc.

names = ['group_a','group_b','group_c']
values = [1,10,100]

# We can start with a bar plot.
plt.bar(names,values)

In [None]:
# We can also create a scatterplot as seen above.
plt.scatter(names,values)

In [None]:
# Or a line plot!
plt.plot(names,values)

<h2>17.3 Controlling Line Properties</h2>

There are a bunch of line attributes you can set! The line type/color/marker examples we saw earlier are shortcuts for common configurations, but there are a <i>lot</i> more. You can find the full list by googling <b>matplotlib.lines.Line2d</b>, or by calling the <b>plt.setp()</b> function with a line or lines as an argument.

In [None]:
lines = plt.plot([1,2,3])
plt.setp(lines)

Let's say you want to increase the width of a given line. You'd want to use the <b>linewidth</b> argument.

In [None]:
plt.plot([1,2,3],linewidth=5.0)

In [None]:
# Likewise, you can set the color.

plt.plot([1,2,3],linewidth=5.0,color='purple')

In [None]:
# If you have markers, you can change their properties as well!

plt.plot([1,2,3],'-o',markeredgecolor='red',markerfacecolor='yellow')

<h2>17.4 Multiple Figures and Axes</h2>

We often want to deal with multiple subplots within the same figure. As a behind-the-scenes note, pyplot keeps track of the "current" figure and axes, which can be referred to using <b>gcf()</b> and <b>gca()</b>, respectively. You probably won't have to worry about this too often.

The <b>subplot()</b> function refers to a particular subplot within a set. It has three arguments: number of rows, number of columns, and then the specific number of this plot (which ranges from 1 to number_rows*number_columns).

In [None]:
# Let's create a couple of subplots of fairly complex data:
# a damped oscillation and an undamped oscillation.

# Start by creating a function that will give us a regular oscillation.

def f(t):
    return np.exp(-t) * np.cos(2*np.pi*t)

# Next, let's have two sets of x values:
# the first is more widely spaced than the second,
# but they cover the same range of data.

t1 = np.arange(0.0,5.0,0.1)
t2 = np.arange(0.0,5.0,0.02)

# First, we create a setup where we have two rows and 1 column of
# plots, and we're referring to the first plot.
plt.subplot(2,1,1)
plt.plot(t1,f(t1),'bo',t2,f(t2),'k')

# Next, we'll refer to the second plot.
plt.subplot(2,1,2)
plt.plot(t2,np.cos(2*np.pi*t2),'r--')
plt.show()

<b>Exercise:</b> Make 4 subplots (2 rows, 2 columns) using the x values below and plot whatever functions you like on them (sin(x), cos(x), 2/x, etc.)!

In [None]:
import numpy as np
import matplotlib.pyplot as plt

x = np.arange(0.0,5.0,0.01)




<h2>17.5 Working with Text</h2>

You can use the <b>text()</b> function to place text in any arbitrary location. As seen above, some useful text-related functions include <b>xlabel()</b>, <b>ylabel()</b>, and <b>title()</b>.

As a side note, if you want to use mathematical expressions in text, it can get a little confusing. You'll want to start your string with the letter r, then surround it with quotation marks followed by dollar signs. The conventions are the same as in LaTeX, and you can find the details by googling 'LaTeX math'.

In [None]:
# Let's generate a histogram from a set of random values in a distribution
# with a specified mean and standard deviation.
mu,sigma = 60,15
x = mu + sigma * np.random.randn(10000)

# Let's create a histogram!
plt.hist(x,50,density=1,facecolor='g')

plt.xlabel('Grades')
plt.ylabel('Probability')
plt.title('Histogram of Grades',fontsize=16)
plt.text(20,.025,r'$\mu=60,\ \sigma=15$',color='b')
plt.axis([0,100,0,0.03])
plt.grid(True)
plt.show()

There's also a method of annotating text that is called, as you might expect, <b>annotate()</b>. An example follows!

In [None]:
t = np.arange(0.0,5.0,0.01)
s = np.cos(2*np.pi*t)

plt.plot(t,s)
plt.annotate('local max',xy=(2,1),
            xytext=(3,1.5),
            arrowprops = dict(facecolor='black',shrink=0.05))
plt.ylim(-2,2)
plt.show()

<h2>17.6 Nonlinear Axes</h2>

If your data spans many orders of magnitude, it can be helpful to create nonlinear axes using <b>xscale()</b> or <b>yscale()</b>.

In [None]:
# Let's do something called fixing the random state.
# Typically when you generate random numbers, every time you
# run the code you'll get a different result.
# Setting a particular random seed will ensure that we can reproduce
# the same results every time for demonstration purposes.

np.random.seed(19680801)

# Let's just make up some data in the interval (0,1)
y = np.random.normal(loc=0.5,scale=0.4,size=1000)
y = y[(y > 0) & (y < 1)]
y.sort()
x = np.arange(len(y))

plt.subplot(1,2,1)
plt.plot(x,y)
plt.yscale('linear')
plt.title('linear')
plt.grid(True)

plt.subplot(1,2,2)
plt.plot(x,y)
plt.yscale('log')
plt.title('log')
plt.grid(True)
plt.show()

<h2>17.7 Take-Home Points</h2>
<ul>
    <li>The scatter() function can make use of keyword strings to set the shape and color of points on the plot.</li>
    <li>We can also use categorical variables to plot groups of information.</li>
    <li>There are many line properties that can be edited!</li>
    <li>Subplots can be added using the subplot() function.</li>
    <li>Text can be added or annotated on plots.</li>
    <li>Nonlinear axes can be added using xscale() or yscale().</li>
</ul>