# Lecture 8 - matplotlib

### Pre-readings
- Chapter 4

### Learning Objectives
- Create a time series plot showing a single data set
- Create a scatter plot showing relationship between two data sets
- Use methods to plot directly from pandas DataFrames
- Customize basic features of a plot, such as axis labels, titles, colours, and line styles
---

## Object oriented vs procedural plotting
Both can be used to achieve the same thing, but are fundamentally different. In essence, procedural progamming will focus on calling functions while object oriented will use methods. Functions would be called by name and would take in  data (parameters) and return data (outputs), with the function defining the operations that should be done on that data. Methods on the other hand, may also take in and return data, but the object that the method belongs to would be implicitly passed as an input.

We have used both functions and methods previously, you may have not been aware of the differences. Let's look at an example we already know:

**Calculating a mean with numpy and pandas**
- We have commonly used the mean function from the `numpy` library. This is a function as we call it as `output = np.mean(input)`. Note how this function takes some input and returns the output.
- Now lets compare this with the `pandas` library. Suppose we have some dataframe called `df`, which for simpliclity only contains one column. If we wanted to calculated the mean of `df` we can just call `df.mean()` without expliclity inputting the data. This is a method, because the data from `df` is implicilty passed into the function. More specifcially, this is because the `pandas` method `mean()` is a method that is associated with the object `df`.

**Matplotlib**
- In matplotlib we can create figures using either and object oriented (method) or a procedural (function) based approach. It is important to be comfortable with both ways to plot data, as you will see both used commonly.

### Anatomy of a figure

The code below is taken straight from [Matplotlib's documentation](https://matplotlib.org/stable/gallery/showcase/anatomy.html). It's provided here so you have a reference to all of the different parts of a Matplotlib figure. For now, just focus primarily on the methods and functions we talk about in the textbook and lecture notebooks, but understand that later you may want to control other details of your figures. 

- What is a figure and what are the elements of a figure?

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator, MultipleLocator, FuncFormatter
%config InlineBackend.figure_format='retina'

np.random.seed(19680801)

X = np.linspace(0.5, 3.5, 100)
Y1 = 3+np.cos(X)
Y2 = 1+np.cos(1+X/0.75)/2
Y3 = np.random.uniform(Y1, Y2, len(X))

fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(1, 1, 1, aspect=1)


def minor_tick(x, pos):
    if not x % 1.0:
        return ""
    return "%.2f" % x

ax.xaxis.set_major_locator(MultipleLocator(1.000))
ax.xaxis.set_minor_locator(AutoMinorLocator(4))
ax.yaxis.set_major_locator(MultipleLocator(1.000))
ax.yaxis.set_minor_locator(AutoMinorLocator(4))
ax.xaxis.set_minor_formatter(FuncFormatter(minor_tick))

ax.set_xlim(0, 4)
ax.set_ylim(0, 4)

ax.tick_params(which='major', width=1.0)
ax.tick_params(which='major', length=10)
ax.tick_params(which='minor', width=1.0, labelsize=10)
ax.tick_params(which='minor', length=5, labelsize=10, labelcolor='0.25')

ax.grid(linestyle="--", linewidth=0.5, color='.25', zorder=-10)

ax.plot(X, Y1, c=(0.25, 0.25, 1.00), lw=2, label="Blue signal", zorder=10)
ax.plot(X, Y2, c=(1.00, 0.25, 0.25), lw=2, label="Red signal")
ax.plot(X, Y3, linewidth=0,
        marker='o', markerfacecolor='w', markeredgecolor='k')

ax.set_title("Anatomy of a figure", fontsize=20, verticalalignment='bottom')
ax.set_xlabel("X axis label")
ax.set_ylabel("Y axis label")

ax.legend(loc='upper right')


def circle(x, y, radius=0.15):
    from matplotlib.patches import Circle
    from matplotlib.patheffects import withStroke
    circle = Circle((x, y), radius, clip_on=False, zorder=10, linewidth=1,
                    edgecolor='black', facecolor=(0, 0, 0, .0125),
                    path_effects=[withStroke(linewidth=5, foreground='w')])
    ax.add_artist(circle)


def text(x, y, text):
    ax.text(x, y, text, backgroundcolor="white",
            ha='center', va='top', weight='bold', color='blue')


# Minor tick
circle(0.50, -0.10)
text(0.50, -0.32, "Minor tick label")

# Major tick
circle(-0.03, 4.00)
text(0.03, 3.80, "Major tick")

# Minor tick
circle(0.00, 3.50)
text(0.00, 3.30, "Minor tick")

# Major tick label
circle(-0.15, 3.00)
text(-0.15, 2.80, "Major tick label")

# X Label
circle(1.80, -0.27)
text(1.80, -0.45, "X axis label")

# Y Label
circle(-0.27, 1.80)
text(-0.27, 1.6, "Y axis label")

# Title
circle(1.60, 4.13)
text(1.60, 3.93, "Title")

# Blue plot
circle(1.75, 2.80)
text(1.75, 2.60, "Line\n(line plot)")

# Red plot
circle(1.20, 0.60)
text(1.20, 0.40, "Line\n(line plot)")

# Scatter plot
circle(3.20, 1.75)
text(3.20, 1.55, "Markers\n(scatter plot)")

# Grid
circle(3.00, 3.00)
text(3.00, 2.80, "Grid")

# Legend
circle(3.70, 3.80)
text(3.70, 3.60, "Legend")

# Axes
circle(0.5, 0.5)
text(0.5, 0.3, "Axes")

# Figure
circle(-0.3, 0.65)
text(-0.3, 0.45, "Figure")

color = 'blue'
ax.annotate('Spines', xy=(4.0, 0.35), xytext=(3.3, 0.5),
            weight='bold', color=color,
            arrowprops=dict(arrowstyle='->',
                            connectionstyle="arc3",
                            color=color))

ax.annotate('', xy=(3.15, 0.0), xytext=(3.45, 0.45),
            weight='bold', color=color,
            arrowprops=dict(arrowstyle='->',
                            connectionstyle="arc3",
                            color=color))

ax.text(4.0, -0.4, "Made with http://matplotlib.org",
        fontsize=10, ha="right", color='.5')

plt.show()



---
## Practice making simple figures
### Practice Problem
- Using a procedural plotting framework, plot the sinewave stored in y against x.
- Add appropriate x and y labels, as well as a title.

In [None]:
# imports
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
# starter code
x = np.linspace(0, 4*np.pi, 2000)
y = np.sin(x)

# your answer here


---
### Practice Problem
- Repeat the same figure in an object oriented framework

In [None]:
# your answer here


---
### Practice Problem
- Given the three sine waves, y1, y2, y3, plot them all on the same plot.
- Use an object oriented approach

In [None]:
# starter code
x = np.linspace(0, 4*np.pi, 2000)
y1 = np.sin(x)
y2 = np.sin(x)*1.5
y3 = np.sin(x)*2

# your answer here


---
### Practice Problem
- Plot the three sine waves above on separate subplots.
- Create a subplot that contains each sine wave in a different row
- On a separate figure, create a subplot that contains each sine wave in a different column

In [None]:
# your answer here
# Plot 1: Three sine waves in different rows


In [None]:
# your answer here
# Plot 2: Three sine waves in different columns


---
## Analyze and plot some data

Our imaginary colleague “Dr. Maverick” has invented a new miracle drug that promises to cure arthritis inflammation flare-ups after only 3 weeks since initially taking the medication! Naturally, we wish to see the clinical trial data, and so they have provided us with a CSV spreadsheet containing the clinical trial data.

The CSV file contains the number of inflammation flare-ups per day for the 60 patients in the initial clinical trial, with the trial lasting 40 days. Each row corresponds to a patient, and each column corresponds to a day in the trial (i.e., wide-form data). Once a patient has their first inflammation flare-up they take the medication and wait a few weeks for it to take effect and reduce flare-ups.

To see how effective the treatment is we would like to:

- Calculate the average inflammation per day across all patients.
- Plot the result to discuss and share with colleagues.


---
### Task 1:
Using NumPy, import the CSV file named "inflammation-01.csv" and enter the correct value for the "delimiter" keyword argument (kwarg). Make sure your array has shape 60x40 (see: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.shape.html). 

**Of course, you could read in the data first with `pandas`, but we want to at least see how NumPy could be used to read in data as well.**

In [None]:
# your answer here



---
### Task 2: Create a `subjects` variable containing each subject's ID

Create a varaiable called `subjects` that holds a unique subject ID for each subject (i.e., "s01, s02,...,s60"), which you will use as indices for a data frame you will create in later steps. Note that each ID should be 3 characters in length. **Do this in one line by combining a list comprehension with a conditional.** The syntax for this is: "do something if some-condition else do something-else..." followed by the remaining list comprehension syntax you are already familiar with.  

In [None]:
# your answer here


---
### Task 3: Create a DataFrame containing the inflammation data and use `subjects` as the index. 

Also, display the first several lines of your DataFrame to make sure the `subjects` variable serves as the index column. 

In [None]:
# your answer here


---
### Task 4: Plot the average inflammation for each day. 
We want to see how the average inflammation across subjects changes on a day-to-day basis. Take this step-by-step: First, assign these values to a variable `avg_inflammation`; second, use matplotlib to plot `avg_inflammation`.

In [None]:
# your answer here


---
### Task 5: Plot the max inflammation for each day.

The result above should have been a reasonably linear rise and fall, in line with Dr. Maverick’s claim that the medication takes 3 weeks to take effect. But a good data scientist doesn’t just consider the average of a dataset, so let’s have a look at two other statistics, the max and min of the data. 

In [None]:
# your answer here


### Task 6: Plot the min of the data. 

In [None]:
# your answer here


---
### Task 7: Plotting with subplots

Since these plots are similar and can be thought of as all part of the same analysis, let's plot them in a single figure using subplots. First, create a single row of three figure panels (i.e., subplots). Specify a sensible figure size, too, but don't worry if it seems off - you can always go back and change it.

Now, in the first set of axes, plot the average. In the second, plot the max, and in the third, plot the min.

In [None]:
# your answer here


### Task 8: Customize the plots
- You should see the same figures you plotted earlier, but now all in one nice row. Now copy your code into the cell below and add appropriate `x` and `y` axis labels.
- After you add sensible axis labels, add a good figure title. And, as a last step, make sure the y-axis labels aren't smooshed up against the neighboring subplot.
- One last thing, add a fourth subplot to show the standard deviation across days. Of course, you'll want to label its axes and make sure everything looks nice, just like you did for the first three panels. 

In [None]:
# your answer here


---
### Task 9: Using the `label` parameter to easily create legends

Let's say you wanted to plot both max and min in the same figure. A nice way to do that would be to use different colors for each, and also to label them appropriately. Go ahead and do that, and remember to continue using object-oriented plotting as opposed to procedural. 

In [None]:
# your answer here


### Task 10:
Now add the average inflammation to the figure, making sure to include it in your legend as well. 

Personally, I prefer not having Matplotlib's default frame around the legend. Look at Matplotlib's API docs to see how you can get rid of it and test it out. 

In [None]:
# your answer here


---
## Fancy heatmap.

You've seen figures like this before and may have even been a bit perplexed by them. Don't be! The figure below is a 2D grid of rows and columns in which the "warmth" of the color indicates the cell value (hence, the name "heatmaps"). In this particular case, the rows represent individual subjects and the columns the days. The color corresponds to the number of inflammation bouts for a particular subject on a particular day - in other words, all of the data are plotted compactly in one pretty cool looking figure! Every heatmap you see will be of the same structure; however, the particular mapping between what rows vs columns represent will be different, that's all. 

In [None]:
image = plt.imshow(data)
cax = plt.axes([0.75, 0.1, 0.025, 0.78])
plt.colorbar(cax=cax)
plt.show()

---
This lab comes from Software Carpentry's [Programming with Python workshop](https://swcarpentry.github.io/python-novice-inflammation/). 