# Lab 5: Practice plotting data with `Matplotlib`
---

## Scenario

Our imaginary colleague “Dr. Maverick” has invented a new miracle drug that promises to cure arthritis inflammation flare-ups after only 3 weeks since initially taking the medication! Naturally, we wish to see the clinical trial data, and so they have provided us with a CSV spreadsheet containing the clinical trial data.

The CSV file contains the number of inflammation flare-ups per day for the 60 patients in the initial clinical trial, with the trial lasting 40 days. Each row corresponds to a patient, and each column corresponds to a day in the trial (i.e., wide-form data). Once a patient has their first inflammation flare-up they take the medication and wait a few weeks for it to take effect and reduce flare-ups.

To see how effective the treatment is we would like to:

- Calculate the average inflammation per day across all patients.
- Plot the result to discuss and share with colleagues.


---
## Preliminaries  

Import `Numpy`, `matplotlib` and `pandas` libraries. 

In [None]:
# ENTER YOUR ANSWER


# # Uncomment the next line for retina displays to create higher-res figs in nb
# %config InlineBackend.figure_format = 'retina'

Using NumPy, import the CSV file named "inflammation-01.csv" from the `data` subdirectory and enter the correct value for the "delimiter" keyword argument (kwarg). Make sure your array has shape 60x40 (see: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.shape.html). 

**Of course, you could read in the data first with `pandas`, but we want to at least see how NumPy could be used to read in data as well.**

In [None]:
# REPLACE ELLIPSES (...)
data = np.loadtxt(..., delimiter=...)

In [None]:
# CHECK THE SHAPE


---
## Create a `subjects` variable containing each subject's ID

Create a varaiable called `subjects` that holds a unique subject ID for each subject (i.e., "s01, s02,...,s60"), which you will use as indices for a data frame you will create in later steps. Note that each ID should be 3 characters in length. **Do this in one line by combining a list comprehension with a conditional.** The syntax for this is: "do something if some-condition else do something-else..." followed by the remaining list comprehension syntax you are already familiar with.  

In [None]:
# YOUR ANSWER HERE


---
## Create a DataFrame containing the inflammation data and use `subjects` as the index. 

Also, display the first several lines of your DataFrame to make sure the `subjects` variables serve as the index column. 

In [None]:
# YOUR ANSWER HERE


---
## Plot the average inflammation for each day. 
We want to see how the average inflammation across subjects changes on a day-to-day basis. Take this step-by-step: First, assign these values to a variable `avg_inflammation`; second, use matplotlib to plot `avg_inflammation`.

In [None]:
# YOUR ANSWER HERE


---
## Plot the max inflammation for each day.

The result above should have been a reasonably linear rise and fall, in line with Dr. Maverick’s claim that the medication takes 3 weeks to take effect. But a good data scientist doesn’t just consider the average of a dataset, so let’s have a look at two other statistics, the max and min of the data. 

In [None]:
# YOUR ANSWER HERE


Next, plot the min of the data. 

In [None]:
# YOUR ANSWER HERE


---
## Plotting with subplots

Since these plots are similar and can be thought of as all part of the same analysis, let's plot them in a single figure using subplots. First, create a single row of three figure panels (i.e., subplots). Specify a sensible figure size, too, but don't worry if it seems off - you can always go back and change it.

Now, in the first set of axes, plot the average. In the second, plot the max, and in the third, plot the min.

In [None]:
# YOUR ANSWER HERE


You should see the same figures you plotted earlier, but now all in one nice row. Now copy your code into the cell below and add appropriate `x` and `y` axis labels.

After you add sensible axis labels, add a good figure title. And, as a last step, make sure the y-axis labels aren't smooshed up against the neighboring subplot.

One last thing, add a fourth subplot to show the standard deviation across days. Of course, you'll want to label its axes and make sure everything looks nice, just like you did for the first three panels. 

In [None]:
# YOUR ANSWER HERE


---
## Using the `label` parameter to easily create legends

Let's say you wanted to plot both max and min in the same figure. A nice way to do that would be to use different colors for each, and also to label them appropriately. Go ahead and do that, and remember to continue using object-oriented plotting as opposed to procedural. 

In [None]:
# YOUR ANSWER HERE


Now add the average inflammation to the figure, making sure to include it in your legend as well. 

Personally, I prefer not having Matplotlib's default frame around the legend. Look at Matplotlib's API docs to see how you can get rid of it and test it out. 

In [None]:
# YOUR ANSWER HERE


---
## Fancy heatmap.

You've seen figures like this before and may have even been a bit perplexed by them. Don't be! The figure below is a 2D grid of rows and columns in which the "warmth" of the color indicates the cell value (hence, the name "heatmaps"). In this particular case, the rows represent individual subjects and the columns the days. The color corresponds to the number of inflammation bouts for a particular subject on a particular day - in other words, all of the data are plotted compactly in one pretty cool looking figure! Every heatmap you see will be of the same structure; however, the particular mapping between what rows vs columns represent will be different, that's all. 

In [None]:
image = plt.imshow(data)
cax = plt.axes([0.75, 0.1, 0.025, 0.78])
plt.colorbar(cax=cax)
plt.show()

---
This lab comes from Software Carpentry's [Programming with Python workshop](https://swcarpentry.github.io/python-novice-inflammation/). 