# Exercise 3: Data visualisation and plotting with `Matplotlib`

[Matplotlib](https://matplotlib.org/) is a Python package for visualising data and generating graphics.
It is used by many Python programmers for plotting and, like NumPy, is a core part of the scientific Python ecosystem.

Basic plotting tasks can be accompalished using Matplotlib in just a few lines of code.
Matplotlib is also very flexible, and can easily be used to generate complex, publication-quality graphics with multiple subplots, insets,  *etc.*

This flexibility does come at a cost, though, which is that it takes a lot of practice to learn the more advanced commands.
The way most people use Matplotlib is to gradually build up a library of small plotting programs for different tasks, from which they can then take pieces of code to prepare more complex graphics when you need to.

On that point, *Google is your friend*: searching for things like "changing line colours Matplotlib" will almost always bring you to a site with a code example and explanation you can follow/adapt to your needs.

## a. Introduction to Matplotlib

### i. Plotting basics

To illustrate the basics of plotting in Matplotlib, we will plot $y = \mathrm{sin}(x)$ over the interval $0 \rightarrow 2\pi$.
First, we use some routines from NumPy, introduced in Exercise 2, to generate some data:

In [None]:
import numpy as np

x_data = np.linspace(0, 2 * np.pi, 101)
y_data = np.sin(x_data)

The `import <module> as <name>` syntax and `np.linspace()` function were both introduced in Exercise 2.

`np.pi` is a constant with the value of $\pi$; we could also just as well have imported the `math` module and used `math.pi` as we did in Exercise 1.

The `np.sin()` call is new.
The NumPy `sin()` function takes a NumPy array as its argument, computes the sine of each element, and returns a new array with the result.

This can be verified using the `shape` parameter:

In [None]:
print("x_data.shape =", x_data.shape)
print("y_data.shape =", y_data.shape)

NumPy implements most of the standard mathematical operations, including the trigonometric functions `np.sin()`, `np.cos()` and `np.tan()` as well as other operations such as `np.power()` and `np.exp()`.

The following is a minimal Matplotlib script for plotting this data:

In [None]:
# This is a special directive to instruct Jupyter to show Matplotlib graphics "inline" in the code cell output.

%matplotlib inline

# Import main pyplot API.

import matplotlib.pyplot as plt

# Create a new figure object.

plt.figure()

# Add a line to the plot.

plt.plot(x_data, y_data)

# Set axis labels.

plt.xlabel("x")
plt.ylabel("sin(x)")

# Show the figure (will appear in the output below the cell).

plt.show()

# Clean up.

plt.close()

Most of the core Matplotlib functionality is available *via* the `pyplot` API, which we import at the top of the script using the "rename" import we use for NumPy (referring to `pyplot` by the shorthand `plt` is standard in most scripts that use Matplotlib).

The `figure()` function creates a new figure.
`plot()` adds a line to the figure, and `xlabel()` and `ylabel()` set the axis labels.
`show()` draws the figure, and `close()` cleans up and frees the resources used to keep track of the content and to draw it.

The following example is similar, but shows some additional customisation:

In [None]:
# Generate two data series.

x = np.linspace(0, 2 * np.pi, 101)

sin_x = np.sin(x)
cos_x = np.cos(x)

plt.figure()

# Plot both data series with optional labels and line colours.

plt.plot(x, sin_x, label = "f(x) = sin(x)", color = 'b')
plt.plot(x, cos_x, label = "f(x) = cos(x)", color = 'r')

plt.xlabel("x")
plt.ylabel("f(x)")

# Add a legend in the lower-left corner.

plt.legend(loc = 'lower left')

# Set axis limits.

plt.xlim(0, 2 * np.pi)
plt.ylim(-1, 1)

plt.show()

plt.close()

This time the code generates and plots two data series: $y = \mathrm{sin}(x)$ and $y = \mathrm{cos}(x)$.

The two calls to `plot()` specify a label and a line colour for each curve using the `label` and `color` "keyword arguments".
Keyword arguments are optional arguments that can be passed to Python functions to override default values.
(By default, Matplotlib will choose its own colours, and will set the label to the "null" value `None`.)

There are several ways to specify colours in Matplotlib:

* The easiest method is to use the one-letter codes for common colours such as red (`color = 'r'`), green (`g`) and blue (`b`).
* Alternatively, functions that take a `color` keyword argument also accept "named colours" from the [standard web palette](https://www.w3schools.com/colors/colors_names.asp), for example `color = 'aquamarine'`.
* A third method is to manually specify colours as a tuple of red, green and blue values between 0 and 1 - for example `color = (1.0, 0.5, 0.5)` specifies 100 % red, 50 % green and 50 % blue, which results in a peach colour.

Feel free to modify the code cell above and try these options out.

Adding labels to the data enables us to add a legend with the `legend()` function.
This takes a `loc` keyword that allows a location to be specified.
The `loc` parameter can either accept text strings, as used here, or integer codes - a list of these can be found mid-way down the [online documentation page](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html).

Finally, we also set the x- and y-axis limits using the `xlim()` and `ylim()` functions.

If we were being picky, there is one more point of formatting that we might want to fix:

In [None]:
# Generate a sequence of values to place x-axis "ticks" at multiples of pi/2.
# np.linspace(0, 2 * np.pi, 5) will generate five points between 0 -> 2pi.

x_ticks = np.linspace(0, 2 * np.pi, 5)
x_tick_labels = ["0", "pi/2", "pi", "3pi/2", "2pi"]

plt.figure()

plt.plot(x, sin_x, label = "f(x) = sin(x)", color = 'b')
plt.plot(x, cos_x, label = "f(x) = cos(x)", color = 'r')

plt.xlabel("x")
plt.ylabel("f(x)")

plt.legend(loc = 'lower left')

plt.xlim(0, 2 * np.pi)
plt.ylim(-1, 1)

# Manually specify x tick locations and labels.

plt.xticks(x_ticks, x_tick_labels)

plt.show()

plt.close()

This code cell makes use of the `xticks()` function to manually specify the locations and labels of the x-axis "ticks".

The `np.linspace()` function is used to obtain a sequence of five values between $0 \rightarrow 2\pi$, and a list of five text labels is created to accompany it.
These are then passed to the `xticks()` function.

As noted above, this is really being picky - if we just wanted to generate a quick plot of some data to look at, we probably wouldn't bother.
If, on the other hand, we were preparing graphics for a report, paper or presentation, subtleties like this can add up to making things that little bit more "polished".

As noted above, one of the big advantages of Matplotlib is that it is highly customisable, so once you get used to how to make these small changes you will be able to produce better-quality figures than you would with, say, Excel, with less effort.
Another advantage is that you can combine Matplotlib with other Python code to analyse/process and visualise data in the same script or Jupyter notebook.

### ii. Exercise

The $y = \mathrm{cos}(x)$ curve can be shifted by adding a phase shift $\phi$ to $x$:

$y = \mathrm{cos}(x + \phi)$

This is quite straightforward to code using NumPy:

In [None]:
# Phase shift phi = pi / 4.

phi = np.pi / 4

# x values between 0 -> 2 pi.

x = np.linspace(0, 2 * np.pi, 101)

# y = cos(x + phi).

y = np.cos(x + phi)

The code defines a phase shift $\phi = \frac{\pi}{4}$.
It then uses `np.linspace()` to generate a sequence of 101 $x$ values between $0 \rightarrow 2 \pi$ as in the previous examples.

The next line calculates $y = \mathrm{cos}(x + \phi)$.
This can be done in a single line of code with `y = np.sin(x + phi)`.
This illustrates another useful feature of NumPy's `ndarray`: adding a single (scalar) value to an array performs element-wise addition, similar to Matlab.

Using this code and the plotting code from above, write some code to do the following:

* Calculate $y = \mathrm{cos}(x + \phi)$ for $\phi = 0$, $\phi = \frac{\pi}{4}$ and $\phi = \frac{\pi}{2}$.
* Create a new figure.
* Draw the three curves on a single plot with using three different colours and appropriate labels.
* Add axis titles and a legend to the plot.
* Display the plot.
* Close the plot to free resources.

*If you wish, it is perfectly fine to copy/paste pieces of code from other cells - this is very often how plots are put together using Matplotlib in practice.*

In [None]:
# Enter and test your code here.

## b. Other plot types

### i. Area plots

In an area plot, the area between the curve and the x-axis is filled with a solid colour.
This can help to add visual interest to plots where the area is important.

To generate some data for this plot, we will use the normalised Gaussian function $G(x)$:

$G(x) = \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{(x - \mu)^{2}}{2 \sigma^{2}}}$

NumPy does not have a Gaussian function, but it easy to define one:

In [None]:
import math

def Gaussian(x, a, mu, sigma):
    norm = a / (sigma * math.sqrt(2 * math.pi))
    exp = (x - mu) ** 2 / (2 * sigma ** 2)
    
    return norm * np.exp(-1 * exp)

This function takes a NumPy array of $x$ values and three scalar values for $a$, $\mu$ and $\sigma$ and returns a NumPy array of $y$ values.
It can be called as follows:

We can now plot this data using the `fill_between()` function to shade the area under the curve:

In [None]:
x = np.linspace(-5, 5, 1001)
g_x = Gaussian(x, 1.0, 0.0, 0.5)

In [None]:
plt.figure()

plt.plot(x, g_x, color = 'b')

# Fills the area between G(x) and x = 0.

plt.fill_between(x, g_x, color = 'b')

plt.xlabel("x")
plt.ylabel("G(x)")

plt.xlim(-5, 5)
plt.ylim(0, 1)

plt.show()

plt.close()

It often looks better if, instead of shading the curve area a solid colour, we make the colour semi-transparent.
This can be achieved with the `alpha` keyword:

In [None]:
plt.figure()

plt.plot(x, g_x, color = 'b')

# Set the transparency with the alpha keyword.

plt.fill_between(x, g_x, color = 'b', alpha = 0.25)

plt.xlabel("x")
plt.ylabel("G(x)")

plt.xlim(-5, 5)
plt.ylim(0, 1)

plt.show()

plt.close()

`alpha` specifies a transparency between 0 (100 % transparent) and 1 (100 % opaque).

This method of presenting data can be particularly effective for peak fitting.

In many areas of spectroscopy, a spectrum is fit to a series of peak functions to try and separate out the contributions from individual processes.
To give an example, we will simulate this by adding up three Gaussian functions:

In [None]:
x = np.linspace(0, 10, 1001)

# Three peaks with different area (a), centre (mu) and width (sigma).

g_1 = Gaussian(x, 1.0, 2.0, 0.5)
g_2 = Gaussian(x, 1.5, 4.0, 0.8)
g_3 = Gaussian(x, 0.6, 6.5, 1.0)

# Add peaks together to simulate a composite spectrum.

g_tot = g_1 + g_2 + g_3

This illustrates another Matplotlib capability: adding two arrays with the `+` operator performs element-wise addition (provided all the arrays in the operands have compatible shapes).

In [None]:
plt.figure()

# Plot composite spectrum as a black line (color = 'k').

plt.plot(x, g_tot, color = 'k')

# Plot each of the three peaks with shaded areas in different colours.

plt.plot(x, g_1, label = "Peak 1", color = 'b')
plt.fill_between(x, g_1, color = 'b', alpha = 0.25)

plt.plot(x, g_2, label = "Peak 2", color = 'r')
plt.fill_between(x, g_2, color = 'r', alpha = 0.25)

plt.plot(x, g_3, label = "Peak 3", color = 'g')
plt.fill_between(x, g_3, color = 'g', alpha = 0.25)

# Add a legend.

plt.legend(loc = 'upper right')

plt.xlabel("x")
plt.ylabel("I(x)")

plt.xlim(0, 10)
plt.ylim(0, 1)

plt.show()

plt.close()

### ii. Images

Matplotlib also has the ability to read and "plot" image data.
The following code reads a high-resolution electron micrograph of some pollen cells from [Wikipedia](https://en.wikipedia.org/wiki/File:Misc_pollen.jpg):

In [None]:
from matplotlib.image import imread

image = imread("Part1a-Exercise3-Pollen.jpg")

Here we use another variation of the `import` statement which imports one specific function, `imread()`, from the `matplotlib.image` package.

`imread()` takes the path to an image file and returns a NumPy array containing the image data:

In [None]:
print("image.ndim:", image.ndim)
print("image.shape:", image.shape)
print("image.dtype:", image.dtype)

The image has three dimensions.
The first two are the pixel dimensions - 935 pixels tall by 1228 wide.
The third dimension is the pixel colour - one value each for the red, green and blue components.

The data type is `uint8`, which is shorthand for "8-bit unsigned integer".
This means each colour component is an integer value from 0 to 255 (8 bits can store 2<sup>8</sup> = 256 values).
This 24-bit format - 8 bits per red, green and blue colour channel - is quite common for colour images.

Images can be included in Matplotlib plots using the `imshow()` function.

In [None]:
plt.figure()

# imshow() displays image data on a Matplotlib plot.

plt.imshow(image)

plt.show()

plt.close()

The x and y axes are numbered by pixel value.
Quite often, we don't want this, so we can use the `axis()` function to remove the labels:

In [None]:
plt.figure()

plt.imshow(image)

# Remove axes abels.

plt.axis('off')

plt.show()

plt.close()

One of the main uses of using Matplotlib to draw images is that we can apply a "colour map" to add a false colour.

To do this, we must first convert the image data to grayscale (one colour channel), which can be done by averaging the third dimension:

In [None]:
image = np.average(image, axis = 2)

print("image.ndim is now:", image.ndim)
print("image.shape is now:", image.shape)

The `np.average()` function takes a NumPy array and the index of the dimension to average.
The index is zero-based, so the first dimension is 0, the second is 1 and the third is 3.
The return value is a new array with the averaged axis "removed" - this is confirmed by inspecting the `ndim` and `shape` properties.

We can now plot the image with a colour map:

In [None]:
plt.figure()

# False colour the image using the `cmap` keyword.

plt.imshow(image, cmap = 'afmhot')

plt.axis('off')

plt.show()

plt.close()

Matplotlib has a large selection of preset colour maps, the names of which are listed in the [online documentation]().
You can experiment with these in Exercise 2 below.

### iii. Exercise

You have been provided with a second electron micrograph of some amosite asbestos crystals (from the [US Geological Survey](www.usgs.gov)).
The following code cell loads this image and averages the colour channel:

In [None]:
image = imread("Part1a-Exercise3-AmositeAsbestos.png")

# The colour dimension in PNG files has four components: r, g, b and alpha (transparency).
# We do not want to include the alpha channel in the average.
# To avoid doing so, we select the first three values in the last channel when we pass the image to np.average().
# This uses another feature of NumPy arrays which you may recognise from Matlab.

image = np.average(image[:, :, :3], 2)

In the code cell below, write some code to:

* Create a new figure.
* Draw the image with a colour map of your choice.
* Disable the axes.
* Show the coloured image.
* Clean up.

In [None]:
# Enter your code here.