# Matplotlib tutorial

[*John Pinney*](https://github.com/johnpinney) and [*Tony Yang*](https://github.com/tonyyzy)

*Part of this notebook's material was adapted from the [Software Carpentry](https://swcarpentry.github.io/python-novice-gapminder/09-plotting/index.html) Python course and [Imperial Chemistry Department's Python courses](https://github.com/imperialchem)*

## Introduction

In this session, we will introduce you to `matplotlib`, which is the most commonly used Python plotting package. This also requires a few concepts from the `numpy` and `pandas` packages.


Let's see if the `numpy` package is available in your notebook's environment. If you're using the default 'base(root)' environment in Anaconda, it is probably already there. 

In [None]:
import numpy as np

The `as np` instruction means that we will refer to the `numpy` module in our code using the shorthand `np`.

If it isn't found (you will get a *ModuleNotFoundError*), use your *package manager* (see below) to install it and try again.


**NB** If you have any difficulties loading packages during the session, we recommend that you switch to using the online Binder version of this notebooks - see https://github.com/johnpinney/irc_viz



### Using a package manager

To work with modules that are not part of the core python distribution, we need a framework that will deal with downloading the external code and ensuring that different modules are compatible with each other. 

With Anaconda, the easiest way to do this is using the Anaconda Navigator GUI. Go to *Environments* and use the search facility to find the packages that you want to install or uninstall. The package manager will attempt to install these from the internet, and you can then import the corresponding modules within your jupyter notebook.

The command-line utility `conda` gives access to the same package management system, e.g. the command

`
conda install numpy
`

will install numpy in the current environment.

Anaconda/conda is highly recommended as the most straightforward way to manage your python environments. If you have a different python install, you will need to use a different package manager to download packages (usually `pip`, e.g. `pip install numpy`). 



---

## `numpy`

`numpy` provides a set of general data structures and utilities to support numerical computing in python. It is one of the most widely used packages in scientific computing. 

### Mathematical functions

The first thing to note about `numpy` is the huge range of [mathematical functions](https://numpy.org/doc/stable/reference/routines.math.html) it provides. Here are a few examples:

In [None]:
np.log(44)

In [None]:
np.log10(44)

In [None]:
np.sin(np.pi/2)

In [None]:
np.tanh(1.5)

### `ndarray`

A major feature of `numpy` is the *n-dimensional array* (`ndarray`) data type that it provides. This is similar to a `list`, but has (at least) three advantages for numerical computing:

* Every element must be of the same data type (e.g. `float` or `int`).
* Operations are much faster and more memory-efficient using `ndarray` than using `list`.
* The resulting code is easier to read and write.

In [None]:
n = 10

a = list()
for i in range(n):
    a.append(1.0)
print(a)
type(a)

In [None]:
b = np.array(a)
print(b)
type(b)

Notice that `np.array()` is a constructor, making a new `ndarray` object using data from the `list` provided. When we refer to "an array" in scientific python, we almost always mean an object of type `ndarray`.

#### Notes

The length of an `ndarray` is fixed when it is created.

You can check the data type of an `ndarray` using the `dtype` attribute, and the number of data using `size`:

In [None]:
b.dtype

In [None]:
b.size


You can find the documentation for the `ndarray` [here](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html).

---

## `matplotlib`

`matplotlib` is modelled after Matlab's plotting functions and aims to be easy to use and intuitive.  

Import `matplotlib` with

In [None]:
import matplotlib.pyplot as plt

Firstly, we need to create some sample data. Let's plot the sine function from -10 to 10. 

In [None]:
# makes a 1D array of 101 equally spaced values from -10 to 10.
x = np.linspace(-10, 10, 101)
print(x)

In [None]:
# here you can see the benefits of a numpy array
# we can apply a numpy function to the whole array at once
# rather than looping over all elements of a list!
sine_x = np.sin(x) 
print(sine_x)

And here's our first plot!

In [None]:
plt.plot(x, sine_x)

# Sometimes you may find that without plt.show()
# the figure is still displayed. This is Jupyter notebook
# trying to be helpful. Do note that in other environments
# you will almost definitely need this line.
plt.show() 

Note that although we only used two lines to plot the graph matplotlib has set many defaults for us behind the scenes
. Let go over some of the settings so we know how to customise our graphs when we want to.

In [None]:
# Create a new figure of size 6 by 4 inches (width and height)
plt.figure(figsize=(6, 4))

# Plot x and y using color blue from the Tableau palette
# with a continuous line of width 1.5 points without any markers
plt.plot(x, sine_x, color="tab:blue", linewidth=1.5, linestyle="-", marker="")

# Set x limits to be slightly bigger than the data range
plt.xlim(-11, 11)

# Set x ticks to be nine ticks evenly spaced between -10 and 10
plt.xticks(np.linspace(-10, 10, 9))

# Set y limits
plt.ylim(-1.1, 1.1)

# Set y ticks
plt.yticks(np.linspace(-1, 1, 9))

# Show the plot on screen
plt.show()

### More choices of colors, linestyles, and markers
List of colors: https://matplotlib.org/3.1.0/gallery/color/named_colors.html  
List of linestyles: https://matplotlib.org/gallery/lines_bars_and_markers/line_styles_reference.html  
List of markers: https://matplotlib.org/api/markers_api.html

## Completing a plot
It's always a good idea to label your graphs! Here we can add a title, x/y axis labels, and a legend.

In [None]:
plt.figure(figsize=(6.4, 4.8))
plt.plot(x, sine_x, label="y=sin(x)")
plt.title("An example sine plot") # add title to the plot
plt.xlabel("x") # label the axes
plt.ylabel("y")
plt.legend() # show legend
plt.show()

We can also plot multiple lines on the same graph.

In [None]:
plt.plot(x, sine_x, label="y=sin(x)")
plt.plot(x, x, label="y=x")
plt.legend()
plt.show()

### Exercise 1
- plot both sine and cosine functions on the same graph
- Use `tab:purple` as color, `--` as linestyle, `*` as marker for cosine
- Take your pick of options for sine
- Make sure to give your plot a title, axis labels, and legends

## More types of plots

We have covered a *line plot* so far. Now let's dive into scatter and bar plots, and the histogram.

If we only want to plot the datapoints without the contineous lines we can use a *scatter plot*.

In [None]:
plt.scatter(x, sine_x, marker="+")
plt.show()

A bar plot is useful when displaying categorical data.

In [None]:
bar_x = ["A", "B", "C", "D", "E"]
bar_y = np.array([32, 49, 11, 20, 44])
plt.bar(bar_x, bar_y)
plt.show()

Finally we can use histograms to show distributions or a range of measurements. 

Pay attention to how the number of bins for histogram is determined. A too small or too large number of bins may hide or amplify some features of the data.

In [None]:
# generate 100 random numbers from the standard normal distribution
rand_n = np.random.randn(100) 
print(rand_n)

# we can change bins to change how the histogram looks
plt.hist(rand_n, bins=20) 
plt.show()

---

## Extra material

### Decorating figures
We can customise our figures to add error bars or show confidence intervals

In [None]:
# let's assume the error is proportional to the y value
error = sine_x * 0.2 

# we use errorbar to plot a lineplot with error bars
# yerr and xerr are the keyword arguments for inputing error bar values
plt.errorbar(x, sine_x, yerr=error) 
                               
plt.show()

Error bars can be added to bar plots in a similar fashion.

In [None]:
bar_x = np.array(range(1, 11))
bar_y = np.array((range(10, 0, -1)))
plt.bar(bar_x, bar_y, yerr=0.1 * bar_y, xerr=0.1 * bar_x)
plt.show()

For confidence intervals, use `plt.fill_between()` to color a region on the graph:

In [None]:
# For an example, let's assume our confidence interval is proportional 
# to the y value
ci = sine_x * 0.2 
plt.plot(x, sine_x)

#
plt.fill_between(x, (sine_x-ci), (sine_x+ci), color='b', alpha=0.1) 

plt.show()

### Further reading

Be sure to check out [matplotlib's gallery](https://matplotlib.org/3.1.1/gallery/index.html) to glance over all the pretty figures you can make, together with the code that makes them!
