# Visualization with matplotlib

In this notebook you will learn about the visualization of numerical data using *matplotlib*, the most popular python plotting library.  We will focus on static 2D plotting, though the library can also make animations and 3D visualizations.

---

## Using this notebook

- You should already have completed [the numpy notebook](1_Numpy.ipynb) before trying this one.  The layout is the same.

- **Make sure to execute every cell or the ones below may not work**



In [None]:
# This line makes the plots that we will make below appear here in the browser,
# instead of opening a new window.
%matplotlib inline

# The simplest to use matplotlib is via its "pyplot" interface,
# which contains all the functions we will use below. We rename it
# as `"plt" since we will be typing it so often
import matplotlib.pyplot as plt
import numpy as np


---

# 1. Essential plots

Matplotlib can make plots of data in either python lists or numpy arrays.  The simplest kind of plots are line and scatter plots, which both just plot *x* vs *y* values.  The `plt.plot` command makes a line graph by default:

In [None]:
# make some data to plot
a = np.arange(-10, 11)
b = a**2

# make a plot of x vs y
plt.plot(a, b)

---

If we want a scatter plot, without lines connecting the points, we use the same function but add an optional format argument to change it:

In [None]:
# Unsorted data - if we were to plot this as a line graph
# it would jump around the axes
c = np.array([2, 4, 6, 8, -2, -4, -6, -8])
d = c**2

plt.plot(c, d, '.')

---


The third argument can specify the colour of the plot and the style of the line or points.  

- Choose the colour with a single letter, one of <font color='red'>**r**</font>, <font color='green'>**g**</font>, <font color='blue'>**b**</font>, <font color='cyan'>**c**</font>, <font color='magenta'>**m**</font>, <font color='yellow'>**y**</font>, <font color='black'>**k**</font>


- Choose the style with one of the choices on <a href="https://matplotlib.org/stable/api/markers_api.html">the matplotlib markers web page</a>, and/or one of '-' (solid lines), '--' (dashed lines), '..' (dotted lines) or '-.' (dash-dot lines)

For example, if you wanted red dashed lines as the style, you would use 'r--'.


## <font color='blue'>Exercise 1</font>  

Repeat the last plot, but with `c` and `d` the other way around, and using green stars for the markers.

In [None]:
# Space for your working


Scatter plots like these are particularly useful for plotting one set of data against another to learn about whether there is some statistical relationship between them (rather than a simple mathematical relationship like here).

---

You can put multiple data sets on the same plot by including them in the same cell.  Matplotlib will automatically cycle through colours if you do not specify them explicitly:

In [None]:
e = np.arange(-0.5, 1.5, 0.01)
f = e**2
g = e**3

plt.plot(e, f)
plt.plot(e, g)

Plots can become crowded very easily if you include many points, so experimenting with colours and styles is often needed to make something readable.

---

You can do lots more customization of each plot by specifying various keywords to `plt.plot` (some of these override the format option we used above).  You can change:

- Colours of different components from the wider <a href="https://matplotlib.org/stable/gallery/color/named_colors.html">matplotlib colour list</a> using `color`, `linecolor`, `markerfacecolor`, or `markeredgecolor`.


- The size of points using `markersize`


- The width of lines using `linewidth`


- The marker and line style using `marker` and `linestyle`


## <font color='blue'>Exercise 2</font>  


Make a single plot of:
- a sine curve
- with 50 evenly spaced points from 0 .. 10
- drawn as size 10 red squares with black edges
- with no line connecting them

Hint: recall the numpy functions `np.linspace` and `np.sin` for creating the data.

In [None]:
# Space for your working


---


Another standard plot type is a histogram, which shows the frequency of different values in a data set.

In [None]:
# This numpy function generates Normal (also known as "Gaussian") random numbers,
# in this case 100000 numbers with mean 10 and standard deviation 3.
m = np.random.normal(10, 3, size=100000)
plt.hist(m)

You may notice that the notebook has printed out a bunch of numbers and things before the plot.  This is because the `plt.hist` function returns the numbers it finds in plotting as well as show the plot itself.  You can stop this by putting a semicolon `;` at the end of the last line in a cell.

---

## <font color='blue'>Exercise 3</font>  

The histogram above is a bit blocky - matplotlib only uses a small number of bins as a default.  Have a look at the documentation for `plt.hist`, and plot this histogram with 100 bins instead of the default.

*Hint*: you can get documentation on the function by running a cell containing `plt.hist?`

In [None]:
# Space for your working


---


# 2. Labelling and customizing

Good graphs have labels describing what they are showing. You can add titles and axis labels by running several matplotlib functions after you create a plot:

In [None]:
# As an aside, here's a nice way to plot a spiral, using a parametric plot:
r = np.arange(0., 40., 0.1)
p = np.sin(r) * np.exp(-r/10)
q = np.cos(r) * np.exp(-r/10)
plt.plot(p, q)

# Add our labels
plt.xlabel("The x axis label")
plt.ylabel("The y axis label")
plt.title("The title at the top")

---

You can use the functions `plt.xlim` and `plt.ylim` to adjust the ranges of data shown on the `x` and `y` axes of plots, to zoom in or out.

They take two arguments, `left/right` for `xlim` and `top/bottom` for `ylim`, which adjust the specified edge. 

Either can be set/left as the default value `None`, which means "don't change this edge".

---

## <font color='blue'>Exercise 4</font>  

Remake the spiral plot above, but zooming the x-axis in to the range (0.0, 0.3) and change the y-axis bottom to zero.

In [None]:
r = np.arange(0., 40., 0.1)
p = np.sin(r) * np.exp(-r/10)
q = np.cos(r) * np.exp(-r/10)
plt.plot(p, q)
plt.xlabel("The x axis label")
plt.ylabel("The y axis label")
plt.title("The title at the top")

# Space for your working


---

You can also customize the position and labels of the tick marks on the x and y axes using the `xticks` and `yticks` commands

In [None]:
u = np.arange(0, 4*np.pi, 0.1)
v = np.cos(u)
plt.plot(u, v)
plt.yticks([-1, 0, 1], ["Min", "Mid", "Max"], color='Blue');

# This command switches on the mini-ticks in between the larger ones
plt.minorticks_on()

---

On plots with multiple labels it is often a good idea to include a legend to tell viewers what the different lines mean.  To do this in matplotlib:

- Add the keyword `label` to each plot you want in the legend


- Call `plt.legend()` after you finish:



In [None]:
x = np.arange(0., 5., 0.01)
y = x**2
z = np.exp(x)
plt.plot(x, y, label='Quadratic')
plt.plot(x, z, label='Exponential')
plt.legend()

---


## <font color='blue'>Exercise 5</font>  

Look at the documentation for `plt.legend` and repeat the plot above but in the upper-middle of the axes, and with a larger font size.

Hint: If you look at the <a href="https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html">legend documentation on the matplotlib website</a> then it is easier to search through for the things you want

In [None]:
# Space for your workings


---

# 3. Multiple Plots

It's often useful to include multiple plots in the same image.  Matplotlib uses the term *Figure* to mean the complete image, and then *axis* to mean one rectangular region within that image.  

The plot commands we have been using above automatically make new figures and axes internally, but we can also control it manually.  In the latter case, axis objects have a method `plot` which behaves just like the `plt.plot` command.

You can use the function `plt.subplots` to make a figure divided into different subplots:

In [None]:
A = np.arange(0, 1., 0.1)

# Set up our plotting area
fig, axes = plt.subplots(ncols=2)

# "fig" is now a figure object representing the whole image.
# "axes" is an array of axis objects, each of which we can use to plot things.

# Plot something in each of the axes
axes[0].plot(A,  A**2, 'blue')
axes[1].plot(A, -A**2, 'red')

---

## <font color='blue'>Exercise 6</font>  

Make a set of four plots over two rows and two colums, and plot `A` vs  `A`, `A**2`, `A**3`, and `A**4` in them.

Hint: `axes` will be a 2x2 array in this case.


In [None]:
A = np.arange(0, 1., 0.1)

# Space for your workings


---

You can provide the keyword `figsize` to `plt.subplots` to make a larger or smaller image.  The unit size is inches (1 inch = 2.54 cm):

In [None]:
# make a wide and short figure
fig, axes = plt.subplots(ncols=3, figsize=(10, 2))
axes[0].plot(A)
axes[1].plot(A**2)
axes[2].plot(A**3)

---

## 4. 2D Data


As well as plotting 1D data as we've done above, matplotlib can display 2D data in a variety of ways, including directly as images as well as contour plots that show levels.

Let's load a height map of the Edinburgh area in a file, which we downloaded from the Ordnance Survey, and use that as our 2D data.



In [None]:
# This numpy function loads simple text data
height = np.loadtxt("data/elevation_map.txt")

# Make an empty plot, 8 inches x 8 inches (though may appear smaller on your screen)
plt.subplots(figsize=(8, 8))

# The imshow function shows a 2D data set with numbers mapped to colours
plt.imshow(height)

----

The choice of relationship between number and colour is important, and can bring out details that otherwise the eye does not see:

- The function `imshow` has arguments `vmin` and `vmax`, which chooses the lowest and highest values at which to saturate the colours.


-  It also has an argument `cmap`, which you usually set to a string naming one of the matplotlib colour maps.  Have a look at the  [complete list of matplotlib colour maps](https://matplotlib.org/stable/tutorials/colors/colormaps.html), part-way down the page.


- After running a plot you can call `plt.colorbar()` to add a colour scale.

## <font color='blue'>Exercise 7</font>  

Remake the visualization above, using the range 0 .. 200 meters as the colour range.  Change the colour map to a sensible one of your choices, and add a colour scale.

In [None]:
# Space for your workings


---

As well as a colour map like this one, matplotlib can also plot contours (lines of constant value, in this case lines of constant height, as you would find on a survey map).  This can help the eye pick out flat versus steep areas.

In [None]:
plt.subplots(figsize=(8,8))

# The default levels matplotlib will draw split the range evenly into
# ten groups.  That's not very useful here - instead let's explicitly specify
# the contour levels we want as 0, 10m, 20m, 30m, 40m, ..., 270m
levels=np.arange(0, 280, 10)

# Matplotlib assumes that our data starts at the bottom left for
# contour plots. We have to explicitly tell it with the origin
# keyword to start at the upper left instead.  Without 'origin=upper'
# our map would be upside-down.
plt.contour(height, origin='upper', levels=levels)


plt.colorbar()

---

## <font color='blue'>Exercise 8</font>  

Repeat the above plot with the function `contourf` instead of `contour`, and again choose a new colour scheme.

In [None]:
# Space for your workings for this exercise


---

## 5. Saving images

After making a plot, you can save it with the `plt.savefig` command.  When using an EDINA-based notebook, this will save into the cloud next to your notebook, and you can download it from the main Jupyter page where you explore your notebooks.

You can save in many formats - the suffix on the filename you give to `plt.savefig` will decide what format is used. In general PNGs are best for web usage and PDFs for publications.

**Taking screen shots of images will usually give you degraded quality**

---

## <font color='blue'>Exercise 9</font>  


The file `data/hiv_contraception_2010.txt` is derived from a 2010 world bank development indicator data set.  The first column is the percentage of women aged 15-49 who use some kind of contraceptive measure. The second is the percentage of the population with HIV.


- Load the data file with `np.loadtxt`.


- Turn it into two variables, for the first and second columns. 


- Make a scatter plot of the two variables


- Zoom to the range 0 - 10% on the HIV axis


- Add axis labels and a title


- Save the file as a PNG image


In [None]:
# Space for your working
