<!--NAVIGATION-->
<span style='background: rgb(128, 128, 128, .15); width: 100%; display: block; padding: 10px 0 10px 10px'>< [Quiz](04.03-Quiz.ipynb) | [Contents](00.00-Index.ipynb) | [Evolutions: Seaborn, Plotly and Bokeh](05.02-Seaborn.ipynb) ></span>

<a href="https://colab.research.google.com/github/eurostat/e-learning/blob/main/python-official-statistics/05.01-Matplotlib.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>

<a id='top'></a>

# Matplotlib
## Content  
- [Saving Figures to File](#save)
- [Two Ways of Using](#two)
- [Simple Line Plots](#simple-line)
- [Adjusting the Plot](#adjust)
- [Simple Scatter Plots](#scatter)
- [Visualizing Errors](#error)
- [Visualizing a 3D Function in 2D](#f3d)
- [Histograms and Density](#histo)
- [Multiple Subplots](#multiple)

Matplotlib is a multi-platform data visualization library built on NumPy arrays, and designed to work with the broader SciPy stack.
It was conceived by John Hunter in 2002, originally as a patch to IPython for enabling interactive MATLAB-style plotting.  
Some preparations before using matplotlib and the first example:

In [None]:
import matplotlib as mpl
import matplotlib.pyplot as plt

# directive to choose appropriate aesthetic styles for our figures
plt.style.use('classic')

# static images of your plot embedded in the notebook
# for intractive plots use: 
# %matplotlib notebook
%matplotlib inline

# one small example
import numpy as np
x = np.linspace(0, 10, 100)

# taking the plot as a figure object (for saving)
fig = plt.figure()
plt.plot(x, np.sin(x), '--')
plt.plot(x, np.cos(x))

<a id='save'></a>

## Saving Figures to File

One nice feature of Matplotlib is the ability to save figures in a wide variety of formats.
Saving a figure can be done using the ``savefig()`` command.
For example, to save the previous figure as a PNG file, you can run this:

In [None]:
# saving as png image file
fig.savefig('img/saved_figure.png')
# checking the image saved
from IPython.display import Image
Image('img/saved_figure.png')

<a id='two'></a>

## Two Ways of Using
A powerful buit confusing feature of Matplotlib is its dual interfaces: a convenient MATLAB-style state-based interface, and a more powerful object-oriented interface. We'll quickly highlight the differences between the two here. We will switch between the MATLAB-style and object-oriented interfaces, throughout this chapter, depending on what is most convenient.
### MATLAB-style Interface
Matplotlib was originally written as a Python alternative for MATLAB users, and much of its syntax reflects that fact.
The MATLAB-style tools are contained in the pyplot (``plt``) interface.  
The following code will probably look quite familiar to MATLAB users:

In [None]:
plt.figure()  # create a plot figure

# create the first of two panels and set current axis
plt.subplot(2, 1, 1) # (rows, columns, panel number)
plt.plot(x, np.sin(x))

# create the second panel and set current axis
plt.subplot(2, 1, 2)
plt.plot(x, np.cos(x))

While this stateful interface is fast and convenient for simple plots, it is easy to run into problems for more complicated plots.

### Object-oriented interface

The object-oriented interface is available for when you want more control over your figure.
Rather than depending on some notion of an "active" figure or axes, in the object-oriented interface the plotting functions are *methods* of explicit ``figure`` and ``axes`` objects.
To re-create the previous plot using this style of plotting, you might do the following:

In [None]:
# First create a grid of plots
# ax will be an array of two Axes objects
fig, ax = plt.subplots(2)
# Call plot() method on the appropriate object
ax[0].plot(x, np.sin(x))
ax[1].plot(x, np.cos(x))

<a id='simple-line'></a>

## Simple Line Plots
The simplest of all plots is the visualization of a single function $y = f(x)$.
Here we will take a first look at creating a simple plot of this type.  
Some notions first:  
`figure`: The container that contains all the objects representing axes, graphics, text, and labels.  
`axes`: The bounding box with ticks and labels, which will contain the plot elements that make up the visualization.

In [None]:
# playing a little with the styling
plt.style.use('seaborn-whitegrid')

fig = plt.figure()
ax = plt.axes()

x = np.linspace(0.1, 10, 1000)
ax.plot(x, np.sin(x))

<a id='adjust'></a>

## Adjusting the Plot
We'll now dive into some more details about how to control the appearance of the axes and lines. There are a lot of elements adjustable here, we will touch these:
- Line colors
- Line styles
- Axes limits
- Labeling plots  

First with MATLIB interface:

In [None]:
# specify color and style by name
plt.plot(x, np.sin(x), color='green', linestyle='dashed', label='sin(x)')
# grayscale between 0 and 1
plt.plot(x, np.cos(x), color='0.45', label='cos(x)')
# hex code (RRGGBB from 00 to FF)
# style using a code (- solid, -- dashed, -. dashdot, : dotted)
plt.plot(x, np.log(x), color='#EE4422', linestyle=':', label='log(x)')
# axes limits
plt.xlim(-1, 11)
plt.ylim(-1.3, 2.3)
# title, labels, legend
plt.title("Sine, Cosine & Log Curves")
plt.xlabel("x")
plt.ylabel("f(x)")
# also axis can plt.axis('equal')
plt.legend()

In the object-oriented interface to plotting, rather than calling these functions individually, it is often more convenient to use the ``ax.set()`` method to set all these properties at once:

In [None]:
ax = plt.axes()
ax.plot(x, np.sin(x), color='green', linestyle='dashed', label='sin(x)')
ax.plot(x, np.cos(x), color='0.45', label='cos(x)')
ax.plot(x, np.log(x), color='#EE4422', linestyle=':', label='log(x)')
ax.set(xlim=(-1, 11), ylim=(-1.3, 2.3), xlabel='x', ylabel='f(x)', title='Sine, Cosine & Log Curves')
ax.legend()

<a id='scatter'></a>

## Scatter Plots

A method of creating scatter plots is the ``plt.scatter`` function, which can be used very similarly to the ``plt.plot`` function:

In [None]:
x = np.linspace(0, 10, 20)
plt.scatter(x, np.sin(x), marker='o', s=60)

The main difference of ``plt.scatter`` from ``plt.plot`` is that it can be used to create scatter plots where the properties of each individual point (size, face color, edge color, etc.) can be individually controlled or mapped to data.

Let's create a random scatter plot with points of many colors and sizes.
In order to better see the overlapping results, we'll also use the ``alpha`` keyword to adjust the transparency level:

In [None]:
rng = np.random.RandomState(131)
x = rng.randn(20)
y = rng.randn(20)
colors = rng.rand(20)
sizes = 4000 * rng.rand(20)

plt.scatter(x, y, c=colors, s=sizes, alpha=0.3, cmap='viridis')
# show color scale
plt.colorbar()

<a id='error'></a>

## Visualizing Errors
For scientific measurement, accurate accounting for errors is nearly as important, if not more important, than accurate reporting of the number itself. In visualization of data and results, showing these errors effectively can make a plot convey much more complete information.  

A basic errorbar can be created with a single Matplotlib function call:

In [None]:
x = np.linspace(0, 10, 50)
dy = 0.8
y = np.sin(x) + dy * np.random.randn(50)

plt.plot(x, np.sin(x))
plt.errorbar(x, y, yerr=dy, fmt='.k')

<a id='f3d'></a>

## Visualizing a 3D Function
### Visualizing in 2D
Sometimes it is useful to display three-dimensional data in two dimensions using contours or color-coded regions.
There are three Matplotlib functions that can be helpful for this task: ``plt.contour`` for contour plots, ``plt.contourf`` for filled contour plots, and ``plt.imshow`` for showing images.  

A contour plot can be created with the ``plt.contour`` function.
It takes three arguments: a grid of *x* values, a grid of *y* values, and a grid of *z* values.
The *x* and *y* values represent positions on the plot, and the *z* values will be represented by the contour levels.
Perhaps the most straightforward way to prepare such data is to use the ``np.meshgrid`` function, which builds two-dimensional grids from one-dimensional arrays:

In [None]:
import numpy as np

def f(x, y):
    return np.sin(x) + np.cos(y * x)* np.cos(y)

x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)

X, Y = np.meshgrid(x, y)
Z = f(X, Y)

plt.contour(X, Y, Z, colors='black')

By default just a single color is used: negative values are represented by dashed lines, and positive values by solid lines.
The lines can be color-coded by specifying a colormap with the ``cmap`` argument (here blue for ``+`` and red for ``-``).
Here, we'll also specify that we want more lines to be drawn—20 equally spaced intervals within the data range:

In [None]:
plt.contour(X, Y, Z, 20, cmap='RdBu')

We can improve this by switching to a filled contour plot using the ``plt.contourf()`` function (notice the ``f`` at the end), which uses largely the same syntax as ``plt.contour()``.

Additionally, we'll add a ``plt.colorbar()`` command for color/values:

In [None]:
plt.contourf(X, Y, Z, 20, cmap='RdBu')
plt.colorbar()

One issue with this plot is that the color steps are discrete rather than continuous.
This could be remedied by using the ``plt.imshow()`` function, which interprets a two-dimensional grid of data as an image.

In [None]:
plt.imshow(Z, extent=[0, 5, 0, 5], origin='lower',
           cmap='RdBu')
plt.colorbar()

Or we can combine the contour with an image plot, and adding some labeling for levels:

In [None]:
contours = plt.contour(X, Y, Z, 5, colors='black')
plt.clabel(contours, inline=True, fontsize=8)
plt.imshow(Z, extent=[0, 5, 0, 5], origin='lower', cmap='RdBu', alpha=0.7)
plt.colorbar()

### Visualizing in 3D
A three-dimensional axes can be created by passing the keyword ``projection='3d'`` to any of the normal axes creation routines:

In [None]:
fig = plt.figure()
ax = plt.axes(projection='3d')

The basic three-dimensional plot is a line or collection of scatter plot created from sets of (x, y, z) triples.
These can be created using the ``ax.plot3D`` and ``ax.scatter3D`` functions:

In [None]:
ax = plt.axes(projection='3d')

# Data for a three-dimensional line
zline = np.linspace(0, 15, 1000)
xline = np.sin(zline)
yline = np.cos(zline)
ax.plot3D(xline, yline, zline, 'gray')

# Data for three-dimensional scattered points
zdata = 15 * np.random.random(100)
xdata = np.sin(zdata) + 0.1 * np.random.randn(100)
ydata = np.cos(zdata) + 0.1 * np.random.randn(100)
ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Greens')

Like two-dimensional ``ax.contour`` plots, ``ax.contour3D`` requires all the input data to be in the form of two-dimensional regular grids, with the Z data evaluated at each point.
Here we'll show a three-dimensional contour of the same function as before, with some added change in the viewing angle:

In [None]:
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.view_init(50, 25)
ax.contour3D(X, Y, Z, 50, cmap='RdBu')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')

The same example as a wireframe:

In [None]:
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.plot_wireframe(X, Y, Z)


A surface plot is like a wireframe plot, but each face of the wireframe is a filled polygon:

In [None]:
ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap='RdBu', edgecolor='none')

<a id='histo'></a>

## Histograms and Density
A simple histogram can be a great first step in understanding a dataset. Here with some options:

In [None]:
data = np.random.randn(1000)
plt.hist(data, bins=20, histtype='stepfilled', color='steelblue', edgecolor='none')

Sometimes it is very useful to compare histograms for several distributions using transparency ``alpha``. Here with some options reusing:

In [None]:
x1 = np.random.normal(0, 0.8, 1000)
x2 = np.random.normal(-2, 1, 1000)
x3 = np.random.normal(3, 2, 1000)

options = dict(histtype='stepfilled', alpha=0.4, bins=30)

plt.hist(x1, **options)
plt.hist(x2, **options)
plt.hist(x3, **options)

### Two-dimensional histogram
Just as we create histograms in one dimension by dividing the number-line into bins, we can also create histograms in two-dimensions by dividing points among two-dimensional bins:

In [None]:
mean = [0, 0]
cov = [[2, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 10000).T
plt.hist2d(x, y, bins=30, cmap='Blues')
cb = plt.colorbar()

The two-dimensional histogram creates a tesselation of squares across the axes.
Another natural shape for such a tesselation is the regular hexagon:

In [None]:
plt.hexbin(x, y, gridsize=26, cmap='Blues')
cb = plt.colorbar()

<a id='multiple'></a>

## Multiple Subplots
Sometimes it is helpful to have different views of data side by side.
For this Matplotlib has the concept of *subplots*: groups of smaller axes that can exist together within a single figure.  
### Using ``plt.axes``
The most basic method of creating an axes is to use the ``plt.axes`` function.
By default this creates a standard axes object that fills the entire figure.
``plt.axes`` also takes an optional argument, a list, that represent ``[left, bottom, width, height]`` in the figure coordinate system, which ranges from 0 at the bottom left of the figure to 1 at the top right of the figure:

In [None]:
ax1 = plt.axes([0, 0, 1, 1])  # standard axes
ax2 = plt.axes([0.65, 0.65, 0.3, 0.3])

In [None]:
# the equivalent with the object-oriented interface
fig = plt.figure()
ax1 = fig.add_axes([0, 0, 1, 1])
ax2 = fig.add_axes([0.65, 0.65, 0.3, 0.3])

### Simple Grids of Subplots

To align columns or rows of subplots we can use ``plt.subplot()``, which creates a single subplot within a grid.
This command takes three integer arguments—the number of rows, the number of columns, and the index of the plot to be created in this scheme, which runs from the upper left to the bottom right:

In [None]:
for i in range(1, 7):
    plt.subplot(2, 3, i)
    plt.text(0.5, 0.5, str(i), fontsize=18, ha='center')

In [None]:
# the equivalent with the object-oriented interface, with some spacing
fig = plt.figure()
fig.subplots_adjust(hspace=0.4, wspace=0.4)
for i in range(1, 7):
    ax = fig.add_subplot(2, 3, i)
    ax.text(0.5, 0.5, str(i), fontsize=18, ha='center')

### The Whole Grid in One Go

The approach just described can become quite tedious when creating a large grid of subplots, especially if you'd like to hide the x- and y-axis labels on the inner plots.
``plt.subplots()`` is the easier tool to use (note the ``s`` at the end of ``subplots``). Rather than creating a single subplot, this function creates a full grid of subplots in a single line, returning them in a NumPy array.
The arguments are the number of rows and number of columns, along with optional keywords ``sharex`` and ``sharey``, which allow you to specify the relationships between different axes:

In [None]:
fig, ax = plt.subplots(2, 3, sharex='col', sharey='row')

for i in range(2):
    for j in range(3):
        ax[i, j].text(0.5, 0.5, str((i, j)), fontsize=18, ha='center')

### More Complicated Arrangements

To go beyond a regular grid to subplots that span multiple rows and columns, ``plt.GridSpec()`` is the best tool.
The ``plt.GridSpec()`` object does not create a plot by itself; it is simply a convenient interface that is recognized by the ``plt.subplot()`` command.
For example, a gridspec for a grid of two rows and three columns with some specified width and height space looks like this:

In [None]:
grid = plt.GridSpec(2, 3, wspace=0.4, hspace=0.3)
plt.subplot(grid[0, 0])
plt.subplot(grid[0, 1:])
plt.subplot(grid[1, :2])
plt.subplot(grid[1, 2])

This type of flexible grid alignment has a wide range of uses.
I most often use it when creating multi-axes histogram plots like the ones shown here:

In [None]:
# Create some normally distributed data
mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 3000).T

# Set up the axes with gridspec
fig = plt.figure(figsize=(6, 6))
grid = plt.GridSpec(4, 4, hspace=0.2, wspace=0.2)
main_ax = fig.add_subplot(grid[:-1, 1:])
y_hist = fig.add_subplot(grid[:-1, 0], xticklabels=[], sharey=main_ax)
x_hist = fig.add_subplot(grid[-1, 1:], yticklabels=[], sharex=main_ax)

# scatter points on the main axes
main_ax.plot(x, y, 'ok', markersize=3, alpha=0.4, color='green')

# histogram on the attached axes
x_hist.hist(x, 40, histtype='stepfilled', orientation='vertical', color='gray')
x_hist.invert_yaxis()

y_hist.hist(y, 40, histtype='stepfilled', orientation='horizontal', color='gray')
y_hist.invert_xaxis()

<!--NAVIGATION-->
<span style='background: rgb(128, 128, 128, .15); width: 100%; display: block; padding: 10px 0 10px 10px'>< [Quiz](04.03-Quiz.ipynb) | [Contents](00.00-Index.ipynb) | [Evolutions: Seaborn, Plotly and Bokeh](05.02-Seaborn.ipynb) > [Top](#top) ^ </span>

<span style='background: rgb(128, 128, 128, .15); width: 100%; display: block; padding: 10px 0 10px 10px'>This is the Jupyter notebook version of the __Python for Official Statistics__ produced by Eurostat; the content is available [on GitHub](https://github.com/eurostat/e-learning/tree/main/python-official-statistics).
<br>The text and code are released under the [EUPL-1.2 license](https://github.com/eurostat/e-learning/blob/main/LICENSE).</span>