<img src="images/inmas.png" width=130x align='right' />

# Notebook 17 -  Visualization with Matplotlib

In this notebook we will learn how to visualize data using the Matplotlib module.

You’ll learn how to present your data visually using the following graphs:

- Box plots
- Histograms
- Pie charts
- Bar charts
- Figures with multiple plots



### Prerequisite
Notebook 16

### Housekeeping for matplotlib

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

### Figures
What if we wanted multiple plots? This is where figures help us:
- A figure is a diagram or a shape that can be formed by a collection of plots in different dimensions
- You can think of the figure object as a canvas that holds subplots and other plot elements inside it
- A figure can have one or more subplots inside it called axes, arranged in rows and columns
- Every figure has at least one axis

First let's look at a simple figure:

In [None]:
# Create a figure of one subplot (default)
fig, ax1 = plt.subplots() 

# Plot
ax1.scatter([1,2,3,4,5], [1,2,3,4,10], color='purple', marker='*') # scatter plot 

# Title, x and y labels, x and y limits
ax1.set_title('Scatterplot Purple Stars')
ax1.set_xlabel('x')  # x label
ax1.set_ylabel('y')  # y label
ax1.set_xlim(0, 6)   # x axis limits
ax1.set_ylim(0, 12); # y axis limits
# plt.show()         # Not required in interactive Python

### Subplots are an array of plots
We use `plt.subplots()` to create a figure that contains a single plot (default) 

This creates and returns two objects:
- the figure 
- the axes (subplots) inside the figure

We called `plt.scatter()` to draw the points. Since there was only one axes by default, it drew the points on that axes itself.

To add title and labels different methods are applied to the specific axes: 
- `set_title()`, `set_ylabel()`, and `set_xlabel()`

### Markers in matplotlib (`marker=`)
- **'.'** : Point
- **'o'** : Circle
- **'s'** : Square
- **'^'** : Triangle (up)
- **'v'** : Triangle (down)
- **'+'** : Plus
- **'x'** : X
- **'\*'** : Stars

### Line styles in matplotlib (`linestyle=`)
- **'-'** or 'solid': Solid line (default)
- **'--'** or 'dashed': Dashed line
- **'-.'** or 'dashdot': Dash-dot line
- **':'** or 'dotted': Dotted line

### Creating more than one subplot
- `subplots()` can create a 1x2 array of plots by passing arguments `plt.subplots(1, 2)` 

- We demonstrate this capability by drawing two side-by-side plots with different symbols (purple stars and orange circles)

`subplots(1, 2)` then returns a tuple of axes as handles to each of the subplots

- When displaying 2 plots side-by-side, we need to be mindful of the aspect ratio
    - For 1x2 figures, we would pass a geometry tuple of (10, 4.5) to figsize so that each graph is roughly square


In [None]:
def makeplot():
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,4.5))
    colors = {ax1: 'purple', ax2: 'orange'}
    markers = {ax1: '*', ax2: 'o'}
    titles = {ax1: 'Scatterplot Purple Stars', ax2: 'ScatterPlot Orange Circle'}
    for ax in (ax1, ax2):
        ax.scatter([1,2,3,4,5], [1,4,9,16,25], color=colors[ax], marker=markers[ax])
        ax.set_title(titles[ax])
        ax.set_xlabel('x')
        ax.set_ylabel('y')
        ax.set_xlim(0, 6)
        ax.set_ylim(0, 26)
    # plt.show() # Only required in a script - not in interactive Python
makeplot()

### Saving a Figure to a File

- Figures can be saved to a file by using of `plt.savefig()`
- Argument is the file name, with the extension
- Image format will be dictates by  the file extension (.png, .jpg, .tiff, etc.)

Let's see an example:

- File *myfig.png* will contain an image of that plot
- Run cell twice and notice that file gets happily overwritten - beware!

In [None]:
makeplot()
plt.savefig("myfig.png")


### Visualizing Data as Boxplots
Let's generate some arrays with a random numbers from a Normal distribution:

In [None]:
x1 = 1 + 1*np.random.randn(100)
x2 = 2 + 2*np.random.randn(100)
x3 = 3 + 3*np.random.randn(100)
x4 = 4 + 4*np.random.randn(100)

`x1` is a normal variable with mean 1 and standard deviation of 1, `x2` ...

We will plot these multidimensional data sets using a boxplot:

In [None]:
fig, ax = plt.subplots()
_ = ax.boxplot((x1, x2, x3, x4), vert=False, showmeans=True, meanline=True,
           labels=('x1', 'x2', 'x3', 'x4'), patch_artist=True,
           medianprops={'linewidth': 2, 'color': 'purple'},
           meanprops={'linewidth': 2, 'color': 'red'})
# plt.show()

### Boxplot arguments explained
The parameters of `boxplot()` define the following:

- **(x1,..):** is your data.
- **vert:** sets the plot orientation to horizontal when False. The default orientation is vertical
- **showmeans**: shows the mean of your data when True
- **meanline:** represents the mean as a line when True. The default representation is a point
- **labels:** the labels of your data
- **patch_artist:** determines how to draw the graph
- **medianprops:** denotes the properties of the line representing the median
- **meanprops:** indicates the properties of the line or dot representing the mean

### Boxplot figure explained

You can see four box plots. Each of them corresponds to a single dataset (x1, x2, x3, or x4) and shows the following:

- The **mean** is the red dashed line
- The **median** is the purple line
- The **first quartile** is the left edge of the blue rectangle
- The **third quartile** is the right edge of the blue rectangle
- The **interquartile range** is the length of the blue rectangle
- The **range** contains everything from left to right
- The **outliers** are the circles to the left and right

A box plot can show so much information in a single figure!

With this information, let's see the boxplot again:

In [None]:
fig, ax = plt.subplots()
ax.boxplot((x1, x2, x3, x4), vert=False, showmeans=True, meanline=True,
           labels=('x1', 'x2', 'x3', 'x4'), patch_artist=True,
           medianprops={'linewidth': 2, 'color': 'purple'},
           meanprops={'linewidth': 2, 'color': 'red'});
# plt.show()

### Histograms

- Histograms are particularly useful when there are a large number of unique values in a dataset
    - The histogram divides the values from a sorted dataset into intervals, also called **bins**
- Often, all bins are of equal width, though this doesn’t have to be the case
    - The values of the lower and upper bounds of a bin are called the **bin edges**
- By convention, all bins but the rightmost one are half-open
    - Bins include the values equal to the lower bounds, but not values equal to the upper bounds
    - The rightmost bin is closed because it includes both bounds
- The **frequency** is a single value that corresponds to each bin
    - It’s the number of elements of the dataset with the values between the edges of the bin   

### An simple example of a histogram
Suppose you have data in the \[0, 15\] interval
- If you divide a dataset in three equal bins with the 4 bin edges 0, 5, 10, and 15:
    - the first and leftmost bin contains the values in interval \[ 0, 5)
    - second bin contains the values in interval \[5, 10)
    - third and rightmost bin contains the values in interval \[10, 15\]


### Using NumPy to generate histograms
The function `np.histogram()` is a convenient way to parse data for making histograms:

In [None]:
x1 = 1 + 1*np.random.randn(100)
hist, bin_edges = np.histogram(x, bins=10)

In [None]:
hist

In [None]:
bin_edges

`numpy.histogram()` takes the data array and the number of bins and returns two NumPy arrays:

- hist contains the frequency or the number of items corresponding to each of the N bins
- bin_edges contains the N + 1 edges or bounds of the bins



### Creating Histograms plots with matplotlib
Histogram plots are generated by the `hist()` method

In [None]:
fig, ax = plt.subplots()
ax.hist(x, bin_edges, cumulative=False)
ax.set_title('Histogram')
ax.set_xlabel('x')
ax.set_ylabel('Frequency');

### Pie Charts
- Pie charts represent data with a small number of labels and given relative frequencies
- They work well even with the labels that can’t be ordered (like nominal data)

Let’s define data associated to three labels:

In [None]:
a, b, c = 128, 256, 1024

### Creating a Pie Chart plots with matplotlib
We use  the method `pie()` to create a pie chart

In [None]:
fig, ax = plt.subplots()
ax.pie((a, b, c), labels=('x', 'y', 'z'), autopct='%2.1f%%')
ax.set_title('Pie Chart');

**Note:**`autopct` defines the float format of the relative frequencies (in percent) shown on the figure 


### Bar Charts

- Bar charts also illustrate data that correspond to given labels or discrete numeric values
- They can show the pairs of data from two datasets
- Items of one set are the labels, while the corresponding items of the other are their frequencies
    - Optionally, they can show the errors related to the frequencies, as well

- Bar charts show parallel rectangles called bars
- Each bar corresponds to a single label and has a height proportional to the frequency or relative frequency of its label 

### Creating Bar Charts plots in matplotlib

We can create a bar chart using the `bar()` method:

In [None]:
x = np.arange(21)
y = np.random.randint(21, size=21)
fig, ax = plt.subplots()
ax.bar(x, y)
ax.set_xlabel('x')
ax.set_ylabel('y');

### Bar Charts with Categorical Data
What if we have categorical data? We can visualize it using bar charts as well! 

In [None]:
animals = ['Lions', 'Elephants', 'Birds', 'Sloths', 'Monkeys']
amounts = [23, 17, 35, 29, 12]
fig, ax= plt.subplots()
ax.bar(animals, amounts)
ax.set_title('Amount of Animals in the Zoo');

### Advanced bar charts in matplotlib
- We can plot multiple bar charts by playing with the thickness and the positions of the bars

Consider the following example:
- The script on next slide shows three bar charts of four bars
    - Bars are given a thickness of 0.25 units
    - Each bar chart is shifted 0.25 units from the previous one
    - The data object is a nested list containing number of candy sold each year

Note a new function called `plt.xticks()`

This function shifts the x-values of ticks, and then inputs the names of each of the 4 ticks in order

In [None]:
data = [[30, 25, 50, 20], [40, 23, 51, 17], [35, 22, 45, 19]]
X = np.arange(4)
plt.bar(X + 0.00, data[0], width = 0.25, label= 'Lollipops' , color="pink")
plt.bar(X + 0.25, data[1], width = 0.25, label= 'Chocolate', color="brown")
plt.bar(X + 0.50, data[2], width = 0.25, label= 'Gummies', color='yellow')
plt.title('Candy sold from 2017-2020')
plt.xticks(X+0.25, ('2017','2018', '2019', '2020')) 
plt.legend();

### Key Points
- matplotlib can create scatterplots, box plots, histograms, pie charts, and more
- Plots can be tiled using subplot
- axes are associated to each plot
- Different plots style can be shown side-by-side


### Further Reading
- This notebook focuses on representing data and keeps stylistic settings to a minimum.
[Here is a link](https://matplotlib.org/stable/api/pyplot_summary.html#basic) to the official documentation for used routines from `matplotlib.pyplot`, so you can explore the options that you didn't see here.

There are many tutorials out there that will help you build the type of graph you want:
* [Official Matplotlib Tutorials](https://matplotlib.org/3.3.3/tutorials/index.html)
* [Adding Flair To Your Graphs](https://data-flair.training/blogs/python-matplotlib-tutorial/)
* [More information at the Matplotlib web page](http://matplotlib.org/)
