# Introduction to Statistics with Python

```
Koen Plevoets
Last modified: 2020-09-10
```

# Class 4

## Chapter 6: Visualization with Matplotlib

- 6.1: Creating graphs with Matplotlib
- 6.2: Statistical graphs with Matplotlib
- 6.3: Style sheets
- 6.4: Object-oriented approach
- 6.5: Graphical methods in pandas

Python has some interesting modules for visualizing data. The main one is **Matplotlib**, which builds on NumPy.

The Matplotlib module consists of many submodules which provide flexible ways of making graphs. There are **two** common **approaches** (API's) in Matplotlib:

 - The various **functions** in the submodule `pyplot` which create graphs like in MATLAB or R.
 - The **object-oriented** creation of graphs which involve methods equivalent to `pyplot` functions.

**Caution**: There used to be a third approach making use of the `pylab` module but this is now **deprecated**! The reason is that `pylab` conflicts with other functionalities in Python.

An important step **before** you import matplotlib is to specify your "**backend**". The backend is the "**engine**" which makes your graphs (the frontend is your code). See more details on [this FAQ page](https://matplotlib.org/faq/usage_faq.html#what-is-a-backend).

In **IPython** or **Jupyter Notebook**, the backend can be selected with the following ("magic") commands:
 - `%matplotlib inline`
 - `%matplotlib qt4`
 - `%matplotlib qt5`
 - `%matplotlib tk`
 - Etc.

In **Spyder**, the backend can be set in the menu **Tools > Preferences > IPython console > Graphics**. If you wish your graphs in external windows, then you can select the option "Automatic".

For a **Jupyter Notebook**, inline plots make most sense:

In [None]:
%matplotlib inline

After you have specified the backend, you can import (submodule `pyplot` of) Matplotlib:

In [None]:
import matplotlib.pyplot as plt

### 6.1 Creating graphs with Matplotlib

The basic function in `pyplot` is `plot()` which produces a **2-D line plot** by default.

In [None]:
plt.plot([1, 2, 3, 4, 5])

In [None]:
plt.plot([1, 2, 3, 4, 5], [2, 4, 6, 8, 10])

If you make plots in **external windows**, they may not appear immediately. In that case the function `show()` will make the external window visible. That is why the `show()` function often concludes a block of visualization commands.

In [None]:
plt.plot([1, 2, 3, 4, 5], [2, 4, 6, 8, 10])
plt.show()

A plot in an **external window** will be **updated as long as it stays open**. New plot commands will automatically adjust the outlook of the plot, e.g. the axis scales. **Inline plots**, on the other hand, are fully defined by **one code block**.

In [None]:
plt.plot([1, 2, 3, 4, 5], [2, 4, 6, 8, 10])

In [None]:
plt.plot([1, 2, 3, 4, 5], [2, 4, 6, 8, 10])
plt.plot([1, 2, 3, 4, 5], [3, 6, 9, 12, 15])
plt.show()

This means that external plot windows need to be **closed** before you can work on. This can be done **manually** or with the function `close()`.

In [None]:
plt.plot([1, 2, 3, 4, 5], [2, 4, 6, 8, 10])
plt.plot([1, 2, 3, 4, 5], [3, 6, 9, 12, 15])
plt.close()    # Disappears

The function `savefig()` **saves** the current plot to an external file.

In [None]:
plt.plot([1, 2, 3, 4, 5], [1, 8, 27, 64, 125])
plt.savefig('cubeplot.png')    # See your working directory.
plt.savefig('cubeplot.pdf')    # Idem.
plt.close()

The **available graphical formats** depend on your **backend**. Usually, all the common formats are supported: `ps`, `eps`, `jpg`, `jpeg`, `pdf`, `pgf`, `png`, `raw`, `rgba`, `svg`, `svgz`, `tif` and `tiff`. See the help file of the `savefig()` function for other useful arguments such as `dpi`, `transparent`,  etc.

It is possible to visualize **multiple datasets** by simply specifying the **coordinates next to each other**. Python figures out that the **odd arguments** relate to the **horizontal axis** and the **even arguments** to the **vertical axis**:

In [None]:
plt.plot([1, 2, 3, 4, 5], [2, 4, 6, 8, 10], [1, 2, 3, 4], [3, 6, 9, 12], [1, 2, 3], [1, 8, 27])
plt.show()

When your data are in an object with **column names** (e.g. a **DataFrame**), you can specify the strings of the column names. The **data object** then has to be specified as the argument `data`.

In [None]:
import pandas as pd
dat = pd.DataFrame( {
        'one' : [1, 2, 3, 4, 5],
        'two' : [2, 4, 6, 8, 10],
        'tri' : [3, 6, 9, 12, 15],
        'cub' : [1, 8, 27, 64, 125]
        } )
print(dat)

In [None]:
plt.plot('one', 'tri', data = dat)
plt.show()

However, the specification of **multiple column names** next to each other is **not possible**. Instead, you have to make **several calls** to `plot()` in an open window/one code block:

In [None]:
plt.plot('one', 'two', 'one', 'tri', 'one', 'cub', data = dat)    # Error

In [None]:
plt.plot('one', 'two', data = dat)
plt.plot('one', 'tri', data = dat)
plt.plot('one', 'cub', data = dat)
plt.show()

The outlook of a plot can be specified with the **third argument**, called "**format**". It is an **abbreviation** of the **color**, the **marker type**, and optionally the **line type**:

- Color:
  - `b` : blue
  - `g` : green
  - `r` : red
  - `c` : cyan
  - `m` : magenta
  - `y` : yellow
  - `k` : black
  - `w` : white

- Marker:
  - `.` : point marker
  - `,` : pixel marker
  - `o` : circle marker
  - `v` : triangle_down marker
  - `^` : triangle_up marker
  - `<` : triangle_left marker
  - `>` : triangle_right marker
  - `1` : tri_down marker
  - `2` : tri_up marker
  - `3` : tri_left marker
  - `4` : tri_right marker
  - `s` : square marker
  - `p` : pentagon marker
  - `*` : star marker
  - `h` : hexagon1 marker
  - `H` : hexagon2 marker
  - `+` : plus marker
  - `x` : x marker
  - `D` : diamond marker
  - `d` : thin diamond marker
  - `|` : vline marker
  - `_` : hline marker

- Line:
  - `-` : solid line
  - `--` : dashed line
  - `-.` : dash-dot line
  - `:` : dotted line

In [None]:
plt.plot('one', 'cub', 'o', data = dat)
plt.show()

In [None]:
plt.plot('one', 'cub', '+', data = dat)
plt.show()

In [None]:
plt.plot('one', 'cub', 'ro', data = dat)
plt.show()

In [None]:
plt.plot('one', 'cub', 'bs', data = dat)
plt.show()

In [None]:
plt.plot('one', 'cub', 'g^--', data = dat)
plt.show()

In [None]:
plt.plot('one', 'cub', 'kx:', data = dat)
plt.show()

A simple **scatter plot** can also be made with the function `scatter()` but the arguments differ: the format needs to be separated into two arguments `color` and `marker`.

In [None]:
plt.scatter('one', 'cub', color = 'g', marker = '*', data = dat)
plt.show()

There are many more **colors** in Matplotlib than the eight ones above, of course. Colors can be specified in a multitude of ways. The following is taken from [this webpage](https://matplotlib.org/api/colors_api.html):

 - an **RGB** or **RGBA tuple** of **float values** in (0, 1) (e.g. `(0.1, 0.2, 0.5)` or `(0.1, 0.2, 0.5, 0.3)`)
 - a **hex RGB** or **RGBA string** (e.g. `'#0F0F0F'` or `'#0F0F0F0F'`)
 - a **string** representation of a **float value** in (0, 1) inclusive for **gray level** (e.g. `'0.5'`)
 - one of {`'b'`, `'g'`, `'r'`, `'c'`, `'m'`, `'y'`, `'k'`, `'w'`}
 - a [X11/CSS4 color name](https://matplotlib.org/gallery/color/named_colors.html)
 - a name from the [xkcd color survey](https://xkcd.com/color/rgb/); prefixed with `xkcd:` (e.g., `'xkcd:sky blue'`)
 - one of {`'tab:blue'`, `'tab:orange'`, `'tab:green'`, `'tab:red'`, `'tab:purple'`, `'tab:brown'`, `'tab:pink'`, `'tab:gray'`, `'tab:oliv'`, `'tab:cyan'`} which are the **Tableau Colors** from the "T10" categorical palette (which is the default color cycle)
 - a "**CN**" color spec, i.e. `C` followed by a single digit, which is an index into the default property cycle (`matplotlib.rcParams['axes.prop_cycle']`)

In [None]:
plt.plot('one', 'cub', color = (1, 0.65, 0), marker = 'D', data = dat)
plt.show()

In [None]:
plt.plot('one', 'cub', color = '#40E0D0', marker = 'D', data = dat)
plt.show()

In [None]:
plt.plot('one', 'cub', color = '0.3', marker = 'D', data = dat)
plt.show()

In [None]:
plt.plot('one', 'cub', color = '0.7', marker = 'D', data = dat)
plt.show()

In [None]:
plt.plot('one', 'cub', color = 'k', marker = 'D', data = dat)
plt.show()

In [None]:
plt.plot('one', 'cub', color = 'lime', marker = 'D', data = dat)
plt.show()

In [None]:
plt.plot('one', 'cub', color = 'xkcd:ocean green', marker = 'D', data = dat)
plt.show()

In [None]:
plt.plot('one', 'cub', color = 'tab:olive', marker = 'D', data = dat)
plt.show()

In [None]:
plt.plot('one', 'cub', color = 'C3', marker = 'D', data = dat)
plt.show()

It is also possible to select and even create **colormaps** in Matplotlib, but we will not cover that here.

If you use the `marker` argument, then you can select other **markers**. The full list can be found on [this webpage](https://matplotlib.org/api/markers_api.html). There are also some arguments for changing the outlook of the marker:

- `markeredgecolor`
- `markeredgewidth`
- `markerfacecolor`
- `markersize`
- Etc.

In [None]:
plt.plot('one', 'cub', marker = 'D', markersize = 1, data = dat)
plt.show()

In [None]:
plt.plot('one', 'cub', marker = 'D', markersize = 10,
         markeredgecolor = 'crimson', markeredgewidth = 2, data = dat)
plt.show()

The **line style** can also be selected with the argument `linestyle`. The **line width** can be changed with the argument `linewidth`.

In [None]:
plt.plot('one', 'cub', marker = '*', linestyle = '-.', data = dat)
plt.show()

In [None]:
plt.plot('one', 'cub', linestyle = '-.', linewidth = 3, data = dat)
plt.show()

The `plot()` function always creates (a list of) objects of the **class** `Line2D`. See all its attributes/arguments at [this webpage](https://matplotlib.org/api/_as_gen/matplotlib.lines.Line2D.html).

As you can see, `plot()` by default selects (and updates) **axis scales** so that it can visualize all data coordinates. You can set **axis limits** yourself with the function `axis()` which requires a sequence of four numbers: the lower limit of horizontal axis, its upper limit, the lower limit of the vertical axis and its upper limit.

In [None]:
plt.plot('one', 'cub', data = dat)
plt.axis( [-2, 7, -2, 100] )
plt.show()

The values in `axis()`  in fact map onto the **methods** `.set_xlim()` and `.set_ylim()` in the **object-oriented approach**.  The tick locations and formats can also be changed but we will not cover that here.

The **scale** of the axes can be changed with the function `xscale()` and `yscale()` which take a string argument: `'linear'`, `'log'`, `'symlog'` or `'logit'`.

In [None]:
plt.plot('one', 'cub', data = dat)
plt.xscale('log')
plt.yscale('log')
plt.show()

The outlook of a graph can also be modified by adding **text** to it. The Matplotlib module has the functions `xlabel()`, `ylabel()`, `title()` and `text()`.

In [None]:
plt.plot('one', 'cub', data = dat)
plt.xlabel('x')
plt.ylabel('$y = x^3$')
plt.title('Cubic graph')
plt.text(4, 64, '64 = 4*4*4')
plt.show()

The `xlabel()` and `ylabel()` functions can be used with the argument `labelpad` which controls the distance to the axis:

In [None]:
plt.plot('one', 'cub', data = dat)
plt.xlabel('x', labelpad = 0.1)
plt.ylabel('$y = x^3$', labelpad = 12)
plt.show()

The `title()` function can be used with an argument `loc` in order to determine its position. It can take the values `'center'`, `'left'` or `'right'`.

In [None]:
plt.plot('one', 'cub', data = dat)
plt.title('Cubic graph', loc = 'left')
plt.show()

Furthermore, all the text functions can have arguments controlling the text properties:

- `color`
- `family` : one of `'serif'`, `'sans-serif'`, `'cursive'`, `'fantasy'` or `'monospace'`
- `fontname` : one of `'Courier'`, `'Helvetica'` , `'Times New Roman'`, ...
- `fontsize` : number or one of `'xx-small'`, `'x-small'`, `'small'`, `'medium'`, `'large'`, `'x-large'` or `'xx-large'`
- `fontstyle` : one of `'normal'`, `'italic'` or `'oblique'`
- `fontweight` : one of `'normal'`, `'bold'`, `'heavy'`, `'light'`, `'ultrabold'` or `'ultralight'`
- `horizontalalignment` : one of `'center'`, `'right'` or `'left'`
- `ha` : idem as `horizontalalignment`
- `name` : idem as `fontname`
- `rotation` : angle in degrees or one of `'vertical'` or `'horizontal'`
- `size` : idem as `fontsize`
- `style` : idem as `fontstyle`
- `variant` : one of `'normal'` or `'small-caps'`
- `verticalalignment` : one of `'center'`, `'top'`, `'bottom'`, `'baseline'` or `'center_baseline'`
- `va` : idem as `verticalalignment`
- `weight` : idem as `fontweight`
- Etc.

See the full list at [this webpage](https://matplotlib.org/tutorials/text/text_props.html).

In [None]:
plt.plot('one', 'cub', data = dat)
plt.title('Cubic graph', loc = 'left', size = 3)
plt.show()   # Poor layout

In [None]:
plt.plot('one', 'cub', data = dat)
plt.title('Cubic graph', size = 'x-large', style = 'oblique', weight = 'bold')
plt.ylabel('$y = x^3$', labelpad = 12, rotation = 'horizontal')
plt.show()

An important feature of the `text()` function is that it **cannot** take **sequences/vectors** of values. That means that you have to run a **loop** if you wish to depict several text labels:

In [None]:
plt.plot('one', 'cub', data = dat)
xx = [2, 3, 4]
yy = [8, 27, 64]
for x, y in zip(xx, yy):
    plt.text(x, y, str(y) + '=' + str(x) + '$^3$', ha = 'right')

plt.show()

If your plot contains multiple data sets, then you can add a **legend** with clarifying labels. The function `legend()` **automatically** figures out which line type or marker type belongs with which label:

In [None]:
plt.plot('one', 'two', data = dat, color = 'blue')
plt.plot('one', 'tri', data = dat, color = 'red')
plt.plot('one', 'cub', data = dat, color = 'green')
plt.legend( ['Double', 'Triple', 'Cubic'] )
plt.show()

In [None]:
plt.plot('one', 'two', 'bo', data = dat)
plt.plot('one', 'tri', 'r+', data = dat)
plt.plot('one', 'cub', 'g*', data = dat)
plt.legend( ['Double', 'Triple', 'Cubic'] )
plt.show()

In [None]:
plt.plot('one', 'two', 'bo-', data = dat)
plt.plot('one', 'tri', 'r+--', data = dat)
plt.plot('one', 'cub', 'g*:', data = dat)
plt.legend( ['Double', 'Triple', 'Cubic'] )
plt.show()

In fact, the `plot()` function can be used with an argument `label` which `legend()` will retrieve:

In [None]:
plt.plot('one', 'two', 'bo-', data = dat, label = 'Double')
plt.plot('one', 'tri', 'r+--', data = dat, label = 'Triple')
plt.plot('one', 'cub', 'g*:', data = dat, label = 'Cubic')
plt.legend() # No arguments necessary!
plt.show()

The `legend()` function also has an argument `loc` for its placement. It can take either a string value or a numeric code from the following list:

- `'best'` or `0`
- `'upper right'` or `1`
- `'upper left'` or `2`
- `'lower left'` or `3`
- `'lower right'` or `4`
- `'right'` or `5`
- `'center left'` or `6`
- `'center right'` or `7`
- `'lower center'` or `8`
- `'upper center'` or `9`
- `'center'` or `10`

In [None]:
plt.plot('one', 'two', 'bo-', data = dat)
plt.plot('one', 'tri', 'r+--', data = dat)
plt.plot('one', 'cub', 'g*:', data = dat)
plt.legend( ['Double', 'Triple', 'Cubic'], loc = 'center left' )
plt.show()

### 6.2 Statistical graphs with Matplotlib

Of course, Matplotlib has other plot types than the 2D line plot. For **univariate** data, there is the **histogram** with `hist()` and the **boxplot** with `boxplot()`:

In [None]:
scores = [7, 8, 9, 2, 10, 9, 9, 9, 9, 4, 5, 6, 1, 5, 6, 7, 8, 6, 1, 10]

In [None]:
plt.hist(scores)
plt.show()

In [None]:
plt.hist(scores, density = True)
plt.show()

In [None]:
plt.boxplot(scores)
plt.show()

In [None]:
plt.boxplot(scores, notch = True)
plt.show()

In [None]:
plt.boxplot(scores, vert = False)
plt.show()

The `boxplot()` function can also be used with a **matrix** in order to produce a **grouped boxplot**:

In [None]:
scores2 = [8, 5, 4, 7, 7, 8, 9, 10, 10, 9, 8, 5, 3, 9, 8, 5, 6, 10, 7, 9]
scr_grp = [ scores, scores2]
scr_grp

In [None]:
plt.boxplot(scr_grp, labels = ['Group 1', 'Group 2'])
plt.show()

For **categorical** data, there is the **bar chart** with both `bar()` and `barh()` and there is the **pie chart** with `pie()`:

In [None]:
plt.bar(['Grp 1', 'Grp 2', 'Grp 3', 'Grp 4'], [10, 30, 70, 90])
plt.show()

In [None]:
plt.bar(['Grp 1', 'Grp 2', 'Grp 3', 'Grp 4'], height = [10, 30, 70, 90], width = 0.5)
plt.show()

In [None]:
plt.barh(['Grp 1', 'Grp 2', 'Grp 3', 'Grp 4'], [10, 30, 70, 90])
plt.show()

In [None]:
plt.barh(['Grp 1', 'Grp 2', 'Grp 3', 'Grp 4'], width = [10, 30, 70, 90], height = 0.5)
plt.close()

Both a **stacked bar chart** and a **grouped bar chart** are only possible with the **methods** of either Matplotlib or pandas.

In [None]:
plt.pie([10, 30, 70, 90])
plt.show()

In [None]:
plt.pie([10, 30, 70, 90], explode = [0, 0, 0, 0], labels = ['Grp 1', 'Grp 2', 'Grp 3', 'Grp 4'],
        colors = ['blue', 'red', 'green', 'purple'])
plt.show()

In [None]:
plt.pie([10, 30, 70, 90], explode = [0, 0.1, 0, 0], labels = ['Grp 1', 'Grp 2', 'Grp 3', 'Grp 4'],
        colors = ['blue', 'red', 'green', 'purple'])
plt.show()

There are many more plot types in Matplotlib, but will will not cover them here. See the overview on [this webpage](https://matplotlib.org/tutorials/introductory/sample_plots.html).

### 6.3 Style sheets

One last but not least way of customizing graphs is by using **style sheets**. The following style sheets are supported:

- `'bmh'`
- `'classic'`
- `'dark_background'`
- `'fast'`
- `'fivethirtyeight'`
- `'ggplot'`
- `'grayscale'`
- `'seaborn'` with its variants (`'seaborn-bright'`, `'seaborn-darkgrid'` etc.)
- `'Solarize_Light2'`
- `'tableau-colorblind10'`
- `'_classic_test'`

You import a style sheet with the function `use()` of the submodule `style`:

In [None]:
plt.style.use('dark_background')

In [None]:
plt.plot('one', 'two', data = dat)
plt.plot('one', 'tri', data = dat)
plt.plot('one', 'cub', data = dat)
plt.legend( ['Double', 'Triple', 'Cubic'] )
plt.show()

In [None]:
plt.hist(scores)
plt.show()

In [None]:
plt.boxplot(scr_grp, labels = ['Group 1', 'Group 2'])
plt.show()    # Poor choice

In [None]:
plt.bar(['Grp 1', 'Grp 2', 'Grp 3', 'Grp 4'], [10, 30, 70, 90])
plt.show()

GPLOT style:

In [None]:
plt.style.use('ggplot')

In [None]:
plt.plot('one', 'two', data = dat)
plt.plot('one', 'tri', data = dat)
plt.plot('one', 'cub', data = dat)
plt.legend( ['Double', 'Triple', 'Cubic'] )
plt.show()

In [None]:
plt.hist(scores)
plt.show()

In [None]:
plt.boxplot(scr_grp, labels = ['Group 1', 'Group 2'])
plt.show()

In [None]:
plt.bar(['Grp 1', 'Grp 2', 'Grp 3', 'Grp 4'], [10, 30, 70, 90])
plt.show()

Bayesian Methods for Hackers style:

In [None]:
plt.style.use('bmh')

In [None]:
plt.plot('one', 'two', data = dat)
plt.plot('one', 'tri', data = dat)
plt.plot('one', 'cub', data = dat)
plt.legend( ['Double', 'Triple', 'Cubic'] )
plt.show()

In [None]:
plt.hist(scores)
plt.show()

In [None]:
plt.boxplot(scr_grp, labels = ['Group 1', 'Group 2'])
plt.show()

In [None]:
plt.bar(['Grp 1', 'Grp 2', 'Grp 3', 'Grp 4'], [10, 30, 70, 90])
plt.show()

Etc.

Default style of matplotlib:

In [None]:
plt.style.use('default')

One can also change the graphical settings with the attribute `rcParams` and/or the `matplotlibrc` file. We will not cover that here.

### 6.4 Object-oriented approach

All `pyplot`'s functions are essentially pointers to Matplotlib's **methods**, which offer much more flexibility. Incidentally, the **object-oriented** API/approach is also the way to visualize **multiple plots** in a window.

The object-oriented creation of graphs always starts with objects of the class `Figure` and `Axes`. The first is made with the function `figure()`:

In [None]:
fig = plt.figure()    # Empty Figure object

The `figure()` function can take an argument `figsize` specifying the height and the width of the Figure window:

In [None]:
fig = plt.figure(figsize = [7.5, 7.5])

The next step is to divide the Figure up into "plots". In Matplotlib these are objects of the class `Axes`. An Axes object is made with the Figure method `.add_subplot()` which needs to specified with three arguments at least:

- The first argument is the number of **plot rows** in your figure.
- The second argument is the number of **plot columns** in your figure.
- The third argument is the **sequence number** of the individual plot in question.

In [None]:
fig = plt.figure()
axs = fig.add_subplot(1, 1, 1)    # Plot with axes

In [None]:
type(fig)

In [None]:
type(axs)

The Axes object can also be made with the function `axes()` which is just a pointer to the `.add_subplot()` method.

In [None]:
fig = plt.figure()
axs = plt.axes()

In [None]:
type(fig)

In [None]:
type(axs)

The `.add_subplot()` method can even be specified with one **string** of three numbers instead of three arguments:

In [None]:
fig = plt.figure()
axs = fig.add_subplot('111')

In [None]:
type(fig)

In [None]:
type(axs)

The Axes objects have many **methods** for customizing the region of the plot. The most useful ones are `.set_xlim()`, `.set_ylim()`, `.set_xlabel()`, `.set_ylabel()` and `.set_title()`.

The `.set_xlim()` and `.set_ylim()` methods set the **limits** of the **horizontal** and the **vertical axis**, respectively. (As mentioned above, the `axis()` function calls these methods, in fact.)

In [None]:
axs.set_xlim(-2, 8)

In [None]:
axs.set_ylim(-5, 145)

The `.set_xlabel()` and `.set_ylabel()` methods do the same as the `xlabel()` and `ylabel()` functions, respectively:

In [None]:
axs.set_xlabel('x', labelpad = 10)

In [None]:
axs.set_ylabel('$y = x^3$', labelpad = 12, rotation = 'horizontal')

Similarly, the `.set_title()` method corresponds to the `title()` function:

In [None]:
axs.set_title('Cubic function', loc = 'left', weight = 'bold')

An Axes object also has a method `.plot()` which adds data to the figure:

In [None]:
axs.plot('one', 'cub', 'g*:', data = dat)

The Figure itself can then finally be **visualized** by simply **calling/printing** it (or by calling `fig.show()` for an external window):

In [None]:
fig

However, it is customary for the `.plot()` method to end a **code block** starting with the `figure()` function:

In [None]:
fig = plt.figure()
axs = fig.add_subplot('111')
axs.set_xlim(-2, 8)
axs.set_ylim(-5, 145)
axs.set_xlabel('x', labelpad = 10)
axs.set_ylabel('$y = x^3$', labelpad = 12, rotation = 'horizontal')
axs.set_title('Cubic function', loc = 'left', weight = 'bold')
axs.plot('one', 'cub', 'g*:', data = dat)

Besides the `.plot()` method, Axes objects have methods for **other plot types** as well:

In [None]:
fig = plt.figure()
axs = fig.add_subplot(1, 1, 1)
axs.hist(scores)

In [None]:
fig = plt.figure()
axs = fig.add_subplot(1, 1, 1)
axs.barh(['Grp 1', 'Grp 2', 'Grp 3', 'Grp 4'], [10, 30, 70, 90])

Another possibility is to use these methods **multiple times** for visualizing **different data** (this works because the Figure will be updated as long as it stays open):

In [None]:
fig = plt.figure()
axs = fig.add_subplot(1, 1, 1)
axs.set_xlim(-2, 8)
axs.set_ylim(-5, 145)
axs.plot('one', 'two', 'bo-', data = dat)
axs.plot('one', 'tri', 'r+--', data = dat)
axs.plot('one', 'cub', 'g*:', data = dat)

The `.legend()` method of Axes objects helps to clarify such a plot (identically to the `legend()` function):

In [None]:
fig = plt.figure()
axs = fig.add_subplot(1, 1, 1)
axs.set_xlim(-2, 8)
axs.set_ylim(-5, 145)
axs.plot('one', 'two', 'bo-', data = dat)
axs.plot('one', 'tri', 'r+--', data = dat)
axs.plot('one', 'cub', 'g*:', data = dat)
axs.legend(['Double', 'Triple', 'Cubic'], loc = 'center right')

This `.legend()` method is again compatible with the `label` argument of the `.plot()` method:

In [None]:
fig = plt.figure()
axs = fig.add_subplot(1, 1, 1)
axs.set_xlim(-2, 8)
axs.set_ylim(-5, 145)
axs.plot('one', 'two', 'bo-', data = dat, label = 'Double')
axs.plot('one', 'tri', 'r+--', data = dat, label = 'Triple')
axs.plot('one', 'cub', 'g*:', data = dat, label = 'Cubic')
axs.legend(loc = 'center right')

Working with Figure and Axes objects is necessary when one wants to arange **multiple plots** in **one window**. Then the `.add_subplot()` method takes values **larger than one** for the first and/or the second argument:

In [None]:
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)
ax4 = fig.add_subplot(2, 2, 4)
ax1.plot('one', 'two', 'bo-', data = dat)
ax2.plot('one', 'tri', 'r+--', data = dat)
ax3.plot('one', 'cub', 'g*:', data = dat)
ax4.bar(['Grp 1', 'Grp 2', 'Grp 3', 'Grp 4'], [10, 30, 70, 90])

Such a "plot matrix" can be given a **main title** with the method `.suptitle()` (of the Figure object):

In [None]:
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)
ax4 = fig.add_subplot(2, 2, 4)
ax1.plot('one', 'two', 'bo-', data = dat)
ax2.plot('one', 'tri', 'r+--', data = dat)
ax3.plot('one', 'cub', 'g*:', data = dat)
ax4.bar(['Grp 1', 'Grp 2', 'Grp 3', 'Grp 4'], [10, 30, 70, 90])
fig.suptitle('Four plots in 2x2', size = 'x-large', weight = 'heavy')

For the arrangement of multiple plots in one Figure, Matplotlib also has the **single function** `subplots()`. It is specified with the number of plot rows and columns in the Figure (the default is `1, 1` for a single plot). Its return value is the **tuple** of the Figure object and the (array of) Axes objects.

In [None]:
fig, axs_lst = plt.subplots(2, 2)

In [None]:
type(fig)

In [None]:
type(axs_lst)

In [None]:
axs_lst

In [None]:
type(axs_lst[0, 0])

In [None]:
fig, axs_lst = plt.subplots(2, 2)
axs_lst[0,0].plot('one', 'two', 'bo-', data = dat)
axs_lst[1,0].plot('one', 'tri', 'r+--', data = dat)
axs_lst[0,1].plot('one', 'cub', 'g*:', data = dat)
axs_lst[1,1].bar(['Grp 1', 'Grp 2', 'Grp 3', 'Grp 4'], [10, 30, 70, 90])

Let us again work with the `iris` dataset (which should be in your working directory):

In [None]:
iris = pd.read_table('iris.csv', sep = ';')

In [None]:
fig, axs_lst = plt.subplots(2, 2)
axs_lst[0, 0].plot('Sepal_Length', 'Sepal_Width', 'g.', data = iris)
axs_lst[1, 0].plot('Sepal_Length', 'Petal_Length', 'g.', data = iris)
axs_lst[0, 1].plot('Petal_Width', 'Sepal_Width', 'g.', data = iris)
axs_lst[1, 1].plot('Petal_Width', 'Petal_Length', 'g.', data = iris)
axs_lst[0, 0].set_ylabel('Sepal width')
axs_lst[1, 0].set_ylabel('Petal length')
axs_lst[1, 0].set_xlabel('Sepal length')
axs_lst[1, 1].set_xlabel('Petal width')
fig.suptitle('Iris dataset', size = 'xx-large')

Poor outlook:

In [None]:
ind_set = iris.Species == 'setosa'
ind_ver = iris.Species == 'versicolor'
ind_vir = iris.Species == 'virginica'
fig3, ax3_lst = plt.subplots(2, 2)
ax3_lst[0, 0].set_title('Setosa')
ax3_lst[0, 1].set_title('Versicolor')
ax3_lst[1, 1].set_title('Virginica')
ax3_lst[0, 0].plot(iris.Sepal_Length[ind_set], iris.Sepal_Width[ind_set], 'g.')
ax3_lst[0, 1].plot(iris.Sepal_Length[ind_ver], iris.Sepal_Width[ind_ver], 'g.')
ax3_lst[1, 1].plot(iris.Sepal_Length[ind_vir], iris.Sepal_Width[ind_vir], 'g.')

Better outlook:

In [None]:
fig3 = plt.figure()
ax1 = fig3.add_subplot(2, 2, 1)
ax2 = fig3.add_subplot(2, 2, 2)
ax3 = fig3.add_subplot(2, 2, 3)
ax1.set_xlim(4, 8)
ax1.set_ylim(2, 6)
ax2.set_xlim(4, 8)
ax2.set_ylim(2, 6)
ax3.set_xlim(4, 8)
ax3.set_ylim(2, 6)
ax1.set_title('Setosa', color = 'green')
ax2.set_title('Versicolor', color = 'red')
ax3.set_title('Virginica', color = 'blue')
ax1.plot(iris.Sepal_Length[ind_set], iris.Sepal_Width[ind_set], 'g.')
ax2.plot(iris.Sepal_Length[ind_ver], iris.Sepal_Width[ind_ver], 'r.')
ax3.plot(iris.Sepal_Length[ind_vir], iris.Sepal_Width[ind_vir], 'b.')
fig3.suptitle('Sepal length x Sepal width')

Best outlook:

In [None]:
fig3,  axs = plt.subplots()    # 1x1 plot
axs.set_xlim([4, 8])
axs.set_ylim([2, 6])
axs.set_xlabel('Sepal length')
axs.set_ylabel('Sepal width')
axs.plot(iris.Sepal_Length[ind_set], iris.Sepal_Width[ind_set], 'g.', label = 'Setosa')
axs.plot(iris.Sepal_Length[ind_ver], iris.Sepal_Width[ind_ver], 'r.', label = 'Versicolor')
axs.plot(iris.Sepal_Length[ind_vir], iris.Sepal_Width[ind_vir], 'b.', label = 'Virginica')
axs.legend(loc = 'upper right')

### 6.5 Graphical methods in pandas

A convenient feature of pandas is that it has compatible **methods** for **graphical output**. For instance, the scatter plot above can also be made as follows:

In [None]:
iris.plot('Sepal_Length', 'Sepal_Width', kind = 'scatter')

**Histograms** and **boxplots** can be made by changing the `kind` argument. There are also **density plots** in pandas (All three plot types have different methods as well, however).

In [None]:
iris['Sepal_Length'].plot(kind = 'hist')

In [None]:
iris['Sepal_Length'].plot(kind = 'box')

In [None]:
iris['Sepal_Length'].plot(kind = 'kde')    # Takes a little time to compute.

In [None]:
iris['Sepal_Length'].plot.hist()

In [None]:
iris['Sepal_Length'].plot.box()

In [None]:
iris['Sepal_Length'].plot.kde()

The histograms and boxplot can also be made as follows:

In [None]:
iris.hist('Sepal_Length')

In [None]:
iris.hist('Sepal_Length', by = 'Species')

In [None]:
iris.boxplot('Sepal_Length')

In [None]:
iris.boxplot('Sepal_Length', by = 'Species')

**Bar plots** visualize frequencies of various groups, so we use the categorical variables in the dataset `chol.txt`:

In [None]:
chol = pd.read_table('chol.txt')
print(chol)

In [None]:
chol_tab = pd.crosstab(chol.Mortality, chol.SmokeGroup)
print(chol_tab)

In [None]:
smok_frq = chol_tab.sum(axis = 0)
print(smok_frq)

In [None]:
smok_frq.plot(kind = 'bar')

In [None]:
smok_frq.plot(kind = 'bar', rot = 0, color = ['green', 'orange', 'red'])

In [None]:
smok_frq.plot.bar(rot = 0, color = ['green', 'orange', 'red'])

In [None]:
smok_frq.plot(kind = 'barh', color = ['green', 'orange', 'red'])

In [None]:
smok_frq.plot.barh(color = ['green', 'orange', 'red'])

In fact, you can make **grouped** or **stacked** bar plots for several variables. However, Python groups all **columns per row**, so one may have to transpose the data object first.

In [None]:
chol_tab.plot(kind = 'bar', rot = 0, color = ['green', 'orange', 'red'])

In [None]:
chol_swp = chol_tab.T
print(chol_swp)

In [None]:
chol_swp.plot(kind = 'bar', rot = 0)

In [None]:
chol_swp.plot(kind = 'bar', rot = 0, stacked = True)

In [None]:
chol_swp.plot(kind = 'bar', rot = 0, stacked = True, color = ['blue', 'red'])

In [None]:
chol_swp.plot(kind = 'bar', rot = 0, subplots = True, color = ['green', 'orange', 'red'])

Pie plots can be made in the same way (by setting `kind` equal to `'pie'`) but we will not go into the details here.

### Exercises

12. Graphics of the `cholesterol` dataset

  12.1 Make histograms and boxplots of the variable `Cholesterol` for every category in `SmokeGroup`.

  12.2 Make a scatterplot of the variables `Age` and `Cholesterol` (on the x-axis and y-axis respectively). Separate the three categories in `SmokeGroup` by means of colors, plotting symbols etc. and add a legend.


13. Graphics of the `substance` dataset

  13.1 Read in the dataset `substance.txt` and construct three frequency tables of `Gender` with the three substances. Create a single Figure with three subplots, each showing the bar chart of the genders in the `Yes` columns.

  13.2 Create a single (grouped) bar chart showing the frequencies of `Race` for the three substances. The bars of the three substances should be grouped per race. You again need to use only the `Yes` frequencies.