# L07 - Basic Visualization 

Credits to Ben Root and the [Anatomy of Matplotlib](https://github.com/matplotlib/AnatomyOfMatplotlib) tutorial on which this document is based, as well as to previous lecturers of this course, who have put much work into this document. *The Anatomy of Matplotlib* is also available as a [3h video talk](https://www.youtube.com/watch?v=6gdNUDs6QPc) in case that suits your learning style better.

This week is about basic visualization, which in our case mostly means learning how to use the package [matplotlib](https://matplotlib.org/).

## Introduction

`matplotlib` is a library for producing publication-quality figures. It can create static, animated, and interactive plots - we will focus on static plots in this week. Its great strength is its comprehensiveness and modifiability. This makes it possible to create virtually any kind plot in Python, if one is ready to invest time into understanding the low-level API and tinkering with it.

The most often used part of matplotlib is the `pyplot` submodule, commonly imported as follows:

In [None]:
import matplotlib.pyplot as plt

If that didn't work, close this notebook, shut down the server and install matplotlib with `pip install matplotlib`. Alternatively, you can also try executing the cell below:

In [None]:
!pip install matplotlib

## Matplotlib Resources

The [matplotlib.org](http://matplotlib.org) project website is the primary online resource for the library's documentation. It contains [tutorials](https://matplotlib.org/stable/tutorials/index.html), [FAQs](https://matplotlib.org/stable/faq/index.html), [API documentation](http://matplotlib.org/stable/api/index.html), and, most importantly, the [gallery](http://matplotlib.org/stable/gallery/index.html).

### Gallery
Many users of Matplotlib are often faced with the question, "I want to make a figure that has X with Y in the same figure, but it needs to look like Z". Good luck getting an answer from a web search with that query! This is why the [gallery](https://matplotlib.org/stable/gallery/index.html) is so useful, because it showcases the variety of ways one can make figures. Browse through the gallery, click on any figure that has pieces of what you want to see and the code that generated it. Soon enough, you will be like a chef, mixing and matching components to produce your masterpiece!


### Main Docs

* Documentation of `plt`: [matplotlib.pyplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.html)
* Documentation of `Axes`: [matplotlib.axes](https://matplotlib.org/stable/api/axes_api.html)
* Documentation of `Figure`: [matplotlib.figure](https://matplotlib.org/stable/api/figure_api.html)

### StackOverflow
Another community resource is [StackOverflow](http://stackoverflow.com/questions/tagged/matplotlib), so if you need to build up karma points, submit your questions here, and help others out too!

### Github Repository
[Matplotlib](https://github.com/matplotlib) is hosted by GitHub. If you think you found a bug, you can also submit an issue there.



## Note on Backends in Jupyter Notebooks

Matplotlib has multiple backends. The backends allow matplotlib to be used on a variety of platforms with a variety of GUI toolkits (GTK, Qt, Wx, etc.), all of them written so that most of the time, you will not need to care which backend you are using. 

In [None]:
import matplotlib

print(matplotlib.__version__)
print(matplotlib.get_backend())

Normally we wouldn't need to think about this too much, but IPython/Jupyter notebooks behave a touch differently than "normal" Python. In JupyterLab notebooks we 
either want to use the `inline` backend, to statically embed figures in the notebook or use the `widget` backend to get interactive figures. 


We can do this in two ways:

1. The IPython ``%matplotlib [BACKEND-NAME]`` magic command

   &rarr; Figures will be shown automatically, even if you don't call ``plt.show()``.  
   
     
2. ``matplotlib.use("[BACKEND-NAME]")``

   &rarr; Figures will only be shown when you call ``plt.show()``.

We will be using the `inline` backend in this notebook - you can activate it by running the cell below.

However, it's important to note that when you use matplotlib in a plain `.py` script, you will always have to run `plt.show()` to see plots. 

In [None]:
%matplotlib inline

You can find more information on the different matplotlib backends [here](https://matplotlib.org/stable/tutorials/introductory/usage.html).

# Anatomy of a Plot

Matplotlib is a large project and can seem daunting at first. However, by learning the components, it should begin to feel much smaller and more approachable.

People use "plot" to mean many different things.  Here, we'll be using a consistent terminology (mirrored by the names of the underlying classes, etc):

![https://github.com/matplotlib/AnatomyOfMatplotlib/raw/master/images/figure_axes_axis_labeled.png](https://github.com/matplotlib/AnatomyOfMatplotlib/raw/master/images/figure_axes_axis_labeled.png)

The ``Figure`` is the top-level container in this hierarchy.  It is the overall window/page that everything is drawn on.  You can have multiple independent figures and ``Figure``s can contain multiple ``Axes``. 

Most plotting ocurs on an ``Axes``.  The axes is effectively the area that we plot data on and any ticks/labels/etc associated with it.  Usually we'll set up an `Axes` with a call to `plt.subplots`, so the names **axes** and **subplot** can mostly be used interchangeably.

Each ``Axes`` has an ``XAxis`` and a ``YAxis``.  These contain the ticks, tick locations, labels, etc.  In this tutorial, we'll mostly control ticks, tick labels, and data limits through other mechanisms, so we won't touch the individual ``Axis`` part of things all that much.  However, it is worth mentioning here to explain where the term ``Axes`` comes from.


## Getting Started

In [None]:
import matplotlib.pyplot as plt
import numpy as np

In [None]:
fig = plt.figure()

Awww, nothing happened! This is because by default matplotlib will not show anything until told to do so, as we mentioned earlier in the "backend" discussion.

Instead, we'll need to call ``plt.show()``

In [None]:
plt.show()

Still nothing. That is because the notebook does not want to show empty figures. In a plain python script, you would actually get an empty figure here. To see the figure in the notebook, we need to add some axes...

## Axes

All plotting is done with respect to an [Axes](https://matplotlib.org/stable/api/axes_api.html). An `Axes` is made up of multiple [Axis](http://matplotlib.org/stable/api/axis_api.html) objects (such as `XAxis` and `YAxis`). An `Axes` object must belong to exactly one `Figure`. Most commands you will ever use will be with respect to an `Axes` object.

You can either set up a `Figure` and add `Axes` to it later, or - more commonly done - create both in one go with `plt.subplots`, which returns one `Figure` and one `Axes` object when called without arguments.

In [None]:
fig = plt.figure()
ax = fig.add_subplot()

# alternatively: fig, ax = plt.subplots()

plt.show()

You can control the size of the figure through the ``figsize`` argument, which expects a tuple of ``(width, height)`` in inches. 

In [None]:
fig = plt.figure(figsize=(5, 10))
ax = fig.add_subplot()
plt.show()

Matplotlib's objects typically have lots of "explicit setters" -- in other words, functions that start with ``set_<something>`` and control a particular option. 

To demonstrate this (and as an example of IPython's tab-completion), try typing `ax.set_` in a code cell, then hit the `<Tab>` key.  You'll see a long list of `Axes` methods that start with `set`.

For example, we could have written the third line above as:

In [None]:
fig, ax = plt.subplots()

ax.set_xlim([0.5, 4.5])
ax.set_ylim([-2, 8])
ax.set_title("An Example Axes")
ax.set_ylabel("Y-Axis")
ax.set_xlabel("X-Axis")

plt.show()

Clearly this can get repetitive quickly.  Therefore, Matplotlib's `set` method can be very handy.  It takes each kwarg you pass it and tries to call the corresponding "setter".  For example, `ax.set(foo='bar')` would call `ax.set_foo('bar')`.

Note that the `set` method doesn't just apply to `Axes`; it applies to more-or-less all matplotlib objects.

However, there are cases where you'll want to use things like `ax.set_xlabel('Some Label', size=25)` to control other options for a particular function.

In [None]:
fig, ax = plt.subplots()

ax.set(
    xlim=[0.5, 4.5], 
    ylim=[-2, 8], 
    title="An Example Axes",
    ylabel="Y-Axis", 
    xlabel="X-Axis",
)

plt.show()

## Basic Plotting

Most plotting happens on an `Axes`.  Therefore, if you're plotting something on an axes, then you'll use one of its methods.

We'll talk about different plotting methods in more depth in the next section.  For now, let's focus on two methods: `plot` and `scatter`.

`plot` draws points and **interpolates lines** in between them - that's why the resulting plots are also called `lineplots`.  

`scatter` draws unconnected points, optionally scaled or colored by additional variables, a `scatterplot`.  

As a basic example:

In [None]:
fig, ax = plt.subplots(figsize=(9, 6))

x = [1, 2, 3, 4]
y = [10, 20, 25, 35]

ax.plot(x, y)

plt.show()

Multiple calls to the plot function will result in multiple plots. This works with many functions in matplotlib (`plot`, `scatter`, `bar`, etc.) and assigns different colors to the different calls.

In [None]:
fig, ax = plt.subplots(figsize=(9, 6))

ax.plot([10, 20, 25, 35]) # when no x is given, an integer range is assumed as x
ax.plot([65, 34, 12, -12])

plt.show()

## `Axes` methods (object-oriented interface) vs. `plt` (state-machine interface)

Many methods of an `Axes` object can also be called on the `plt` module itself.  

For example, when calling `plt.xlim(1, 10)`, `plt` calls `ax.set_xlim(1, 10)` on whichever `Axes` is *current*. Here is an equivalent version of the above example using just `pyplot`.

In [None]:
plt.figure()
plt.plot([10, 20, 25, 35])
plt.plot([65, 34, 12, -12])
plt.show()

Much cleaner, and much clearer! So, why will most of my examples not follow the pyplot approach? Because [PEP20](http://www.python.org/dev/peps/pep-0020/) *The Zen of Python* says:

``Explicit is better than implicit``

While very simple plots, with short scripts would benefit from the conciseness of the implicit `plt` approach, when doing more complicated plots, or working within larger scripts, you will want to explicitly pass around the `Axes` and/or `Figure` object to operate upon.

The advantage of keeping which axes we're working with very clear in our code will become more obvious when we start to have multiple axes in one figure.

Fun fact: The *Zen of Python* is ingrained in the very heart of Python:

In [None]:
import this

## Multiple Subplots

We've mentioned before that a `Figure` can have more than one `Axes` on it.  If you want your axes to be on a regular grid system, then it's easiest to use `plt.subplots` to create a `Figure` and `Axes` on it automatically.

For example:

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2)
plt.show()

`plt.subplots` created a new figure and added 4 subplots to it.  The `Axes` object that was returned is a 2D numpy object array.  Each item in the array is one of the subplots.  They're laid out as you see them on the figure.  

Therefore, when we want to work with one of these axes, we can index the `Axes` array and use that item's methods.

For example:

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2)

axes[0, 0].set(title="Upper Left") # each subplot can have its individual title, labels, etc.
axes[0, 1].set(title="Upper Right")
axes[1, 0].set(title="Lower Left")
axes[1, 1].set(title="Lower Right")

# tight_layout makes things do not overlap - call it after you have finished everything else
fig.tight_layout()

plt.show()

Speaking of titles, you can also set a **supertitle** or `suptitle` for an entire `Figure`.

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2)

axes[0, 0].set(title="Upper Left")
axes[0, 1].set(title="Upper Right")
axes[1, 0].set(title="Lower Left")
axes[1, 1].set(title="Lower Right")

fig.suptitle("Four Subplots")

fig.tight_layout()

plt.show()

## Sharing an Axis

Multiple subplots can *share* an axis so as to prevent duplication of ticks and to make them comparable. `plt.subplots` has the kwargs `sharex` and `sharey` for that.

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, figsize=(9, 6))

axes[0].set(title="Upper Plot")
axes[1].set(title="Lower Plot")

fig.suptitle("Two Subplots")

fig.tight_layout()

plt.show()

As you may have observed, the returned `Axes` array of `plt.subplots` are *squeezed*, meaning there are no dimensions with only a single element in it.

In [None]:
fig, four_plots = plt.subplots(nrows=2, ncols=2)
fig, two_plots = plt.subplots(nrows=2, ncols=1)
fig, one_plot = plt.subplots(nrows=1, ncols=1)

print(four_plots.shape)
print(two_plots.shape)
print(type(one_plot))

# Types of  Plots

We've talked a lot about laying things out, etc, but we haven't talked about actually plotting data yet. Matplotlib has a number of different plotting functions -- many more than we'll cover here, in fact. However, a full list and/or the gallery can be a bit overwhelming at first. Instead we'll condense it down and give you a look at some of the ones you're most likely to use. Below are some **common examples** of plot types:

![https://github.com/matplotlib/AnatomyOfMatplotlib/raw/master/images/plot_example.png](https://github.com/matplotlib/AnatomyOfMatplotlib/raw/master/images/plot_example.png)

![https://github.com/matplotlib/AnatomyOfMatplotlib/raw/master/images/scatter_example.png](https://github.com/matplotlib/AnatomyOfMatplotlib/raw/master/images/scatter_example.png)

![https://github.com/matplotlib/AnatomyOfMatplotlib/raw/master/images/vector_example.png](https://github.com/matplotlib/AnatomyOfMatplotlib/raw/master/images/vector_example.png)

![https://github.com/matplotlib/AnatomyOfMatplotlib/raw/master/images/statistical_example.png](https://github.com/matplotlib/AnatomyOfMatplotlib/raw/master/images/statistical_example.png)

![https://github.com/matplotlib/AnatomyOfMatplotlib/raw/master/images/imshow_example.png](https://github.com/matplotlib/AnatomyOfMatplotlib/raw/master/images/imshow_example.png)

## Input Data: 1D Series

We've briefly mentioned `ax.plot(x, y)` and `ax.scatter(x, y)` to draw lines and points, respectively.  We'll cover some of their options (markers, colors, linestyles, etc) in the next section. Let's move on to a couple of other common plot types.

### Bar Plots: `Axes.bar` and `Axes.barh`

Bar plots are one of the most common plot types.  Matplotlib's `ax.bar` method can also plot general rectangles, but the default is optimized for a simple sequence of x, y values, where the rectangles have a constant width.  There's also `ax.barh` (for horizontal), which makes a constant-height assumption instead of a constant-width assumption.

### Simple Bar Plot

In [None]:
import numpy as np

np.random.seed(2)

x = np.arange(5)
y = np.random.random(5) * 2

fig, ax = plt.subplots()

ax.bar(x, y)

plt.show()

In [None]:
x

In [None]:
y

Adding errorbars:

In [None]:
x = np.arange(5)
y = np.random.random(5) * 2
error = y * 0.1

fig, ax = plt.subplots()

ax.bar(x, y, yerr=error)

plt.show()

If we have negative values, we can use `axhline` to draw an axis "spine" to mark the zero line.

In [None]:
x = np.arange(5)
y = np.random.randn(5)

fig, ax = plt.subplots()

ax.bar(x, y)
ax.axhline(y=0, color='black', linewidth=2)

plt.show()

Matplotlib plotting methods return an `Artist` or a sequence of artists.  Anything you can see in a Matplotlib figure/axes/etc is an `Artist` of some sort. Most of the time, you will not need to retain these returned objects. You will want to capture them for special customizing that may not be possible through the normal plotting mechanism.

Let's re-visit that last example and modify what's plotted.  In the case of `bar`, a container artist is returned, so we'll modify its contents instead of the container itself (thus, `for bar in vert_bars`).

In [None]:
x = np.arange(5)
y = np.random.randn(5)

fig, ax = plt.subplots()
vert_bars = ax.bar(x, y) # store the output of the call to bar

for bar, height in zip(vert_bars, y):
    if height < 0:
        bar.set(color='red')
        
ax.axhline(y=0, color='black', linewidth=2)

plt.show()

Looking at the artist returned by `ax.bar` we can see that they are all plain rectangles.

In [None]:
for bar in vert_bars:
    print(bar)

### Stacking Bar Plots

Bar plots can also be stacked by controlling where to start drawing the bars with the `bottom` keyword.

In [None]:
x = np.arange(5)
y = np.random.randn(5)

fig, ax = plt.subplots()

ax.bar(x, y, color="green")
ax.bar(x, y, bottom=y, color="red")

### Filled Regions: `Axes.fill` and `Axes.fill_between`

Of these functions, `Axes.fill_between` is probably the one you'll use the most often.  In its most basic form, it fills between the given y-values and 0:

In [None]:
y = np.random.randn(100).cumsum()
x = np.linspace(0, 10, 100)

fig, ax = plt.subplots()
ax.fill_between(x, y, color='lightblue')
plt.show()

In [None]:
x = np.linspace(0, 10, 200)

y1 = 2 * x + 1
y2 = 3 * x + 1.2
y_mean = 0.5 * x * np.cos(2*x) + 2.5 * x + 1.1

fig, ax = plt.subplots()

# plot the envelope with `fill_between`
ax.fill_between(x, y1, y2, color='yellow')

# plot the "centerline" with `plot`
ax.plot(x, y_mean, color='black')

plt.show()

## Input Data: 2D Arrays or Images

There are several options for plotting 2D data such as images. `imshow`, `pcolor`, and `pcolormesh` have a lot of overlap, at first glance.

In short, `imshow` can interpolate and display large arrays very quickly, while `pcolormesh` and `pcolor` are much slower, but can handle flexible (i.e. more than just rectangular) arrangements of cells.

We won't dwell too much on the differences and overlaps here.  They have overlapping capabilities, but different default behavior because their primary use-cases are a bit different (there's also `matshow`, which is `imshow` with different defaults).  

Instead we'll focus on what they have in common.

`imshow`, `pcolor`, `pcolormesh`, `scatter`, and any other matplotlib plotting methods that map a range of data values onto a colormap will return artists that are instances of `ScalarMappable.`  In practice, what that means is you can display a colorbar for them, and they share several keyword arguments. `imshow` is perhaps the most useful within this course.

### Displaying 2D Data with `imshow`

In [None]:
arr_2d = np.arange(9).reshape((3, 3))
arr_2d

In [None]:
fig, ax = plt.subplots()
ax.imshow(arr_2d)
plt.show()

`imshow` is used most of the times to display images.

In [None]:
grace_hopper_url = "https://matplotlib.org/stable/_images/sphx_glr_image_clip_path_001.png" 
img = plt.imread(grace_hopper_url) # this could also be a local path (relative or absolute) instead of an url 

fig, ax = plt.subplots()
ax.imshow(img)
ax.axis("off")
plt.show()

For visualizing matrices, `matshow` provides better defaults, e.g. axis labelling.

In [None]:
fig, ax = plt.subplots()
ax.matshow(arr_2d)
plt.show()

### Colorbars

Just seeing the colors does not necessarily tell us something about the values beneath. Let's add a colorbar to the figure to display what colors correspond to values of `data` we've plotted. 

In [None]:
fig, ax = plt.subplots()
im = ax.imshow(arr_2d)
fig.colorbar(im)
plt.show()

In [None]:
print(arr_2d)

You may notice that `colorbar` is a `Figure` method and not an `Axes` method.  That's because `colorbar` doesn't operate on the axes. Instead, it shrinks the current axes by a bit, adds a _new_ axes to the figure, and places the colorbar on that axes.

The new axes that `fig.colorbar` creates is fairly limited in where it can be positioned.   For example, it's always outside the axes it "steals" room from. Sometimes you may want to avoid "stealing" room from an axes or maybe even have the colorbar _inside_ another axes.  In that case, you can manually create the axes for the colorbar and position it where you'd like:

In [None]:
fig, ax = plt.subplots()
cax = fig.add_axes([0.27, 0.8, 0.5, 0.05])

im = ax.imshow(arr_2d)
fig.colorbar(im, cax=cax, orientation="horizontal")
plt.show()

### Shared parameters for `imshow`, `pcolormesh`, `contour`, `scatter`, etc
  
  As we mentioned earlier, any plotting method that creates a `ScalarMappable` will have some common kwargs.  The ones you'll use the most frequently are:
  
  * `cmap` : The colormap (or name of the colormap) used to display the input (a mapping from numbers to colors)
  * `vmin` : The minimum data value that will correspond to the "bottom" of the colormap (defaults to the minimum of your input data).
  * `vmax` : The maximum data value that will correspond to the "top" of the colormap (defaults to the maximum of your input data).
  * `norm` : A `Normalize` instance to control how the data values are mapped to the colormap. By default, this will be a linear scaling between `vmin` and `vmax`, but other norms are available (e.g. `LogNorm`, `PowerNorm`, etc).
  
`vmin` and `vmax` are particularly useful.  Quite often, you'll want the colors to be mapped to a set range of data values, which aren't the min/max of your input data. For example, you might want a symmetric ranges of values around 0.

As an example of that, let's use a divergent colormap on some example data. Note how the colormap is **not** centered at zero.

In [None]:
from matplotlib.cbook import get_sample_data

data = get_sample_data("axes_grid/bivariate_normal.npy", np_load=True)

fig, ax = plt.subplots()
im = ax.imshow(data, cmap="seismic")
fig.colorbar(im)
plt.show()

In [None]:
fig, ax = plt.subplots()
im = ax.imshow(data, cmap="seismic", vmin=-2, vmax=2)
fig.colorbar(im)
plt.show()

## Scatter for n-dimensional data
`scatter` allows to map several dimensions to different aesthetics such as x-postion, color, size and shape.

In [None]:
n = 50
x1 = np.random.random(n)
x2 = np.random.random(n) * 50
x3 = np.random.random(n)
y = x1 + x2 + x3

fig, ax = plt.subplots()
sc = ax.scatter(x=x1, y=y, s=x2, c=x3, marker='o')
fig.colorbar(sc)
plt.show()

There are lots of different markers that matplotlib supports which can help to emphasize different distributions. Have a look [here](https://matplotlib.org/stable/api/markers_api.html) to get a list of marker options.

In [None]:
num = 100
x1 = np.random.normal(-1, 2, num)
y1 = np.random.normal(3, 4, num)
x2 = np.random.normal(3, 2, num)
y2 = np.random.normal(1, 1, num)

fig, ax = plt.subplots()
ax.scatter(x1, y1, marker='.')
ax.scatter(x2, y2, marker='x')
plt.show()

## Visualizing Statistical Distributions

Draw samples from a normal distribution.

In [None]:
μ = 0
σ = 1
num_samples = 1000
dist = np.random.normal(μ, σ, num_samples)

**Side note:** Since Python 3 we can use any unicode character such as μ and σ as variables. This can make sense in scientific programming if strong naming conventions exist. A good example is mean and standard deviation of a normal distribution. To easily obtain common characters you can use `LaTex` style and type e.g. `\mu` followed by <kbd>tab</kbd> to obtain μ. However, do not overuse this. Clearly named characters are often easier to read.

### Histograms
Histograms a great way to visualize univariate distributions. The data is divided into equally sized bins. Then we count how many data points fall into each bin. Finally, we draw a bar with the width of the corresponding bin and the height of the count.

The standard histogram looks like this.

In [None]:
fig, ax = plt.subplots()
ax.hist(dist)
plt.show()

We can control the number of bins.

In [None]:
fig, ax = plt.subplots()
ax.hist(dist, bins=500)
plt.show()

Or let it be automatically determined.

In [None]:
fig, ax = plt.subplots()
ax.hist(dist, bins="auto")
plt.show()

Using `density=True` will create a normalized histogram that can be interpreted as a probability density. 

In [None]:
fig, ax = plt.subplots()
ax.hist(dist, bins="auto", density=True)
plt.show()

### Boxplots
Boxplots are another standard way to summarize univariate distributions. The give a compact visual description of important *summary statistics*.  

A box is drawn at the 25% and 75% quantile, that is where most of the data is. Additionally, the median is marked by a line inside the box.  

The *whiskers* extend 1.5 times the *inter-quartile range* beyond the quartiles. Every point beyond that is drawn individually as a *flier* or *outlier*. Check also [this blog post](https://flowingdata.com/2008/02/15/how-to-read-and-use-a-box-and-whisker-plot/) for a nice illustration and further explanation.

In [None]:
fig, ax = plt.subplots()
ax.boxplot(dist)
plt.show()

`boxplot` can also be used to display several distributions at once.

In [None]:
# distribution parameters
means = [0, -1, 2.5, 4.3, -3.6]
sigmas = [1.2, 5, 3, 1.5, 2]

# each distribution has a different number of samples.
nums = [150, 1000, 100, 200, 500]

dists = [np.random.normal(*args) for args in zip(means, sigmas, nums)]

In [None]:
fig, ax = plt.subplots()
ax.boxplot(dists)
plt.show()

### Violinplots
Violinplots are a third common way to visualize distributions. For violinplots a *kernel density estimate* is computed for the whole range of data. This gives a smooth estimate of the probabiliy density function underlying the data.
The `violinplot`function behaves similar to `boxplot`. For further information, have a look at [Wikipedia](https://en.wikipedia.org/wiki/Violin_plot).

In [None]:
fig, ax = plt.subplots()
ax.violinplot(dist)
plt.show()

In [None]:
fig, ax = plt.subplots()
ax.violinplot(dists)
plt.show()

### Pie Charts
Pie charts a well-known way to visualize categorical distributions.

In [None]:
# pie chart, where the slices will be ordered and plotted counter-clockwise:
labels = ["Frogs", "Hogs", "Dogs", "Logs"]
sizes = [15, 30, 45, 20]

fig, ax = plt.subplots()
ax.pie(sizes, labels=labels, autopct="%.1f")

plt.show()

We will cover more advanced and more convenient methods for statistical visualization in a later lecture.

## Annotating Plots
Especially for scientific figures, you often want to hightlight the part of a plot that supports your hypothesis. With maplotlib you have the full flexibility to do this using `annotate`. By default it will just add text at certain x, y coordinate.  

In [None]:
fig, ax = plt.subplots()

t = np.arange(0.0, 5.0, 0.01)
s = np.cos(2 * np.pi * t)

# plot a line and add some simple annotations
line, = ax.plot(t, s)
ax.annotate("local maximum", xy=(3.5, 1.5))
ax.set(ylim=(-2, 2))
plt.show()

We can also put the text at a different x, y-coordinat using the `xytext` argument.

In [None]:
fig, ax = plt.subplots()

t = np.arange(0.0, 5.0, 0.01)
s = np.cos(2 * np.pi * t)

line, = ax.plot(t, s)
ax.annotate(
    "local maximum", 
    xy=(3, 1), 
    xytext=(3.5, 1.5)
)

ax.set(ylim=(-2, 2))
plt.show()

Now that we have two x, y - coordinates we can connect them using arrows.

In [None]:
fig, ax = plt.subplots()

t = np.arange(0.0, 5.0, 0.01)
s = np.cos(2 * np.pi * t)

# plot a line and add some simple annotations
line, = ax.plot(t, s)
ax.annotate(
    "local maximum", 
    xy=(3, 1), 
    xytext=(3.5, 1.5),
    arrowprops=dict()
)

ax.set(ylim=(-2, 2))


If we have to, we can go very fancy on the arrow styles. See [this tutorial](https://matplotlib.org/tutorials/text/annotations.html) for a more detailed overview.

In [None]:
fig, ax = plt.subplots()

t = np.arange(0.0, 5.0, 0.01)
s = np.cos(2*np.pi*t)

# plot a line and add some simple annotations
line, = ax.plot(t, s)
ax.annotate(
    "local maximum", 
    xy=(3, 1), 
    xytext=(3.5, 1.5),
    arrowprops={
        "arrowstyle": "wedge", 
        "connectionstyle": "angle3", 
        "facecolor": "red"
    }
)

ax.set(ylim=(-2, 2))
plt.show()

# How to speak "Matplotlib"
In the previous parts, you learned how Matplotlib organizes plot-making by figures and axes. We broke down the components of a basic figure and learned how to create them. You also learned how to add one or more axes to a figure, and how to tie them together. You even learned how to change some of the basic appearances of the axes. Finally, we went over some of the many plotting methods that Matplotlib has to draw on those axes. With all that knowledge, you should be off making great and wonderful figures.

Why are you still here?

`We don't know how to control our plots and figures!` 

says some random voice in the back of the room.

Of course! While the previous sections may have taught you some of the structure and syntax of matplotlib, it did not describe much of the substance and vocabulary of the library. This section will go over many of the properties that are used throughout the library. Note that while many of the examples in this section may show one way of setting a particular property, that property may be applicible elsewhere in completely different context. This is the "language" of Matplotlib.

## Colors
This is, perhaps, the most important piece of vocabulary in Matplotlib. Given that Matplotlib is a plotting library, colors are associated with everything that is plotted in your figures. Matplotlib supports a robust [language](http://matplotlib.org/stable/api/colors_api.html#module-matplotlib.colors) for specifying colors that should be familiar to a wide variety of users.

By default, matplotlib will choose different colors when combining data on the same axes.

In [None]:
t = np.arange(0.0, 5.0, 0.2)
fig, ax = plt.subplots()
ax.plot(t, t, linewidth=5)
ax.plot(t, t**2, linewidth=5)
ax.plot(t, t**3, linewidth=5)
plt.show()

### Colornames
First, colors can be given as strings. For very basic colors, you can even get away with just a single letter:

- b: blue
- g: green
- r: red
- c: cyan
- m: magenta
- y: yellow
- k: black
- w: white

Other colornames that are allowed are the HTML/CSS colornames such as "burlywood" and "chartreuse". See the [full list](https://www.w3schools.com/colors/colors_names.asp) of the 147 colornames.

### Hex values
Colors can also be specified by supplying a HTML/CSS hex string, such as `'#0000FF'` for blue. Support for an optional alpha channel was added for v2.0. For more information about hex colors have a look at [Wikipedia](https://en.wikipedia.org/wiki/Web_colors#Hex_triplet).

In [None]:
t = np.arange(0.0, 5.0, 0.2)
fig, ax = plt.subplots()
ax.plot(t, t, linewidth=5, color="#00ffff")
ax.plot(t, t**2, linewidth=5, color="#ff00ff")
ax.plot(t, t**3, linewidth=5, color="#ffcc00")
plt.show()

### 256 Shades of Gray
A gray level can be given instead of a color by passing a string representation of a number between 0 and 1 (inclusive). `"0.0"` is black, while `"1.0"` is white. `"0.75"` would be a light shade of gray.


In [None]:
t = np.arange(0.0, 5.0, 0.2)
fig, ax = plt.subplots()
ax.plot(t, t, linewidth=5, color="1.0")
ax.plot(t, t**2, linewidth=5, color="0.5")
ax.plot(t, t**3, linewidth=5, color="0.0")
plt.show()

### RGB[A] Tuples
You may come upon instances where the previous ways of specifying colors do not work. This can sometimes happen in some of the deeper, stranger levels of the library. When all else fails, the universal language of colors for matplotlib is the RGB[A] tuple. This is the "Red", "Green", "Blue", and sometimes "Alpha" tuple of floats in the range of [0, 1]. One means full saturation of that channel, so a red RGBA tuple would be `(1.0, 0.0, 0.0, 1.0)`, whereas a partly transparent green RGBA tuple would be `(0.0, 1.0, 0.0, 0.75)`.  The documentation will usually specify whether it accepts RGB or RGBA tuples. Sometimes, a list of tuples would be required for multiple colors, and you can even supply a Nx3 or Nx4 numpy array in such cases.

In functions such as `plot` and `scatter`, while it may appear that they can take a color specification, what they really need is a "format specification", which includes color as part of the format. Unfortunately, such specifications are string only and so RGB[A] tuples are not supported for such arguments (but you can still pass an RGB[A] tuple for a "color" argument).

Oftentimes there is a separate argument for "alpha" whenever you can specify a color. The value for "alpha" will usually take precedence over the alpha value in the RGBA tuple. There is no easy way around this inconsistency.

In [None]:
t = np.arange(0.0, 3.0, 0.2)
fig, ax = plt.subplots()

ax.plot(t, t, linewidth=5, color=(0, 0, 1))
ax.plot(t, t**2, linewidth=5, color=(0, 0.5, 0.5))
ax.plot(t, t**3, linewidth=5, color=(0, 0, 1, 0.3))

# the alpha value can also be specified as an additional kwarg
ax.plot(t, t**4, linewidth=5, color=(0, 1, 1), alpha=0.2)
plt.show()

## Markers
[Markers](http://matplotlib.org/stable/api/markers_api.html) are commonly used in line and scatter plots, but also show up elsewhere. There is a wide set of markers available, and custom markers can even be specified.

marker     |  description  | marker    |  description    |marker    |  description  | marker    |  description  
:----------|:--------------| :---------|:--------------  |:---------|:--------------| :---------|:--------------
"."        |  point        | "+"       |  plus           |","       |  pixel        | "x"       |  cross
"o"        |  circle       | "D"       |  diamond        |"d"       |  thin_diamond |           |
"8"        |  octagon      | "s"       |  square         |"p"       |  pentagon     | "\*"      |  star
"&#124;"   |  vertical line| "\_"      | horizontal line | "h"      |  hexagon1     | "H"       |  hexagon2
0          |  tickleft     | 4         |  caretleft      |"<"       | triangle_left | "3"       |  tri_left
1          |  tickright    | 5         |  caretright     |">"       | triangle_right| "4"       |  tri_right
2          |  tickup       | 6         |  caretup        |"^"       | triangle_up   | "2"       |  tri_up
3          |  tickdown     | 7         |  caretdown      |"v"       | triangle_down | "1"       |  tri_down
"None"     |  nothing      | `None`    |  default        |" "       |  nothing      |""         |  nothing


In [None]:
t = np.arange(0.0, 5.0, 0.2)
fig, ax = plt.subplots()
ax.plot(t, t, '.', linewidth=5)
ax.plot(t, t**2, 'o', linewidth=5)
ax.plot(t, t**3, marker='+', linewidth=1) # with explicit arguments, you can set maker and linestyle separately.
ax.plot(t, -t, ls='', marker='v', linewidth=5) 
plt.show()

## Linestyles
Line styles are about as commonly used as colors. There are a few predefined linestyles available to use. Note that there are some advanced techniques to specify some custom line styles. 

linestyle          | description
-------------------|------------------------------
'-'                | solid
'--'               | dashed
'-.'               | dashdot
':'                | dotted
'None'             | draw nothing
' '                | draw nothing
''                 | draw nothing

Also, don't mix up `".-"` (line with dot markers) and `"-."` (dash-dot line) when using the ``plot`` function!

In [None]:
t = np.arange(0.0, 5.0, 0.2)
fig, ax = plt.subplots()
ax.plot(t, t, linestyle='-', linewidth=5)
ax.plot(t, t**2, linestyle='--', linewidth=5)
ax.plot(t, t**3, linestyle='-.', linewidth=5)
ax.plot(t, -t, linestyle=':', linewidth=5)
plt.show()

In [None]:
fig, ax = plt.subplots(1, 1)
ax.bar([1, 2, 3, 4], [10, 20, 15, 13], linestyle='--', edgecolor='r', linewidth=4)
plt.show()

## Colormaps
Another very important property of many figures is the colormap. The job of a colormap is to relate a scalar value to a color. In addition to the regular portion of the colormap, an "over", "under" and "bad" color can be optionally defined as well. NaNs will trigger the "bad" part of the colormap.

As we all know, we create figures in order to convey information visually to our readers. There is much care and consideration that have gone into the design of these colormaps. Your choice in which colormap to use depends on what you are displaying. In mpl, the "jet" colormap has historically been used by default, but it will often not be the colormap you would want to use. Much discussion has taken place on the mailing lists with regards to what colormap should be default. The v2.0 release of Matplotlib adopted a new default colormap, 'viridis', along with some other stylistic changes to the defaults.

[Here is the talk](https://www.youtube.com/watch?v=xAoljeRJ3lU) by Nathaniel Smith and Stéfan van der Walt at SciPy 2015 that does an excellent job explaining colormaps and how the new perceptually uniform colormaps where designed.



In [None]:
def plot_cmap(name, value_range=(0, 1)):
    
    gradient = np.linspace(*value_range, 256)
    gradient = np.vstack((gradient, gradient))
    fig, ax = plt.subplots(figsize=plt.figaspect(0.1))
    ax.imshow(gradient, aspect="auto", cmap=plt.get_cmap(name), vmin=0, vmax=1)
    pos = list(ax.get_position().bounds)
    x_text = pos[0] - 0.01
    y_text = pos[1] + pos[3]/2.
    ax.set_title(name, fontsize=20)
    ax.axis("off")

plot_cmap("inferno")

In [None]:
plot_cmap("winter")

[Here](https://matplotlib.org/stable/tutorials/colors/colormaps.html) you can find the full gallery of all the pre-defined colormaps, organized by the types of data they are usually used for.

## Mathtext
Oftentimes, you just simply need that superscript or some other math text in your labels. Matplotlib provides a very easy way to do this for those familiar with `LaTeX`. Any text that is surrounded by dollar signs will be treated as [mathtext](https://matplotlib.org/stable/tutorials/text/mathtext.html). Do note that because backslashes are prevalent in LaTeX, it is often a good idea to prepend an `r` to your string literal so that Python will not treat the backslashes as escape characters.

In [None]:
print("a\nb")
print(r"a\nb")

In [None]:
fig, ax = plt.subplots()
ax.scatter([1, 2, 3, 4], [4, 3, 2, 1])
ax.spines["top"].set(visible=False)  # remove spines so they don't intersect with the title
ax.spines["right"].set(visible=False)
ax.set_title(r"$\sigma_i=\frac{3}{5}$", fontsize=25)
plt.show()

# Limits, Legends and Layouts

In this section, we'll focus on what happens around the edges of the axes:  Ticks, ticklabels, limits, layouts, and legends.

## Legends

As you've seen in some of the examples so far, the X and Y axis can also be labeled, as well as the subplot itself via the title. 

However, another thing you can label is the line/point/bar/etc that you plot.  You can provide a label to your plot, which allows your legend to automatically build itself. 

In [None]:
fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4], [10, 20, 25, 30])  # Philadelphia
ax.plot([1, 2, 3, 4], [30, 23, 13, 4])  # Boston
ax.set(ylabel="Temperature (C°)", xlabel="Time", title="A Tale of Two Cities")
plt.show()

In [None]:
fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4], [10, 20, 25, 30], label="Philadelphia")
ax.plot([1, 2, 3, 4], [30, 23, 13, 4], label="Boston")
ax.set(ylabel="Temperature (C°)", xlabel="Time", title="A Tale of Two Cities")
plt.show()

The keyword argument `loc` allows to position the legend at different positions. The `"best"` argument is the default one which automatically chooses the location which overlaps the plot elements as little as possbile.

| Location String | Location Code |
| --- | --- |
| best | 0 |
| upper right | 1 |
| upper left | 2 |
| lower left | 3 |
| lower right | 4 |
| right | 5 |
| center left | 6 |
| center right | 7 |
| lower center | 8 |
| upper center | 9 |
| center | 10 |

In [None]:
fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4], [10, 20, 25, 30], label="Philadelphia")
ax.plot([1, 2, 3, 4], [30, 23, 13, 4], label="Boston")
ax.set(ylabel="Temperature (C°)", xlabel="Time", title="A Tale of Two Cities")
ax.legend(loc="lower center")
plt.show()

## Ticks, Ticklines, Ticklabels
This is a constant source of confusion:

* A Tick is the *location* of a Ticklabel.
* A Tickline is the small line that denotes the location of the tick.
* A Ticklabel is the text that is displayed at that tick.

In [None]:
fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4], [10, 20, 25, 35])

# manually set ticks and ticklabels on the x-axis
ax.xaxis.set(ticks=range(1, 5), ticklabels=[3, 100, -12, "foo"]) 

# make the y-ticks a bit longer and go both in and out
ax.tick_params(axis="y", direction="in", length=10)

plt.show()

In [None]:
data = [('apples', 2), ('oranges', 3), ('peaches', 1)]
fruit, value = zip(*data)

fig, ax = plt.subplots()
x = np.arange(len(fruit))

ax.bar(x, value, align="center", color="blue")

ax.set_xticks(x) # use the manual setter functions
ax.set_xticklabels(fruit, rotation="45", ha="right", rotation_mode="anchor") # rotate ticklabels for readability

plt.show()

## Subplot Spacing

The spacing between the subplots can be adjusted using [Figure.subplots_adjust](https://matplotlib.org/stable/api/figure_api.html#matplotlib.figure.Figure.subplots_adjust). Play around with the example below to see how the different arguments affect the spacing.

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(9, 9))

fig.subplots_adjust(wspace=0.3, hspace=0.7,
                    left=0.125, right=0.8,
                    top=0.7, bottom=0.2)

plt.show()

A common "gotcha" is that the labels are not automatically adjusted to avoid overlapping those of another subplot. Matplotlib does not currently have any sort of robust layout engine, as it is a design decision to minimize the amount of "magical plotting". We intend to let users have complete, 100% control over their plots. LaTeX users would be quite familiar with the amount of frustration that can occur with automatic placement of figures in their documents.

That said, there have been some efforts to develop tools that users can use to help address the most common compaints. The [tight layout](https://matplotlib.org/stable/api/tight_layout_api.html) feature, when invoked, will attempt to resize margins and subplots so that nothing overlaps. However, be warned - this is not a silver bullet and can sometimes create more problems than it solvs.

If you have multiple subplots, and want to avoid overlapping titles/axis labels/etc, `Figure.tight_layout` is a great way to do so:

In [None]:
def example_plot(ax):
    
    ax.plot([1, 2])
    ax.set_xlabel('x-label', fontsize=16)
    ax.set_ylabel('y-label', fontsize=8)
    ax.set_title('Title', fontsize=24)

In [None]:
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2)

example_plot(ax1)
example_plot(ax2)
example_plot(ax3)
example_plot(ax4)

plt.show()

In [None]:
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2)
example_plot(ax1)
example_plot(ax2)
example_plot(ax3)
example_plot(ax4)

fig.tight_layout()

plt.show()

# 2021-homework07

If you have reached the bottom of this notebook, you are now hopefully prepared to take on the current homework assignment, `2021-homework07`. You will find the link in a StudIP assignment as always.  

In this homework, you will create for types of plots:

* Line plot (with `Axes.plot`)
* Bar plot (with `Axes.bar`)
* Heatmap (with `Axes.imshow`)
* Set plot (also with `Axes.imshow`)

Have a good week!