# DAML 05 - Seaborn

Michal Grochmal <michal.grochmal@city.ac.uk>

A higher level plotting library.
Where by higher-level we mean that it is capable of producing specific types of plots
without the need to draw every axes of it.  Also, contrary to `matplotlib`, `seaborn`
can operate directly on `pandas`' data frames.

`seaborn` depends on `matplotlib`, i.e. it is built on top of it, but also depends on
`numpy`, `pandas` and `scipy` for several of its plots.  Moreover, the Python `statsmodels`
library can also be used by `seaborn` to get nicer plots.  Despite the fact that we will not
cover the `statsmodels` or `scipy` libraries both deserve a mention:

-   [scipy][] is the mathematical library for Python on top of NumPy,
    it is originally built for signal processing yet contains some regression
    and statistical models.

-   [statsmodels][] has a good deal of statistical models but also several
    tools to perform statistical test (e.g. several null hypothesis tests).

[scipy]: https://docs.scipy.org/doc/scipy/reference/ "scipy documentation"
[statsmodels]: http://www.statsmodels.org "statsmodels documentation"

And there's more: `seaborn` has a better set of plot aesthetics that makes `matplotlib` plots
look considerably nicer.  We have been using the `seaborn-whitegrid` style for a while already,
yet the full extent of `seaborn` aesthetics is often even more pleasing to the eye.
And good looking graphs are particularly important if you need to convince someone that
your work is meaningful (unfortunately, but that is the harsh reality).

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('classic')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (12.5, 6.0)
import numpy as np
import pandas as pd

With imports done let's compare `matplotlib` graphs before and after applying
`seaborn`'s styles.

In [None]:
x = np.linspace(0, 10, 500)
y = np.array([np.sin(x), np.cos(x), np.sin(x+1)]).T
plt.plot(x, y)
plt.legend(['sin(x)', 'cos(x)', 'sinc(x+1)']);

In [None]:
import seaborn as sns
sns.set()

In [None]:
plt.plot(x, y)
plt.legend(['sin(x)', 'cos(x)', 'sinc(x+1)']);

## Distributions

Let's use again the Iris dataset to go on a whirlwind tour of `seaborn` histograms
and sample distribution plots.

In [None]:
iris = sns.load_dataset('iris')
iris.head()

In [None]:
ax = plt.axes()
ax.hist(iris.sepal_length, alpha=0.5)
ax.hist(iris.petal_length, alpha=0.5)
ax.hist(iris.petal_width, alpha=0.5)
ax.legend();

In [None]:
ax = plt.axes()
sns.kdeplot(iris.sepal_length, shade=True, ax=ax)
sns.kdeplot(iris.petal_length, shade=True, ax=ax)
sns.kdeplot(iris.petal_width, shade=True, ax=ax)
ax.set_ylim((0, 0.5));  # KDE is always normalised

In [None]:
ax = plt.axes()
sns.distplot(iris.sepal_length, ax=ax)
sns.distplot(iris.petal_length, ax=ax)
sns.distplot(iris.petal_width, ax=ax);
ax.set_ylim((0, 0.7));

In [None]:
sns.kdeplot(iris.sepal_width, iris.sepal_length);

In [None]:
sns.jointplot(iris.sepal_width, iris.sepal_length);

In [None]:
sns.jointplot(iris.sepal_width, iris.sepal_length, kind='kde');

In [None]:
mean = [0, 0]
cov = [[3, 1],
       [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 10240).T
sns.jointplot(x, y, kind='hex');

## Joint Plots

The joint plot plots one vector against another, and it can even perform a regression for us.
We will see much more about regression later, for now let's see how the orbital period and mass
of a planet are related to each other.  We can load a dataset about recently discovered planets
directly from `seaborn`.

In [None]:
planets = sns.load_dataset('planets')
planets.head()

In [None]:
sns.jointplot(planets.orbital_period, planets.mass, kind='reg');

## Pair Plot

If we do not require the lateral histograms, which is normally when data points
do not overlap much, we can compare more vectors at the same time with `pairplots`.
Here we can see why the Iris dataset is so popular, it is a great example of
three classes where one can be easily separated but the other two cannot.

In [None]:
sns.pairplot(iris, hue='species');

## Facets

When comparing a numeric quantity against a categorical quantity we would often
group by the categorical quantity.  Facet plots allow us to perform the grouping
against one or even two categorical quantities as a grid, with the group by
operation implicit into the grid.  This is very visible on the titanic dataset.

In [None]:
titanic = sns.load_dataset('titanic')
titanic.head()

In [None]:
grid = sns.FacetGrid(titanic, row='sex', col='class', margin_titles=True)
grid.map(plt.hist, 'age');

## Factor (Cat) Plots

Quantiles are often used to describe one feature of data.  Just like with the facet,
`seaborn` allows us to look at quantiles whilst grouping by a category.
In several different visuals at that.

In [None]:
#sns.factorplot('class', 'fare', 'sex', data=titanic, kind='box', size=12)  # old seaborn
sns.catplot('class', 'fare', 'sex', data=titanic, kind='box', height=12);

In [None]:
#sns.factorplot('class', 'fare', 'sex', data=titanic, kind='violin', size=12)  #old seaborn
sns.catplot('class', 'fare', 'sex', data=titanic, kind='violin', height=12);

## Bars

Same as with facets and factor plots the bar plots also allow for groupings.

In [None]:
#sns.factorplot('year', data=planets, kind='count', aspect=3)  # old seaborn
sns.catplot('year', data=planets, kind='count', aspect=3);

In [None]:
#sns.factorplot('year', data=planets, hue='method', kind='count', aspect=2, size=6)  # old seaborn
sns.catplot('year', data=planets, hue='method', kind='count', aspect=2, height=6);