# 1 Plotting with Seaborn

Seaborn is a data visualisation library for Python which builds on the
matplotlib package.

It is designed primarily with data exploration in mind. In particular:

- Seaborn integrates much more closely with pandas data structures
- It is capable of performing operations on entire datasets
- Its visualisation functions are designed to quickly produce detailed and
  informative statistical plots with few lines of code.

When importing seaborn, the convention is to use the alias `sns`:


In [None]:
import seaborn as sns

## 1.1 Basic plots

Before we can start making plots, we need some data! Let's load in the "iris"
dataset:


In [None]:
iris = sns.load_dataset("iris")
iris.head()

### Scatter plots

We will begin with a basic scatter plot of petal length versus sepal length:


In [None]:
sns.scatterplot(
    data=iris,
    x="sepal_length",
    y="petal_length",
)

Notice how seaborn is able to set the axis labels using the column names, so it
is not necessary to specify the labels manually!

With seaborn, we are able to control the formatting of the markers using the data:


In [None]:
sns.scatterplot(
    data=iris,
    x="sepal_length",
    y="petal_length",
    hue="species",
    style="species",
)

Seaborn also takes care of the legend, giving us a lovely detailed scatter plot
with just a single function call!

We can also set a theme with seaborn. Running `sns.set_theme()` will set all
subsequent figures to seaborn's "default" theme:


In [None]:
sns.set_theme()  # sets all figures to the default theme

In [None]:
sns.scatterplot(
    data=iris,
    x="sepal_length",
    y="petal_length",
    hue="species",
    style="species",
)

This can be undone by running `sns.reset_orig()`:

### Line plots

To demonstrate line plots, let's first load in another dataset:


In [None]:
flights = sns.load_dataset("flights")
flights.head()

This time we will call the `lineplot()` function:


In [None]:
sns.lineplot(data=flights, x="year", y="passengers")

In addition to a solid line, we also get a shaded error region which, by
default, represents the 95% confidence interval.

To remove the error region, we just need to include the argument `ci=None`.
Alternatively we could set `ci` to an integer to get a specific percentile
range, or `"sd"` to display the standard deviation:


In [None]:
sns.lineplot(
    data=flights,
    x="year",
    y="passengers",
    ci="sd",
)

We can also control the line style, colour, etc using the data. For example:


In [None]:
sns.lineplot(
    data=flights,
    x="year",
    y="passengers",
    hue="month",
    style="month",
)

## Exercises Q1

Please complete *Q1* of [this exercise sheet](1-exercises.ipynb#Q1\))

## 1.2 Seaborn vs matplotlib

Let's motivate why we believe that seaborn is a good choice for statistical
visualisations!

To do this, we will create a basic regression plot with seaborn and attempt to
replicate it with matplotlib.

### Example: scatter plot with a trend line

Let's make a basic scatter plot and overlay with a linear trend line. This is
trivial with seaborn:


In [None]:
sns.regplot(data=iris, x="sepal_length", y="petal_length")

Now let's see ho to make a similar figure with matplotlib:


In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Calculate the linear relationship
x, y = iris["sepal_length"], iris["petal_length"]
lin = np.polyfit(x, y, 1)
pred = np.poly1d(lin)

# Generate the plot
plt.scatter(x, y)
plt.plot(x, pred(x))
plt.xlabel("sepal_length")
plt.ylabel("petal_length")

plt.show()

We've used a lot more code, and don't even have a shaded confidence interval!

This highlights a number of drawbacks with solely using matplotlib:

- Matplotlib has no regression functionality, so we have to calculate the linear
  model separately
- The data points and the trend line have to be added via separate function
  calls
- Matplotlib has no functionality for plotting straight from a `DataFrame`
- Because matplotlib cannot access the `DataFrame` labels, we have to supply the
  axis labels manually

### Why use matplotlib?

If all you want is a quick statistical plot to visualise and explore your data,
seaborn is probably the better choice.

However, seaborn actually *wraps around* matplotlib! The `sns.regplot()`
function is just a wrapper designed to give us a detailed statistical
visualisation with much shorter code.

We have also seen that matplotlib is excellent for customisation and
fine-tuning of figures.

So it is generally helpful to work with _both_ libraries when creating
statistical visualisations!

## 1.3 Customising seaborn figures with matplotlib

Because seaborn is built on top of matplotlib, we are able to work with both
libraries simultaneously.

For example, we can initialise a figure using matplotlib, then modify the
figure using seaborn.

### Example: scatter plot with a trend line

First, let's select a more visually-appealing style using matplotlib:


In [None]:
plt.style.use("seaborn-v0_8")

Selecting the "seaborn" style sheet will set all figures to seaborn's default
theme which we saw earlier with `sns.set_theme()`.

Let's now reproduce our regression plot using a combination of matplotlib's
object-oriented interface and seaborn's `regplot()` function.

### Aside: Matplotlib's Object Oriented (OO) interface

With Matplotlib's Object-Oriented interface, it is much easier to keep track 
of what and where we are plotting.

To create a plot with the OO interface we first create an instance of the `Figure` class.


In [None]:
fig = plt.figure(figsize=(6, 4))

We then need to add a set of axes to the figure. If we wish to create an `Axes` 
that occupies the entire `Figure` we can use a shorthand.


In [None]:
fig, ax = plt.subplots(figsize=(6, 4))

### Back to our example...

To add seaborn's regression plot to our matplotlib figure, we just need to
specify an `ax` argument when calling `sns.regplot()`:


In [None]:
fig, ax = plt.subplots(figsize=(5, 6))
sns.regplot(
    data=iris,
    x="sepal_length",
    y="petal_length",
    ax=ax,
)
ax.set_xlabel("Sepal length")
ax.set_ylabel("Petal length")

### Standard workflow

The previous examples motivate a sensible workflow when working with seaborn:

1. Set a style using matplotlib
2. Initialise the figure using matplotlib's object-oriented interface
3. Insert the desired statistical visualisations using seaborn
4. Customise and fine-tune the figure using matplotlib

[Next Chapter >>>](../chapter2/2-demo.ipynb)

