# Exercise solutions



## Q1)

In this question you will create some scatter and line plots using the health
expenditure data, which can be loaded by running:

In [None]:
import seaborn as sns

data = sns.load_dataset("healthexp")
data.head()

### Q1a)

First, set the seaborn theme for your plots by running the code below.


In [None]:
sns.set_theme()

### Q1b)

Using seaborn, create a scatter plot showing life expectancy (on the $y$-axis)
against year (on the $x$-axis).


In [None]:
# Your Q1b) code here
sns.scatterplot(data=data, x="Year", y="Life_Expectancy")

### Q1c)

Now update the colour and shape of the data points based on country.


In [None]:
# Your Q1c) code here
sns.scatterplot(
    data=data,
    x="Year",
    y="Life_Expectancy",
    hue="Country",
    style="Country",
)

### Q1d)

The code below will generate a subset of the data for Great Britain:


In [None]:
gb = data[data["Country"] == "Great Britain"]

With this data, use seaborn to create a scatter plot of the life expectancy in
Great Britain against time. Then resize the data points in your plot based on
the annual health expenditure.

*Hint*: use the `size` argument to vary the point size.


In [None]:
# Your Q1d) code here
sns.scatterplot(
    data=gb,
    x="Year",
    y="Life_Expectancy",
    size="Spending_USD",
)

### Q1e)

Using the "full" dataset again (not just data relating to Great Britain),
create a line plot showing the average health expenditure for each year.

Use a shaded region to indicate the standard deviation.

*Hint*: set the argument `ci="sd"` to use the standard deviation.


In [None]:
# Your Q1e) code here
sns.lineplot(
    data=data,
    x="Year",
    y="Spending_USD",
    ci="sd",
)

### Q1f)

Create another line plot showing the health expenditure against time for each
country. Use a distinct line style and colour for each country.


In [None]:
# Your Q1f) code here
sns.lineplot(
    data=data,
    x="Year",
    y="Spending_USD",
    hue="Country",
    style="Country",
)

## Q2)

We will now see how to combine matplotlib syntax with seaborn's visualisation functions
to construct subplots.

### Q2a)

Run the code below to initialise a `plt.subplots()` figure with two panels.


In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(
    nrows=2,
    ncols=1,
    figsize=(6, 5),
)

The top panel is accessed using `ax[0]` and the bottom panel is accessed with `ax[1]`.

### Q2b)

The code below adds a regression plot to the upper panel, showing the best-fit trendline
for the life expectancy vs the health expenditure for the Great Britain data.

Display the residuals in the lower panel.

*Hint*: use the argument `ax` to specify which axes to use.


In [None]:
# Your Q2b) code here
fig, ax = plt.subplots(
    nrows=2,
    ncols=1,
    figsize=(6, 5),
)
sns.regplot(
    data=gb,
    x="Spending_USD",
    y="Life_Expectancy",
    ax=ax[0],
)
sns.residplot(
    data=gb,
    x="Spending_USD",
    y="Life_Expectancy",
    ax=ax[1],
)

_NB. Because our figure has been created with matplotlib, it can be customised
using matplotlib syntax. For example, you could change the y-axis label of the lower
panel to "Residuals" using_ `ax[1].set_ylabel("Residuals")`

## Q3)

In this question we will plot the distribution of pulse rates for different
types of exercise and diet. First, load the dataset with:


In [None]:
import seaborn as sns

exercise = sns.load_dataset("exercise")
exercise.head()

### Q3a)

Plot the distribution of pulse rates using a histogram.


In [None]:
# Your Q3a) code here
sns.histplot(data=exercise, x="pulse")

### Q3b)

Plot the distributions of pulse rates, with a separate histogram for each
exercise type (listed in the `"kind"` column). Use stepped histograms.

*Hint*: specify the argument `element="step"` to plot a stepped histogram.


In [None]:
# Your Q3b) code here
sns.histplot(
    data=exercise,
    x="pulse",
    element="step",
    hue="kind",
)

### Q3c)

Recreate the plot from Q3b), but visualising the distributions with a KDE,
rather than a histogram.


In [None]:
# Your Q3c) code here
sns.kdeplot(data=exercise, x="pulse", hue="kind")

## Q4)

In this question we will practice generating facet grids and pair grids.

### Q4a)

Using a facet grid of histograms, display the distributions of pulse rates for
every combination of diet and exercise type


In [None]:
# Your Q4a) code here
g = sns.FacetGrid(data=exercise, row="diet", col="kind")
g.map(sns.histplot, "pulse")

### Q4b)

We will now construct a pair grid plot using the iris dataset, which can be loaded by
running:


In [None]:
iris = sns.load_dataset("iris")
iris.head()

The code below will generate an empty pair grid. Update this so that the
diagonal panels display histograms, the upper panels display scatter plots, and
the lower panels display KDE plots.


In [None]:
# Your Q4b) code here
g = sns.PairGrid(data=iris, diag_sharey=False)
g.map_diag(sns.histplot) \
 .map_lower(sns.kdeplot) \
 .map_upper(sns.scatterplot)

## Q5)

Change the axis labels of your pair plot from Q4b) to `"Sepal length"`,
`"Sepal width"`, `"Petal length"`, `"Petal width"`.

*Hint*: try using a for loop to update the labels.


In [None]:
# Your Q4c) code here
g = sns.PairGrid(data=iris, diag_sharey=False)
g.map_diag(sns.histplot) \
 .map_lower(sns.kdeplot) \
 .map_upper(sns.scatterplot)

fig, ax = g.figure, g.axes
labels = [
    "Sepal length",
    "Sepal width",
    "Petal length",
    "Petal width",
]
for i, lab in enumerate(labels):
    ax[i, 0].set_ylabel(lab)
    ax[3, i].set_xlabel(lab)