In [None]:
%matplotlib inline
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

In exercise 4.1 we used seaborn to control figure aesthetics, however seaborn has much more to offer. Seaborn works especially well with pandas DataFrames; in most cases seaborn will take care of extracting the data from the DataFrame. In the following exercises we will explore some of the plotting functionatity of seaborn. Most plotting functions in seaborn have a similar syntax: `sns.functionname(data=df,x='column_x',y='column_y')`, where `column_x` and `column_y` are columns in DataFrame `df`. 

In this exercise we will use the sample dataset *tips*. Load it with the code below and have a look at the data.

In [None]:
df = sns.load_dataset("tips")
display(df)

# Scatterplot

Next, we use `sns.scatterplot` to make a scatter plot: 

In [None]:
sns.scatterplot(data=df,x='tip',y='total_bill')

As you can see, seaborn uses the column names to automatically set the axes labels. 

There is a lot more data that is not being visualized. More information could be put in the color of the dots and the shape of the dots, which can be achieved with the arguments `hue` and `style`. Both these arguments should be set to a column name. Add `hue` and `style` to the scatter plot. What do they do?

In [None]:
#sns.scatterplot(data=df,x='tip',y='total_bill',...)

The scatterplot shown above gives you insight in the relation between tip and bill. Set up a plot that gives insight in the relation between the tip and the number of diners.

In [None]:
# make plot

Do you think this is an informative plot? Can you think of any better kinds of plots?

# Categorical plots

You can divide data into two categories (no pun intented): numerical data and categorical data, like the group size. Note that group size is actually both numerical and categorical. 

Have look at the data on tipping. Which columns do you think are categorical and which are numerical, and are there any columns that could be both?


The function `sns.stripplot` is used in a similar manner as `sns.scatterplot`. Create a strip plot with the group size on the x-axis and the tip on the y-axis.

In [None]:
#sns.stripplot(...)

How does this differ from the scatter plot you made earlier?

`hue` also works with `stripplot`; use this to differentiate between men and women.

In [None]:
# add hue to stripplot

There several other function to plot categorical data:
- `sns.swarmplot`
- `sns.boxplot`
- `sns.violinplot`
- `sns.barplot`

Try out these functions with the tipping data

In [None]:
# try different plotting functions

# Seaborn and matplotlib

Seaborn is built on top of matplotlib, which means that all matplotlib functions work with seaborn plots. Before, we used the matplotlib `Axis` object (`ax`) to modify plots. In order to create a Seaborn plot on a specific matplotlib `Axis`, the `ax` argument must be used:

In [None]:
fig = plt.figure()
ax = plt.gca()
sns.scatterplot(data=df,x='tip',y='total_bill',ax=ax)

Use matplotlib functions to:
* add a title
* change the axes labels
* save this figure to file

# Pair plots

With the plotting functions above, we were only able to look at one numerical column. With `pairplot` we can inspect all numerical columns at once:

In [None]:
sns.pairplot(df)

You can specify the columns that should be included with the `vars` argument. What column(s) do you think could be left out? Make a pairplot that does not include that column.

In [None]:
# make more usefull pairplot

`pairplot` does support `hue`; map gender on the pairplot.

In [None]:
# use hue in pairplot

Interestingly, more happens than just the mapping of the genders. This is because the kind of plot for the on-diagonal plots is set to auto. You can change this with the argument `diag_kind`. Use the documentation to figure out what values `diag_kind` can have and get the histograms back.

In [None]:
# change type of plot on the diagonal

It is also possible to draw a regression plot instead of the scatter plot on the off-diagonal plots. Look into the documentation of `pairplot` for this option and create a pariplot with regression plots.

In [None]:
# put regression plot on off-diagonal

# Advanced plot

Reproduce this plot using `subplot`, Seaborn's function for figure aesthetics and Seaborn's categorical plot function.

<img src="advanced_plot.svg" width="70%">



In [None]:
# use everything you learned in the previous exercises to recreate this plot