# Introduction to Seaborn

Data Science  
TECNUN - Escuela de Ingeniería  
Universidad de Navarra

- Idoia Ochoa: iochoal@unav.es

Credits: Iñigo Apaolaza




Seaborn is built on top of Matplotlib and provides a higher-level interface for creating statistical graphics. This means that Seaborn requires less code to create complex visualizations compared to Matplotlib

First, we load the module.

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
sns.set(style="darkgrid")

## Scatterplots

Scatterplots are used to relate variables. Let's load some data to create some examples.

In [None]:
tips = sns.load_dataset("tips")
print(tips.shape)
tips.head()

With this data, the first question we ask ourselves is whether the tip is related in any way to the price of the meal; in other words, we want to know if a higher meal price leads to a higher tip.

In [None]:
# parameter `s` modifies the size of the points

With the `hue` variable (hue semantic), we can add a third dimension to the plot using colors. This way, we can see that there is a positive correlation between the bill and the tip, but there is no notable difference between men and women.

In [None]:
sns.relplot(x = "total_bill", y = "tip", hue = "sex", data = tips, s = 100)

Below, it shows how we can adjust the different parameters of the function to change the sizes, shapes, colors, etc., of the points.

In [None]:
# alpha for transparency


In [None]:
# Change shape of points


In [None]:
# Size of points based on a value


So far, the `hue` parameter has been categorical. That's why the points are shown in two colors. If it is numerical, a continuous color palette will be displayed.

In [None]:
# To modify the color palette.
sns.relplot(x = "total_bill", y = "tip", data = tips, hue = "size", palette="Spectral") # pastel

It's not highly recommended because it can be difficult to interpret, but we can add a fourth dimension to the plot by modifying the colors and shapes of the points separately.

## Line charts

To plot time series, for example, line charts are a very good option. Let's simulate some data to create some examples.



In [None]:
df = pd.DataFrame(dict(time = np.arange(500), value = np.random.randn(500).cumsum()))
df.head()

In [None]:
g = sns.relplot(x = "time", y = "value", kind = "line", data = df)

By using the "hue" parameter, we can draw different lines on the same graph. To do this, we will first create a new dataframe with the variables "x", "y", and "label". The first 250 observations will have label A and the last 250 will have label B. Then, we will plot the graph.


In [None]:
# Create a new df and add label column
df = pd.DataFrame(dict(x = np.concatenate((np.arange(250), np.arange(250))), y = np.random.randn(500).cumsum()))
df["label"] = 250*['A']+250*['B']
df.head()
df.tail()

Similarly, if we have different observations for the same x value, a shaded area is drawn around the line containing those values to show the dispersion. Additionally, we can also work by changing the line format to see subgroups.

In [None]:
df = pd.DataFrame(dict(x = np.concatenate((np.arange(125), np.arange(125), np.arange(125), np.arange(125))), y = np.random.randn(500).cumsum()))
df["label"] = 250*['A']+250*['B']
df["label2"] = 125*['C']+125*['D']+125*['C']+125*['D']
df

To finish with the line chart, it's worth mentioning that the other parameters available for scatterplots (size, palette...) can also be used for line charts.

## Histograms

Let's simulate some data from a normal distribution and plot the corresponding histogram.

In [None]:
x = np.random.randn(100)

We say "kde = False" to plot only the histogram and not the density curve.

If we want to plot only the density curve, we will do the following:

## Categorical Scatterplots

Returning to the tips dataset, we will now draw different scatter plots that Seaborn provides for categorical variables.

In [None]:
tips.head()

In [None]:
# kind swarm so that the points are not overlapping


In [None]:
# Plot only for parties bigger than 1


## Boxplots

Below are several box plots with different parameters that can be defined.

## Violin Plots

Violin plots are very similar to box plots. The only difference is that instead of a box, they display the distribution of the values.

## Bar charts

Let's use the Titanic data to draw bar charts.

In [None]:
titanic = sns.load_dataset("titanic")
titanic.head()

In [None]:
titanic.tail()

In [None]:
print(titanic[(titanic["sex"] == 'male') & (titanic["class"] == 'First')].survived.sum()/len(titanic[(titanic["sex"] == 'male') & (titanic["class"] == 'First')]))
print(titanic[(titanic["sex"] == 'male') & (titanic["class"] == 'Second')].survived.sum()/len(titanic[(titanic["sex"] == 'male') & (titanic["class"] == 'Second')]))
print(titanic[(titanic["sex"] == 'male') & (titanic["class"] == 'Third')].survived.sum()/len(titanic[(titanic["sex"] == 'male') & (titanic["class"] == 'Third')]))

print(titanic[(titanic["sex"] == 'female') & (titanic["class"] == 'First')].survived.sum()/len(titanic[(titanic["sex"] == 'female') & (titanic["class"] == 'First')]))
print(titanic[(titanic["sex"] == 'female') & (titanic["class"] == 'Second')].survived.sum()/len(titanic[(titanic["sex"] == 'female') & (titanic["class"] == 'Second')]))
print(titanic[(titanic["sex"] == 'female') & (titanic["class"] == 'Third')].survived.sum()/len(titanic[(titanic["sex"] == 'female') & (titanic["class"] == 'Third')]))


## Graph panels

Seaborn allows you to draw panels with different plots.

In [None]:
g = sns.catplot(x="fare", y="survived", row="class", kind="box", orient="h", height=1.5, aspect=4, data=titanic.query("fare > 0"))
g.set(xscale="log")

In [None]:
g = sns.catplot(x="fare", y="survived", row="class", col = "sex", kind="box", orient="h", height=1.5, aspect=4, data=titanic.query("fare > 0"))
g.set(xscale="log")