Seaborn References and Tutorial

The purpose of this notebook is to keep a running list of examples of Seaborn for use in the future. Feel free to take anything for your own use.

**Step 1: Import libraries**

First we import the libraries we'll need for our future datasets and visualization.

In [None]:
import numpy as np 
# Pandas is a good library for managing datasets
import pandas as pd 

# Matplotlib allows for additional customization
# %matplotlib inline to display our plots inside your notebook.
from matplotlib import pyplot as plt
%matplotlib inline

# Seaborn for plotting and styling
import seaborn as sns

**Step 2: Import datasets**

I've imported a simple Pokemon dataset to run tests on.

In [None]:
combats = pd.read_csv("../input/pokemon/combats.csv")
pokemon = pd.read_csv("../input/pokemon/pokemon.csv")
tests = pd.read_csv("../input/pokemon/tests.csv")


In [None]:
# head function displays the first five rows. 
pokemon.head()

**Step 3: Run some Seaborn plots**

I wanted to try references Seaborn more because of all its plotting possibilities. I'll run some here and keep adding as I see more.

In [None]:
pokemon.columns

In [None]:
sns.lmplot(x="Attack", y="Defense", data=pokemon);
 
# An Alternative way
#sns.lmplot(x=df.Male_Pct, y=df.Female_Pct)

In [None]:
# Adding a bit more style and a legendary filter
sns.set_style('whitegrid')
sns.lmplot(
    x="Attack",
    y="Defense",
    data=pokemon,
    fit_reg=False,
    hue='Legendary',
    palette="Set1")

In [None]:
sns.set_style('darkgrid')  #changes the background of the plot
plt.figure(figsize=(14, 6))
sns.regplot(
    x="Attack", y="Defense", data=pokemon,
    fit_reg=True)  #fit_Reg fits a regression line

In [None]:
# We can make faceted plots where we can segment plots based on another categorical variable: Generation in this case

plt.figure(figsize=(20, 6))
sns.set_style('whitegrid')
sns.lmplot(
    x="Attack",
    y="Defense",
    data=pokemon,
    fit_reg=False,
    hue='Legendary',
    col="Generation",
    aspect=0.4,
    height=10)

In [None]:
# We can also see plot a continous variable against a categorical column. 
# Below we're trying to see relationship between Speed and Legendary status

plt.figure(figsize=(14, 6))
sns.set_style('whitegrid')
sns.regplot(x="Legendary", y="Speed", data=pokemon)

In [None]:
# One issue with this plot is we cannot see the distribution at each value of speed as the points are overlapping. 
# This can be fixed by an option called jitter

plt.figure(figsize=(14, 6))
sns.set_style("ticks")
sns.regplot(x="Legendary", y="Speed", data=pokemon, x_jitter=0.3)

Fitting a logistic relationship

In [None]:
plt.figure(figsize=(14, 6))
sns.set_style("ticks")
sns.regplot(x="Attack", y="Legendary", data=pokemon, logistic=True)

kde = True option tries to estimate the density based on gaussian kernel

In [None]:
plt.figure(figsize=(12, 6))
ax = sns.distplot(
    pokemon['Defense'], kde=True,
    norm_hist=False)  #norm_hist normalizes the count
ax.set_title('Defense')
plt.show()

Joint plots

I love joint plots. They aren't always the best for actual data visualization but it is fun to mess around and they have the ability to visualize information in interesting ways.

In [None]:
plt.figure(figsize=(12, 6))
sns.jointplot(x='Attack', y='Defense', data=pokemon)

In [None]:
plt.figure(figsize=(12, 6))
sns.jointplot(x='HP', y='Speed', data=pokemon, kind='kde')

In [None]:
# Kind = hex is interesting
plt.figure(figsize=(12, 6))
sns.jointplot(x='HP', y='Speed', data=pokemon, kind='hex')

Pairplots

To see relationships between all pairwise combination of variables, we can use pairplot

In [None]:
sns.pairplot(
    pokemon,
    hue='Legendary',
    vars=['Speed', 'HP', 'Attack', 'Defense', 'Generation'],
    diag_kind='kde')

Count Plots

In [None]:
plt.figure(figsize=(20, 6))
ax = sns.countplot(x="Type 1", data=pokemon, color='green')

In [None]:
plt.figure(figsize=(20, 6))
sns.countplot(
    x="Type 1", data=pokemon, hue='Legendary',color='green',
    dodge=False)  #dodge = False option is used to make stacked plots

Bar plots

In the first plot, the value is average Speed by Type of the pokemon and the black line indicates the confidence interval

In [None]:
sns.set_style('darkgrid')
plt.figure(figsize=(20, 6))
sns.barplot(x="Type 1", y='Speed', data=pokemon, color='green')

In [None]:
sns.set_style('darkgrid')
plt.figure(figsize=(20, 6))
sns.barplot(x="Type 1", y='Speed', data=pokemon, hue='Legendary')

Point Plot

In [None]:
plt.figure(figsize=(20, 6))
sns.pointplot(x="Generation", y='Speed', data=pokemon, hue='Legendary')

Striplot

In [None]:
plt.figure(figsize=(12, 6))
sns.stripplot(x="Generation", y="Speed", data=pokemon)

Remember that we can use jitter to expand the scatter points

In [None]:
plt.figure(figsize=(12, 6))
sns.stripplot(x="Generation", y="Speed", data=pokemon, jitter=0.4)

Swarm Plot

Swarmplot goes one step further by displaying all the points with no overlap at all

In [None]:
sns.set_style('ticks')
plt.figure(figsize=(12, 6))
sns.swarmplot(x="Generation", y="Speed", data=pokemon, hue='Legendary')

Box Plots

In [None]:
sns.boxplot(data=pokemon)

Lets clean that up a bit

In [None]:
# Pre-format
stats_pokemon = pokemon.drop(['Generation', 'Legendary'], axis=1)
sns.boxplot(data=stats_pokemon)

In [None]:
# Set theme
sns.set_style('whitegrid')
 
# Violin plot
plt.figure(figsize=(15, 6))
sns.violinplot(x='Type 1', y='Attack', data=pokemon)

Facet Grids

Facet grids allow us to display more than one graph in a plot.

In [None]:
grid1 = sns.FacetGrid(data=pokemon, col='Generation', col_wrap=3)

grid1.map(plt.hist, "Speed")

In [None]:
# Something a little more complex
grid2 = sns.FacetGrid(data=pokemon, col='Generation', col_wrap=3, hue="Legendary")

grid2.map(sns.regplot, "Speed", "HP", fit_reg=False).add_legend()

Separating rows via filter

In [None]:
grid3 = sns.FacetGrid(
    data=pokemon, col='Generation', row='Legendary', margin_titles=True)

grid3.map(sns.regplot, "Speed", "HP", fit_reg=False)