# Intro to seaborn

As the charts get more complex, the more the code you’ve got to write. For example, in matplotlib, there is no direct method to draw a density plot of a scatterplot with line of best fit. You get the idea.

So, what you can do instead is to use a higher level package like seaborn, and use one of its prebuilt functions to draw the plot.

We are not going in-depth into seaborn. But let’s see how to get started and where to find what you want. A lot of seaborn’s plots are suitable for data analysis and the library works seamlessly with pandas dataframes.

seaborn is typically imported as `sns`. Like matplotlib it comes with its own set of pre-built styles and palettes.

In [None]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# Read dataset
df = pd.read_csv('Pokemon.csv', index_col=0, encoding='unicode_escape')
df.head()

## Seaborn's plotting functions

One of Seaborn's greatest strengths is its diversity of plotting functions. For instance, making a scatter plot is just one line of code using the `lmplot` function.

For example, let's compare the Attack and Defense stats for our Pokémon:

In [None]:
sns.set_style("whitegrid") # let's try a different theme
sns.lmplot(x='Attack', y='Defense', data=df);

By the way, Seaborn doesn't have a dedicated scatter plot function, which is why you see a diagonal line. We actually used Seaborn's function for fitting and plotting a regression line.

Thankfully, each plotting function has several useful options that you can set. Here's how we can tweak the `lmplot`:

* First, we'll set `fit_reg=False` to remove the regression line, since we only want a scatter plot.
* Next, we'll set `hue='Stage'` to color our points by the Pokémon's evolution stage. This hue argument is very useful because it allows you to express a third dimension of information using color.

In [None]:
sns.set_style("darkgrid") # let's try a different theme
sns.lmplot(x='Attack', y='Defense', data=df,
           fit_reg=False, # No regression line
           hue='Stage');   # Color by evolution stage

## Customizing with matplotlib

Remember, Seaborn is a high-level interface to Matplotlib. Seaborn will get you most of the way there, but you'll sometimes need to bring in Matplotlib.

Look at the previous plot: do you notice that the bottom left corner doesn't start with 0? Setting your axes limits is one of those times, but the process is pretty simple:

* First, invoke your Seaborn plotting function as normal.
* Then, invoke Matplotlib's customization functions. In this case, we'll use its `ylim` and `xlim` functions.

Here's our new scatter plot with sensible axes limits:

In [None]:
sns.set_style("dark") # let's try a different theme
# Plot using Seaborn
sns.lmplot(x='Attack', y='Defense', data=df,
           fit_reg=False, 
           hue='Stage')
 
# Tweak using Matplotlib
plt.ylim(0, None)
plt.xlim(0, None);

# (look at the bottom left corner now)

## The role of Pandas

Even though this is a Seaborn tutorial, Pandas actually plays a very important role. You see, Seaborn's plotting functions benefit from a base DataFrame that's reasonably formatted.

For example, let's say we wanted to make a box plot for our Pokémon's combat stats:

In [None]:
sns.set_style("white") # let's try a different theme
# Boxplot
sns.boxplot(data=df);

Well, that's a reasonable start, but there are some columns we'd probably like to remove:

* We can remove the *Total* since we have individual stats.
* We can remove the *Stage* and *Legendary* columns because they aren't combat stats.

In turns out that this isn't easy to do within Seaborn alone. Instead, it's much simpler to pre-format your `DataFrame`.

Let's create a new `DataFrame` called `stats_df` that only keeps the stats columns:

In [None]:
# Pre-format DataFrame
stats_df = df.drop(['Total', 'Stage', 'Legendary'], axis=1)
 
# New boxplot using stats_df
sns.boxplot(data=stats_df);

While we're at it, let's talk about violin plots:

* Violin plots are useful alternatives to box plots.
* They show the distribution (through the thickness of the violin) instead of only the summary statistics.

For example, we can visualize the distribution of Attack by Pokémon's primary type:

In [None]:
# Set theme
sns.set_style('whitegrid')
 
# Violin plot
sns.violinplot(x='Type 1', y='Attack', data=df);

As you can see, Dragon types tend to have higher Attack stats than Ghost types, but they also have greater variance.

Now, Pokémon fans might find something quite jarring about that plot: The colors are nonsensical. Why is the Grass type colored pink or the Water type colored orange? We must fix this!

## Color palettes

Fortunately, Seaborn allows us to set custom color palettes. We can simply create an ordered Python list of color hex values.

In [None]:
pkmn_type_colors = ['#78C850',  # Grass
                    '#F08030',  # Fire
                    '#6890F0',  # Water
                    '#A8B820',  # Bug
                    '#A8A878',  # Normal
                    '#A040A0',  # Poison
                    '#F8D030',  # Electric
                    '#E0C068',  # Ground
                    '#EE99AC',  # Fairy
                    '#C03028',  # Fighting
                    '#F85888',  # Psychic
                    '#B8A038',  # Rock
                    '#705898',  # Ghost
                    '#98D8D8',  # Ice
                    '#7038F8',  # Dragon
                   ]

Now we can simply use the `palette=` argument to recolor our chart.

In [None]:
# Violin plot with Pokemon color palette
sns.violinplot(x='Type 1', y='Attack', data=df, 
               palette=pkmn_type_colors); # Set color palette

Violin plots are great for visualizing distributions. However, since we only have 151 Pokémon in our dataset, we may want to simply display each point.

That's where the swarm plot comes in. This visualization will show each point, while "stacking" those with similar values:

In [None]:
# Swarm plot with Pokemon color palette
sns.swarmplot(x='Type 1', y='Attack', data=df, 
              palette=pkmn_type_colors);

That's handy, but can't we combine our swarm plot and the violin plot? After all, they display similar information, right?

## Overlaying plots

The answer is yes.

It's pretty straightforward to overlay plots using Seaborn, and it works the same way as with Matplotlib. Here's what we'll do:

* First, we'll make our figure larger using Matplotlib.
* Then, we'll plot the violin plot. However, we'll set inner=None to remove the bars inside the violins.
* Next, we'll plot the swarm plot. This time, we'll make the points black so they pop out more.
* Finally, we'll set a title using Matplotlib.

In [None]:
# Set figure size with matplotlib
plt.figure(figsize=(10,6))
 
# Create plot
sns.violinplot(x='Type 1',
               y='Attack', 
               data=df, 
               inner=None, # Remove the bars inside the violins
               palette=pkmn_type_colors)
 
sns.swarmplot(x='Type 1', 
              y='Attack', 
              data=df, 
              color='k', # Make points black
              alpha=0.7) # and slightly transparent
 
# Set title with matplotlib
plt.title('Attack by Type');

## Let's go crazy

Well, we could certainly repeat that chart for each stat. But we can also combine the information into one chart... we just have to do some data wrangling with Pandas beforehand.

First, here's a reminder of our data format:

In [None]:
stats_df.head()

As you can see, all of our stats are in separate columns. Instead, we want to "melt" them into one column.

To do so, we'll use Pandas's `melt` function. It takes 3 arguments:

* First, the DataFrame to melt.
* Second, ID variables to keep (Pandas will melt all of the other ones).
* Finally, a name for the new, melted variable.

Here's the output:

In [None]:
# Melt DataFrame
melted_df = pd.melt(stats_df, 
                    id_vars=["Name", "Type 1", "Type 2"], # Variables to keep
                    var_name="Stat") # Name of melted variable
melted_df.head()

All 6 of the stat columns have been "melted" into one, and the new Stat column indicates the original stat (HP, Attack, Defense, Sp. Attack, Sp. Defense, or Speed). For example, it's hard to see here, but Bulbasaur now has 6 rows of data.

In fact, if you print the shape of these two DataFrames...

In [None]:
print( stats_df.shape )
print( melted_df.shape )

...you'll find that `melted_df` has 6 times the number of rows as `stats_df`.

Now we can make a swarm plot with `melted_df`.

But this time, we're going to set `x='Stat'` and `y='value'` so our swarms are separated by stat.
Then, we'll set `hue='Type 1'` to color our points by the Pokémon type.

In [None]:
# Swarmplot with melted_df
sns.swarmplot(x='Stat', y='value', data=melted_df, 
              hue='Type 1');

Finally, let's make a few final tweaks for a more readable chart:

* Enlarge the plot.
* Separate points by hue using the argument split=True .
* Use our custom Pokemon color palette.
* Adjust the y-axis limits to end at 0.
* Place the legend to the right.

In [None]:
# 1. Enlarge the plot
plt.figure(figsize=(14,10))
 
sns.swarmplot(x='Stat', 
              y='value', 
              data=melted_df, 
              hue='Type 1', 
              dodge=True, # 2. Separate points by hue
              palette=pkmn_type_colors) # 3. Use Pokemon palette
 
# 4. Adjust the y-axis
plt.ylim(0, 260)
 
# 5. Place legend to the right
plt.legend(bbox_to_anchor=(1, 1), loc=2);

## Chart gallery

We're going to conclude this tutorial with a few quick-fire data visualizations, just to give you a sense of what's possible with Seaborn.

### Heatmap

Heatmaps help you visualize matrix-like data.

In [None]:
# Calculate correlations
corr = stats_df.corr()
 
# Heatmap
sns.heatmap(corr, annot=True);

### Histogram

Histograms allow you to plot the distributions of numeric variables.

In [None]:
# Distribution Plot (a.k.a. Histogram)
sns.distplot(df.Attack);

### Bar Plot

Bar plots help you visualize the distributions of categorical variables.

In [None]:
# Count Plot (a.k.a. Bar Plot)
sns.countplot(x='Type 1', data=df, palette=pkmn_type_colors)
 
# Rotate x-labels
plt.xticks(rotation=-45);

### Category Plot

Category plots make it easy to separate plots by categorical classes.

In [None]:
# Factor Plot
g = sns.catplot(x='Type 1', 
                   y='Attack', 
                   data=df, 
                   hue='Stage',  # Color by stage
                   col='Stage',  # Separate by stage
                   kind='box') # Swarmplot
 
# Rotate x-axis labels
g.set_xticklabels(rotation=-45);

### Density Plot

Density plots display the distribution between two variables.

>Tip: Consider overlaying this with a scatter plot.

In [None]:
# Density Plot
sns.kdeplot(df.Attack, df.Defense);

### Joint Distribution Plot

Joint distribution plots combine information from scatter plots and histograms to give you detailed information for bi-variate distributions.

In [None]:
# Joint Distribution Plot
plt.figure(figsize=(16,10), dpi= 80)
sns.jointplot(x='Attack', y='Defense', data=df);

### Pair Plot

In [None]:
#only supports numerical variables
only_nums = df.select_dtypes(exclude=['object', 'bool'])

sns.pairplot(only_nums, hue="Stage");