# Seaborn
   - Content taken from *Jake VanderPlas. Python data science handbook: Essential tools for working with data. O'Reilly Media, 2016.*   
   
Matplotlib has proven to be an incredibly useful and popular visualization tool, but even avid users will admit it often leaves much to be desired. There are several valid complaints about Matplotlib that often come up:
- Prior to version 2.0, Matplotlib’s defaults are not exactly the best choices. It was based off of MATLAB circa 1999, and this often shows.
- Matplotlib’s API is relatively low level. Doing sophisticated statistical visualization is possible, but often requires a lot of boilerplate code. 
- Matplotlib predated Pandas by more than a decade, and thus is not designed for use with Pandas DataFrames. In order to visualize data from a Pandas DataFrame, you must extract each Series and often concatenate them together into the right format. It would be nicer to have a plotting library that can intelligently use the DataFrame labels in a plot.

An answer to these problems is Seaborn. Seaborn provides an API on top of Matplotlib that offers sane choices for plot style and color defaults, defines simple high-level functions for common statistical plot types, and integrates with the functionality provided by Pandas DataFrames.

To be fair, the Matplotlib team is addressing this: it has recently added the `plt.style` tools and is starting to handle Pandas data more seamlessly. The 2.0 release of the library will include a new default stylesheet that will improve on the current status quo. But for all the reasons just discussed, Seaborn remains an extremely useful add-on.

## Seaborn Versus Matplotlib
Here is an example of a simple random-walk plot in Matplotlib, using its classic plot formatting and colors. Although the result contains all the information we’d like it to convey, it does so in a way that is not all that aesthetically pleasing, and even looks a bit old-fashioned in the context of 21st-century data visualization.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib inline
 
# Create some data
rng = np.random.RandomState(0)
x = np.linspace(0, 10, 500)
y = np.cumsum(rng.randn(500, 6), 0)
    
# Plot the data with Matplotlib defaults
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');

Now let’s take a look at how it works with Seaborn. As we will see, Seaborn has many of its own high-level plotting routines, but it can also overwrite Matplotlib’s default parameters and in turn get even simple Matplotlib scripts to produce vastly superior output. We can set the style by calling Seaborn’s `set()` method. By convention, Seaborn is imported as `sns`. Ah, much better!

In [None]:
import seaborn as sns
sns.set()

# same plotting code as above!
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');

## Exploring Seaborn Plots
The main idea of Seaborn is that it provides high-level commands to create a variety of plot types useful for statistical data exploration, and even some statistical model fitting.   

Let’s take a look at a few of the datasets and plot types available in Seaborn. Note that all of the following could be done using raw Matplotlib commands (this is, in fact, what Seaborn does under the hood), but the Seaborn API is much more convenient.

### Histograms, KDE, and Densities
Often in statistical data visualization, all you want is to plot histograms and joint distributions of variables. We have seen that this is relatively straightforward in Matplotlib.

In [None]:
data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]], size=2000)
data = pd.DataFrame(data, columns=['x', 'y'])

for col in 'xy':
    plt.hist(data[col], alpha=0.5)

This can be done with Seaborn using `histplot`. In addition to a histogram, we can get a smooth estimate of the distribution using a kernel density estimation using `kde` option.

In [None]:
sns.histplot(data, kde=True);

If we pass the full two-dimensional dataset to `kdeplot`, we will get a two-dimensional visualization of the data.

In [None]:
sns.kdeplot(data=data, x='x', y='y');

### Pair Plots
When you generalize joint plots to datasets of larger dimensions, you end up with pair plots. This is very useful for exploring correlations between multidimensional data, when you’d like to plot all pairs of values against each other.   

We’ll demo this with the well-known Iris dataset, which lists measurements of petals and sepals of three iris species. Visualizing the multidimensional relationships among the samples is as easy as calling `sns.pairplot`.

In [None]:
iris = sns.load_dataset("iris")
iris.head()

In [None]:
sns.pairplot(iris, hue='species', height=2.5);

### Faceted Histograms
Sometimes the best way to view data is via histograms of subsets. Seaborn’s `FacetGrid` makes this extremely simple. We’ll take a look at some data that shows the amount that restaurant staff receive in tips based on various indicator data.

In [None]:
tips = sns.load_dataset('tips')
tips.head()

In [None]:
tips['tip_pct'] = 100 * tips['tip'] / tips['total_bill']
grid = sns.FacetGrid(tips, row="sex", col="time", margin_titles=True)
grid.map(plt.hist, "tip_pct", bins=np.linspace(0, 40, 15));

### Cat Plots
Cat plots can be useful for this kind of visualization as well. This allows you to view the distribution of a parameter within bins defined by any other parameter.

In [None]:
g = sns.catplot(x="day", y="total_bill", hue="sex", data=tips, kind="box")
g.set_axis_labels("Day", "Total Bill");

### Joint Distributions
Similar to the pair plot we saw earlier, we can use `sns.jointplot` to show the joint distribution between different datasets, along with the associated marginal distributions.

In [None]:
sns.jointplot(x="total_bill", y="tip", data=tips, kind='hex');

The joint plot can even do some automatic kernel density estimation and regression.

In [None]:
sns.jointplot(x="total_bill", y="tip", data=tips, kind='reg');

### Bar Plots
Time series can be plotted with `sns.catplot`. Let us use the Planets data.

In [None]:
planets = sns.load_dataset('planets')
planets.head()

In [None]:
g = sns.catplot(x="year", data=planets, aspect=2, kind="count", color='steelblue')
g.set_xticklabels(step=5);

### Exercise
We want to learn more by looking at the method of discovery of each of these planets. Draw a bar plot to visualize multiple bars per years 2008 to 2015. Each bar should show the number of planets discovered with a certain method in that particular year. Decorate the plot properly. 