# Seaborn

Seaborn is a nice library for make easy statistical plots  
Seaborn works great with Pandas Dataframes

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

#### Seaborn Data

Seaborn can easily load some classic/toy datasets to illustrate plotting features.  An internet connection is required to load the data.  

In [None]:
iris = sns.load_dataset('iris')
tips = sns.load_dataset('tips')
flights = sns.load_dataset('flights') # not the same dataset we used in HW

In [None]:
iris.head()

In [None]:
tips.head()

In [None]:
flights.head()

# Distribution Plots

Let's discuss some plots that allow us to visualize the distribution of a data set. These plots are:

* histplot
* displot
* jointplot
* pairplot
* rugplot

In [None]:
sns.displot(data = iris, x = 'petal_length', kind = 'kde', rug = True)


In [None]:
sns.displot(data = iris, x = 'petal_length', kind = 'hist', rug = True, kde = True)

In [None]:
sns.histplot(data = iris, x = 'petal_length')

## jointplot

jointplot() allows you to basically match up two distplots for bivariate data. With your choice of what **kind** parameter to compare with: 
* “scatter” 
* “reg” 
* “resid” 
* “kde” 
* “hex”

`sns.jointplot(x='',y='',data=,kind='')`

In [None]:
sns.jointplot(data = iris, x = 'petal_length', y = 'petal_width', hue = 'species')

In [None]:
sns.jointplot(data = iris, x = 'petal_length', y = 'petal_width', kind = 'kde')

In [None]:
sns.jointplot(data = tips, x = 'total_bill', y = 'tip', kind = 'reg')

## pairplot

pairplot will plot pairwise relationships across an entire dataframe (for the numerical columns) and supports a color hue argument (for categorical columns). 

``sns.pairplot(data)``  
``sns.pairplot(data, hue= , palette="")``

In [None]:
sns.pairplot(data = iris, hue = 'species')

In [None]:
g = sns.PairGrid(data = iris, hue = 'species', vars = ['petal_length', 'sepal_length', 'petal_width'])
g.map_diag(sns.histplot)
g.map_upper(sns.scatterplot)
g.map_lower(sns.regplot)

## boxplot and violinplot

#### boxplot
A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.

``sns.boxplot(x="", y="", data= ,palette="")`` 

``sns.boxplot(data= ,palette=' ',orient=' ')``

``sns.boxplot(x=" ", y=" ", hue=" ", data= , palette=" ")``

#### violinplot
A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual datapoints, the violin plot features a kernel density estimation of the underlying distribution.

`sns.violinplot(x="<cat>", y="<num>", data= , palette=' ')`

`sns.violinplot(x="<cat>", y="<num>", data= , hue=' ', palette=' ')`

`sns.violinplot(x="<cat>", y="<num>", data= , hue=' ', split=(boolean), palette=' ')`

In [None]:
sns.boxplot(data = tips, y = 'sex', x = 'tip', orient = 'horizontal')

In [None]:
tips['tip_pct'] = tips['tip'] / tips['total_bill']

In [None]:
sns.boxplot(data = tips, x = 'sex', y = 'tip_pct')

In [None]:
sns.violinplot(data = tips, x = 'sex', y = 'tip_pct',  hue = 'sex', palette = 'Dark2')

## stripplot 
The stripplot will draw a scatterplot where one variable is categorical. A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution.


``sns.stripplot(x="<cat>", y="<num>", data= , jitter=<bool>)``

``sns.stripplot(x="<cat>", y="<num>", data= ,jitter=<bool>,hue='<cat>',palette=' ',split=<bool>)``

In [None]:
sns.stripplot(data = tips, x = 'day', y = 'total_bill', hue = 'tip_pct')

In [None]:
sns.stripplot(data = tips, x = 'day', y = 'total_bill', jitter = False)

In [None]:
sns.stripplot(data = flights, x = 'month', y = 'passengers')

# Categorical Data Plots

Now let's discuss using seaborn to plot categorical data! 

* barplot
* countplot

``sns.barplot(x='<cat>',y='<num>',data= , estimator= <default is mean>)``

``sns.countplot(x='<cat>',data=tips)``

In [None]:
sns.barplot(data = tips, x = 'sex', y = 'tip')

In [None]:
sns.countplot(data = tips, x = 'sex', hue = 'sex')

# Plots for "Matrix" Data

Matrix plots allow you to plot data as color-encoded matrices and can also be used to indicate clusters within the data.

`sns.heatmap(matrix)`

`sns.heatmap(matrix, cmap=' ', annot=<bool>)`

`sns.clustermap(matrix)`

In [None]:
from scipy.spatial.distance import pdist, squareform

In [None]:
iris_dists = squareform(pdist(iris.loc[:, iris.columns != 'species']))

In [None]:
sns.heatmap(iris_dists, cmap = 'cubehelix')

Seaborn colormaps: [colormaps](https://seaborn.pydata.org/tutorial/color_palettes.html)

# Grids

Grids are general types of plots that allow you to map plot types to rows and columns of a grid, this helps you create similar plots separated by features.

In [None]:
# Map to upper,lower, and diagonal
g = sns.PairGrid(iris, hue = 'species')
g.map_diag(plt.hist)
g.map_upper(plt.scatter)
g.map_lower(sns.kdeplot)

## pairplot

pairplot is a simpler version of PairGrid 

In [None]:
sns.pairplot(iris)

## Facet Grid

FacetGrid is the general way to create grids of plots based off of a feature

In [None]:
g = sns.FacetGrid(tips, col = "time",  row = "smoker")
g = g.map(plt.hist, "total_bill")

In [None]:
g = sns.FacetGrid(tips, col = "time",  row = "smoker", hue = 'sex')
# Notice hwo the arguments come after plt.scatter call
g = g.map(plt.scatter, "total_bill", "tip").add_legend()

# Regression Plots

Seaborn has many built-in capabilities for regression plots.

**lmplot** allows you to display linear models, but it also conveniently allows you to split up those plots based off of features, as well as coloring the hue based off of features.

Let's explore how this works:

`sns.lmplot(x='<num>',y='<num>', data= )`

`sns.lmplot(x='<num>',y='<nujm>',data= , hue='<cat>', palette=" ")`



In [None]:
sns.lmplot(data = tips, x = 'total_bill', y = 'tip')

## Using a Grid

`lmplot` can easily create facets. Just indicate this with the col or row arguments:

In [None]:
sns.lmplot(x = "total_bill", y = "tip", row = "sex", col = "time", data = tips)

# Aspect and Size

Seaborn figures can have their size and aspect ratio adjusted with the **size** (inches) and **aspect** (ratio) parameters.  

sns.lmplot(x = ' ', y = ' ', data = , col = ' ', hue = ' ', palette = ' ', aspect = ,size = )

There is also a **height** parameter that specifies the height of individual facets.

In [None]:
sns.lmplot(data = iris, x = 'petal_length', y = 'sepal_length', col = 'species', 
           hue = 'species', palette = 'Dark2')

# Style and Context

`sns.set_style('whitegride')` can take the following styles:  darkgrid, whitegrid, dark, white, or ticks.  Advanced users can customize further.

`sns.set_context('notebook', font_scale = 1)` can take take the following contexts: notebook (default), paper, talk, poster.  The font can also be adjusted.  Advanced users can customize further.

In [None]:
fig, axes = plt.subplots(1, 2, figsize = (12, 6))

sns.scatterplot(iris, x = 'petal_length', y = 'petal_width', hue = 'species', ax = axes[0])
sns.scatterplot(iris, x = 'petal_length', y = 'petal_width', hue = 'sepal_length', ax = axes[1])

In [None]:
sns.histplot(iris, x = 'petal_length', hue = 'species')