# Seaborn

https://seaborn.pydata.org/

### Installation

conda install seaborn
pip install seaborn

In [None]:
%pip install seaborn

### Import Seaborn library

In [None]:
import seaborn as sns
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv(r"C:\Users\Purushotham\Desktop\deloitte\visualization\datasets\tips.csv")
df.head()

### Categorical Plots

### Bar Plots

In [None]:
sns.barplot(x=df['gender'],y=df['total_bill'],data=df)

You can change the estimator object to your own function, that converts a vector to a scalar:

In [None]:
sns.barplot(x=df['gender'],y=df['total_bill'],data=df, estimator=np.std)

### Count Plot

This is essentially the same as barplot except the estimator is explicitly counting the number of occurrences. Which is why we only pass the x value:

In [None]:
sns.countplot(x=df['gender'],data=df)

### Box Plot

boxplots and violinplots are used to shown the distribution of categorical data. A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.

In [None]:
sns.boxplot(x=df['gender'],y=df['total_bill'],data=df)

In [None]:
sns.boxplot(x=df['day'],y=df['total_bill'],data=df)

In [None]:
sns.boxplot(data=df,palette='rainbow',orient='h')

In [None]:
sns.boxplot(x=df['day'], y="total_bill", hue="gender",data=df, palette="coolwarm")

In [None]:
df.head()

### Violin Plot

In [None]:
sns.violinplot(x="day", y="total_bill", data=df,hue='gender',palette='Set1')

In [None]:
sns.violinplot(x="day", y="total_bill", data=df,hue='gender',split=True,palette='Set1')

### Strip Plot

The stripplot will draw a scatterplot where one variable is categorical. A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution.

In [None]:
sns.stripplot(x="day", y="total_bill", data=df)

In [None]:
sns.stripplot(x="day", y="total_bill", data=df,jitter=True,hue='gender',palette='Set1') #try split=True

### Swarm Plot

The swarmplot is similar to stripplot(), but the points are adjusted (only along the categorical axis) so that they don’t overlap. This gives a better representation of the distribution of values, although it does not scale as well to large numbers of observations (both in terms of the ability to show all the points and in terms of the computation needed to arrange them).

In [None]:
sns.swarmplot(x="day", y="total_bill", data=df)

In [None]:
sns.swarmplot(x="day", y="total_bill",hue='gender',data=df, palette="Set1", split=True)

### Combining Plots

In [None]:
sns.violinplot(x="tip", y="day", data=df,palette='rainbow')


In [None]:
sns.swarmplot(x="tip", y="day", data=df,color='black',size=3)

In [None]:
sns.violinplot(x="tip", y="day", data=df,palette='rainbow')
sns.swarmplot(x="tip", y="day", data=df,color='black',size=3)

### Distribution Plots

### distplot

The distplot shows the distribution of a univariate set of observations.

In [None]:
sns.distplot(df['total_bill'])

In [None]:
sns.distplot(df['total_bill'], kde=False)

### jointplot

jointplot() allows you to basically match up two distplots for bivariate data. With your choice of what kind parameter to compare with:

“scatter”
“reg”
“resid”
“kde”
“hex”

In [None]:
sns.jointplot(x='total_bill',y='tip',data=df,kind='scatter')

### pairplot

pairplot will plot pairwise relationships across an entire dataframe (for the numerical columns) and supports a color hue argument (for categorical columns).

In [None]:
sns.pairplot(df)

In [None]:
sns.pairplot(df,hue='gender',palette='coolwarm')

### rugplot

rugplots are actually a very simple concept, they just draw a dash mark for every point on a univariate distribution. They are the building block of a KDE plot:

In [None]:
sns.rugplot(df['total_bill'])

### kdeplot

kdeplots are Kernel Density Estimation plots. These KDE plots replace every single observation with a Gaussian (Normal) distribution centered around that value. For example:

https://en.wikipedia.org/wiki/Kernel_density_estimation#Practical_estimation_of_the_bandwidth

In [None]:
sns.kdeplot(df['total_bill'])

In [None]:
sns.kdeplot(df['total_bill'])
sns.rugplot(df['total_bill'])

In [None]:
# Don't worry about understanding this code!
# It's just for the diagram below
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

#Create dataset
dataset = np.random.randn(25)

# Create another rugplot
sns.rugplot(dataset);

# Set up the x-axis for the plot
x_min = dataset.min() - 2
x_max = dataset.max() + 2

# 100 equally spaced points from x_min to x_max
x_axis = np.linspace(x_min,x_max,100)

# Set up the bandwidth, for info on this:
url = 'http://en.wikipedia.org/wiki/Kernel_density_estimation#Practical_estimation_of_the_bandwidth'

bandwidth = ((4*dataset.std()**5)/(3*len(dataset)))**.2


# Create an empty kernel list
kernel_list = []

# Plot each basis function
for data_point in dataset:
    
    # Create a kernel for each point and append to list
    kernel = stats.norm(data_point,bandwidth).pdf(x_axis)
    kernel_list.append(kernel)
    
    #Scale for plotting
    kernel = kernel / kernel.max()
    kernel = kernel * .4
    plt.plot(x_axis,kernel,color = 'grey',alpha=0.5)

plt.ylim(0,1)

### Grid Plot

In [None]:
df = pd.read_csv(r"C:\Users\Purushotham\Desktop\deloitte\visualization\datasets\iris.csv")
df.head() 

In [None]:
# Just the Grid
sns.PairGrid(df)

In [None]:
# Then you map to the grid
g = sns.PairGrid(df)
g.map(plt.scatter)

In [None]:
# Map to upper,lower, and diagonal
g = sns.PairGrid(df)
g.map_diag(plt.hist)
g.map_upper(plt.scatter)
g.map_lower(sns.kdeplot)

### pairplot

pairplot is a simpler version of PairGrid (you'll use quite often)

In [None]:
df.head()


In [None]:
#sns.pairplot(df)
sns.pairplot(df,hue='Species',palette='rainbow')

In [None]:
df = pd.read_csv(r"C:\Users\Purushotham\Desktop\deloitte\visualization\datasets\tips.csv")
df.head()

### FacetGrid

FacetGrid is the general way to create grids of plots based off of a feature:

In [None]:
g = sns.FacetGrid(df, col="time",  row="smoker")
g = g.map(plt.hist, "total_bill")

### JointGrid

JointGrid is the general version for jointplot() type grids, for a quick example:

In [None]:
g = sns.JointGrid(x="total_bill", y="tip", data=df)
g = g.plot(sns.regplot, sns.distplot)

### Matrix Plots

Matrix plots allow you to plot data as color-encoded matrices and can also be used to indicate clusters within the data (later in the machine learning section we will learn how to formally cluster data).

### Heat Maps

In [None]:
df.head()

In order for a heatmap to work properly, your data should already be in a matrix form, the sns.heatmap function basically just colors it in for you. For example:

In [None]:
df.corr()

In [None]:
sns.heatmap(df.corr())

In [None]:
sns.heatmap(df.corr(),cmap='coolwarm',annot=True)

In [None]:
df = pd.read_csv(r"C:\Users\Purushotham\Desktop\deloitte\visualization\datasets\flights.csv")
df.head()

In [None]:
df.shape

In [None]:
# Matrix form for flights data
df.pivot_table(values='passengers',index='month',columns='year') 

In [None]:
sns.heatmap(df.pivot_table(values='passengers',index='month',columns='year') )

### Regression Plot

Seaborn has many built-in capabilities for regression plots, however we won't really discuss regression until the machine learning section of the course, so we will only cover the lmplot() function for now.

lmplot allows you to display linear models, but it also conveniently allows you to split up those plots based off of features, as well as coloring the hue based off of features.

In [None]:
df = pd.read_csv(r"C:\Users\Purushotham\Desktop\deloitte\visualization\datasets\tips.csv")
df.head()

In [None]:
sns.lmplot(x='total_bill',y='tip',data=df)

In [None]:
sns.lmplot(x='total_bill',y='tip',data=df,hue='gender')

In [None]:
# Using Markers
# http://matplotlib.org/api/markers_api.html
sns.lmplot(x='total_bill',y='tip',data=df,hue='gender',palette='coolwarm',
           markers=['o','v'],scatter_kws={'s':100})

### Using grids for implot

We can add more variable separation through columns and rows with the use of a grid. Just indicate this with the col or row arguments:

In [None]:
sns.lmplot(x='total_bill',y='tip',data=df,col='gender')

In [None]:
sns.lmplot(x="total_bill", y="tip", row="gender", col="time",data=df)

In [None]:
sns.lmplot(x='total_bill',y='tip',data=df,col='day',hue='gender',palette='coolwarm')

In [None]:
#aspect and size
sns.lmplot(x='total_bill',y='tip',data=df,col='day',hue='gender',palette='coolwarm',
          aspect=0.6,height=8)