# Advanced Plotting in Seaborn and how to use it in EDA
If you want to type along with me, use [this notebook](https://humboldt.cloudbank.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fbethanyj0%2Fdata271_sp24&branch=main&urlpath=tree%2Fdata271_sp24%2Fdemos%2Fdata271_demo23_live.ipynb) instead. 
If you don't want to type and want to follow along just by executing the cells, stay in this notebook. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
iris = sns.load_dataset('iris')

### Distributions of numerical data

In [None]:
# Historgrams for distributions
sns.histplot(data = iris, x = 'petal_width')
plt.show()

In [None]:
# Histograms take hue argument
sns.histplot(data = iris, x = 'petal_width', hue = 'species')
plt.show()

In [None]:
# Kernel Density Estimate (KDE) plot for smooth distributions
sns...(data = iris, x = 'petal_width')
plt.show()

In [None]:
# Shade your KDE plot
sns.kdeplot(data = iris, x = 'petal_width',...)
plt.show()

In [None]:
# More options with colors
sns.kdeplot(data = iris, x = 'petal_width', fill = True, hue = 'species')
plt.show()

In [None]:
# Can plot binned histograms and smooth distributions together
sns.histplot(data = iris, x = 'petal_width',..., hue = 'species')
plt.show()

In [None]:
# Easily transition between different types of distributions using displot
sns...(data = iris, x = 'petal_width', hue = 'species',...)
plt.show()

In [None]:
# distplot with KDE
sns.displot(data = iris, x = 'petal_width', hue = 'species', kind=..., fill=True)
plt.show()

In [None]:
# distplot makes facetting easier!! Use argument col to map a variable to different columns in a subplot
sns.displot(data = iris, x = 'petal_width', hue = 'species', kind='kde', fill=True, ...)
plt.show()

### Numerical data

In [None]:
# plot numerical data with regression plots
sns...(data = iris, x = 'petal_length',y='petal_width')
plt.show()

In [None]:
# plot data associations AND distributions in same plot
sns...(data = iris, x = 'petal_length',y='petal_width')
plt.show()

In [None]:
# joinplot has options such as hex (useful when there is a lot of overlap in original scatter)
sns.jointplot(data = iris, x = 'petal_length',y='petal_width', ...)
plt.show()

### Categorical data

In [None]:
# swarm plot
plt.figure(figsize=(5,5))
sns...
plt.show()

In [None]:
# swarm plot (adjust point size)
plt.figure(figsize=(5,5))
sns.swarmplot(data = iris, x = 'species', y = 'petal_width',...)
plt.show()

In [None]:
# Violin plot
plt.figure(figsize=(5,5))
sns...(data = iris, x = 'species', y = 'petal_width')
plt.show()

In [None]:
# Point plot (best when there is some order to the categories)
plt.figure(figsize=(5,5))
sns...(data = iris, x = 'species', y = 'petal_width')
plt.show()

In [None]:
# Categorical plots (catplot) allow you to seemlessly transition between different types of categorical plots
sns...(data = iris, x = 'species', y = None, ...)
plt.show()

In [None]:
# another version of catplot
sns.catplot(data = iris, x = 'species',y = 'petal_width',kind = ...)
plt.show()

In [None]:
# Another version of catplot
sns.catplot(data = iris, x = 'species',y = 'petal_width',kind = ...)
plt.show()

In [None]:
# Cat plot takes the same arguments as the original type of plot
sns.catplot(data = iris, x = 'species',y = 'petal_width',...)
plt.show()

In [None]:
# Violin with catplot
sns.catplot(data = iris, x = 'species',y = 'petal_width',...)
plt.show()

In [None]:
# Point plot with catplot
sns.catplot(data = iris, x = 'species',y = 'petal_width',...)
plt.show()

### Subplots with Seaborn

In [None]:
# sometimes we want to visualize the relationship between all variables and 1 specific variable


## Group Activity: Using plotting methods in an EDA: Titanic Data

In [None]:
# set defined plot sizes and styles
plt.rcParams['figure.figsize'] = [6,3] # figures will be 6 units in length 3 units in height
plt.rcParams['figure.dpi'] = 80 # default is 72 in webpages, we wish to see in higher resolution

In [None]:
# titanic data set is part of Seaborn
titanic = sns.load_dataset('titanic')

In [None]:
# look at the first few lines
titanic.head()

In [None]:
# describe will only represent the numerical columns-- for example, sex, class, embark town, etc. are not included
titanic.describe()

In [None]:
titanic.info()

### Task: 
Use a heatmap to visualize the null values in the dataset. 

In [None]:
plt.style.use('ggplot')

### Discussion Question: 
What do you notice about the heatmap? What information does it show. What information does it not show?

### Task
Use a heatmap to visualize the correlation in the data. 

### Discussion question
Looking at the `survived` column, what do you notice. Anything else you notice about the heatmap overall?

## Task 
Create subplots to show the count of each category for the variables `survived`, `pclass`, `sex`, `sibsp`, `parch`, `embark_town`, and `alone`. 

### Discussion Question
What observations can you make from the plots you created?