# EDA and Intro to Seaborn
If you want to type along with me, use [this notebook](https://humboldt.cloudbank.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fbethanyj0%2Fdata271_sp24&branch=main&urlpath=tree%2Fdata271_sp24%2Fdemos%2Fdata271_demo20_live.ipynb) instead. 
If you don't want to type and want to follow along just by executing the cells, stay in this notebook. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings 
warnings.filterwarnings('ignore') 

## Seaborn basics

In [None]:
# Whenever we want to use seaborn for visualization
import seaborn as sns
sns.set_style("darkgrid")

In [None]:
iris = sns.load_dataset('iris')
iris

In [None]:
# To make a scatter plot
sns.scatterplot(data = iris, x = 'petal_length', y = 'petal_width'); 

In [None]:
# Customizing your scatter plot: change the color of all the points to another color
sns.scatterplot(data = iris, x = 'petal_length', y = 'petal_width', color = 'r'); 

In [None]:
# Customizing your scatter plot: change the color of points to map to another variable (like an aesthetic map)
sns.scatterplot(data = iris, x = 'petal_length', y = 'petal_width', hue = 'species'); 

In [None]:
# Customizing your scatter plot: change the shape of points to map to another variable (like an aesthetic map)
sns.scatterplot(data = iris, x = 'petal_length', y = 'petal_width', style = 'species'); 

In [None]:
# adjusting axis scales
fig = sns.scatterplot(data = iris, x = 'petal_length', y = 'petal_width', hue = 'species')
fig.set_xlim(0,8);

In [None]:
# adjusting color scales
sns.scatterplot(data = iris, x = 'petal_length', y = 'petal_width', hue = 'species', palette = 'colorblind');

In [None]:
# Facetting
# Create a facet grid
fig = sns.FacetGrid(iris, col="species", hue = 'species')
# map your scatter plots to the grid
fig.map(sns.scatterplot, "petal_length", "petal_width");

In [None]:
# line plots (NOTE: THIS IS NOT A CASE WHERE YOU SHOULD USE A LINE PLOT)
sns.lineplot(data = iris, x = 'petal_length', y = 'petal_width', hue = 'species'); 

In [None]:
# barplots to look at averages of a numeric variable vs a categorical variable
sns.barplot(data = iris, x = 'species', y='petal_length', errorbar = None); 

In [None]:
# barplots to just look at counts
sns.countplot(data = iris, x = 'species'); 

In [None]:
# Histograms
sns.histplot(data = iris, x = 'petal_length');

## Visualizations for EDA

In [None]:
# Pairplots show the relationship between numeric variables
sns.pairplot(data = iris);

In [None]:
# Can map other variables to aesthetic properties
sns.pairplot(data = iris, hue='species');

In [None]:
# boxplots can be used to look at he distribution of data in different categories
sns.boxplot(data = iris);

In [None]:
# boxplots can also be used with a categorical x and a numerical y
sns.boxplot(data = iris, x = 'species',y='sepal_width');

In [None]:
# heatmaps can be used to study correlations
corrmat = iris.corr() # make correlation matrix
sns.heatmap(data = corrmat, annot = True);

## Practice: Exploratory Data Analysis on Covid Data

In [None]:
# import covid data
covid = pd.read_csv('https://raw.githubusercontent.com/PacktPublishing/Python-Data-Cleaning-Cookbook/master/Chapter05/data/covidtotals.csv')
covid.head()

In [None]:
# begin exploring the dataset
covid_desc = covid.describe()
covid_desc

In [None]:
covid.info()

In [None]:
# Check for na values
covid.isna()

In [None]:
# Summarize number of na's by column
nas = covid.isna().sum()
nas

In [None]:
# drop columns with na values
covid = covid.dropna(axis=1)
covid

In [None]:
# check if there are any duplicate rows
covid.duplicated().sum()

In [None]:
# begin visualizing for insights
sns.pairplot(data = covid);

In [None]:
# boxplots to look at distributions
plt.figure(figsize=(12,4))
sns.boxplot(x = covid.region,y=covid.total_deaths_pm)
plt.xticks(rotation=45)
plt.show()

In [None]:
# heatmaps to look at correlations
corrmatc = covid.corr() # make correlation matrix
sns.heatmap(data = corrmatc, annot = True);

## Activity

1. What did the inital visualizations above reveal that was surprising or interesting? Create additional Seaborn visualizations to explore this more. 