# 🌊 Seaborn Mastery: Statistical Data Visualization

<img src='https://seaborn.pydata.org/_static/logo-wide-lightbg.svg' width='400' alt='Seaborn Logo'>

## 🎨 Beautiful Statistical Graphics Made Easy

**Seaborn** is built on Matplotlib but makes creating beautiful statistical visualizations incredibly simple. Think of it as Matplotlib with superpowers!

### 🎯 Why Seaborn is Amazing:
- **Statistical Focus** - Built-in statistical computations
- **Beautiful by Default** - Gorgeous plots with minimal code
- **Dataset-Oriented** - Works seamlessly with DataFrames
- **Smart Defaults** - Automatic color palettes and styles
- **Complex Plots Made Simple** - Pair plots, heatmaps, distributions

### 📊 What We'll Master Today:
1. **Distribution Plots** - Histograms, KDE, rug plots
2. **Categorical Plots** - Box, violin, swarm, strip plots
3. **Relationship Plots** - Scatter, line, regression plots
4. **Matrix Plots** - Heatmaps, cluster maps
5. **Multi-plot Grids** - FacetGrid, PairGrid
6. **Statistical Models** - Regression, distributions
7. **Styling & Themes** - Color palettes, contexts
8. **Real-World Projects** - Complete analyses

---

## 🚀 Let's Create Statistical Art!

In [None]:
# Import essential libraries
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

# Set Seaborn style
sns.set_theme(style="whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

# Check versions
print(f"🌊 Seaborn Version: {sns.__version__}")
print(f"📊 Matplotlib Version: {plt.matplotlib.__version__}")

# Load built-in datasets
print("\n📦 Loading sample datasets...")
tips = sns.load_dataset('tips')
iris = sns.load_dataset('iris')
titanic = sns.load_dataset('titanic')
flights = sns.load_dataset('flights')

print("✅ Ready to create beautiful statistical visualizations!")

---

## 📌 Section 1: Understanding Seaborn

### 🔍 First Look at Seaborn

In [None]:
# 1.1 Exploring the Datasets
print("📊 Sample Datasets Overview\n" + "="*40)

print("Tips Dataset (Restaurant tips):")
print(tips.head())
print(f"\nShape: {tips.shape}")
print(f"Columns: {tips.columns.tolist()}")

print("\n" + "="*40)
print("\nIris Dataset (Flower measurements):")
print(iris.head())
print(f"\nSpecies: {iris['species'].unique()}")

print("\n" + "="*40)
print("\nTitanic Dataset (Passenger survival):")
print(titanic.head())
print(f"\nSurvival rate: {titanic['survived'].mean():.1%}")

In [None]:
# 1.2 Seaborn vs Matplotlib Comparison
print("🎨 Seaborn vs Matplotlib\n" + "="*40)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Matplotlib version
ax1.scatter(tips['total_bill'], tips['tip'])
ax1.set_xlabel('Total Bill')
ax1.set_ylabel('Tip')
ax1.set_title('Matplotlib: Basic Scatter')

# Seaborn version
sns.scatterplot(data=tips, x='total_bill', y='tip', 
                hue='time', size='size', style='sex', ax=ax2)
ax2.set_title('Seaborn: Enhanced Scatter')

plt.tight_layout()
plt.show()

print("Notice how Seaborn automatically:")
print("✅ Adds colors for categorical variables")
print("✅ Sizes points by numeric variables")
print("✅ Uses different markers for categories")
print("✅ Creates informative legends")

---

## 📌 Section 2: Distribution Plots

### 📊 Visualizing Data Distributions

In [None]:
# 2.1 Histograms and KDE
print("📊 Distribution Plots\n" + "="*40)

fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Basic histogram
sns.histplot(data=tips, x='total_bill', ax=axes[0, 0])
axes[0, 0].set_title('Histogram')

# KDE plot
sns.kdeplot(data=tips, x='total_bill', ax=axes[0, 1])
axes[0, 1].set_title('KDE (Kernel Density Estimate)')

# Combined histogram + KDE
sns.histplot(data=tips, x='total_bill', kde=True, ax=axes[0, 2])
axes[0, 2].set_title('Histogram + KDE')

# Multiple distributions
sns.histplot(data=tips, x='total_bill', hue='time', ax=axes[1, 0])
axes[1, 0].set_title('Multiple Distributions')

# Stacked distributions
sns.histplot(data=tips, x='total_bill', hue='day', multiple='stack', ax=axes[1, 1])
axes[1, 1].set_title('Stacked Distributions')

# 2D distribution
sns.histplot(data=tips, x='total_bill', y='tip', ax=axes[1, 2])
axes[1, 2].set_title('2D Distribution')

plt.tight_layout()
plt.show()

In [None]:
# 2.2 Advanced Distribution Plots
print("🎯 Advanced Distribution Analysis\n" + "="*40)

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# ECDF (Empirical Cumulative Distribution)
sns.ecdfplot(data=tips, x='total_bill', hue='time', ax=axes[0, 0])
axes[0, 0].set_title('ECDF Plot')

# Rug plot with KDE
sns.kdeplot(data=tips, x='total_bill', ax=axes[0, 1])
sns.rugplot(data=tips, x='total_bill', ax=axes[0, 1], color='red', alpha=0.5)
axes[0, 1].set_title('KDE with Rug Plot')

# Joint distribution
# Note: jointplot creates its own figure, so we'll describe it
axes[1, 0].text(0.5, 0.5, 'Joint Plot Example:\nsns.jointplot(x="total_bill", y="tip", data=tips)',
               ha='center', va='center', fontsize=12)
axes[1, 0].set_title('Joint Distribution')

# Q-Q plot (manual)
from scipy import stats
stats.probplot(tips['total_bill'], dist="norm", plot=axes[1, 1])
axes[1, 1].set_title('Q-Q Plot (Normal Distribution Check)')

plt.tight_layout()
plt.show()

In [None]:
# 2.3 Comparing Distributions
print("⚖️ Comparing Distributions\n" + "="*40)

# Create figure
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Overlapping KDEs
for species in iris['species'].unique():
    subset = iris[iris['species'] == species]
    sns.kdeplot(data=subset, x='sepal_length', ax=axes[0], label=species)
axes[0].set_title('Overlapping KDE Plots')
axes[0].legend()

# Ridge plot (Joy plot)
species_list = iris['species'].unique()
for i, species in enumerate(species_list):
    subset = iris[iris['species'] == species]
    sns.kdeplot(data=subset, x='sepal_length', ax=axes[1], 
                fill=True, alpha=0.5, label=species)
    # Offset each distribution
    axes[1].collections[i].set_linewidth(2)
axes[1].set_title('Ridge Plot Style')
axes[1].legend()

# Violin comparison
sns.violinplot(data=iris, x='species', y='sepal_length', ax=axes[2])
axes[2].set_title('Violin Plot Comparison')

plt.tight_layout()
plt.show()

### 🏋️ Exercise 1: Distribution Analysis

Create distribution plots for the Titanic dataset:
1. Compare age distributions between survived and not survived
2. Show fare distribution by passenger class
3. Create a 2D distribution of age vs fare

In [None]:
# Your solution here:

# Solution:
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Age by survival
sns.histplot(data=titanic, x='age', hue='survived', kde=True, ax=axes[0])
axes[0].set_title('Age Distribution by Survival')

# Fare by class
sns.violinplot(data=titanic, x='class', y='fare', ax=axes[1])
axes[1].set_title('Fare Distribution by Class')

# 2D distribution
sns.histplot(data=titanic, x='age', y='fare', bins=20, ax=axes[2])
axes[2].set_title('Age vs Fare Distribution')

plt.tight_layout()
plt.show()

---

## 📌 Section 3: Categorical Plots

### 📦 Visualizing Categories

In [None]:
# 3.1 Box and Violin Plots
print("📦 Categorical Distribution Plots\n" + "="*40)

fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Box plot
sns.boxplot(data=tips, x='day', y='total_bill', ax=axes[0, 0])
axes[0, 0].set_title('Box Plot')

# Box plot with hue
sns.boxplot(data=tips, x='day', y='total_bill', hue='sex', ax=axes[0, 1])
axes[0, 1].set_title('Box Plot with Hue')

# Violin plot
sns.violinplot(data=tips, x='day', y='total_bill', ax=axes[0, 2])
axes[0, 2].set_title('Violin Plot')

# Split violin
sns.violinplot(data=tips, x='day', y='total_bill', hue='sex', split=True, ax=axes[1, 0])
axes[1, 0].set_title('Split Violin Plot')

# Boxen plot (enhanced box plot)
sns.boxenplot(data=tips, x='day', y='total_bill', ax=axes[1, 1])
axes[1, 1].set_title('Boxen Plot (Letter-value)')

# Violin with inner points
sns.violinplot(data=tips, x='day', y='total_bill', inner='point', ax=axes[1, 2])
axes[1, 2].set_title('Violin with Points')

plt.tight_layout()
plt.show()

In [None]:
# 3.2 Strip and Swarm Plots
print("🔵 Point-based Categorical Plots\n" + "="*40)

fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Strip plot
sns.stripplot(data=tips, x='day', y='total_bill', ax=axes[0, 0])
axes[0, 0].set_title('Strip Plot')

# Strip plot with jitter
sns.stripplot(data=tips, x='day', y='total_bill', hue='sex', dodge=True, ax=axes[0, 1])
axes[0, 1].set_title('Strip Plot with Dodge')

# Swarm plot
sns.swarmplot(data=tips, x='day', y='total_bill', ax=axes[0, 2])
axes[0, 2].set_title('Swarm Plot')

# Combination: violin + swarm
sns.violinplot(data=tips, x='day', y='total_bill', ax=axes[1, 0], inner=None)
sns.swarmplot(data=tips, x='day', y='total_bill', color='white', ax=axes[1, 0], size=3)
axes[1, 0].set_title('Violin + Swarm')

# Combination: box + strip
sns.boxplot(data=tips, x='day', y='total_bill', ax=axes[1, 1])
sns.stripplot(data=tips, x='day', y='total_bill', color='red', ax=axes[1, 1], alpha=0.3)
axes[1, 1].set_title('Box + Strip')

# Point plot (shows mean and CI)
sns.pointplot(data=tips, x='day', y='total_bill', hue='sex', ax=axes[1, 2])
axes[1, 2].set_title('Point Plot (Mean with CI)')

plt.tight_layout()
plt.show()

In [None]:
# 3.3 Bar and Count Plots
print("📊 Bar and Count Plots\n" + "="*40)

fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Bar plot (shows mean)
sns.barplot(data=tips, x='day', y='total_bill', ax=axes[0, 0])
axes[0, 0].set_title('Bar Plot (Mean with CI)')

# Bar plot with hue
sns.barplot(data=tips, x='day', y='total_bill', hue='sex', ax=axes[0, 1])
axes[0, 1].set_title('Grouped Bar Plot')

# Count plot
sns.countplot(data=tips, x='day', ax=axes[0, 2])
axes[0, 2].set_title('Count Plot')

# Count plot with hue
sns.countplot(data=tips, x='day', hue='time', ax=axes[1, 0])
axes[1, 0].set_title('Grouped Count Plot')

# Horizontal bar plot
sns.barplot(data=tips, y='day', x='total_bill', orient='h', ax=axes[1, 1])
axes[1, 1].set_title('Horizontal Bar Plot')

# Custom estimator
sns.barplot(data=tips, x='day', y='total_bill', estimator=np.median, ax=axes[1, 2])
axes[1, 2].set_title('Bar Plot with Median')

plt.tight_layout()
plt.show()

---

## 📌 Section 4: Relationship Plots

### 🔗 Exploring Relationships

In [None]:
# 4.1 Scatter and Line Plots
print("🔗 Relationship Plots\n" + "="*40)

fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Basic scatter
sns.scatterplot(data=tips, x='total_bill', y='tip', ax=axes[0, 0])
axes[0, 0].set_title('Basic Scatter Plot')

# Scatter with hue and size
sns.scatterplot(data=tips, x='total_bill', y='tip', 
                hue='time', size='size', ax=axes[0, 1])
axes[0, 1].set_title('Scatter with Multiple Variables')

# Scatter with style
sns.scatterplot(data=tips, x='total_bill', y='tip', 
                hue='day', style='sex', ax=axes[0, 2])
axes[0, 2].set_title('Scatter with Style')

# Line plot
fmri = sns.load_dataset('fmri')
sns.lineplot(data=fmri, x='timepoint', y='signal', ax=axes[1, 0])
axes[1, 0].set_title('Line Plot with CI')

# Multiple lines
sns.lineplot(data=fmri, x='timepoint', y='signal', 
             hue='event', ax=axes[1, 1])
axes[1, 1].set_title('Multiple Line Plot')

# Line with style
sns.lineplot(data=fmri, x='timepoint', y='signal', 
             hue='region', style='event', ax=axes[1, 2])
axes[1, 2].set_title('Line Plot with Style')

plt.tight_layout()
plt.show()

In [None]:
# 4.2 Regression Plots
print("📈 Regression Analysis\n" + "="*40)

fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Linear regression
sns.regplot(data=tips, x='total_bill', y='tip', ax=axes[0, 0])
axes[0, 0].set_title('Linear Regression')

# Polynomial regression
sns.regplot(data=tips, x='total_bill', y='tip', order=2, ax=axes[0, 1])
axes[0, 1].set_title('Polynomial Regression (order=2)')

# Robust regression
sns.regplot(data=tips, x='total_bill', y='tip', robust=True, ax=axes[0, 2])
axes[0, 2].set_title('Robust Regression')

# Logistic regression
sns.regplot(data=tips, x='total_bill', y='size', 
            logistic=True, ax=axes[1, 0])
axes[1, 0].set_title('Logistic Regression')

# Lowess smoothing
sns.regplot(data=tips, x='total_bill', y='tip', 
            lowess=True, ax=axes[1, 1])
axes[1, 1].set_title('LOWESS Smoothing')

# Residual plot
sns.residplot(data=tips, x='total_bill', y='tip', ax=axes[1, 2])
axes[1, 2].set_title('Residual Plot')
axes[1, 2].axhline(y=0, color='red', linestyle='--')

plt.tight_layout()
plt.show()

---

## 📌 Section 5: Matrix Plots

### 🔥 Heatmaps and Correlations

In [None]:
# 5.1 Heatmaps
print("🔥 Heatmap Visualizations\n" + "="*40)

fig, axes = plt.subplots(2, 2, figsize=(14, 12))

# Correlation heatmap
corr = tips.select_dtypes(include=[np.number]).corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0, ax=axes[0, 0])
axes[0, 0].set_title('Correlation Heatmap')

# Pivot table heatmap
pivot = tips.pivot_table(values='tip', index='day', columns='time')
sns.heatmap(pivot, annot=True, fmt='.2f', cmap='YlOrRd', ax=axes[0, 1])
axes[0, 1].set_title('Pivot Table Heatmap')

# Flights dataset heatmap
flights_pivot = flights.pivot_table(values='passengers', 
                                    index='month', 
                                    columns='year')
sns.heatmap(flights_pivot, cmap='Blues', ax=axes[1, 0])
axes[1, 0].set_title('Time Series Heatmap')

# Mask upper triangle
mask = np.triu(np.ones_like(corr, dtype=bool))
sns.heatmap(corr, mask=mask, annot=True, cmap='RdBu_r', 
            center=0, ax=axes[1, 1])
axes[1, 1].set_title('Triangular Correlation Matrix')

plt.tight_layout()
plt.show()

In [None]:
# 5.2 Cluster Maps
print("🌳 Hierarchical Clustering\n" + "="*40)

# Prepare data
iris_data = iris.select_dtypes(include=[np.number])

# Create clustermap (creates its own figure)
g = sns.clustermap(iris_data.T, cmap='viridis', figsize=(10, 8))
g.fig.suptitle('Hierarchical Clustering of Iris Features', y=1.02)
plt.show()

print("\nClustermap features:")
print("✅ Automatically clusters rows and columns")
print("✅ Shows dendrograms for hierarchical relationships")
print("✅ Reorders data to show patterns")

---

## 📌 Section 6: Multi-plot Grids

### 🎯 FacetGrid and PairGrid

In [None]:
# 6.1 FacetGrid
print("📊 FacetGrid - Multiple Subplots\n" + "="*40)

# Basic FacetGrid
g = sns.FacetGrid(tips, col='time', row='sex', height=4, aspect=1)
g.map(sns.scatterplot, 'total_bill', 'tip')
g.add_legend()
g.fig.suptitle('Tips by Time and Sex', y=1.02)
plt.show()

# FacetGrid with histogram
g = sns.FacetGrid(tips, col='day', col_wrap=2, height=3)
g.map(sns.histplot, 'total_bill')
g.fig.suptitle('Bill Distribution by Day', y=1.02)
plt.show()

# FacetGrid with hue
g = sns.FacetGrid(tips, col='time', hue='sex', height=5)
g.map(sns.scatterplot, 'total_bill', 'tip')
g.add_legend()
plt.show()

In [None]:
# 6.2 PairGrid and PairPlot
print("🔗 PairGrid - All Pairwise Relationships\n" + "="*40)

# Basic pairplot
g = sns.pairplot(iris, hue='species', height=2.5)
g.fig.suptitle('Iris Dataset Pairplot', y=1.02)
plt.show()

# Custom PairGrid
iris_subset = iris[['sepal_length', 'sepal_width', 'petal_length', 'species']]
g = sns.PairGrid(iris_subset, hue='species', height=3)
g.map_diag(sns.histplot)
g.map_offdiag(sns.scatterplot)
g.add_legend()
g.fig.suptitle('Custom PairGrid', y=1.02)
plt.show()

---

## 📌 Section 7: Styling and Themes

### 🎨 Make It Beautiful

In [None]:
# 7.1 Seaborn Styles
print("🎨 Seaborn Styles\n" + "="*40)

styles = ['whitegrid', 'darkgrid', 'white', 'dark', 'ticks']
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for idx, style in enumerate(styles):
    with sns.axes_style(style):
        sns.boxplot(data=tips, x='day', y='total_bill', ax=axes[idx])
        axes[idx].set_title(f'Style: {style}')

# Remove extra subplot
fig.delaxes(axes[5])
plt.tight_layout()
plt.show()

In [None]:
# 7.2 Color Palettes
print("🌈 Color Palettes\n" + "="*40)

# Show different palettes
palettes = ['deep', 'pastel', 'bright', 'dark', 'colorblind', 'husl']

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for idx, palette in enumerate(palettes):
    sns.set_palette(palette)
    sns.barplot(data=tips.groupby('day')['total_bill'].mean().reset_index(), 
                x='day', y='total_bill', ax=axes[idx])
    axes[idx].set_title(f'Palette: {palette}')

plt.tight_layout()
plt.show()

# Custom color palette
custom_colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4']
sns.set_palette(custom_colors)

In [None]:
# 7.3 Context Settings
print("📏 Context Settings (Size Control)\n" + "="*40)

contexts = ['paper', 'notebook', 'talk', 'poster']
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()

for idx, context in enumerate(contexts):
    with sns.plotting_context(context):
        sns.lineplot(data=fmri, x='timepoint', y='signal', 
                    hue='event', ax=axes[idx])
        axes[idx].set_title(f'Context: {context}')

plt.tight_layout()
plt.show()

print("\nContext effects:")
print("📄 paper: Small, for papers")
print("📓 notebook: Default, for notebooks")
print("🎤 talk: Larger, for presentations")
print("🖼️ poster: Largest, for posters")

---

## 📌 Section 8: Statistical Estimation

### 📊 Built-in Statistical Features

In [None]:
# 8.1 Confidence Intervals
print("📊 Statistical Estimation\n" + "="*40)

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Default CI (95%)
sns.barplot(data=tips, x='day', y='total_bill', ax=axes[0, 0])
axes[0, 0].set_title('Bar Plot with 95% CI')

# Custom CI
sns.barplot(data=tips, x='day', y='total_bill', ci=68, ax=axes[0, 1])
axes[0, 1].set_title('Bar Plot with 68% CI')

# Bootstrap CI
sns.lineplot(data=fmri, x='timepoint', y='signal', 
             ci=95, ax=axes[1, 0])
axes[1, 0].set_title('Line Plot with Bootstrap CI')

# Standard deviation
sns.lineplot(data=fmri, x='timepoint', y='signal', 
             ci='sd', ax=axes[1, 1])
axes[1, 1].set_title('Line Plot with Standard Deviation')

plt.tight_layout()
plt.show()

---

## 🎯 Section 9: Real-World Projects

### Project 1: Customer Analytics Dashboard

In [None]:
# Project 1: Restaurant Analytics
print("🍽️ RESTAURANT ANALYTICS DASHBOARD\n" + "="*50)

# Set style
sns.set_theme(style="whitegrid")

# Create comprehensive dashboard
fig = plt.figure(figsize=(18, 12))
fig.suptitle('Restaurant Business Analytics', fontsize=20, fontweight='bold')

# 1. Revenue by day and time
ax1 = plt.subplot(3, 3, 1)
daily_revenue = tips.groupby(['day', 'time'])['total_bill'].sum().reset_index()
pivot = daily_revenue.pivot(index='day', columns='time', values='total_bill')
sns.heatmap(pivot, annot=True, fmt='.0f', cmap='YlGn', ax=ax1)
ax1.set_title('Revenue Heatmap')

# 2. Tip percentage analysis
ax2 = plt.subplot(3, 3, 2)
tips['tip_pct'] = tips['tip'] / tips['total_bill'] * 100
sns.violinplot(data=tips, x='day', y='tip_pct', ax=ax2)
ax2.set_title('Tip % Distribution by Day')
ax2.set_ylabel('Tip Percentage')

# 3. Customer segments
ax3 = plt.subplot(3, 3, 3)
sns.scatterplot(data=tips, x='total_bill', y='tip', 
                hue='size', size='size', ax=ax3)
ax3.set_title('Bill vs Tip by Party Size')

# 4. Time patterns
ax4 = plt.subplot(3, 3, 4)
sns.countplot(data=tips, x='day', hue='time', ax=ax4)
ax4.set_title('Customer Count by Day/Time')

# 5. Gender analysis
ax5 = plt.subplot(3, 3, 5)
sns.boxplot(data=tips, x='sex', y='total_bill', hue='time', ax=ax5)
ax5.set_title('Spending by Gender and Time')

# 6. Smoker analysis
ax6 = plt.subplot(3, 3, 6)
sns.barplot(data=tips, x='smoker', y='total_bill', hue='sex', ax=ax6)
ax6.set_title('Average Bill: Smokers vs Non-smokers')

# 7. Regression analysis
ax7 = plt.subplot(3, 3, 7)
sns.regplot(data=tips, x='total_bill', y='tip', ax=ax7, color='coral')
ax7.set_title('Tip Prediction Model')

# 8. Day comparison
ax8 = plt.subplot(3, 3, 8)
day_stats = tips.groupby('day').agg({
    'total_bill': ['mean', 'sum'],
    'tip': 'mean'
}).reset_index()
sns.barplot(data=tips, x='day', y='total_bill', ax=ax8, color='skyblue')
ax8.set_title('Average Bill by Day')

# 9. Distribution comparison
ax9 = plt.subplot(3, 3, 9)
for day in ['Thur', 'Fri', 'Sat', 'Sun']:
    subset = tips[tips['day'] == day]
    sns.kdeplot(data=subset, x='total_bill', ax=ax9, label=day)
ax9.set_title('Bill Distribution by Day')
ax9.legend()

plt.tight_layout()
plt.show()

# Print insights
print("\n📊 KEY INSIGHTS")
print("=" * 50)
print(f"Average tip percentage: {tips['tip_pct'].mean():.1f}%")
print(f"Best day for revenue: {tips.groupby('day')['total_bill'].sum().idxmax()}")
print(f"Dinner vs Lunch ratio: {len(tips[tips['time']=='Dinner'])/len(tips[tips['time']=='Lunch']):.1f}:1")
print(f"Average party size: {tips['size'].mean():.1f} people")

### Project 2: Titanic Survival Analysis

In [None]:
# Project 2: Titanic Survival Analysis
print("🚢 TITANIC SURVIVAL ANALYSIS\n" + "="*50)

# Create analysis dashboard
fig = plt.figure(figsize=(18, 10))
fig.suptitle('Titanic Passenger Survival Analysis', fontsize=20, fontweight='bold')

# 1. Survival by class
ax1 = plt.subplot(2, 4, 1)
sns.barplot(data=titanic, x='class', y='survived', ax=ax1)
ax1.set_title('Survival Rate by Class')
ax1.set_ylabel('Survival Rate')

# 2. Age distribution
ax2 = plt.subplot(2, 4, 2)
sns.histplot(data=titanic, x='age', hue='survived', kde=True, ax=ax2)
ax2.set_title('Age Distribution by Survival')

# 3. Gender survival
ax3 = plt.subplot(2, 4, 3)
sns.barplot(data=titanic, x='sex', y='survived', hue='class', ax=ax3)
ax3.set_title('Survival by Gender and Class')

# 4. Fare analysis
ax4 = plt.subplot(2, 4, 4)
sns.boxplot(data=titanic, x='survived', y='fare', ax=ax4)
ax4.set_title('Fare Distribution by Survival')
ax4.set_yscale('log')

# 5. Family size impact
ax5 = plt.subplot(2, 4, 5)
titanic['family_size'] = titanic['sibsp'] + titanic['parch'] + 1
family_survival = titanic.groupby('family_size')['survived'].mean().reset_index()
sns.barplot(data=family_survival, x='family_size', y='survived', ax=ax5)
ax5.set_title('Survival by Family Size')

# 6. Embarkation port
ax6 = plt.subplot(2, 4, 6)
sns.countplot(data=titanic, x='embarked', hue='survived', ax=ax6)
ax6.set_title('Survival by Embarkation Port')

# 7. Children vs Adults
ax7 = plt.subplot(2, 4, 7)
titanic['age_group'] = pd.cut(titanic['age'], bins=[0, 12, 18, 60, 100], 
                              labels=['Child', 'Teen', 'Adult', 'Senior'])
sns.barplot(data=titanic, x='age_group', y='survived', ax=ax7)
ax7.set_title('Survival by Age Group')

# 8. Correlation heatmap
ax8 = plt.subplot(2, 4, 8)
corr_data = titanic[['survived', 'pclass', 'age', 'sibsp', 'parch', 'fare']].corr()
sns.heatmap(corr_data, annot=True, cmap='coolwarm', center=0, ax=ax8)
ax8.set_title('Feature Correlations')

plt.tight_layout()
plt.show()

# Calculate and print statistics
print("\n📊 SURVIVAL STATISTICS")
print("=" * 50)
print(f"Overall survival rate: {titanic['survived'].mean():.1%}")
print(f"Female survival rate: {titanic[titanic['sex']=='female']['survived'].mean():.1%}")
print(f"Male survival rate: {titanic[titanic['sex']=='male']['survived'].mean():.1%}")
print(f"First class survival: {titanic[titanic['class']=='First']['survived'].mean():.1%}")
print(f"Third class survival: {titanic[titanic['class']=='Third']['survived'].mean():.1%}")
print(f"Children (<12) survival: {titanic[titanic['age']<12]['survived'].mean():.1%}")

---

## 🏆 Final Challenge: Complete EDA Pipeline

### Build a Complete Exploratory Data Analysis

In [None]:
# Final Project: Complete EDA Pipeline
print("🎯 COMPLETE EDA PIPELINE\n" + "="*50)

class SeabornEDA:
    def __init__(self, data, name="Dataset"):
        self.data = data
        self.name = name
        self.numeric_cols = data.select_dtypes(include=[np.number]).columns
        self.categorical_cols = data.select_dtypes(include=['object', 'category']).columns
        
    def data_overview(self):
        """Print data overview"""
        print(f"\n📊 {self.name} Overview")
        print("=" * 50)
        print(f"Shape: {self.data.shape}")
        print(f"Numeric columns: {len(self.numeric_cols)}")
        print(f"Categorical columns: {len(self.categorical_cols)}")
        print(f"Missing values: {self.data.isnull().sum().sum()}")
        
    def distribution_analysis(self):
        """Analyze distributions"""
        if len(self.numeric_cols) == 0:
            return
            
        n_cols = min(4, len(self.numeric_cols))
        fig, axes = plt.subplots(2, n_cols, figsize=(n_cols*4, 8))
        fig.suptitle(f'{self.name}: Distribution Analysis', fontsize=16)
        
        for idx, col in enumerate(self.numeric_cols[:n_cols]):
            # Histogram
            sns.histplot(data=self.data, x=col, kde=True, ax=axes[0, idx])
            axes[0, idx].set_title(f'{col} Distribution')
            
            # Box plot
            sns.boxplot(data=self.data, y=col, ax=axes[1, idx])
            axes[1, idx].set_title(f'{col} Box Plot')
            
        plt.tight_layout()
        plt.show()
        
    def correlation_analysis(self):
        """Analyze correlations"""
        if len(self.numeric_cols) < 2:
            return
            
        fig, ax = plt.subplots(figsize=(10, 8))
        corr = self.data[self.numeric_cols].corr()
        mask = np.triu(np.ones_like(corr, dtype=bool))
        sns.heatmap(corr, mask=mask, annot=True, cmap='coolwarm', 
                   center=0, ax=ax)
        ax.set_title(f'{self.name}: Correlation Matrix')
        plt.show()
        
    def categorical_analysis(self):
        """Analyze categorical variables"""
        if len(self.categorical_cols) == 0:
            return
            
        n_cols = min(3, len(self.categorical_cols))
        fig, axes = plt.subplots(1, n_cols, figsize=(n_cols*5, 5))
        if n_cols == 1:
            axes = [axes]
        fig.suptitle(f'{self.name}: Categorical Analysis', fontsize=16)
        
        for idx, col in enumerate(self.categorical_cols[:n_cols]):
            value_counts = self.data[col].value_counts()[:10]
            sns.barplot(x=value_counts.values, y=value_counts.index, ax=axes[idx])
            axes[idx].set_title(f'{col} Distribution')
            
        plt.tight_layout()
        plt.show()
        
    def run_complete_eda(self):
        """Run complete EDA"""
        print(f"\n🔍 RUNNING COMPLETE EDA FOR {self.name.upper()}")
        print("=" * 60)
        
        self.data_overview()
        self.distribution_analysis()
        self.correlation_analysis()
        self.categorical_analysis()
        
        print("\n✅ EDA Complete!")

# Run EDA on Tips dataset
eda = SeabornEDA(tips, "Restaurant Tips")
eda.run_complete_eda()

# Summary statistics
print("\n📊 SUMMARY STATISTICS")
print("=" * 50)
print(tips.describe())

---

## 🎯 Summary & Next Steps

### 🏆 What You've Mastered:

✅ **Distribution Plots**
- Histograms, KDE, ECDF
- Joint and marginal distributions

✅ **Categorical Plots**
- Box, violin, swarm plots
- Bar and count plots

✅ **Relationship Plots**
- Scatter and line plots
- Regression analysis

✅ **Matrix Plots**
- Heatmaps
- Cluster maps

✅ **Multi-plot Grids**
- FacetGrid
- PairGrid

✅ **Statistical Features**
- Confidence intervals
- Built-in statistics

✅ **Styling**
- Themes and contexts
- Color palettes

### 🚀 Next Steps:

1. **Practice with real data**: Kaggle datasets
2. **Learn Plotly**: Interactive visualizations
3. **Combine with ML**: Feature engineering
4. **Create reports**: Jupyter + Seaborn
5. **Build dashboards**: Streamlit/Dash

### 💡 Pro Tips:

- **Start with pairplot** for quick overview
- **Use hue parameter** for grouping
- **Combine plots** for insights
- **Set theme early** for consistency
- **Export high-res** for presentations

### 📚 Resources:

- Official Docs: seaborn.pydata.org
- Gallery: seaborn.pydata.org/examples
- Tutorial: seaborn.pydata.org/tutorial
- Book: "Data Visualization with Python"

---

## 🎉 Congratulations!

You've mastered Seaborn - statistical visualization made beautiful!

You can now:
- **Create statistical plots** 📊
- **Analyze distributions** 📈
- **Explore relationships** 🔗
- **Build beautiful dashboards** 🎨
- **Tell data stories** 📖

**Keep exploring, keep visualizing, and keep finding insights!** 🌊

In [None]:
# 🎊 Course Complete!
print("🎊" * 20)
print("\n    🏆 SEABORN MASTERY ACHIEVED! 🏆")
print("\n    You're now ready to:")
print("    → Create statistical visualizations")
print("    → Perform visual EDA")
print("    → Build analysis dashboards")
print("    → Find patterns in data")
print("\n    Next: Plotly for interactive plots! ✨")
print("\n" + "🎊" * 20)