# Data Visualization with Seaborn

Seaborn is a powerful Python data visualization library built on top of Matplotlib that provides a high-level interface for creating attractive and informative statistical graphics. It simplifies the process of creating complex visualizations and comes with several built-in themes and color palettes to make your plots more aesthetically pleasing.

## 1. Introduction to Seaborn

Seaborn is a statistical data visualization library that:
- Provides beautiful default styles and color palettes
- Works seamlessly with Pandas DataFrames
- Offers specialized statistical plot types
- Simplifies complex visualization tasks
- Built on top of Matplotlib, so you can use both together

**Key Features:**
- Dataset-oriented API for examining relationships between variables
- Specialized support for categorical variables
- Tools for visualizing univariate and bivariate distributions
- Automatic estimation and plotting of statistical models
- High-level abstractions for multi-plot grids

## 2. Why Seaborn over Matplotlib?

| Feature | Matplotlib | Seaborn |
|---------|-----------|----------|
| Ease of Use | Requires more code | More concise, fewer lines |
| Default Style | Basic, requires customization | Beautiful defaults |
| Statistical Plots | Manual implementation | Built-in statistical functions |
| DataFrame Integration | Manual data handling | Direct DataFrame support |
| Color Palettes | Limited defaults | Rich, curated palettes |
| Multi-plot Grids | Complex setup | Simple grid systems |

**When to use Seaborn:**
- Creating statistical visualizations
- Working with DataFrames
- Need quick, beautiful plots
- Exploratory data analysis

**When to use Matplotlib:**
- Need precise control over every element
- Creating custom, non-standard visualizations
- Working with low-level plotting primitives

## 3. Installing and Importing Seaborn

In [None]:
# Installation (run in terminal or command prompt)
# pip install seaborn

# Standard imports for data visualization
import seaborn as sns  # Seaborn for statistical plots
import matplotlib.pyplot as plt  # Matplotlib for additional customization
import pandas as pd  # Pandas for data manipulation
import numpy as np  # NumPy for numerical operations

# Check Seaborn version
print(f"Seaborn version: {sns.__version__}")

# Set default style for better-looking plots
sns.set_theme()  # Apply Seaborn's default theme

## 4. Seaborn Datasets

Seaborn comes with several built-in datasets that are perfect for learning and experimentation.

### 4.1 Built-in Datasets

In [None]:
# View all available datasets
available_datasets = sns.get_dataset_names()
print("Available datasets:")
print(available_datasets)
print(f"\nTotal datasets: {len(available_datasets)}")

### 4.2 Loading Sample Data

In [None]:
# Load popular datasets
tips = sns.load_dataset('tips')  # Restaurant tipping data
iris = sns.load_dataset('iris')  # Classic iris flower dataset
titanic = sns.load_dataset('titanic')  # Titanic passengers data
planets = sns.load_dataset('planets')  # Exoplanet discoveries

# Explore the tips dataset
print("Tips Dataset:")
print(tips.head())
print(f"\nShape: {tips.shape}")
print(f"\nColumns: {tips.columns.tolist()}")

In [None]:
# Explore the iris dataset
print("Iris Dataset:")
print(iris.head())
print(f"\nShape: {iris.shape}")
print(f"\nSpecies: {iris['species'].unique()}")
print(f"\nDataset info:")
print(iris.info())

## 5. Statistical Plots

Seaborn excels at creating statistical visualizations that help understand data distributions and relationships.

### 5.1 Scatter Plots with Regression

In [None]:
# regplot: Simple scatter plot with regression line
plt.figure(figsize=(10, 5))

plt.subplot(1, 2, 1)
# Basic regression plot
sns.regplot(x='total_bill', y='tip', data=tips)
plt.title('Regression Plot: Bill vs Tip')

plt.subplot(1, 2, 2)
# Regression plot with customization
sns.regplot(x='total_bill', y='tip', data=tips, 
            scatter_kws={'alpha': 0.5, 's': 50},  # Scatter point properties
            line_kws={'color': 'red', 'linewidth': 2})  # Line properties
plt.title('Customized Regression Plot')

plt.tight_layout()
plt.show()

In [None]:
# lmplot: More powerful regression plot with additional features
# Can create multiple plots based on categories
sns.lmplot(x='total_bill', y='tip', data=tips, 
           hue='sex',  # Different colors for male/female
           markers=['o', 's'],  # Different markers
           palette='Set1',  # Color palette
           height=5, aspect=1.5)  # Figure size
plt.title('LM Plot with Categories')
plt.show()

In [None]:
# lmplot with col parameter: separate plots for each category
sns.lmplot(x='total_bill', y='tip', data=tips, 
           col='time',  # Separate plot for Lunch and Dinner
           hue='sex',  # Color by gender
           height=4, aspect=1)
plt.show()

### 5.2 Distribution Plots

In [None]:
# histplot: Histogram with automatic bin selection
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
# Basic histogram
sns.histplot(data=tips, x='total_bill', bins=20)
plt.title('Basic Histogram')

plt.subplot(1, 3, 2)
# Histogram with KDE (Kernel Density Estimate)
sns.histplot(data=tips, x='total_bill', kde=True, color='green')
plt.title('Histogram with KDE')

plt.subplot(1, 3, 3)
# Histogram with multiple categories
sns.histplot(data=tips, x='total_bill', hue='time', 
             multiple='stack',  # Stack the categories
             palette='viridis')
plt.title('Stacked Histogram by Time')

plt.tight_layout()
plt.show()

In [None]:
# kdeplot: Kernel Density Estimate plot (smooth distribution)
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
# Single variable KDE
sns.kdeplot(data=tips, x='total_bill', fill=True)
plt.title('KDE Plot')

plt.subplot(1, 3, 2)
# Multiple categories KDE
sns.kdeplot(data=tips, x='total_bill', hue='time', 
            fill=True, alpha=0.5)
plt.title('KDE Plot with Categories')

plt.subplot(1, 3, 3)
# 2D KDE (bivariate)
sns.kdeplot(data=tips, x='total_bill', y='tip', 
            fill=True, cmap='Blues', thresh=0)
plt.title('2D KDE Plot')

plt.tight_layout()
plt.show()

In [None]:
# displot: Figure-level distribution plot (more flexible)
# Can create histograms, KDE plots, or both
sns.displot(data=tips, x='total_bill', 
            kde=True,  # Add KDE overlay
            height=5, aspect=1.5)
plt.title('Distribution Plot')
plt.show()

In [None]:
# displot with multiple subplots
sns.displot(data=tips, x='total_bill', 
            col='time',  # Separate plots for each time
            row='sex',   # Separate rows for each sex
            kde=True, height=3, aspect=1.2)
plt.show()

### 5.3 Box Plots and Violin Plots

In [None]:
# boxplot: Shows distribution quartiles and outliers
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
# Basic box plot
sns.boxplot(data=tips, x='day', y='total_bill')
plt.title('Box Plot by Day')

plt.subplot(1, 3, 2)
# Box plot with hue (additional category)
sns.boxplot(data=tips, x='day', y='total_bill', hue='sex')
plt.title('Box Plot by Day and Gender')

plt.subplot(1, 3, 3)
# Box plot with customization
sns.boxplot(data=tips, x='day', y='total_bill', 
            palette='Set2', linewidth=2.5)
plt.title('Customized Box Plot')

plt.tight_layout()
plt.show()

In [None]:
# violinplot: Combination of box plot and KDE
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
# Basic violin plot
sns.violinplot(data=tips, x='day', y='total_bill')
plt.title('Violin Plot by Day')

plt.subplot(1, 3, 2)
# Violin plot with split (comparing two categories)
sns.violinplot(data=tips, x='day', y='total_bill', 
               hue='sex', split=True, palette='muted')
plt.title('Split Violin Plot')

plt.subplot(1, 3, 3)
# Violin plot with inner quartiles
sns.violinplot(data=tips, x='day', y='total_bill', 
               inner='quartile',  # Show quartile lines
               palette='pastel')
plt.title('Violin Plot with Quartiles')

plt.tight_layout()
plt.show()

### 5.4 Pair Plots

In [None]:
# pairplot: Matrix of scatter plots for all numeric variables
# Shows relationships between all pairs of variables
sns.pairplot(iris)
plt.suptitle('Pair Plot of Iris Dataset', y=1.02)
plt.show()

In [None]:
# pairplot with hue: color by category
sns.pairplot(iris, hue='species', 
             palette='husl',  # Color palette
             markers=['o', 's', 'D'],  # Different markers for each species
             diag_kind='kde')  # Use KDE for diagonal plots
plt.suptitle('Pair Plot with Species Classification', y=1.02)
plt.show()

In [None]:
# pairplot with selected columns
sns.pairplot(iris, 
             vars=['sepal_length', 'sepal_width'],  # Only these columns
             hue='species', 
             height=3)
plt.show()

### 5.5 Joint Plots

In [None]:
# jointplot: Combines scatter plot with marginal distributions
# Shows relationship between two variables plus their distributions

# Scatter joint plot
sns.jointplot(data=tips, x='total_bill', y='tip', 
              kind='scatter',  # Type of plot
              height=6)
plt.suptitle('Joint Scatter Plot', y=1.02)
plt.show()

In [None]:
# Joint plot with regression
sns.jointplot(data=tips, x='total_bill', y='tip', 
              kind='reg',  # Regression plot
              height=6)
plt.suptitle('Joint Regression Plot', y=1.02)
plt.show()

In [None]:
# Joint plot with hexagonal bins (good for large datasets)
sns.jointplot(data=tips, x='total_bill', y='tip', 
              kind='hex',  # Hexagonal binning
              height=6)
plt.suptitle('Joint Hexbin Plot', y=1.02)
plt.show()

In [None]:
# Joint plot with KDE
sns.jointplot(data=tips, x='total_bill', y='tip', 
              kind='kde',  # Kernel density estimate
              fill=True, cmap='YlOrRd',
              height=6)
plt.suptitle('Joint KDE Plot', y=1.02)
plt.show()

## 6. Categorical Plots

Seaborn provides specialized plots for visualizing categorical data.

### 6.1 Bar Plots

In [None]:
# barplot: Shows mean values with confidence intervals
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
# Basic bar plot (shows mean by default)
sns.barplot(data=tips, x='day', y='total_bill')
plt.title('Average Bill by Day')

plt.subplot(1, 3, 2)
# Bar plot with hue
sns.barplot(data=tips, x='day', y='total_bill', hue='sex')
plt.title('Average Bill by Day and Gender')

plt.subplot(1, 3, 3)
# Bar plot with custom estimator (median instead of mean)
sns.barplot(data=tips, x='day', y='total_bill', 
            estimator=np.median,  # Use median
            palette='coolwarm')
plt.title('Median Bill by Day')

plt.tight_layout()
plt.show()

In [None]:
# countplot: Shows count of observations in each category
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
# Basic count plot
sns.countplot(data=tips, x='day')
plt.title('Count of Observations by Day')

plt.subplot(1, 3, 2)
# Count plot with hue
sns.countplot(data=tips, x='day', hue='sex', palette='Set2')
plt.title('Count by Day and Gender')

plt.subplot(1, 3, 3)
# Horizontal count plot
sns.countplot(data=tips, y='day', palette='viridis')
plt.title('Horizontal Count Plot')

plt.tight_layout()
plt.show()

### 6.2 Point Plots

In [None]:
# pointplot: Shows point estimates with confidence intervals
# Useful for showing how a variable changes across categories
plt.figure(figsize=(15, 5))

plt.subplot(1, 2, 1)
# Basic point plot
sns.pointplot(data=tips, x='day', y='total_bill')
plt.title('Point Plot: Bill by Day')

plt.subplot(1, 2, 2)
# Point plot with hue
sns.pointplot(data=tips, x='day', y='total_bill', hue='sex',
              markers=['o', 's'],  # Different markers
              linestyles=['-', '--'])  # Different line styles
plt.title('Point Plot with Gender Comparison')

plt.tight_layout()
plt.show()

### 6.3 Strip Plots and Swarm Plots

In [None]:
# stripplot: Shows all individual data points
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
# Basic strip plot (points may overlap)
sns.stripplot(data=tips, x='day', y='total_bill')
plt.title('Strip Plot')

plt.subplot(1, 3, 2)
# Strip plot with jitter (reduces overlap)
sns.stripplot(data=tips, x='day', y='total_bill', 
              jitter=True, alpha=0.5)
plt.title('Strip Plot with Jitter')

plt.subplot(1, 3, 3)
# Strip plot with hue
sns.stripplot(data=tips, x='day', y='total_bill', 
              hue='sex', jitter=True, alpha=0.5)
plt.title('Strip Plot by Gender')

plt.tight_layout()
plt.show()

In [None]:
# swarmplot: Similar to strip plot but adjusts points to avoid overlap
# Better for smaller datasets
plt.figure(figsize=(15, 5))

plt.subplot(1, 2, 1)
# Basic swarm plot
sns.swarmplot(data=tips, x='day', y='total_bill')
plt.title('Swarm Plot')

plt.subplot(1, 2, 2)
# Swarm plot with hue
sns.swarmplot(data=tips, x='day', y='total_bill', 
              hue='sex', palette='Set1')
plt.title('Swarm Plot by Gender')

plt.tight_layout()
plt.show()

### 6.4 Combining Plots

In [None]:
# Combining box plot with strip plot for better visualization
plt.figure(figsize=(15, 5))

plt.subplot(1, 2, 1)
# Box plot with strip plot overlay
sns.boxplot(data=tips, x='day', y='total_bill', palette='pastel')
sns.stripplot(data=tips, x='day', y='total_bill', 
              color='black', alpha=0.3, size=3)
plt.title('Box Plot with Strip Plot Overlay')

plt.subplot(1, 2, 2)
# Violin plot with swarm plot overlay
sns.violinplot(data=tips, x='day', y='total_bill', 
               palette='muted', inner=None)  # Remove inner box
sns.swarmplot(data=tips, x='day', y='total_bill', 
              color='white', edgecolor='black', size=3)
plt.title('Violin Plot with Swarm Plot Overlay')

plt.tight_layout()
plt.show()

## 7. Matrix Plots

Matrix plots are useful for visualizing relationships between multiple variables, especially correlation matrices.

### 7.1 Heatmaps

In [None]:
# Create a sample matrix for demonstration
# Random data matrix
data_matrix = np.random.rand(10, 12)

# Basic heatmap
plt.figure(figsize=(12, 6))
sns.heatmap(data_matrix, 
            annot=False,  # Don't show values
            cmap='viridis')  # Color scheme
plt.title('Basic Heatmap')
plt.show()

In [None]:
# Heatmap with annotations
small_matrix = np.random.rand(5, 5)

plt.figure(figsize=(8, 6))
sns.heatmap(small_matrix, 
            annot=True,  # Show values in cells
            fmt='.2f',  # Format to 2 decimal places
            cmap='coolwarm',  # Blue-white-red color scheme
            linewidths=0.5,  # Lines between cells
            cbar_kws={'label': 'Values'})  # Colorbar label
plt.title('Annotated Heatmap')
plt.show()

### 7.2 Correlation Matrices

In [None]:
# Calculate correlation matrix for iris dataset
iris_numeric = iris.select_dtypes(include=[np.number])  # Select only numeric columns
correlation_matrix = iris_numeric.corr()  # Calculate correlations

print("Correlation Matrix:")
print(correlation_matrix)

In [None]:
# Visualize correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, 
            annot=True,  # Show correlation values
            fmt='.2f',  # Two decimal places
            cmap='coolwarm',  # Color scheme
            center=0,  # Center colormap at 0
            square=True,  # Square cells
            linewidths=1,  # Cell borders
            cbar_kws={'shrink': 0.8})  # Smaller colorbar
plt.title('Correlation Matrix - Iris Dataset', fontsize=14, pad=20)
plt.show()

In [None]:
# Correlation matrix with mask (show only lower triangle)
# Create mask for upper triangle
mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))

plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, 
            mask=mask,  # Apply mask
            annot=True, 
            fmt='.2f',
            cmap='RdYlGn',  # Red-Yellow-Green color scheme
            center=0,
            square=True,
            linewidths=2,
            vmin=-1, vmax=1)  # Set color range from -1 to 1
plt.title('Lower Triangle Correlation Matrix', fontsize=14, pad=20)
plt.show()

### 7.3 Clustermaps

In [None]:
# clustermap: Hierarchically clustered heatmap
# Automatically reorders rows and columns to show patterns

# Create sample data
iris_sample = iris_numeric.sample(30, random_state=42)  # Sample 30 rows

# Basic clustermap
sns.clustermap(iris_sample, 
               cmap='viridis',
               figsize=(10, 10))
plt.suptitle('Clustermap of Iris Features', y=0.98)
plt.show()

In [None]:
# Clustermap with standardized data and customization
from scipy.stats import zscore

# Standardize the data (z-score normalization)
iris_standardized = iris_sample.apply(zscore)

# Create clustermap with more options
sns.clustermap(iris_standardized, 
               cmap='coolwarm',
               standard_scale=None,  # Data already standardized
               figsize=(10, 10),
               row_cluster=True,  # Cluster rows
               col_cluster=True,  # Cluster columns
               linewidths=0.5,
               cbar_kws={'label': 'Z-score'})
plt.suptitle('Standardized Clustermap', y=0.98)
plt.show()

## 8. Multi-plot Grids

Seaborn provides powerful grid systems for creating complex multi-plot figures.

### 8.1 FacetGrid

In [None]:
# FacetGrid: Create a grid of plots based on categorical variables
# Useful for examining subsets of data

# Basic FacetGrid with one dimension
g = sns.FacetGrid(tips, col='time',  # Separate plots by time
                  height=4, aspect=1)
g.map(sns.histplot, 'total_bill')  # Apply histogram to each subplot
g.add_legend()
plt.show()

In [None]:
# FacetGrid with two dimensions (row and col)
g = sns.FacetGrid(tips, 
                  row='sex',  # Rows by gender
                  col='time',  # Columns by time
                  hue='smoker',  # Color by smoker status
                  height=4, aspect=1.2)
g.map(sns.scatterplot, 'total_bill', 'tip', alpha=0.7)
g.add_legend()
plt.show()

In [None]:
# FacetGrid with multiple plot types
g = sns.FacetGrid(tips, col='day', col_wrap=2,  # Wrap after 2 columns
                  height=4, aspect=1.2)
g.map(sns.scatterplot, 'total_bill', 'tip', alpha=0.6)
g.map(sns.regplot, 'total_bill', 'tip', 
      scatter=False,  # Don't show scatter points (already shown)
      color='red')  # Regression line in red
plt.show()

### 8.2 PairGrid

In [None]:
# PairGrid: More control over pairplot
# Can specify different plot types for different parts of the grid

# Create PairGrid
g = sns.PairGrid(iris, hue='species', height=2.5)

# Map different plot types to different parts
g.map_upper(sns.scatterplot)  # Upper triangle: scatter plots
g.map_lower(sns.kdeplot)  # Lower triangle: KDE plots
g.map_diag(sns.histplot)  # Diagonal: histograms
g.add_legend()
plt.show()

In [None]:
# PairGrid with custom plot types
g = sns.PairGrid(iris, height=2.5)

# Different plots for each section
g.map_upper(sns.scatterplot, alpha=0.5)
g.map_lower(sns.regplot, scatter_kws={'alpha': 0.3})
g.map_diag(sns.kdeplot, fill=True)
plt.show()

### 8.3 JointGrid

In [None]:
# JointGrid: More control over joint plots
# Create custom combinations of plots

# Create JointGrid
g = sns.JointGrid(data=tips, x='total_bill', y='tip', height=6)

# Map different plot types
g.plot_joint(sns.scatterplot, alpha=0.5)  # Center: scatter plot
g.plot_marginals(sns.histplot, kde=True)  # Margins: histograms with KDE
plt.show()

In [None]:
# JointGrid with different plot combinations
g = sns.JointGrid(data=tips, x='total_bill', y='tip', height=6)

# Hexbin for center, KDE for margins
g.plot_joint(plt.hexbin, cmap='YlOrRd', gridsize=20)
g.plot_marginals(sns.kdeplot, fill=True)
plt.show()

## 9. Customization

Seaborn provides extensive customization options for colors, styles, and aesthetics.

### 9.1 Color Palettes

In [None]:
# View available color palettes
print("Built-in color palettes:")
print("Qualitative: deep, muted, bright, pastel, dark, colorblind")
print("Sequential: rocket, mako, flare, crest")
print("Diverging: vlag, icefire")
print("Matplotlib: viridis, plasma, inferno, magma, cividis")

In [None]:
# Display color palettes
palettes = ['deep', 'muted', 'pastel', 'dark', 'colorblind']

plt.figure(figsize=(15, 8))
for i, palette in enumerate(palettes, 1):
    plt.subplot(len(palettes), 1, i)
    sns.palplot(sns.color_palette(palette))
    plt.title(f'{palette.capitalize()} Palette', loc='left')
    plt.axis('off')
plt.tight_layout()
plt.show()

In [None]:
# Using different color palettes in plots
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
sns.barplot(data=tips, x='day', y='total_bill', palette='deep')
plt.title('Deep Palette')

plt.subplot(1, 3, 2)
sns.barplot(data=tips, x='day', y='total_bill', palette='pastel')
plt.title('Pastel Palette')

plt.subplot(1, 3, 3)
sns.barplot(data=tips, x='day', y='total_bill', palette='colorblind')
plt.title('Colorblind-Friendly Palette')

plt.tight_layout()
plt.show()

In [None]:
# Custom color palettes
# Create custom palette with specific colors
custom_palette = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#FFA07A']

plt.figure(figsize=(10, 5))
sns.barplot(data=tips, x='day', y='total_bill', palette=custom_palette)
plt.title('Custom Color Palette')
plt.show()

In [None]:
# Sequential color palettes (for continuous data)
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
# Light to dark blue
sns.kdeplot(data=tips, x='total_bill', y='tip', 
            fill=True, cmap='Blues', thresh=0)
plt.title('Blues Sequential Palette')

plt.subplot(1, 3, 2)
# Rocket palette (dark to light)
sns.kdeplot(data=tips, x='total_bill', y='tip', 
            fill=True, cmap='rocket', thresh=0)
plt.title('Rocket Sequential Palette')

plt.subplot(1, 3, 3)
# Viridis palette (perceptually uniform)
sns.kdeplot(data=tips, x='total_bill', y='tip', 
            fill=True, cmap='viridis', thresh=0)
plt.title('Viridis Sequential Palette')

plt.tight_layout()
plt.show()

### 9.2 Themes and Styles

In [None]:
# Available Seaborn styles
styles = ['darkgrid', 'whitegrid', 'dark', 'white', 'ticks']

plt.figure(figsize=(15, 12))

for i, style in enumerate(styles, 1):
    plt.subplot(3, 2, i)
    sns.set_style(style)  # Set the style
    sns.lineplot(data=tips, x='total_bill', y='tip')
    plt.title(f'Style: {style}')

plt.tight_layout()
plt.show()

# Reset to default
sns.set_style('darkgrid')

In [None]:
# Customizing specific style elements
# Modify grid, ticks, etc.

custom_style = {
    'axes.facecolor': '#EAEAF2',  # Background color
    'axes.edgecolor': 'white',
    'axes.grid': True,
    'grid.color': 'white',
    'grid.linestyle': '-',
    'grid.linewidth': 1.5
}

sns.set_style('darkgrid', custom_style)
plt.figure(figsize=(10, 6))
sns.scatterplot(data=tips, x='total_bill', y='tip', hue='time')
plt.title('Plot with Custom Style')
plt.show()

# Reset style
sns.set_style('darkgrid')

### 9.3 Figure Aesthetics

In [None]:
# despine: Remove top and right spines for cleaner look
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
sns.boxplot(data=tips, x='day', y='total_bill')
plt.title('Default Spines')

plt.subplot(1, 3, 2)
sns.boxplot(data=tips, x='day', y='total_bill')
sns.despine()  # Remove top and right spines
plt.title('After despine()')

plt.subplot(1, 3, 3)
sns.boxplot(data=tips, x='day', y='total_bill')
sns.despine(left=True, bottom=True)  # Remove all spines
plt.title('Minimal Spines')

plt.tight_layout()
plt.show()

### 9.4 Context Settings

In [None]:
# Context affects the scale of plot elements
# Useful for different output formats (paper, notebook, talk, poster)

contexts = ['paper', 'notebook', 'talk', 'poster']

plt.figure(figsize=(15, 12))

for i, context in enumerate(contexts, 1):
    plt.subplot(2, 2, i)
    sns.set_context(context)  # Set context
    sns.lineplot(data=tips, x='total_bill', y='tip')
    plt.title(f'Context: {context}', fontsize=16)

plt.tight_layout()
plt.show()

# Reset to default
sns.set_context('notebook')

In [None]:
# Custom context with scaling
sns.set_context('notebook', 
                font_scale=1.5,  # Increase font size by 50%
                rc={'lines.linewidth': 2.5})  # Thicker lines

plt.figure(figsize=(10, 6))
sns.lineplot(data=tips, x='total_bill', y='tip', hue='time')
plt.title('Plot with Scaled Context')
plt.show()

# Reset
sns.set_context('notebook', font_scale=1)

## 10. Integration with Pandas

Seaborn is designed to work seamlessly with Pandas DataFrames, making it easy to visualize structured data.

In [None]:
# Direct DataFrame plotting with Seaborn
# Seaborn automatically handles DataFrame columns

# Create a sample DataFrame
df = pd.DataFrame({
    'category': ['A', 'B', 'C', 'D'] * 25,
    'value1': np.random.randn(100),
    'value2': np.random.randn(100) * 2 + 1,
    'group': np.random.choice(['Group 1', 'Group 2'], 100)
})

print("Sample DataFrame:")
print(df.head())
print(f"\nShape: {df.shape}")

In [None]:
# Using DataFrame with Seaborn plots
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
# Column names as parameters
sns.boxplot(data=df, x='category', y='value1')
plt.title('Box Plot from DataFrame')

plt.subplot(1, 3, 2)
# Multiple groupings
sns.violinplot(data=df, x='category', y='value1', hue='group')
plt.title('Violin Plot with Hue')

plt.subplot(1, 3, 3)
# Scatter plot with multiple variables
sns.scatterplot(data=df, x='value1', y='value2', 
                hue='group', style='group')
plt.title('Scatter Plot with Groups')

plt.tight_layout()
plt.show()

In [None]:
# Working with time series data
# Create time series DataFrame
dates = pd.date_range('2024-01-01', periods=100, freq='D')
ts_df = pd.DataFrame({
    'date': dates,
    'value': np.cumsum(np.random.randn(100)) + 100,
    'category': np.random.choice(['A', 'B', 'C'], 100)
})

plt.figure(figsize=(15, 5))
sns.lineplot(data=ts_df, x='date', y='value', hue='category')
plt.title('Time Series Plot from DataFrame')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
# Using Pandas groupby with Seaborn
# Calculate statistics and visualize

# Group by category and calculate mean
grouped_data = df.groupby(['category', 'group'])['value1'].mean().reset_index()

print("Grouped Data:")
print(grouped_data)

plt.figure(figsize=(10, 6))
sns.barplot(data=grouped_data, x='category', y='value1', hue='group')
plt.title('Mean Values by Category and Group')
plt.ylabel('Mean Value')
plt.show()

## 11. Comparison: Matplotlib vs Seaborn

Let's compare how the same visualization looks in both libraries.

In [None]:
# Example 1: Scatter plot with regression line
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Matplotlib version (more code)
axes[0].scatter(tips['total_bill'], tips['tip'], alpha=0.5)
# Calculate regression line manually
z = np.polyfit(tips['total_bill'], tips['tip'], 1)
p = np.poly1d(z)
axes[0].plot(tips['total_bill'], p(tips['total_bill']), "r-", linewidth=2)
axes[0].set_xlabel('Total Bill')
axes[0].set_ylabel('Tip')
axes[0].set_title('Matplotlib: Scatter + Regression')
axes[0].grid(True, alpha=0.3)

# Seaborn version (simpler)
sns.regplot(ax=axes[1], data=tips, x='total_bill', y='tip')
axes[1].set_title('Seaborn: regplot()')

plt.tight_layout()
plt.show()

In [None]:
# Example 2: Box plot with multiple categories
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Matplotlib version (complex data preparation)
days = tips['day'].unique()
data_to_plot = [tips[tips['day'] == day]['total_bill'].values for day in days]
axes[0].boxplot(data_to_plot, labels=days)
axes[0].set_xlabel('Day')
axes[0].set_ylabel('Total Bill')
axes[0].set_title('Matplotlib: Box Plot')
axes[0].grid(True, alpha=0.3, axis='y')

# Seaborn version (one line)
sns.boxplot(ax=axes[1], data=tips, x='day', y='total_bill')
axes[1].set_title('Seaborn: boxplot()')

plt.tight_layout()
plt.show()

In [None]:
# Example 3: Heatmap
# Prepare correlation matrix
corr = tips[['total_bill', 'tip', 'size']].corr()

fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Matplotlib version (manual setup)
im = axes[0].imshow(corr, cmap='coolwarm', aspect='auto', vmin=-1, vmax=1)
axes[0].set_xticks(range(len(corr.columns)))
axes[0].set_yticks(range(len(corr.columns)))
axes[0].set_xticklabels(corr.columns)
axes[0].set_yticklabels(corr.columns)
plt.colorbar(im, ax=axes[0])
# Add text annotations manually
for i in range(len(corr)):
    for j in range(len(corr)):
        axes[0].text(j, i, f'{corr.iloc[i, j]:.2f}',
                    ha='center', va='center')
axes[0].set_title('Matplotlib: Heatmap')

# Seaborn version (automatic formatting)
sns.heatmap(corr, ax=axes[1], annot=True, fmt='.2f', 
            cmap='coolwarm', center=0, square=True)
axes[1].set_title('Seaborn: heatmap()')

plt.tight_layout()
plt.show()

## 12. Practical Examples

Real-world examples demonstrating common data analysis tasks.

### 12.1 Exploratory Data Analysis (EDA)

In [None]:
# Complete EDA workflow for tips dataset
print("=" * 50)
print("EXPLORATORY DATA ANALYSIS - Tips Dataset")
print("=" * 50)

# 1. Dataset overview
print("\n1. Dataset Information:")
print(f"   Shape: {tips.shape}")
print(f"   Columns: {tips.columns.tolist()}")
print(f"\n   First few rows:")
print(tips.head())

# 2. Statistical summary
print("\n2. Statistical Summary:")
print(tips.describe())

In [None]:
# 3. Distribution analysis
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Distribution of total bill
sns.histplot(data=tips, x='total_bill', kde=True, ax=axes[0, 0])
axes[0, 0].set_title('Distribution of Total Bill')

# Distribution of tips
sns.histplot(data=tips, x='tip', kde=True, ax=axes[0, 1], color='green')
axes[0, 1].set_title('Distribution of Tips')

# Distribution by day
sns.countplot(data=tips, x='day', ax=axes[1, 0])
axes[1, 0].set_title('Count by Day')

# Distribution by time
sns.countplot(data=tips, x='time', hue='sex', ax=axes[1, 1])
axes[1, 1].set_title('Count by Time and Gender')

plt.tight_layout()
plt.show()

In [None]:
# 4. Relationship analysis
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Bill vs Tip
sns.scatterplot(data=tips, x='total_bill', y='tip', 
                hue='time', style='sex', ax=axes[0, 0])
axes[0, 0].set_title('Bill vs Tip by Time and Gender')

# Regression analysis
sns.regplot(data=tips, x='total_bill', y='tip', ax=axes[0, 1])
axes[0, 1].set_title('Bill vs Tip with Regression')

# Tip percentage by day
tips['tip_percent'] = (tips['tip'] / tips['total_bill']) * 100
sns.boxplot(data=tips, x='day', y='tip_percent', ax=axes[1, 0])
axes[1, 0].set_title('Tip Percentage by Day')

# Party size analysis
sns.violinplot(data=tips, x='size', y='total_bill', ax=axes[1, 1])
axes[1, 1].set_title('Bill Amount by Party Size')

plt.tight_layout()
plt.show()

In [None]:
# 5. Correlation analysis
plt.figure(figsize=(10, 8))

# Select numeric columns
numeric_cols = tips.select_dtypes(include=[np.number])
correlation = numeric_cols.corr()

# Create heatmap
sns.heatmap(correlation, annot=True, fmt='.3f', 
            cmap='RdYlGn', center=0, square=True,
            linewidths=1, cbar_kws={'label': 'Correlation'})
plt.title('Correlation Matrix - Tips Dataset', fontsize=14, pad=20)
plt.show()

### 12.2 Statistical Visualizations

In [None]:
# Example: Analyzing iris dataset with multiple statistical views
print("Statistical Analysis - Iris Dataset")
print("=" * 50)

# Overview by species
print("\nMean values by species:")
print(iris.groupby('species').mean())

In [None]:
# Comprehensive visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# 1. Distribution comparison
sns.violinplot(data=iris, x='species', y='sepal_length', 
               ax=axes[0, 0], palette='Set2')
axes[0, 0].set_title('Sepal Length Distribution by Species')

# 2. Relationship with regression
sns.lmplot(data=iris, x='sepal_length', y='sepal_width', 
           hue='species', height=5, aspect=1.5)
plt.title('Sepal Dimensions with Regression Lines')

# 3. Multiple distributions
iris_melt = iris.melt(id_vars='species', var_name='measurement', value_name='value')
sns.boxplot(data=iris_melt, x='species', y='value', 
            hue='measurement', ax=axes[0, 1])
axes[0, 1].set_title('All Measurements by Species')
axes[0, 1].legend(loc='upper right', fontsize=8)

# 4. Density plots
for species in iris['species'].unique():
    subset = iris[iris['species'] == species]
    sns.kdeplot(data=subset, x='petal_length', ax=axes[1, 0], 
                label=species, fill=True, alpha=0.5)
axes[1, 0].set_title('Petal Length Density by Species')
axes[1, 0].legend()

# 5. Strip plot with box plot
sns.boxplot(data=iris, x='species', y='petal_width', 
            ax=axes[1, 1], palette='pastel')
sns.stripplot(data=iris, x='species', y='petal_width', 
              ax=axes[1, 1], color='black', alpha=0.3, size=3)
axes[1, 1].set_title('Petal Width: Box Plot + Strip Plot')

plt.tight_layout()
plt.show()

In [None]:
# Joint plot for detailed relationship analysis
g = sns.jointplot(data=iris, x='petal_length', y='petal_width', 
                  hue='species', height=8, ratio=4,
                  marginal_kws={'alpha': 0.5})
g.fig.suptitle('Petal Dimensions by Species', y=1.02)
plt.show()

## 13. Best Practices

Guidelines for effective data visualization with Seaborn.

### Best Practices for Seaborn Visualizations

**1. Choose the Right Plot Type**
- **Distribution**: Use `histplot`, `kdeplot`, or `displot`
- **Relationships**: Use `scatterplot`, `lineplot`, or `regplot`
- **Categorical**: Use `barplot`, `boxplot`, `violinplot`, or `countplot`
- **Comparisons**: Use `boxplot`, `violinplot`, or `stripplot`
- **Correlations**: Use `heatmap` or `pairplot`

**2. Color Usage**
- Use colorblind-friendly palettes (`colorblind` palette)
- Avoid too many colors (max 5-7 distinct colors)
- Use sequential palettes for ordered data
- Use diverging palettes for data with a meaningful center
- Maintain consistent color schemes across related plots

**3. Plot Clarity**
- Always include clear titles and axis labels
- Use appropriate figure sizes (larger for more data)
- Add legends when using hue parameter
- Avoid cluttered plots (use faceting for many categories)
- Use `sns.despine()` to remove unnecessary borders

**4. Data Preparation**
- Clean data before plotting (handle missing values)
- Use appropriate data types (categorical for categories)
- Consider data transformations for skewed distributions
- Aggregate data when appropriate (avoid overplotting)

**5. Customization**
- Set theme and context at the beginning
- Use consistent styling across all plots
- Customize when needed but keep it simple
- Save settings in variables for reuse

**6. Performance**
- Use sampling for very large datasets
- Choose efficient plot types (hexbin for dense data)
- Avoid unnecessary calculations in plot functions

**7. Interpretability**
- Include statistical information when relevant
- Show uncertainty (confidence intervals, error bars)
- Use annotations to highlight important points
- Order categories meaningfully (not just alphabetically)

In [None]:
# Example: Good vs Bad Visualization
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# BAD: Too much information, cluttered
sns.scatterplot(data=tips, x='total_bill', y='tip', 
                hue='day', style='time', size='size',
                sizes=(50, 400), ax=axes[0])
axes[0].set_title('Bad: Too Many Visual Encodings')

# GOOD: Clear, focused message
sns.regplot(data=tips, x='total_bill', y='tip', ax=axes[1],
            scatter_kws={'alpha': 0.5})
axes[1].set_title('Good: Clear Relationship Visualization')
axes[1].set_xlabel('Total Bill ($)')
axes[1].set_ylabel('Tip ($)')

plt.tight_layout()
plt.show()

In [None]:
# Example: Proper use of color and labels
# Set up proper styling
sns.set_style('whitegrid')
sns.set_context('notebook', font_scale=1.2)

plt.figure(figsize=(12, 6))

# Create informative plot
ax = sns.barplot(data=tips, x='day', y='total_bill', 
                 hue='time', palette='colorblind',
                 errorbar='ci')  # Show confidence intervals

# Proper labeling
plt.title('Average Bill Amount by Day and Time\n(with 95% Confidence Intervals)', 
          fontsize=14, pad=20)
plt.xlabel('Day of Week', fontsize=12)
plt.ylabel('Average Bill Amount ($)', fontsize=12)
plt.legend(title='Time of Day', fontsize=10)

# Add grid for easier reading
ax.grid(axis='y', alpha=0.3)

# Remove top and right spines
sns.despine()

plt.tight_layout()
plt.show()

# Reset to defaults
sns.set_style('darkgrid')
sns.set_context('notebook', font_scale=1)

## 14. Summary

### Key Takeaways

**What is Seaborn?**
- High-level statistical data visualization library built on Matplotlib
- Provides beautiful default styles and color palettes
- Designed to work seamlessly with Pandas DataFrames
- Simplifies complex statistical visualizations

**Main Plot Categories:**

1. **Statistical Plots**
   - `regplot`, `lmplot`: Scatter plots with regression
   - `histplot`, `kdeplot`, `displot`: Distribution visualizations
   - `boxplot`, `violinplot`: Distribution summaries
   - `pairplot`, `jointplot`: Multi-variable relationships

2. **Categorical Plots**
   - `barplot`, `countplot`: Aggregated category values
   - `pointplot`: Trends across categories
   - `stripplot`, `swarmplot`: Individual data points
   - Category-specific box/violin plots

3. **Matrix Plots**
   - `heatmap`: Visualize matrices and correlations
   - `clustermap`: Hierarchically clustered heatmaps

4. **Multi-plot Grids**
   - `FacetGrid`: Create grids of subplots by categories
   - `PairGrid`: Customize pair plots
   - `JointGrid`: Customize joint plots

**Customization Options:**
- **Color Palettes**: Built-in palettes (deep, pastel, colorblind) and custom colors
- **Styles**: darkgrid, whitegrid, dark, white, ticks
- **Contexts**: paper, notebook, talk, poster for different output formats
- **Themes**: Overall aesthetic settings with `set_theme()`

**Advantages of Seaborn:**
- Less code for complex visualizations
- Automatic statistical calculations
- Beautiful defaults that work well for publications
- Easy integration with Pandas workflows
- Consistent API across plot types

**When to Use Seaborn:**
- Exploratory data analysis
- Statistical visualizations
- Working with structured DataFrame data
- Need for quick, publication-ready plots
- Analyzing relationships between variables

**Best Practices:**
- Choose appropriate plot types for your data
- Use colorblind-friendly palettes
- Always label axes and add titles
- Keep plots simple and focused
- Use faceting for complex comparisons
- Show uncertainty when relevant
- Clean and prepare data before plotting

**Integration Tips:**
- Seaborn works alongside Matplotlib (not instead of it)
- Can use Matplotlib functions for fine-tuning
- Pass `ax` parameter to integrate with Matplotlib subplots
- Use Pandas methods to prepare data for Seaborn

### Next Steps

- Practice with different datasets to understand plot types
- Experiment with color palettes and styles
- Combine multiple plot types for comprehensive analysis
- Learn to interpret statistical visualizations
- Explore advanced customization options
- Study effective data visualization principles

### Resources for Further Learning

- Official Seaborn Documentation: https://seaborn.pydata.org/
- Seaborn Tutorial: https://seaborn.pydata.org/tutorial.html
- Example Gallery: https://seaborn.pydata.org/examples/
- Data Visualization Best Practices
- Color Theory for Data Visualization

---

## Practice Exercises

**Exercise 1:** Load the 'diamonds' dataset and create a distribution plot of diamond prices.

**Exercise 2:** Create a correlation heatmap for all numeric variables in the 'mpg' dataset.

**Exercise 3:** Use FacetGrid to create scatter plots of sepal length vs width for each iris species.

**Exercise 4:** Create a violin plot comparing tip amounts by day, split by gender.

**Exercise 5:** Build a comprehensive EDA visualization for the 'titanic' dataset showing survival patterns.

**Exercise 6:** Create a custom color palette and apply it to a barplot showing average fare by passenger class.

**Exercise 7:** Use jointplot to explore the relationship between two variables with different plot kinds (scatter, hex, kde).

**Exercise 8:** Create a clustermap showing the correlation between different features in a dataset of your choice.

---

*Remember: The best way to learn Seaborn is through practice. Start with simple plots and gradually build up to more complex visualizations. Focus on telling clear stories with your data!*