# Statistical Data Analysis with Seaborn

**Author:** RSK World  
**Website:** https://rskworld.in  
**Email:** help@rskworld.in  
**Phone:** +91 93305 39277  
**Project:** Statistical Data Analysis with Seaborn  

This notebook demonstrates advanced statistical visualization techniques using Seaborn for comprehensive data analysis and exploration.


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Set style for better-looking plots
sns.set_style("whitegrid")
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 10

print("Libraries imported successfully!")


## 1. Data Generation and Preparation


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Generate synthetic dataset for demonstration
np.random.seed(42)
n_samples = 1000

# Create a dataset with multiple variables
data = {
    'Age': np.random.normal(35, 10, n_samples),
    'Income': np.random.normal(50000, 15000, n_samples),
    'Spending': np.random.normal(3000, 800, n_samples),
    'Satisfaction': np.random.normal(7.5, 1.5, n_samples),
    'Experience': np.random.normal(5, 3, n_samples),
    'Category': np.random.choice(['A', 'B', 'C'], n_samples)
}

# Add some correlations
data['Income'] = data['Income'] + data['Age'] * 500
data['Spending'] = data['Spending'] + data['Income'] * 0.05 + np.random.normal(0, 200, n_samples)
data['Satisfaction'] = np.clip(data['Satisfaction'] + (data['Income'] - 50000) / 20000, 1, 10)

df = pd.DataFrame(data)

# Ensure positive values
df['Age'] = np.abs(df['Age'])
df['Income'] = np.abs(df['Income'])
df['Spending'] = np.abs(df['Spending'])
df['Experience'] = np.abs(df['Experience'])

print("Dataset created successfully!")
print(f"Shape: {df.shape}")
print("\nFirst few rows:")
print(df.head())
print("\nDataset Info:")
print(df.info())
print("\nStatistical Summary:")
print(df.describe())


## 2. Correlation Matrix Heatmap


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Calculate correlation matrix
numeric_cols = df.select_dtypes(include=[np.number]).columns
correlation_matrix = df[numeric_cols].corr()

# Create heatmap
plt.figure(figsize=(12, 10))
sns.heatmap(correlation_matrix, 
            annot=True, 
            cmap='coolwarm', 
            center=0,
            square=True,
            linewidths=1,
            cbar_kws={"shrink": 0.8},
            fmt='.2f')
plt.title('Correlation Matrix Heatmap\nStatistical Data Analysis with Seaborn - RSK World', 
          fontsize=16, fontweight='bold', pad=20)
plt.tight_layout()
plt.savefig('correlation_heatmap.png', dpi=300, bbox_inches='tight')
plt.show()


## 3. Distribution and Density Plots


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Distribution plots for numeric variables
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Distribution and Density Plots\nStatistical Data Analysis with Seaborn - RSK World', 
             fontsize=16, fontweight='bold', y=1.02)

variables = ['Age', 'Income', 'Spending', 'Satisfaction']
axes_flat = axes.flatten()

for idx, var in enumerate(variables):
    # Histogram with KDE
    sns.histplot(data=df, x=var, kde=True, ax=axes_flat[idx], bins=30)
    axes_flat[idx].set_title(f'Distribution of {var}', fontweight='bold')
    axes_flat[idx].set_xlabel(var)
    axes_flat[idx].set_ylabel('Frequency')
    axes_flat[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('distribution_plots.png', dpi=300, bbox_inches='tight')
plt.show()


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Density plots by category
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Density Plots by Category\nStatistical Data Analysis with Seaborn - RSK World', 
             fontsize=16, fontweight='bold', y=1.02)

variables = ['Age', 'Income', 'Spending', 'Satisfaction']
axes_flat = axes.flatten()

for idx, var in enumerate(variables):
    for category in df['Category'].unique():
        subset = df[df['Category'] == category]
        sns.kdeplot(data=subset, x=var, label=category, ax=axes_flat[idx], fill=True, alpha=0.6)
    axes_flat[idx].set_title(f'Density Plot of {var} by Category', fontweight='bold')
    axes_flat[idx].set_xlabel(var)
    axes_flat[idx].set_ylabel('Density')
    axes_flat[idx].legend(title='Category')
    axes_flat[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('density_plots.png', dpi=300, bbox_inches='tight')
plt.show()


## 4. Box Plots


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Box plots for numeric variables by category
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Box Plots by Category\nStatistical Data Analysis with Seaborn - RSK World', 
             fontsize=16, fontweight='bold', y=1.02)

variables = ['Age', 'Income', 'Spending', 'Satisfaction']
axes_flat = axes.flatten()

for idx, var in enumerate(variables):
    sns.boxplot(data=df, x='Category', y=var, ax=axes_flat[idx])
    axes_flat[idx].set_title(f'Box Plot of {var} by Category', fontweight='bold')
    axes_flat[idx].set_xlabel('Category')
    axes_flat[idx].set_ylabel(var)
    axes_flat[idx].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('box_plots.png', dpi=300, bbox_inches='tight')
plt.show()


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Horizontal box plot for all numeric variables
plt.figure(figsize=(12, 8))
numeric_data = df[numeric_cols].melt()
sns.boxplot(data=numeric_data, y='variable', x='value', orient='h')
plt.title('Box Plots for All Numeric Variables\nStatistical Data Analysis with Seaborn - RSK World', 
          fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Value')
plt.ylabel('Variable')
plt.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.savefig('box_plots_all.png', dpi=300, bbox_inches='tight')
plt.show()


## 5. Violin Plots


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Violin plots for numeric variables by category
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Violin Plots by Category\nStatistical Data Analysis with Seaborn - RSK World', 
             fontsize=16, fontweight='bold', y=1.02)

variables = ['Age', 'Income', 'Spending', 'Satisfaction']
axes_flat = axes.flatten()

for idx, var in enumerate(variables):
    sns.violinplot(data=df, x='Category', y=var, ax=axes_flat[idx], inner='box')
    axes_flat[idx].set_title(f'Violin Plot of {var} by Category', fontweight='bold')
    axes_flat[idx].set_xlabel('Category')
    axes_flat[idx].set_ylabel(var)
    axes_flat[idx].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('violin_plots.png', dpi=300, bbox_inches='tight')
plt.show()


## 6. Pair Plots for Multivariate Analysis


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Pair plot with hue by category
pair_plot_vars = ['Age', 'Income', 'Spending', 'Satisfaction']
pair_plot = sns.pairplot(df[pair_plot_vars + ['Category']], 
                         hue='Category',
                         diag_kind='kde',
                         plot_kws={'alpha': 0.6},
                         height=2.5)
pair_plot.fig.suptitle('Pair Plot for Multivariate Analysis\nStatistical Data Analysis with Seaborn - RSK World', 
                       fontsize=16, fontweight='bold', y=1.02)
plt.savefig('pair_plot.png', dpi=300, bbox_inches='tight')
plt.show()


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Pair plot with correlation coefficients
pair_plot_corr = sns.pairplot(df[pair_plot_vars], 
                              kind='reg',
                              diag_kind='kde',
                              plot_kws={'scatter_kws': {'alpha': 0.5}},
                              height=2.5)
pair_plot_corr.fig.suptitle('Pair Plot with Regression Lines\nStatistical Data Analysis with Seaborn - RSK World', 
                            fontsize=16, fontweight='bold', y=1.02)
plt.savefig('pair_plot_regression.png', dpi=300, bbox_inches='tight')
plt.show()


## 7. Statistical Summary Visualizations


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Statistical summary bar plot
summary_stats = df[numeric_cols].describe().T
summary_stats = summary_stats[['mean', 'std', 'min', 'max']]

fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Statistical Summary Visualizations\nStatistical Data Analysis with Seaborn - RSK World', 
             fontsize=16, fontweight='bold', y=1.02)

stats_to_plot = ['mean', 'std', 'min', 'max']
axes_flat = axes.flatten()

for idx, stat in enumerate(stats_to_plot):
    sns.barplot(data=summary_stats.reset_index(), 
                x='index', 
                y=stat, 
                ax=axes_flat[idx])
    axes_flat[idx].set_title(f'{stat.upper()} by Variable', fontweight='bold')
    axes_flat[idx].set_xlabel('Variable')
    axes_flat[idx].set_ylabel(stat.upper())
    axes_flat[idx].tick_params(axis='x', rotation=45)
    axes_flat[idx].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('statistical_summary.png', dpi=300, bbox_inches='tight')
plt.show()


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Heatmap of statistical summary
plt.figure(figsize=(10, 6))
summary_heatmap = df[numeric_cols].describe().T[['mean', 'std', 'min', 'max', '25%', '50%', '75%']]
sns.heatmap(summary_heatmap, 
            annot=True, 
            cmap='YlOrRd', 
            fmt='.2f',
            cbar_kws={"label": "Value"})
plt.title('Statistical Summary Heatmap\nStatistical Data Analysis with Seaborn - RSK World', 
          fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Statistical Measure')
plt.ylabel('Variable')
plt.tight_layout()
plt.savefig('summary_heatmap.png', dpi=300, bbox_inches='tight')
plt.show()


## 8. Advanced Statistical Visualizations


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Joint plot for bivariate analysis
fig, axes = plt.subplots(2, 2, figsize=(16, 14))
fig.suptitle('Joint Plots for Bivariate Analysis\nStatistical Data Analysis with Seaborn - RSK World', 
             fontsize=16, fontweight='bold', y=0.995)

# Create joint plots manually using subplots
from scipy.stats import pearsonr

pairs = [('Age', 'Income'), ('Income', 'Spending'), ('Spending', 'Satisfaction'), ('Age', 'Satisfaction')]
axes_flat = axes.flatten()

for idx, (x_var, y_var) in enumerate(pairs):
    # Scatter plot
    axes_flat[idx].scatter(df[x_var], df[y_var], alpha=0.5, s=20)
    
    # Add regression line
    z = np.polyfit(df[x_var], df[y_var], 1)
    p = np.poly1d(z)
    axes_flat[idx].plot(df[x_var], p(df[x_var]), "r--", alpha=0.8, linewidth=2)
    
    # Calculate correlation
    corr, p_value = pearsonr(df[x_var], df[y_var])
    axes_flat[idx].set_title(f'{x_var} vs {y_var}\nCorrelation: {corr:.3f}', fontweight='bold')
    axes_flat[idx].set_xlabel(x_var)
    axes_flat[idx].set_ylabel(y_var)
    axes_flat[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('joint_plots.png', dpi=300, bbox_inches='tight')
plt.show()


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Strip plot with swarm overlay
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Strip and Swarm Plots\nStatistical Data Analysis with Seaborn - RSK World', 
             fontsize=16, fontweight='bold', y=1.02)

variables = ['Age', 'Income', 'Spending', 'Satisfaction']
axes_flat = axes.flatten()

for idx, var in enumerate(variables):
    sns.stripplot(data=df, x='Category', y=var, ax=axes_flat[idx], alpha=0.5, size=3)
    sns.swarmplot(data=df, x='Category', y=var, ax=axes_flat[idx], color='black', size=2, alpha=0.3)
    axes_flat[idx].set_title(f'Strip Plot of {var} by Category', fontweight='bold')
    axes_flat[idx].set_xlabel('Category')
    axes_flat[idx].set_ylabel(var)
    axes_flat[idx].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('strip_swarm_plots.png', dpi=300, bbox_inches='tight')
plt.show()


## 9. Statistical Tests and Insights


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Print statistical insights
print("=" * 60)
print("STATISTICAL INSIGHTS")
print("=" * 60)
print("\n1. CORRELATION ANALYSIS:")
print(correlation_matrix)

print("\n\n2. DESCRIPTIVE STATISTICS:")
print(df[numeric_cols].describe())

print("\n\n3. CATEGORY-WISE STATISTICS:")
for category in df['Category'].unique():
    print(f"\nCategory {category}:")
    print(df[df['Category'] == category][numeric_cols].describe())

print("\n\n4. SKEWNESS AND KURTOSIS:")
for col in numeric_cols:
    skewness = stats.skew(df[col])
    kurtosis = stats.kurtosis(df[col])
    print(f"{col}: Skewness = {skewness:.3f}, Kurtosis = {kurtosis:.3f}")

print("\n" + "=" * 60)
print("Analysis completed successfully!")
print("=" * 60)


## 10. Loading Example Dataset


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Load example dataset
try:
    example_df = pd.read_csv('example_data.csv', parse_dates=['Date'])
    print("Example dataset loaded successfully!")
    print(f"Shape: {example_df.shape}")
    print("\nColumns:", list(example_df.columns))
    print("\nFirst few rows:")
    print(example_df.head())
    print("\nDataset Info:")
    print(example_df.info())
except FileNotFoundError:
    print("Example dataset not found. Using generated data.")
    example_df = df.copy()


## 11. Categorical Plots


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Count plots for categorical variables
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Categorical Count Plots\nStatistical Data Analysis with Seaborn - RSK World', 
             fontsize=16, fontweight='bold', y=1.02)

categorical_vars = ['Gender', 'Region', 'Education', 'Category']
axes_flat = axes.flatten()

for idx, var in enumerate(categorical_vars):
    if var in example_df.columns:
        sns.countplot(data=example_df, x=var, ax=axes_flat[idx], palette='Set2')
        axes_flat[idx].set_title(f'Count Plot of {var}', fontweight='bold')
        axes_flat[idx].set_xlabel(var)
        axes_flat[idx].set_ylabel('Count')
        axes_flat[idx].tick_params(axis='x', rotation=45)
        axes_flat[idx].grid(True, alpha=0.3, axis='y')
    else:
        # Use Category from original df if example_df doesn't have the variable
        if var == 'Category':
            sns.countplot(data=df, x=var, ax=axes_flat[idx], palette='Set2')
            axes_flat[idx].set_title(f'Count Plot of {var}', fontweight='bold')
            axes_flat[idx].set_xlabel(var)
            axes_flat[idx].set_ylabel('Count')
            axes_flat[idx].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('categorical_count_plots.png', dpi=300, bbox_inches='tight')
plt.show()


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Bar plots with error bars
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Bar Plots with Error Bars\nStatistical Data Analysis with Seaborn - RSK World', 
             fontsize=16, fontweight='bold', y=1.02)

if 'Gender' in example_df.columns:
    # Average income by gender
    sns.barplot(data=example_df, x='Gender', y='Income', ax=axes[0, 0], palette='viridis', ci='sd')
    axes[0, 0].set_title('Average Income by Gender', fontweight='bold')
    axes[0, 0].set_ylabel('Income')
    axes[0, 0].grid(True, alpha=0.3, axis='y')
    
    # Average spending by region
    sns.barplot(data=example_df, x='Region', y='Spending', ax=axes[0, 1], palette='mako', ci='sd')
    axes[0, 1].set_title('Average Spending by Region', fontweight='bold')
    axes[0, 1].set_ylabel('Spending')
    axes[0, 1].grid(True, alpha=0.3, axis='y')
    
    # Average satisfaction by education
    sns.barplot(data=example_df, x='Education', y='Satisfaction', ax=axes[1, 0], palette='rocket', ci='sd')
    axes[1, 0].set_title('Average Satisfaction by Education', fontweight='bold')
    axes[1, 0].set_ylabel('Satisfaction')
    axes[1, 0].tick_params(axis='x', rotation=45)
    axes[1, 0].grid(True, alpha=0.3, axis='y')
    
    # Average credit score by category
    sns.barplot(data=example_df, x='Category', y='Credit_Score', ax=axes[1, 1], palette='flare', ci='sd')
    axes[1, 1].set_title('Average Credit Score by Category', fontweight='bold')
    axes[1, 1].set_ylabel('Credit Score')
    axes[1, 1].tick_params(axis='x', rotation=45)
    axes[1, 1].grid(True, alpha=0.3, axis='y')
else:
    # Use original df
    sns.barplot(data=df, x='Category', y='Income', ax=axes[0, 0], palette='viridis', ci='sd')
    axes[0, 0].set_title('Average Income by Category', fontweight='bold')
    axes[0, 0].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('bar_plots_error_bars.png', dpi=300, bbox_inches='tight')
plt.show()


## 12. Regression Plots with Confidence Intervals


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Regression plots with confidence intervals
fig, axes = plt.subplots(2, 2, figsize=(16, 14))
fig.suptitle('Regression Plots with Confidence Intervals\nStatistical Data Analysis with Seaborn - RSK World', 
             fontsize=16, fontweight='bold', y=0.995)

data_to_use = example_df if 'Gender' in example_df.columns else df

# Age vs Income
sns.regplot(data=data_to_use, x='Age', y='Income', ax=axes[0, 0], 
           scatter_kws={'alpha': 0.5, 's': 20}, 
           line_kws={'color': 'red', 'linewidth': 2})
axes[0, 0].set_title('Age vs Income (with 95% CI)', fontweight='bold')
axes[0, 0].grid(True, alpha=0.3)

# Income vs Spending
sns.regplot(data=data_to_use, x='Income', y='Spending', ax=axes[0, 1],
           scatter_kws={'alpha': 0.5, 's': 20},
           line_kws={'color': 'red', 'linewidth': 2})
axes[0, 1].set_title('Income vs Spending (with 95% CI)', fontweight='bold')
axes[0, 1].grid(True, alpha=0.3)

# Spending vs Satisfaction
sns.regplot(data=data_to_use, x='Spending', y='Satisfaction', ax=axes[1, 0],
           scatter_kws={'alpha': 0.5, 's': 20},
           line_kws={'color': 'red', 'linewidth': 2})
axes[1, 0].set_title('Spending vs Satisfaction (with 95% CI)', fontweight='bold')
axes[1, 0].grid(True, alpha=0.3)

# Experience vs Income
if 'Experience' in data_to_use.columns:
    sns.regplot(data=data_to_use, x='Experience', y='Income', ax=axes[1, 1],
               scatter_kws={'alpha': 0.5, 's': 20},
               line_kws={'color': 'red', 'linewidth': 2})
    axes[1, 1].set_title('Experience vs Income (with 95% CI)', fontweight='bold')
    axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('regression_plots.png', dpi=300, bbox_inches='tight')
plt.show()


## 13. Facet Grid Visualizations


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Facet grid - Distribution plots by category
data_to_use = example_df if 'Gender' in example_df.columns else df
hue_var = 'Gender' if 'Gender' in data_to_use.columns else 'Category'

g = sns.FacetGrid(data_to_use, col=hue_var, col_wrap=3, height=4, aspect=1.2)
g.map(sns.histplot, 'Income', kde=True, bins=30)
g.fig.suptitle('Income Distribution by Category (Facet Grid)\nStatistical Data Analysis with Seaborn - RSK World', 
               fontsize=16, fontweight='bold', y=1.02)
g.set_axis_labels('Income', 'Frequency')
plt.tight_layout()
plt.savefig('facet_grid_distribution.png', dpi=300, bbox_inches='tight')
plt.show()


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Facet grid - Scatter plots
g2 = sns.FacetGrid(data_to_use, col=hue_var, row='Region' if 'Region' in data_to_use.columns else None, 
                   height=4, aspect=1.2, margin_titles=True)
g2.map(sns.scatterplot, 'Age', 'Income', alpha=0.6)
g2.fig.suptitle('Age vs Income by Category (Facet Grid)\nStatistical Data Analysis with Seaborn - RSK World', 
                fontsize=16, fontweight='bold', y=1.02)
g2.set_axis_labels('Age', 'Income')
plt.tight_layout()
plt.savefig('facet_grid_scatter.png', dpi=300, bbox_inches='tight')
plt.show()


## 14. Q-Q Plots for Normality Testing


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Q-Q plots for normality testing
from scipy.stats import probplot

fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Q-Q Plots for Normality Testing\nStatistical Data Analysis with Seaborn - RSK World', 
             fontsize=16, fontweight='bold', y=1.02)

variables_qq = ['Age', 'Income', 'Spending', 'Satisfaction']
axes_flat = axes.flatten()

for idx, var in enumerate(variables_qq):
    if var in data_to_use.columns:
        probplot(data_to_use[var].dropna(), dist="norm", plot=axes_flat[idx])
        axes_flat[idx].set_title(f'Q-Q Plot: {var}', fontweight='bold')
        axes_flat[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('qq_plots.png', dpi=300, bbox_inches='tight')
plt.show()


## 15. Clustermap for Hierarchical Clustering


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Clustermap for hierarchical clustering
numeric_cols_cluster = data_to_use.select_dtypes(include=[np.number]).columns.tolist()
if 'ID' in numeric_cols_cluster:
    numeric_cols_cluster.remove('ID')

# Sample data for clustermap (too many rows can be slow)
sample_data = data_to_use[numeric_cols_cluster].sample(min(100, len(data_to_use)), random_state=42)

# Standardize the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_data = pd.DataFrame(scaler.fit_transform(sample_data), 
                          columns=numeric_cols_cluster, 
                          index=sample_data.index)

# Create clustermap
g = sns.clustermap(scaled_data, 
                   cmap='coolwarm', 
                   center=0,
                   figsize=(12, 10),
                   cbar_kws={"label": "Standardized Value"})
g.fig.suptitle('Clustermap - Hierarchical Clustering\nStatistical Data Analysis with Seaborn - RSK World', 
               fontsize=16, fontweight='bold', y=1.02)
plt.savefig('clustermap.png', dpi=300, bbox_inches='tight')
plt.show()


## 16. Residual Analysis


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Residual analysis for regression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

fig, axes = plt.subplots(2, 2, figsize=(16, 14))
fig.suptitle('Residual Analysis Plots\nStatistical Data Analysis with Seaborn - RSK World', 
             fontsize=16, fontweight='bold', y=0.995)

# Fit linear regression: Income ~ Age
X = data_to_use[['Age']].values
y = data_to_use['Income'].values
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)
residuals = y - y_pred

# Residuals vs Fitted
axes[0, 0].scatter(y_pred, residuals, alpha=0.5)
axes[0, 0].axhline(y=0, color='r', linestyle='--', linewidth=2)
axes[0, 0].set_xlabel('Fitted Values')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].set_title('Residuals vs Fitted Values', fontweight='bold')
axes[0, 0].grid(True, alpha=0.3)

# Q-Q plot of residuals
probplot(residuals, dist="norm", plot=axes[0, 1])
axes[0, 1].set_title('Q-Q Plot of Residuals', fontweight='bold')
axes[0, 1].grid(True, alpha=0.3)

# Scale-Location plot
standardized_residuals = residuals / np.std(residuals)
axes[1, 0].scatter(y_pred, np.sqrt(np.abs(standardized_residuals)), alpha=0.5)
axes[1, 0].set_xlabel('Fitted Values')
axes[1, 0].set_ylabel('√|Standardized Residuals|')
axes[1, 0].set_title('Scale-Location Plot', fontweight='bold')
axes[1, 0].grid(True, alpha=0.3)

# Residuals vs Leverage
axes[1, 1].scatter(range(len(residuals)), residuals, alpha=0.5)
axes[1, 1].axhline(y=0, color='r', linestyle='--', linewidth=2)
axes[1, 1].set_xlabel('Observation')
axes[1, 1].set_ylabel('Residuals')
axes[1, 1].set_title('Residuals vs Observation Order', fontweight='bold')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('residual_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

print(f"R² Score: {r2_score(y, y_pred):.4f}")
print(f"Mean Residual: {np.mean(residuals):.4f}")
print(f"Std Residual: {np.std(residuals):.4f}")


## 17. Ridge Plots (Joy Plots)


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Ridge plots (Joy plots) - overlapping density plots
try:
    from scipy import stats as scipy_stats
    
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    fig.suptitle('Ridge Plots (Joy Plots)\nStatistical Data Analysis with Seaborn - RSK World', 
                 fontsize=16, fontweight='bold', y=1.02)
    
    variables_ridge = ['Income', 'Spending', 'Satisfaction', 'Age']
    category_var = 'Category' if 'Category' in data_to_use.columns else None
    
    if category_var and category_var in data_to_use.columns:
        categories = data_to_use[category_var].unique()
        colors = plt.cm.viridis(np.linspace(0, 1, len(categories)))
        
        for idx, var in enumerate(variables_ridge):
            if var in data_to_use.columns:
                ax = axes.flatten()[idx]
                for i, cat in enumerate(categories):
                    subset = data_to_use[data_to_use[category_var] == cat][var].dropna()
                    if len(subset) > 0:
                        density = scipy_stats.gaussian_kde(subset)
                        xs = np.linspace(subset.min(), subset.max(), 200)
                        ys = density(xs)
                        ax.fill_between(xs, ys, alpha=0.5, label=cat, color=colors[i])
                        ax.plot(xs, ys, color=colors[i], linewidth=2)
                ax.set_title(f'Ridge Plot: {var}', fontweight='bold')
                ax.set_xlabel(var)
                ax.set_ylabel('Density')
                ax.legend(title=category_var)
                ax.grid(True, alpha=0.3)
    else:
        # Simple ridge plot without categories
        for idx, var in enumerate(variables_ridge[:2]):
            if var in data_to_use.columns:
                ax = axes.flatten()[idx]
                density = scipy_stats.gaussian_kde(data_to_use[var].dropna())
                xs = np.linspace(data_to_use[var].min(), data_to_use[var].max(), 200)
                ys = density(xs)
                ax.fill_between(xs, ys, alpha=0.6)
                ax.plot(xs, ys, linewidth=2)
                ax.set_title(f'Density Plot: {var}', fontweight='bold')
                ax.set_xlabel(var)
                ax.set_ylabel('Density')
                ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('ridge_plots.png', dpi=300, bbox_inches='tight')
    plt.show()
except Exception as e:
    print(f"Ridge plot creation skipped: {e}")


## 18. Time Series Analysis (if Date column exists)


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Time series analysis
if 'Date' in example_df.columns:
    # Aggregate data by month
    example_df['YearMonth'] = example_df['Date'].dt.to_period('M')
    monthly_data = example_df.groupby('YearMonth').agg({
        'Income': 'mean',
        'Spending': 'mean',
        'Satisfaction': 'mean'
    }).reset_index()
    monthly_data['YearMonth'] = monthly_data['YearMonth'].astype(str)
    
    fig, axes = plt.subplots(3, 1, figsize=(15, 12))
    fig.suptitle('Time Series Analysis\nStatistical Data Analysis with Seaborn - RSK World', 
                 fontsize=16, fontweight='bold', y=0.995)
    
    variables_ts = ['Income', 'Spending', 'Satisfaction']
    
    for idx, var in enumerate(variables_ts):
        axes[idx].plot(monthly_data['YearMonth'], monthly_data[var], marker='o', linewidth=2, markersize=6)
        axes[idx].set_title(f'{var} Over Time', fontweight='bold')
        axes[idx].set_xlabel('Month')
        axes[idx].set_ylabel(var)
        axes[idx].tick_params(axis='x', rotation=45)
        axes[idx].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('time_series_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()
else:
    print("Date column not found. Skipping time series analysis.")


## 19. Heatmap with Annotations


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277
# Project: Statistical Data Analysis with Seaborn

# Advanced heatmap with custom annotations
if 'Gender' in example_df.columns and 'Region' in example_df.columns:
    # Create pivot table for heatmap
    pivot_income = example_df.pivot_table(values='Income', index='Region', columns='Gender', aggfunc='mean')
    
    plt.figure(figsize=(10, 6))
    sns.heatmap(pivot_income, 
                annot=True, 
                fmt='.0f',
                cmap='YlOrRd',
                cbar_kws={"label": "Average Income"},
                linewidths=1,
                linecolor='white')
    plt.title('Average Income Heatmap: Region vs Gender\nStatistical Data Analysis with Seaborn - RSK World', 
              fontsize=16, fontweight='bold', pad=20)
    plt.xlabel('Gender')
    plt.ylabel('Region')
    plt.tight_layout()
    plt.savefig('heatmap_pivot.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    # Another heatmap for satisfaction
    pivot_satisfaction = example_df.pivot_table(values='Satisfaction', index='Education', columns='Category', aggfunc='mean')
    
    plt.figure(figsize=(10, 6))
    sns.heatmap(pivot_satisfaction, 
                annot=True, 
                fmt='.2f',
                cmap='RdYlBu_r',
                cbar_kws={"label": "Average Satisfaction"},
                linewidths=1,
                linecolor='white')
    plt.title('Average Satisfaction Heatmap: Education vs Category\nStatistical Data Analysis with Seaborn - RSK World', 
              fontsize=16, fontweight='bold', pad=20)
    plt.xlabel('Category')
    plt.ylabel('Education')
    plt.tight_layout()
    plt.savefig('heatmap_satisfaction.png', dpi=300, bbox_inches='tight')
    plt.show()
else:
    print("Required columns not found for pivot heatmaps.")


## Summary

This notebook demonstrates comprehensive statistical visualization techniques using Seaborn:

1. **Correlation Matrix Heatmaps** - Understanding relationships between variables
2. **Distribution and Density Plots** - Analyzing data distributions
3. **Box Plots** - Identifying quartiles and outliers
4. **Violin Plots** - Combining distribution and box plot information
5. **Pair Plots** - Multivariate analysis across all variable pairs
6. **Statistical Summary Visualizations** - Comprehensive statistical overview
7. **Advanced Visualizations** - Joint plots, strip plots, and swarm plots
8. **Categorical Plots** - Count plots and bar plots with error bars
9. **Regression Plots** - With confidence intervals
10. **Facet Grids** - Multi-panel visualizations
11. **Q-Q Plots** - Normality testing
12. **Clustermap** - Hierarchical clustering visualization
13. **Residual Analysis** - Regression diagnostics
14. **Ridge Plots** - Overlapping density distributions
15. **Time Series Analysis** - Temporal data visualization
16. **Advanced Heatmaps** - Pivot table visualizations

---

**Project by:** RSK World  
**Website:** https://rskworld.in  
**Email:** help@rskworld.in  
**Phone:** +91 93305 39277


## Summary

This notebook demonstrates comprehensive statistical visualization techniques using Seaborn:

1. **Correlation Matrix Heatmaps** - Understanding relationships between variables
2. **Distribution and Density Plots** - Analyzing data distributions
3. **Box Plots** - Identifying quartiles and outliers
4. **Violin Plots** - Combining distribution and box plot information
5. **Pair Plots** - Multivariate analysis across all variable pairs
6. **Statistical Summary Visualizations** - Comprehensive statistical overview
7. **Advanced Visualizations** - Joint plots, strip plots, and swarm plots

---

**Project by:** RSK World  
**Website:** https://rskworld.in  
**Email:** help@rskworld.in  
**Phone:** +91 93305 39277
