# Chart Gallery - Data Analysis Platform

A comprehensive collection of chart types and visualization patterns for data analysis.

## Quick Reference Guide
- **Statistical Charts**: Distribution analysis, correlations, statistical tests
- **Business Charts**: KPIs, performance metrics, financial analysis
- **Comparison Charts**: Rankings, comparisons, benchmarking
- **Time Series**: Trends, forecasting, seasonal analysis
- **Geospatial**: Maps, regional analysis, location-based insights

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('default')
sns.set_palette("Set2")

# Generate sample data
np.random.seed(42)
n = 500

data = pd.DataFrame({
    'category': np.random.choice(['A', 'B', 'C', 'D', 'E'], n),
    'value': np.random.normal(100, 20, n),
    'date': pd.date_range('2023-01-01', periods=n, freq='D'),
    'group': np.random.choice(['Group1', 'Group2', 'Group3'], n),
    'x_coord': np.random.uniform(-5, 5, n),
    'y_coord': np.random.uniform(-5, 5, n),
    'size': np.random.exponential(2, n)
})

print("📊 Chart Gallery Data Loaded")
data.head()

## 1. Distribution Charts

In [None]:
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Distribution Analysis Charts', fontsize=16, fontweight='bold')

# Histogram
axes[0,0].hist(data['value'], bins=30, alpha=0.7, color='skyblue', edgecolor='black')
axes[0,0].set_title('Histogram')
axes[0,0].set_xlabel('Value')
axes[0,0].set_ylabel('Frequency')

# Box Plot
sns.boxplot(data=data, x='category', y='value', ax=axes[0,1])
axes[0,1].set_title('Box Plot by Category')

# Violin Plot
sns.violinplot(data=data, x='category', y='value', ax=axes[0,2])
axes[0,2].set_title('Violin Plot by Category')

# Density Plot
for group in data['group'].unique():
    subset = data[data['group'] == group]
    axes[1,0].hist(subset['value'], bins=20, alpha=0.5, label=group, density=True)
axes[1,0].set_title('Density Plot by Group')
axes[1,0].legend()

# Q-Q Plot
from scipy.stats import probplot
probplot(data['value'], dist="norm", plot=axes[1,1])
axes[1,1].set_title('Q-Q Plot (Normal Distribution)')

# ECDF (Empirical Cumulative Distribution)
sorted_data = np.sort(data['value'])
cumulative = np.arange(1, len(sorted_data) + 1) / len(sorted_data)
axes[1,2].plot(sorted_data, cumulative, marker='.', linestyle='none')
axes[1,2].set_title('Empirical CDF')
axes[1,2].set_xlabel('Value')
axes[1,2].set_ylabel('Cumulative Probability')

plt.tight_layout()
plt.show()

## 2. Comparison Charts

In [None]:
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Comparison Charts', fontsize=16, fontweight='bold')

# Bar Chart
category_means = data.groupby('category')['value'].mean()
bars = axes[0,0].bar(category_means.index, category_means.values, color='lightcoral')
axes[0,0].set_title('Bar Chart - Average Values')
axes[0,0].set_ylabel('Average Value')

# Add value labels
for bar in bars:
    height = bar.get_height()
    axes[0,0].text(bar.get_x() + bar.get_width()/2., height,
                   f'{height:.1f}', ha='center', va='bottom')

# Horizontal Bar Chart
group_counts = data['group'].value_counts()
axes[0,1].barh(group_counts.index, group_counts.values, color='lightgreen')
axes[0,1].set_title('Horizontal Bar Chart - Group Counts')
axes[0,1].set_xlabel('Count')

# Stacked Bar Chart
pivot_data = data.pivot_table(values='value', index='category', columns='group', aggfunc='count', fill_value=0)
pivot_data.plot(kind='bar', stacked=True, ax=axes[0,2])
axes[0,2].set_title('Stacked Bar Chart')
axes[0,2].tick_params(axis='x', rotation=45)

# Grouped Bar Chart
pivot_means = data.pivot_table(values='value', index='category', columns='group', aggfunc='mean')
pivot_means.plot(kind='bar', ax=axes[1,0])
axes[1,0].set_title('Grouped Bar Chart')
axes[1,0].tick_params(axis='x', rotation=45)

# Pie Chart
category_counts = data['category'].value_counts()
axes[1,1].pie(category_counts.values, labels=category_counts.index, autopct='%1.1f%%', startangle=90)
axes[1,1].set_title('Pie Chart - Category Distribution')

# Donut Chart
wedges, texts, autotexts = axes[1,2].pie(category_counts.values, labels=category_counts.index, 
                                          autopct='%1.1f%%', startangle=90, 
                                          wedgeprops=dict(width=0.5))
axes[1,2].set_title('Donut Chart - Category Distribution')

plt.tight_layout()
plt.show()

## 3. Relationship Charts

In [None]:
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Relationship Analysis Charts', fontsize=16, fontweight='bold')

# Scatter Plot
scatter = axes[0,0].scatter(data['x_coord'], data['y_coord'], 
                           c=data['value'], s=data['size']*10, 
                           alpha=0.6, cmap='viridis')
axes[0,0].set_title('Scatter Plot with Size & Color')
axes[0,0].set_xlabel('X Coordinate')
axes[0,0].set_ylabel('Y Coordinate')
plt.colorbar(scatter, ax=axes[0,0], label='Value')

# Regression Plot
sns.regplot(data=data, x='x_coord', y='value', ax=axes[0,1], scatter_kws={'alpha':0.5})
axes[0,1].set_title('Regression Plot')

# Correlation Heatmap
numeric_cols = ['value', 'x_coord', 'y_coord', 'size']
corr_matrix = data[numeric_cols].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0, ax=axes[0,2])
axes[0,2].set_title('Correlation Heatmap')

# Pair Plot (subset)
from pandas.plotting import scatter_matrix
sample_data = data[numeric_cols].sample(100)  # Sample for performance
pd.plotting.scatter_matrix(sample_data, ax=axes[1,:], figsize=(12, 4))
fig.suptitle('Pair Plot Matrix (Sample)', y=0.02)

plt.tight_layout()
plt.show()

## 4. Time Series Charts

In [None]:
# Generate time series data
dates = pd.date_range('2023-01-01', periods=365, freq='D')
trend = np.linspace(100, 120, 365)
seasonal = 10 * np.sin(2 * np.pi * np.arange(365) / 365.25 * 4)  # Quarterly seasonality
noise = np.random.normal(0, 5, 365)
ts_data = pd.DataFrame({
    'date': dates,
    'value': trend + seasonal + noise,
    'category': np.random.choice(['Product A', 'Product B', 'Product C'], 365)
})

fig, axes = plt.subplots(3, 2, figsize=(16, 14))
fig.suptitle('Time Series Analysis Charts', fontsize=16, fontweight='bold')

# Line Chart
axes[0,0].plot(ts_data['date'], ts_data['value'], linewidth=2, color='blue')
axes[0,0].set_title('Basic Time Series Line Chart')
axes[0,0].tick_params(axis='x', rotation=45)

# Multiple Line Chart
for category in ts_data['category'].unique():
    subset = ts_data[ts_data['category'] == category]
    monthly_avg = subset.groupby(subset['date'].dt.to_period('M'))['value'].mean()
    axes[0,1].plot(monthly_avg.index.to_timestamp(), monthly_avg.values, 
                   marker='o', label=category, linewidth=2)
axes[0,1].set_title('Multiple Time Series (Monthly Averages)')
axes[0,1].legend()
axes[0,1].tick_params(axis='x', rotation=45)

# Area Chart
monthly_data = ts_data.groupby([ts_data['date'].dt.to_period('M'), 'category'])['value'].sum().unstack(fill_value=0)
monthly_data.plot.area(ax=axes[1,0], alpha=0.7)
axes[1,0].set_title('Stacked Area Chart')
axes[1,0].tick_params(axis='x', rotation=45)

# Moving Average
ts_data['ma_7'] = ts_data['value'].rolling(window=7).mean()
ts_data['ma_30'] = ts_data['value'].rolling(window=30).mean()
axes[1,1].plot(ts_data['date'], ts_data['value'], alpha=0.3, label='Original', color='gray')
axes[1,1].plot(ts_data['date'], ts_data['ma_7'], label='7-day MA', linewidth=2)
axes[1,1].plot(ts_data['date'], ts_data['ma_30'], label='30-day MA', linewidth=2)
axes[1,1].set_title('Moving Averages')
axes[1,1].legend()
axes[1,1].tick_params(axis='x', rotation=45)

# Seasonal Decomposition (simplified)
monthly_avg = ts_data.groupby(ts_data['date'].dt.month)['value'].mean()
axes[2,0].bar(monthly_avg.index, monthly_avg.values, color='lightblue')
axes[2,0].set_title('Seasonal Pattern (Monthly Averages)')
axes[2,0].set_xlabel('Month')
axes[2,0].set_xticks(range(1, 13))

# Cumulative Sum
ts_data['cumsum'] = ts_data['value'].cumsum()
axes[2,1].plot(ts_data['date'], ts_data['cumsum'], linewidth=2, color='red')
axes[2,1].set_title('Cumulative Sum')
axes[2,1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

## 5. Interactive Plotly Charts

In [None]:
# Interactive Scatter Plot
fig1 = px.scatter(data, x='x_coord', y='y_coord', 
                 color='category', size='size',
                 hover_data=['value', 'group'],
                 title='Interactive Scatter Plot')
fig1.show()

# Interactive Time Series
fig2 = px.line(ts_data, x='date', y='value', 
              color='category',
              title='Interactive Time Series')
fig2.update_layout(xaxis_title='Date', yaxis_title='Value')
fig2.show()

# Interactive Bar Chart with Animation
monthly_category = ts_data.groupby([ts_data['date'].dt.to_period('M'), 'category'])['value'].sum().reset_index()
monthly_category['date'] = monthly_category['date'].astype(str)

fig3 = px.bar(monthly_category, x='category', y='value', 
             animation_frame='date',
             title='Animated Bar Chart by Month')
fig3.show()

print("🎯 Interactive features available:")
print("- Hover for details")
print("- Zoom and pan")
print("- Click legend to filter")
print("- Animation controls")

## 6. Statistical Charts

In [None]:
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Statistical Analysis Charts', fontsize=16, fontweight='bold')

# Error Bars
category_stats = data.groupby('category')['value'].agg(['mean', 'std']).reset_index()
axes[0,0].bar(category_stats['category'], category_stats['mean'], 
              yerr=category_stats['std'], capsize=5, alpha=0.7)
axes[0,0].set_title('Bar Chart with Error Bars')
axes[0,0].set_ylabel('Value (Mean ± Std)')

# Confidence Intervals
from scipy import stats
x_vals = np.linspace(data['x_coord'].min(), data['x_coord'].max(), 100)
slope, intercept, r_value, p_value, std_err = stats.linregress(data['x_coord'], data['value'])
y_pred = slope * x_vals + intercept
axes[0,1].scatter(data['x_coord'], data['value'], alpha=0.5)
axes[0,1].plot(x_vals, y_pred, 'r-', linewidth=2, label=f'R² = {r_value**2:.3f}')
axes[0,1].set_title('Regression with Confidence')
axes[0,1].legend()

# Residual Plot
y_pred_actual = slope * data['x_coord'] + intercept
residuals = data['value'] - y_pred_actual
axes[0,2].scatter(y_pred_actual, residuals, alpha=0.5)
axes[0,2].axhline(y=0, color='r', linestyle='--')
axes[0,2].set_title('Residual Plot')
axes[0,2].set_xlabel('Predicted Values')
axes[0,2].set_ylabel('Residuals')

# Strip Plot
sns.stripplot(data=data, x='category', y='value', size=4, alpha=0.7, ax=axes[1,0])
axes[1,0].set_title('Strip Plot')

# Swarm Plot
sample_data = data.sample(200)  # Sample for performance
sns.swarmplot(data=sample_data, x='category', y='value', ax=axes[1,1])
axes[1,1].set_title('Swarm Plot (Sample)')

# Ridge Plot (approximated with multiple density plots)
categories = data['category'].unique()
for i, cat in enumerate(categories):
    cat_data = data[data['category'] == cat]['value']
    kde = stats.gaussian_kde(cat_data)
    x_range = np.linspace(cat_data.min(), cat_data.max(), 100)
    density = kde(x_range)
    axes[1,2].fill_between(x_range, i, i + density * 0.5, alpha=0.7, label=cat)
axes[1,2].set_title('Ridge Plot (Density by Category)')
axes[1,2].set_xlabel('Value')
axes[1,2].set_ylabel('Category')
axes[1,2].set_yticks(range(len(categories)))
axes[1,2].set_yticklabels(categories)

plt.tight_layout()
plt.show()

## 7. Advanced Visualization Patterns

In [None]:
# Subplot with different chart types
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=('Histogram', 'Box Plot', 'Scatter Plot', 'Line Chart'),
    specs=[[{"type": "histogram"}, {"type": "box"}],
           [{"type": "scatter"}, {"type": "scatter"}]]
)

# Add histogram
fig.add_trace(
    go.Histogram(x=data['value'], name='Value Distribution'),
    row=1, col=1
)

# Add box plot
for category in data['category'].unique():
    cat_data = data[data['category'] == category]
    fig.add_trace(
        go.Box(y=cat_data['value'], name=category),
        row=1, col=2
    )

# Add scatter plot
fig.add_trace(
    go.Scatter(x=data['x_coord'], y=data['y_coord'], 
              mode='markers',
              marker=dict(size=data['size']*3, 
                         color=data['value'],
                         colorscale='Viridis',
                         showscale=True),
              name='Scatter'),
    row=2, col=1
)

# Add time series
monthly_ts = ts_data.groupby(ts_data['date'].dt.to_period('M'))['value'].mean()
fig.add_trace(
    go.Scatter(x=monthly_ts.index.to_timestamp(), y=monthly_ts.values,
              mode='lines+markers', name='Monthly Trend'),
    row=2, col=2
)

fig.update_layout(height=600, showlegend=True, 
                 title_text="Multi-Chart Dashboard")
fig.show()

print("\n📊 Chart Gallery Summary")
print("=" * 30)
print("✅ Distribution Charts: Histogram, Box, Violin, Density")
print("✅ Comparison Charts: Bar, Pie, Donut, Stacked")
print("✅ Relationship Charts: Scatter, Regression, Correlation")
print("✅ Time Series Charts: Line, Area, Moving Average")
print("✅ Interactive Charts: Plotly with hover and zoom")
print("✅ Statistical Charts: Error bars, Confidence intervals")
print("✅ Advanced Patterns: Multi-chart dashboards")

## Chart Selection Guide

### When to Use Each Chart Type:

#### **Distribution Analysis**
- **Histogram**: Show frequency distribution of a single variable
- **Box Plot**: Compare distributions across categories, identify outliers
- **Violin Plot**: Show distribution shape and density
- **Density Plot**: Smooth distribution curves, good for overlapping groups

#### **Comparisons**
- **Bar Chart**: Compare values across categories
- **Horizontal Bar**: When category names are long
- **Stacked Bar**: Show part-to-whole relationships
- **Pie Chart**: Show proportions (use sparingly, max 5-6 categories)

#### **Relationships**
- **Scatter Plot**: Show correlation between two variables
- **Regression Plot**: Show trend line and confidence intervals
- **Correlation Heatmap**: Show relationships between multiple variables
- **Bubble Chart**: Add third dimension with bubble size

#### **Time Series**
- **Line Chart**: Show trends over time
- **Area Chart**: Show cumulative values or stacked time series
- **Moving Average**: Smooth out noise in time series
- **Seasonal Plot**: Identify recurring patterns

#### **Best Practices**
1. **Choose the right chart** for your data type and message
2. **Keep it simple** - avoid chart junk
3. **Use consistent colors** and styling
4. **Add clear labels** and titles
5. **Consider your audience** - interactive for exploration, static for reports
6. **Test on different devices** if web-based