# PlotSmith: Complete Showcase üé®

**PlotSmith** is a production-ready, layered plotting library for ML models with strict architectural boundaries. This notebook showcases the full power and elegance of PlotSmith.

## Why PlotSmith?

‚ú® **4-Layer Architecture** - Clean separation of concerns  
üéØ **Minimalist Styling** - Publication-ready plots out of the box  
üîí **Type-Safe** - Full type hints and immutable data structures  
üìä **20+ Chart Types** - From time series to specialized visualizations  
ü§ñ **ML-Focused** - Built for model evaluation and analysis  
‚ö° **Production Ready** - Comprehensive tests, robust error handling, complete docs

Let's explore what makes PlotSmith special!


In [None]:
# Import everything we need
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# PlotSmith - The complete plotting toolkit
from plotsmith import (
    # Time Series & ML Analysis
    plot_timeseries,
    plot_backtest,
    plot_residuals,
    plot_model_comparison,
    plot_forecast_comparison,
    
    # Statistical Plots
    plot_histogram,
    plot_box,
    plot_violin,
    plot_scatter,
    
    # Categorical Charts
    plot_bar,
    plot_lollipop,
    plot_dumbbell,
    plot_range,
    
    # Specialized Charts
    plot_waterfall,
    plot_waffle,
    plot_slope,
    plot_metric,
    
    # Correlation & Heatmaps
    plot_heatmap,
    plot_correlation,
    
    # Composition
    figure,
    small_multiples,
    
    # Styling Helpers
    minimal_axes,
    event_line,
    direct_label,
    ACCENT,
)

# Set random seed for reproducibility
np.random.seed(42)

print("üöÄ PlotSmith loaded successfully!")
print(f"üì¶ NumPy: {np.__version__}")
print(f"üêº Pandas: {pd.__version__}")
print("\n‚ú® Ready to create beautiful, publication-ready plots!")


## 1. Time Series with Confidence Bands üìà

PlotSmith makes time series visualization elegant and effortless. Notice the clean, minimalist styling that's publication-ready.


In [None]:
# Create realistic time series data
dates = pd.date_range("2020-01-01", periods=365, freq="D")
trend = np.linspace(100, 150, 365)
seasonal = 10 * np.sin(2 * np.pi * np.arange(365) / 365.25)
noise = np.random.randn(365) * 5
values = trend + seasonal + noise

# Create series
sales = pd.Series(values, index=dates, name="Daily Sales")

# Calculate confidence bands
rolling_std = sales.rolling(window=30).std()
lower = sales - 1.96 * rolling_std
upper = sales + 1.96 * rolling_std

# Plot with confidence bands
fig, ax = plot_timeseries(
    sales,
    bands={"95% Confidence Interval": (lower, upper)},
    title="Sales Trend with Confidence Bands",
    xlabel="Date",
    ylabel="Sales ($)",
    figsize=(12, 6)
)

plt.tight_layout()
plt.show()


## 2. Model Evaluation Suite ü§ñ

PlotSmith excels at ML model evaluation. Let's see backtest results, residuals, and model comparisons.


In [None]:
# Simulate backtest results with cross-validation folds
n_samples = 200
y_true = np.random.randn(n_samples) * 10 + 100
y_pred = y_true + np.random.randn(n_samples) * 3  # Some prediction error
fold_ids = np.repeat([0, 1, 2, 3, 4], n_samples // 5)

results_df = pd.DataFrame({
    'y_true': y_true,
    'y_pred': y_pred,
    'fold_id': fold_ids
})

# Backtest visualization with fold separation
fig, ax = plot_backtest(
    results_df,
    fold_id_col='fold_id',
    title="Model Performance Across CV Folds",
    xlabel="True Values",
    ylabel="Predicted Values"
)

plt.tight_layout()
plt.show()


In [None]:
# Residual analysis - two views
actual = np.random.randn(100) * 10 + 50
predicted = actual + np.random.randn(100) * 2

# Scatter plot: predicted vs actual
fig1, ax1 = plot_residuals(
    actual, predicted,
    plot_type='scatter',
    title="Residual Analysis: Predicted vs Actual",
    xlabel="Actual Values",
    ylabel="Predicted Values"
)
plt.tight_layout()
plt.show()

# Time series: residuals over time
fig2, ax2 = plot_residuals(
    actual, predicted,
    x=pd.date_range("2024-01-01", periods=100, freq="D"),
    plot_type='series',
    title="Residuals Over Time",
    xlabel="Date",
    ylabel="Residual"
)
plt.tight_layout()
plt.show()


In [None]:
# Model comparison - multiple forecasting models
dates = pd.date_range("2023-01-01", periods=100, freq="D")
actual = pd.Series(
    100 + np.cumsum(np.random.randn(100) * 0.5),
    index=dates,
    name="Actual"
)

# Simulate different model forecasts
forecasts = {
    'LSTM': pd.Series(
        actual.values + np.random.randn(100) * 2,
        index=dates
    ),
    'XGBoost': pd.Series(
        actual.values + np.random.randn(100) * 1.5,
        index=dates
    ),
    'Prophet': pd.Series(
        actual.values + np.random.randn(100) * 2.5,
        index=dates
    ),
}

# Add confidence intervals for one model
intervals = {
    'LSTM': (
        pd.Series(forecasts['LSTM'].values - 3, index=dates),
        pd.Series(forecasts['LSTM'].values + 3, index=dates),
    )
}

fig, ax = plot_forecast_comparison(
    actual,
    forecasts,
    intervals=intervals,
    title="Forecast Model Comparison",
    xlabel="Date",
    ylabel="Value"
)

plt.tight_layout()
plt.show()


## 3. Statistical Visualizations üìä

Beautiful statistical plots for data exploration and analysis.


In [None]:
# Create data for statistical plots
np.random.seed(42)
categories = ['Model A', 'Model B', 'Model C', 'Model D']
data_dict = {
    'Model A': np.random.randn(100) + 0,
    'Model B': np.random.randn(100) + 2,
    'Model C': np.random.randn(100) + 4,
    'Model D': np.random.randn(100) + 1,
}

# Convert to long format for box/violin plots
df_stats = pd.DataFrame([
    {'category': cat, 'value': val}
    for cat, values in data_dict.items()
    for val in values
])

# Box plot
fig1, ax1 = plot_box(
    df_stats,
    x='category',
    y='value',
    show_means=True,
    title="Model Performance Distribution (Box Plot)",
    xlabel="Model",
    ylabel="Score"
)
plt.tight_layout()
plt.show()

# Violin plot
fig2, ax2 = plot_violin(
    df_stats,
    x='category',
    y='value',
    show_means=True,
    title="Model Performance Distribution (Violin Plot)",
    xlabel="Model",
    ylabel="Score"
)
plt.tight_layout()
plt.show()


In [None]:
# Histogram with multiple distributions
normal_data = np.random.randn(1000)
skewed_data = np.random.exponential(2, 1000)

fig, ax = plot_histogram(
    [normal_data, skewed_data],
    labels=['Normal Distribution', 'Exponential Distribution'],
    colors=['#2E86AB', '#A23B72'],
    alpha=0.7,
    bins=50,
    title="Distribution Comparison",
    xlabel="Value",
    ylabel="Frequency"
)
plt.tight_layout()
plt.show()


In [None]:
# Enhanced scatter plot with color and size mapping
n = 300
scatter_df = pd.DataFrame({
    'x': np.random.randn(n),
    'y': np.random.randn(n),
    'category': np.random.choice(['A', 'B', 'C'], n),
    'size': np.random.uniform(10, 200, n),
})

fig, ax = plot_scatter(
    scatter_df,
    x='x',
    y='y',
    color='category',
    size='size',
    title="Scatter Plot with Color and Size Mapping",
    xlabel="Feature 1",
    ylabel="Feature 2"
)
plt.tight_layout()
plt.show()


## 4. Specialized Business Charts üíº

PlotSmith includes powerful specialized charts for business and financial analysis.


In [None]:
# Waterfall Chart - Financial breakdown
waterfall_df = pd.DataFrame({
    'Category': [
        'Starting Balance',
        'Revenue',
        'Cost of Goods',
        'Operating Expenses',
        'Taxes',
        'Net Profit'
    ],
    'Value': [100000, 50000, -30000, -15000, -5000, 0],
    'measure': ['absolute', 'relative', 'relative', 'relative', 'relative', 'total']
})

fig, ax = plot_waterfall(
    waterfall_df,
    categories_col='Category',
    values_col='Value',
    measure_col='measure',
    title="Financial Waterfall Chart",
    figsize=(10, 6)
)
plt.tight_layout()
plt.show()


In [None]:
# Waffle Chart - Composition visualization
waffle_df = pd.DataFrame({
    'Party': ['Democrats', 'Republicans', 'Independents'],
    'Seats': [45, 40, 15]
})

fig, ax = plot_waffle(
    waffle_df,
    category_col='Party',
    value_col='Seats',
    rows=10,
    columns=10,
    title="Election Results (Waffle Chart)",
    figsize=(10, 6)
)
plt.tight_layout()
plt.show()


In [None]:
# Metric Display - KPI visualization
fig, ax = plot_metric(
    title="Monthly Revenue",
    value=1234567,
    delta=123456,
    prefix="$",
    title_size=16,
    value_size=24
)
plt.tight_layout()
plt.show()


## 5. Comparison Charts üìä

Compare values across categories with elegant comparison charts.


In [None]:
# Dumbbell Chart - Before/After comparison
dumbbell_df = pd.DataFrame({
    'Country': ['USA', 'Canada', 'Mexico', 'Brazil', 'Argentina'],
    '2020': [67.5, 68.2, 65.8, 64.5, 66.1],
    '2024': [69.2, 69.8, 67.1, 65.9, 67.5],
})

fig, ax = plot_dumbbell(
    dumbbell_df,
    categories_col='Country',
    values1_col='2020',
    values2_col='2024',
    title="Life Expectancy Change (2020 ‚Üí 2024)",
    xlabel="Life Expectancy (years)",
    ylabel="Country"
)
plt.tight_layout()
plt.show()


In [None]:
# Range Chart - Show ranges/uncertainty
range_df = pd.DataFrame({
    'Metric': ['Accuracy', 'Precision', 'Recall', 'F1-Score'],
    'Min': [0.85, 0.82, 0.88, 0.84],
    'Max': [0.92, 0.89, 0.95, 0.91],
})

fig, ax = plot_range(
    range_df,
    categories_col='Metric',
    values1_col='Min',
    values2_col='Max',
    title="Model Performance Ranges",
    xlabel="Score",
    ylabel="Metric"
)
plt.tight_layout()
plt.show()


In [None]:
# Lollipop Chart - Clean bar chart alternative
lollipop_df = pd.DataFrame({
    'Feature': ['Feature A', 'Feature B', 'Feature C', 'Feature D', 'Feature E'],
    'Importance': [0.85, 0.72, 0.68, 0.55, 0.42],
})

fig, ax = plot_lollipop(
    lollipop_df,
    categories_col='Feature',
    values_col='Importance',
    title="Feature Importance (Lollipop Chart)",
    xlabel="Importance Score",
    ylabel="Feature"
)
plt.tight_layout()
plt.show()


In [None]:
# Slope Chart - Show change over time
slope_df = pd.DataFrame({
    'Year': [2020, 2020, 2020, 2023, 2023, 2023],
    'Country': ['USA', 'Canada', 'Mexico', 'USA', 'Canada', 'Mexico'],
    'GDP': [20.94, 1.64, 1.07, 22.99, 1.99, 1.27],
})

fig, ax = plot_slope(
    data_frame=slope_df,
    x='Year',
    y='GDP',
    group='Country',
    title="GDP Growth (2020 ‚Üí 2023)",
    xlabel="Year",
    ylabel="GDP (Trillion USD)"
)
plt.tight_layout()
plt.show()


## 6. Correlation Analysis üîó

Powerful correlation visualization for feature analysis.


In [None]:
# Create correlated dataset
np.random.seed(42)
n = 500
features_df = pd.DataFrame({
    'Feature_1': np.random.randn(n),
    'Feature_2': np.random.randn(n) * 0.8 + 0.5 * np.random.randn(n),
    'Feature_3': np.random.randn(n) * 0.6 + 0.3 * np.random.randn(n),
    'Feature_4': np.random.randn(n),
    'Target': np.random.randn(n) * 0.7 + 0.4 * np.random.randn(n),
})

# Correlation heatmap
fig, ax = plot_correlation(
    features_df,
    method='pearson',
    annotate=True,
    title="Feature Correlation Matrix",
    figsize=(8, 6)
)
plt.tight_layout()
plt.show()


## 7. Small Multiples - Composition Power üé®

Create beautiful multi-panel figures with consistent styling.


In [None]:
# Create small multiples for different metrics
# Note: For small multiples, we'll create individual plots
# In a real dashboard, you'd use the small_multiples helper with primitives directly

metrics = [
    ('Revenue', np.random.randn(100).cumsum() + 100),
    ('Users', np.random.randn(100).cumsum() + 1000),
    ('Engagement', np.random.randn(100).cumsum() + 50),
    ('Retention', np.random.randn(100).cumsum() + 80),
    ('Satisfaction', np.random.randn(100).cumsum() + 4),
    ('Growth', np.random.randn(100).cumsum() + 10),
]

dates = pd.date_range("2024-01-01", periods=100, freq="D")

# Create a grid of plots
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for ax, (title, values) in zip(axes, metrics):
    series = pd.Series(values, index=dates)
    # Use primitives directly for small multiples
    from plotsmith.tasks import TimeseriesPlotTask
    from plotsmith.primitives import draw_series, minimal_axes, apply_axes_style
    
    task = TimeseriesPlotTask(data=series, title=title)
    views, spec = task.execute()
    minimal_axes(ax)
    for view in views:
        draw_series(ax, view)
    apply_axes_style(ax, spec)

plt.suptitle("Business Metrics Dashboard", fontsize=16, y=0.995)
plt.tight_layout()
plt.show()


## 8. Real-World ML Workflow Example üöÄ

Let's put it all together in a complete ML evaluation workflow.


In [None]:
# Complete ML Model Evaluation Workflow
print("=" * 60)
print("COMPLETE ML MODEL EVALUATION WORKFLOW")
print("=" * 60)

# 1. Generate synthetic ML dataset
np.random.seed(42)
n_train, n_test = 500, 200

# Training data
X_train = np.random.randn(n_train, 5)
y_train = X_train[:, 0] * 2 + X_train[:, 1] * 1.5 + np.random.randn(n_train) * 0.5

# Test data
X_test = np.random.randn(n_test, 5)
y_test = X_test[:, 0] * 2 + X_test[:, 1] * 1.5 + np.random.randn(n_test) * 0.5

# Simulate predictions from different models
predictions = {
    'Linear Regression': y_test + np.random.randn(n_test) * 1.0,
    'Random Forest': y_test + np.random.randn(n_test) * 0.8,
    'XGBoost': y_test + np.random.randn(n_test) * 0.6,
}

print(f"\n‚úÖ Generated dataset: {n_train} training, {n_test} test samples")
print(f"‚úÖ Simulated {len(predictions)} model predictions")


In [None]:
# 2. Backtest visualization with cross-validation
cv_results = []
for fold in range(5):
    fold_size = n_test // 5
    start_idx = fold * fold_size
    end_idx = start_idx + fold_size
    
    cv_results.append({
        'y_true': y_test[start_idx:end_idx],
        'y_pred': predictions['XGBoost'][start_idx:end_idx],
        'fold_id': fold
    })

backtest_df = pd.DataFrame({
    'y_true': np.concatenate([r['y_true'] for r in cv_results]),
    'y_pred': np.concatenate([r['y_pred'] for r in cv_results]),
    'fold_id': np.concatenate([[r['fold_id']] * len(r['y_true']) for r in cv_results])
})

fig, ax = plot_backtest(
    backtest_df,
    fold_id_col='fold_id',
    title="Cross-Validation Performance",
    figsize=(8, 8)
)
plt.tight_layout()
plt.show()


In [None]:
# 3. Residual analysis - two separate plots
# Scatter: predicted vs actual
fig1, ax1 = plot_residuals(
    y_test,
    predictions['XGBoost'],
    plot_type='scatter',
    title="Residuals: Predicted vs Actual",
    figsize=(7, 5)
)
plt.tight_layout()
plt.show()

# Series: residuals over index
fig2, ax2 = plot_residuals(
    y_test,
    predictions['XGBoost'],
    x=np.arange(len(y_test)),
    plot_type='series',
    title="Residuals Over Test Samples",
    figsize=(7, 5)
)
plt.tight_layout()
plt.show()


In [None]:
# 4. Model comparison
dates = pd.date_range("2024-01-01", periods=n_test, freq="D")
actual_series = pd.Series(y_test, index=dates)

# Convert predictions to arrays (plot_model_comparison expects arrays/Series)
model_predictions = {
    name: pred  # Already arrays
    for name, pred in predictions.items()
}

fig, ax = plot_model_comparison(
    actual_series,
    models=model_predictions,
    test_start_idx=0,
    title="Model Comparison: All Models",
    xlabel="Date",
    ylabel="Target Value"
)
plt.tight_layout()
plt.show()


In [None]:
# 5. Feature importance and correlation
feature_names = [f'Feature_{i+1}' for i in range(5)]
feature_df = pd.DataFrame(X_test, columns=feature_names)
feature_df['Target'] = y_test

# Correlation matrix
fig, ax = plot_correlation(
    feature_df,
    annotate=True,
    title="Feature Correlation Analysis",
    figsize=(8, 6)
)
plt.tight_layout()
plt.show()


In [None]:
# 6. Performance metrics dashboard
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

metrics_data = []
for name, pred in predictions.items():
    mae = mean_absolute_error(y_test, pred)
    rmse = np.sqrt(mean_squared_error(y_test, pred))
    r2 = r2_score(y_test, pred)
    
    metrics_data.append({
        'Model': name,
        'MAE': mae,
        'RMSE': rmse,
        'R¬≤': r2
    })

metrics_df = pd.DataFrame(metrics_data)

# Create comparison visualizations - three separate plots
# Bar chart for MAE
fig1, ax1 = plot_bar(
    metrics_df['Model'],
    metrics_df['MAE'],
    title="Mean Absolute Error",
    ylabel="MAE",
    figsize=(6, 5)
)
plt.tight_layout()
plt.show()

# Bar chart for RMSE
fig2, ax2 = plot_bar(
    metrics_df['Model'],
    metrics_df['RMSE'],
    title="Root Mean Squared Error",
    ylabel="RMSE",
    figsize=(6, 5)
)
plt.tight_layout()
plt.show()

# Bar chart for R¬≤
fig3, ax3 = plot_bar(
    metrics_df['Model'],
    metrics_df['R¬≤'],
    title="R¬≤ Score",
    ylabel="R¬≤",
    figsize=(6, 5)
)
plt.tight_layout()
plt.show()

print("\nüìä Model Performance Summary:")
print(metrics_df.to_string(index=False))


## 9. The PlotSmith Architecture Advantage üèóÔ∏è

PlotSmith's 4-layer architecture ensures clean, maintainable code. Let's see it in action:


In [None]:
# Layer 1: Objects - Immutable data structures (no matplotlib!)
from plotsmith.objects import SeriesView, FigureSpec

# Create a view object (Layer 1)
view = SeriesView(
    x=np.array([1, 2, 3, 4, 5]),
    y=np.array([10, 20, 15, 25, 30]),
    label="Sample Series"
)

# Layer 2: Primitives - Drawing functions (all matplotlib here)
from plotsmith.primitives import draw_series, minimal_axes, apply_axes_style

fig, ax = plt.subplots(figsize=(8, 5))
minimal_axes(ax)  # Apply minimalist styling
draw_series(ax, view)  # Draw the view
apply_axes_style(ax, FigureSpec(title="Layer 1 ‚Üí Layer 2 Example", xlabel="X", ylabel="Y"))

plt.tight_layout()
plt.show()

print("‚úÖ Layer 1 (Objects): Pure data, no matplotlib dependencies")
print("‚úÖ Layer 2 (Primitives): All matplotlib calls isolated here")
print("‚úÖ Clean separation of concerns!")


## 10. Styling Helpers & Annotations üé®

PlotSmith includes powerful styling helpers for professional annotations.


In [None]:
# Create a plot with annotations
dates = pd.date_range("2020-01-01", periods=100, freq="D")
values = 100 + np.cumsum(np.random.randn(100) * 0.5)
series = pd.Series(values, index=dates)

fig, ax = plot_timeseries(series, title="Annotated Time Series", figsize=(12, 6))

# Add event markers
from plotsmith import event_line, direct_label

event_line(ax, x=dates[30], text="Policy Change", color=ACCENT)
event_line(ax, x=dates[70], text="Market Shift", color=ACCENT)

# Add direct labels
direct_label(ax, x=dates[50], y=values[50], text="Peak Value", use_accent=True)

plt.tight_layout()
plt.show()


## Summary: Why PlotSmith? ‚ú®

### Key Advantages

1. **üèóÔ∏è Clean Architecture** - 4-layer design ensures maintainability
2. **üé® Beautiful by Default** - Publication-ready plots out of the box
3. **üîí Type-Safe** - Full type hints prevent errors
4. **üìä Comprehensive** - 20+ chart types for every need
5. **ü§ñ ML-Focused** - Built specifically for model evaluation
6. **‚ö° Production Ready** - Comprehensive tests, robust error handling
7. **üìö Well Documented** - Complete docs with examples
8. **üöÄ Easy to Use** - Simple API, powerful results

### Perfect For

- ‚úÖ ML model evaluation and comparison
- ‚úÖ Time series analysis with confidence bands
- ‚úÖ Statistical data exploration
- ‚úÖ Business intelligence dashboards
- ‚úÖ Research publication figures
- ‚úÖ Data science workflows

### Get Started

```python
from plotsmith import plot_timeseries, plot_backtest, plot_correlation
# That's it! Start plotting immediately.
```

**PlotSmith** - Where elegance meets functionality. üé®‚ú®
