# Example 2: Hyperparameter Tuning Grid Search

**Phase 4 Features Showcased:**
- ✅ Performance Optimization (100+ panels, loading states)
- ✅ Multi-Range Filtering (cv_score > 0.8 AND fit_time < 60s)
- ✅ Multi-Column Sorting (score DESC, then time ASC)
- ✅ Label Configuration (show only critical metrics)
- ✅ Views (save "Production Candidates")
- ✅ Keyboard Navigation (quick browsing)
- ✅ Export (top 10 configurations)

## Use Case

Visualize results from hyperparameter grid search across Random Forest, XGBoost, and LightGBM models. Each panel shows learning curves and performance metrics for a specific hyperparameter combination.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from trelliscope import Display
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)

## 1. Generate Hyperparameter Grid Search Results

In [None]:
def create_learning_curve_plot(train_scores, val_scores, params_str, cv_score, model_type):
    """Create learning curve visualization."""
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
    
    iterations = np.arange(len(train_scores))
    
    # Learning curves
    ax1.plot(iterations, train_scores, label='Train', color='#27AE60', linewidth=2.5, alpha=0.9)
    ax1.plot(iterations, val_scores, label='Validation', color='#E74C3C', linewidth=2.5, alpha=0.9)
    ax1.axhline(y=cv_score, color='#3498DB', linestyle='--', linewidth=2, label='Final CV Score', alpha=0.7)
    ax1.set_xlabel('Iteration', fontsize=11)
    ax1.set_ylabel('Score (R²)', fontsize=11)
    ax1.set_title(f'{model_type} Learning Curve', fontsize=12, fontweight='bold')
    ax1.legend(loc='lower right', fontsize=10)
    ax1.grid(True, alpha=0.25, linestyle=':')
    ax1.spines['top'].set_visible(False)
    ax1.spines['right'].set_visible(False)
    
    # Overfitting analysis
    gap = train_scores - val_scores
    ax2.plot(iterations, gap, color='#9B59B6', linewidth=2.5, alpha=0.9)
    ax2.axhline(y=0, color='black', linestyle='-', linewidth=1, alpha=0.3)
    ax2.fill_between(iterations, 0, gap, where=(gap>0), color='#9B59B6', alpha=0.2, label='Overfitting')
    ax2.set_xlabel('Iteration', fontsize=11)
    ax2.set_ylabel('Train - Val Score', fontsize=11)
    ax2.set_title('Overfitting Analysis', fontsize=12, fontweight='bold')
    ax2.legend(loc='upper right', fontsize=10)
    ax2.grid(True, alpha=0.25, linestyle=':')
    ax2.spines['top'].set_visible(False)
    ax2.spines['right'].set_visible(False)
    
    fig.suptitle(f'Hyperparameters: {params_str}\nCV Score: {cv_score:.4f}', 
                 fontsize=11, y=1.02)
    
    plt.tight_layout()
    return fig

# Define hyperparameter grids
param_grids = {
    'RandomForest': {
        'n_estimators': [50, 100, 200, 500],
        'max_depth': [5, 10, 15, 20, None],
        'min_samples_split': [2, 5, 10]
    },
    'XGBoost': {
        'n_estimators': [50, 100, 200],
        'max_depth': [3, 5, 7, 10],
        'learning_rate': [0.01, 0.05, 0.1, 0.3]
    },
    'LightGBM': {
        'n_estimators': [50, 100, 200],
        'max_depth': [3, 5, 7, 10],
        'learning_rate': [0.01, 0.05, 0.1],
        'num_leaves': [15, 31, 63]
    }
}

print("Generating hyperparameter tuning results...")
print(f"RandomForest: {4 * 5 * 3} = {4*5*3} combinations")
print(f"XGBoost: {3 * 4 * 4} = {3*4*4} combinations")
print(f"LightGBM: {3 * 4 * 3 * 3} = {3*4*3*3} combinations")
print(f"Total: {60 + 48 + 108} = {60+48+108} panels\n")

In [None]:
import itertools

data_rows = []
panel_count = 0

for model_type, param_grid in param_grids.items():
    # Generate all parameter combinations
    param_names = list(param_grid.keys())
    param_values = [param_grid[name] for name in param_names]
    
    for param_combo in itertools.product(*param_values):
        params = dict(zip(param_names, param_combo))
        
        # Simulate model performance based on parameters
        # More estimators + moderate depth = better performance
        base_score = 0.70
        n_est_bonus = min(params.get('n_estimators', 100) / 500, 0.15)
        depth_factor = params.get('max_depth', 10)
        if depth_factor is None:
            depth_factor = 20
        depth_bonus = min(depth_factor / 100, 0.10) if depth_factor < 15 else max(0, 0.10 - (depth_factor - 15) * 0.01)
        
        lr = params.get('learning_rate', 0.1)
        lr_bonus = 0.05 if 0.05 <= lr <= 0.1 else -0.02
        
        cv_score = base_score + n_est_bonus + depth_bonus + lr_bonus + np.random.normal(0, 0.03)
        cv_score = np.clip(cv_score, 0.60, 0.96)
        
        # Training score (always >= cv_score)
        train_score = cv_score + np.random.uniform(0.02, 0.12)
        train_score = np.clip(train_score, cv_score, 0.99)
        
        # Fit time based on complexity
        fit_time = params.get('n_estimators', 100) * 0.05
        fit_time *= (depth_factor / 10) if depth_factor else 2.0
        fit_time *= (1 + np.random.uniform(-0.2, 0.2))
        
        # Generate learning curves
        n_iterations = params.get('n_estimators', 100)
        train_scores = np.linspace(0.5, train_score, n_iterations) + np.random.normal(0, 0.02, n_iterations).cumsum() * 0.01
        val_scores = np.linspace(0.5, cv_score, n_iterations) + np.random.normal(0, 0.015, n_iterations).cumsum() * 0.01
        train_scores = np.clip(train_scores, 0.5, 0.99)
        val_scores = np.clip(val_scores, 0.5, cv_score)
        
        # Create params string
        params_str = ', '.join([f"{k}={v}" for k, v in params.items()])
        
        # Create visualization
        fig = create_learning_curve_plot(train_scores, val_scores, params_str, cv_score, model_type)
        
        # Find best params per model
        is_best = False  # Will set after generating all
        
        data_rows.append({
            'panel': fig,
            'model_type': model_type,
            'n_estimators': params.get('n_estimators', 100),
            'max_depth': params.get('max_depth', 10) if params.get('max_depth') is not None else 999,
            'learning_rate': params.get('learning_rate', 0.1),
            'cv_score': cv_score,
            'train_score': train_score,
            'fit_time': fit_time,
            'overfitting_gap': train_score - cv_score,
            'params_str': params_str,
            'is_best': 'no'  # Will update
        })
        
        panel_count += 1
        if panel_count % 50 == 0:
            print(f"  Generated {panel_count}/216 panels...")
        
        plt.close(fig)

print(f"\n✓ Generated {panel_count} hyperparameter configurations")

## 2. Mark Best Parameters

In [None]:
# Create DataFrame
df = pd.DataFrame(data_rows)

# Mark best params per model
for model in df['model_type'].unique():
    mask = df['model_type'] == model
    best_idx = df[mask]['cv_score'].idxmax()
    df.loc[best_idx, 'is_best'] = 'yes'

print(f"\nDataFrame shape: {df.shape}")
print(f"\nBest parameters per model:")
for model in df['model_type'].unique():
    best = df[(df['model_type'] == model) & (df['is_best'] == 'yes')].iloc[0]
    print(f"  {model}: CV={best['cv_score']:.4f}, Time={best['fit_time']:.1f}s")
    print(f"    {best['params_str']}")

## 3. Create Trelliscope Display

In [None]:
display = (
    Display(df, name="hyperparameter_tuning_grid_search", description="Hyperparameter grid search results for RandomForest, XGBoost, and LightGBM with 216 parameter combinations")
    .set_panel_column("panel")
    .infer_metas()
    .set_default_layout(ncol=4, nrow=3)  # 12 panels per page for large dataset
    .set_default_labels(["model_type", "cv_score", "fit_time", "is_best"])
    # Best score, fastest time
    .write()
)

print("\n✓ Trelliscope display created successfully!")

## 4. Launch Interactive Viewer

In [None]:
from trelliscope.dash_viewer import create_dash_app

app = create_dash_app(display)

print("\n" + "="*70)
print("🚀 LAUNCHING INTERACTIVE VIEWER - LARGE DATASET DEMO")
print("="*70)
print(f"\n📊 Display: {display.name}")
print(f"📈 Total Panels: {len(df)} (LARGE DATASET)")
print(f"🤖 Models: {', '.join(df['model_type'].unique())}")
print(f"\n⚡ Performance Features:")
print(f"  - Loading states on filter/sort operations")
print(f"  - Efficient rendering for 200+ panels")
print(f"  - Cached operations")
print("\n🌐 Opening browser on http://localhost:8053...\n")

app.run(debug=False, host='127.0.0.1', port=8053)

## 5. Feature Testing Guide

### ✅ Performance Optimization (Feature 3) - PRIMARY FOCUS

**Dataset**: 216 panels (large enough to see loading states)

**Try This**:
1. **Initial Load**:
   - Observe loading spinner during initial render
   - Check browser DevTools → Performance tab
   - Should load in < 3 seconds

2. **Filter Operations**:
   - Apply cv_score range: min=0.80, max=1.0
   - Observe loading state appears
   - Operation should complete in < 500ms

3. **Multi-Filter Stress Test**:
   - Add: cv_score > 0.85
   - Add: fit_time < 50
   - Add: overfitting_gap < 0.10
   - Observe loading states on each

4. **Sort Large Dataset**:
   - Sort by cv_score descending
   - Observe loading state
   - Add second sort: fit_time ascending
   - Observe re-sorting loading state

5. **Page Navigation**:
   - Use → arrow key to navigate pages quickly
   - Should smoothly load next 12 panels
   - No lag or freeze

**Expected**: Loading spinners visible, operations < 500ms, no browser freeze

---

### ✅ Multi-Range Filtering

**Try This**:
1. **Production-Ready Filter**:
   - cv_score: min=0.85 (high accuracy)
   - fit_time: max=60 (fast training)
   - overfitting_gap: max=0.08 (low overfitting)
   - Observe: ~10-20 panels remain

2. **Best-of-Best Filter**:
   - is_best: "yes"
   - Observe: Only 3 panels (one per model)

3. **Model-Specific Filter**:
   - model_type: "XGBoost"
   - n_estimators: 200
   - learning_rate: 0.1
   - Observe: Specific configuration subset

**Expected**: Complex filters combine correctly, empty state if no matches

---

### ✅ Multi-Column Sorting

**Try This**:
1. **Primary: Best Score, Secondary: Fastest**:
   - Sort 1: cv_score (descending)
   - Sort 2: fit_time (ascending)
   - Result: Best performing models first, ties broken by speed

2. **Primary: Model, Secondary: Score**:
   - Sort 1: model_type (ascending) - alphabetical
   - Sort 2: cv_score (descending)
   - Result: Models grouped, best within each model

3. **Three-Level Sort**:
   - Sort 1: model_type (ascending)
   - Sort 2: n_estimators (descending)
   - Sort 3: cv_score (descending)
   - Result: Complex hierarchical ordering

**Expected**: Sorts apply in order, lower priority breaks ties

---

### ✅ Views for Complex States

**Try This**:
1. **Create "Production Candidates" View**:
   - Filter: cv_score > 0.85, fit_time < 60, overfitting_gap < 0.08
   - Sort: cv_score DESC, fit_time ASC
   - Labels: model_type, cv_score, fit_time, is_best
   - Save as "Production Candidates"

2. **Create "Fast Models" View**:
   - Filter: fit_time < 30
   - Sort: fit_time ASC
   - Save as "Fast Models"

3. **Switch Between Views**:
   - Load "Production Candidates" → see high-quality configs
   - Load "Fast Models" → see speedy configs
   - Clear all → see full dataset

**Expected**: Views restore complete state including multi-filter/sort

---

### ✅ Keyboard Navigation (Feature 4)

**With 216 Panels**:
- **→ / ←**: Navigate through 18 pages (12 panels each)
- **/**: Quick search for specific model or params
- **Esc**: Clear search
- **⌨️ button**: View all shortcuts

**Try This**:
1. Press → 5 times quickly
2. Observe smooth page transitions
3. Press ← to go back
4. Press / and search "XGBoost"
5. Press Esc to clear

**Expected**: Responsive keyboard controls, no lag

---

### ✅ Export Top Configurations

**Try This**:
1. **Export Top 10**:
   - Sort: cv_score DESC
   - Filter: cv_score > 0.88 (approximately top 10)
   - Click "Export Data (CSV)"
   - Open CSV: should have ~10 rows with best configs

2. **Export Production View**:
   - Load "Production Candidates" view
   - Export Data (CSV) - filtered set only
   - Export View (JSON) - save filter specification

3. **Share Configuration**:
   - Export Config (JSON)
   - Contains full display metadata
   - Can be used to recreate display

**Expected**: CSV has filtered data, JSON has complete state

---

## 6. Performance Benchmarks

**Expected Timings** (216 panels):
- Initial load: < 3s
- Single filter: < 400ms
- Multi-filter (3+): < 600ms
- Sort operation: < 400ms
- Multi-sort (3 levels): < 600ms
- Search: < 200ms
- Page navigation: < 300ms

**How to Measure**:
1. Open DevTools (F12)
2. Go to Performance tab
3. Click Record
4. Perform operation
5. Stop recording
6. Check duration

---

## 7. Key Insights to Discover

1. **Best Overall Configuration**:
   - Filter: is_best = "yes"
   - Sort: cv_score DESC
   - Answer: Which model + params win?

2. **Speed vs Accuracy Trade-off**:
   - Plot conceptually: cv_score vs fit_time
   - Filter: cv_score > 0.85, fit_time < 40
   - Answer: Sweet spot configurations

3. **Overfitting Analysis**:
   - Sort: overfitting_gap DESC
   - See which configs overfit most
   - Filter: overfitting_gap < 0.05
   - See well-generalized configs

4. **Model Comparison**:
   - Group by model_type
   - Compare best of each
   - Answer: Which model family performs best?

---

## Summary

This example demonstrates:
- ✅ Performance optimization for large datasets (200+ panels)
- ✅ Complex multi-range filtering
- ✅ Multi-column hierarchical sorting
- ✅ Views for saving complex analysis states
- ✅ Keyboard navigation for efficient browsing
- ✅ Export for sharing top configurations

**Next**: Try Example 3 (CV Fold Analysis) for modal navigation features!