# Hyperparameter Tuning - Data Science Koans

Welcome to Notebook 13: Hyperparameter Tuning!

## What You Will Learn
- Grid Search for exhaustive parameter exploration
- Random Search for efficient parameter sampling  
- Bayesian Optimization for intelligent search
- Early Stopping techniques for efficiency
- Nested Cross-Validation for unbiased evaluation
- AutoML basics for automated model selection

## Why This Matters
Hyperparameter tuning can dramatically improve model performance:
- **Performance Gains**: Often 10-30% accuracy improvement
- **Generalization**: Better validation performance through proper tuning
- **Efficiency**: Automated search saves time and finds better solutions
- **Robustness**: Reduces sensitivity to parameter choices
- **Competition Edge**: Essential for winning ML competitions

## Key Concepts
- **Hyperparameters**: Model configuration settings (not learned from data)
- **Search Space**: Range of possible parameter values to explore
- **Cross-Validation**: Robust evaluation during parameter search
- **Overfitting Risk**: Tuning on test data leads to overoptimistic results
- **Computational Trade-offs**: Balancing search thoroughness vs. time

## Prerequisites
- Ensemble Methods (Notebook 12)
- Understanding of cross-validation
- Experience with scikit-learn model evaluation

## How to Use
1. Understand each search strategy's strengths and weaknesses
2. Implement the TODO sections with proper parameter definitions
3. Run validations to verify search implementations
4. Compare different tuning approaches
5. Learn to avoid common pitfalls like data leakage

Ready to optimize your models to their full potential? 🎯

In [None]:
# Setup - Run first!
import sys
sys.path.append('../..')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer, load_digits, make_classification
from sklearn.model_selection import (GridSearchCV, RandomizedSearchCV, 
                                     train_test_split, cross_val_score,
                                     validation_curve, learning_curve)
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler
from scipy.stats import randint, uniform
import time

# Optional advanced libraries
try:
    from sklearn.experimental import enable_halving_search_cv
    from sklearn.model_selection import HalvingGridSearchCV, HalvingRandomSearchCV
    HALVING_AVAILABLE = True
    print("✓ Halving search methods available")
except ImportError:
    HALVING_AVAILABLE = False
    print("⚠️ Halving search not available (sklearn < 0.24)")

try:
    from skopt import BayesSearchCV
    from skopt.space import Real, Integer, Categorical
    BAYESIAN_AVAILABLE = True
    print("✓ Bayesian optimization available")
except ImportError:
    BAYESIAN_AVAILABLE = False
    print("⚠️ Bayesian optimization not available (install: pip install scikit-optimize)")

from koans.core.validator import KoanValidator
from koans.core.progress import ProgressTracker

validator = KoanValidator("13_hyperparameter_tuning")
tracker = ProgressTracker()

print("Setup complete!")
print(f"Current progress: {tracker.get_notebook_progress('13_hyperparameter_tuning')}%")

## KOAN 13.1: Grid Search - Exhaustive Parameter Search
**Objective**: Use GridSearchCV to find optimal Random Forest parameters  
**Difficulty**: Advanced

Grid Search evaluates every combination of specified parameter values. It's thorough but can be computationally expensive with large parameter spaces.

**Key Concept**: Grid Search guarantees finding the best combination within your specified parameter grid, but grows exponentially with the number of parameters.

In [None]:
def optimize_random_forest_with_grid_search():
    """
    Use Grid Search to optimize Random Forest hyperparameters.
    
    We'll tune: n_estimators, max_depth, min_samples_split, min_samples_leaf
    
    Returns:
        dict: Contains best parameters, best score, and search results
    """
    # Load and prepare data
    cancer = load_breast_cancer()
    X, y = cancer.data, cancer.target
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    
    # TODO: Define parameter grid for Random Forest
    param_grid = {
        'n_estimators': None,      # [50, 100, 200]
        'max_depth': None,         # [None, 10, 20, 30]  
        'min_samples_split': None, # [2, 5, 10]
        'min_samples_leaf': None   # [1, 2, 4]
    }
    
    # TODO: Create base RandomForestClassifier
    rf = None  # RandomForestClassifier(random_state=42)
    
    # TODO: Create GridSearchCV with 5-fold cross-validation
    # Hint: GridSearchCV(rf, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
    grid_search = None
    
    # Time the search
    start_time = time.time()
    
    # TODO: Fit the grid search
    # grid_search.fit(X_train, y_train)
    
    search_time = time.time() - start_time
    
    # TODO: Get best parameters and score
    best_params = None      # grid_search.best_params_
    best_cv_score = None    # grid_search.best_score_
    
    # Test the best model
    best_model = grid_search.best_estimator_
    test_score = accuracy_score(y_test, best_model.predict(X_test))
    
    return {
        'best_params': best_params,
        'best_cv_score': best_cv_score,
        'test_score': test_score,
        'search_time': search_time,
        'total_combinations': len(grid_search.cv_results_['params']) if grid_search else None
    }

@validator.koan(1, "Grid Search - Exhaustive Parameter Search", difficulty="Advanced")
def validate():
    results = optimize_random_forest_with_grid_search()
    
    assert results['best_params'] is not None, "Best parameters is None"
    assert results['best_cv_score'] is not None, "Best CV score is None"  
    assert results['test_score'] is not None, "Test score is None"
    assert results['total_combinations'] is not None, "Total combinations is None"
    
    # Check that we have reasonable results
    assert 0.8 <= results['best_cv_score'] <= 1.0, f"CV score should be reasonable, got {results['best_cv_score']:.3f}"
    assert 0.8 <= results['test_score'] <= 1.0, f"Test score should be reasonable, got {results['test_score']:.3f}"
    
    # Check that we tested multiple combinations
    expected_combinations = 3 * 4 * 3 * 3  # Based on typical param grid
    assert results['total_combinations'] >= 36, f"Should test many combinations, got {results['total_combinations']}"
    
    print("✓ Grid Search optimization complete!")
    print(f"  - Search time: {results['search_time']:.2f} seconds")
    print(f"  - Total combinations tested: {results['total_combinations']}")
    print(f"  - Best CV score: {results['best_cv_score']:.4f}")
    print(f"  - Test score: {results['test_score']:.4f}")
    
    print(f"\n  🏆 Best Parameters Found:")
    for param, value in results['best_params'].items():
        print(f"    {param}: {value}")
    
    # Performance comparison
    cv_test_diff = abs(results['best_cv_score'] - results['test_score'])
    if cv_test_diff < 0.02:
        print(f"\n  ✓ Good generalization (CV-Test diff: {cv_test_diff:.3f})")
    else:
        print(f"\n  ⚠️ Possible overfitting (CV-Test diff: {cv_test_diff:.3f})")
        
    print(f"\n  💡 Grid Search Insights:")
    print(f"    • Exhaustive: Tests all parameter combinations")  
    print(f"    • Deterministic: Same results every run")
    print(f"    • Expensive: Time grows exponentially with parameters")

validate()

## KOAN 13.2: Random Search - Efficient Parameter Sampling
**Objective**: Use RandomizedSearchCV for faster parameter exploration  
**Difficulty**: Advanced

Random Search samples parameter combinations randomly from specified distributions. It's often more efficient than Grid Search, especially with many parameters.

**Key Concept**: Random Search can find good parameters faster than Grid Search because it doesn't waste time on systematically bad regions of parameter space.

In [None]:
def compare_grid_vs_random_search():
    """
    Compare Grid Search vs Random Search on SVM hyperparameter tuning.
    
    Returns:
        dict: Comparison results including timing and performance
    """
    # Load digits dataset (more complex for demonstrating search differences)
    digits = load_digits()
    X, y = digits.data, digits.target
    
    # Use subset for faster demo
    X_subset = X[:1000]
    y_subset = y[:1000]
    
    X_train, X_test, y_train, y_test = train_test_split(
        X_subset, y_subset, test_size=0.2, random_state=42, stratify=y_subset
    )
    
    # Scale features for SVM
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # TODO: Define parameter distributions for Random Search
    param_distributions = {
        'C': None,        # uniform(0.1, 100) - continuous distribution
        'gamma': None,    # uniform(0.001, 1) - continuous distribution  
        'kernel': None    # ['rbf', 'linear'] - categorical choice
    }
    
    # TODO: Create SVC model
    svm = None  # SVC(random_state=42)
    
    # Grid Search (smaller grid for comparison)
    grid_params = {'C': [0.1, 1, 10], 'gamma': [0.001, 0.01, 0.1], 'kernel': ['rbf', 'linear']}
    grid_search = GridSearchCV(svm, grid_params, cv=3, n_jobs=-1)
    
    # TODO: Create RandomizedSearchCV with 20 iterations
    # Hint: RandomizedSearchCV(svm, param_distributions, n_iter=20, cv=3, random_state=42, n_jobs=-1)
    random_search = None
    
    # Time both searches
    start = time.time()
    grid_search.fit(X_train_scaled, y_train)
    grid_time = time.time() - start
    
    start = time.time()  
    # TODO: Fit random search
    # random_search.fit(X_train_scaled, y_train)
    random_time = time.time() - start
    
    # Compare results
    grid_score = grid_search.score(X_test_scaled, y_test)
    random_score = random_search.score(X_test_scaled, y_test) if random_search else 0
    
    return {
        'grid_time': grid_time,
        'random_time': random_time,
        'grid_score': grid_score,
        'random_score': random_score,
        'grid_combinations': len(grid_search.cv_results_['params']),
        'random_combinations': 20,
        'grid_best_params': grid_search.best_params_,
        'random_best_params': random_search.best_params_ if random_search else None
    }

@validator.koan(2, "Random Search - Efficient Parameter Sampling", difficulty="Advanced")
def validate():
    results = compare_grid_vs_random_search()
    
    assert results['random_score'] is not None and results['random_score'] > 0, "Random search not implemented"
    assert results['random_best_params'] is not None, "Random search best params is None"
    assert 0.7 <= results['grid_score'] <= 1.0, f"Grid score should be reasonable, got {results['grid_score']:.3f}"
    assert 0.7 <= results['random_score'] <= 1.0, f"Random score should be reasonable, got {results['random_score']:.3f}"
    
    print("✓ Random Search vs Grid Search comparison complete!")
    print(f"\n  ⏱️  Timing Comparison:")
    print(f"    Grid Search: {results['grid_time']:.2f}s ({results['grid_combinations']} combinations)")
    print(f"    Random Search: {results['random_time']:.2f}s ({results['random_combinations']} combinations)")
    
    speedup = results['grid_time'] / results['random_time'] if results['random_time'] > 0 else 1
    print(f"    Speedup: {speedup:.1f}x faster")
    
    print(f"\n  🎯 Performance Comparison:")
    print(f"    Grid Search: {results['grid_score']:.4f}")
    print(f"    Random Search: {results['random_score']:.4f}")
    
    score_diff = results['random_score'] - results['grid_score']
    if abs(score_diff) < 0.02:
        print(f"    Similar performance (diff: {score_diff:+.3f})")
    elif score_diff > 0:
        print(f"    Random Search wins! (+{score_diff:.3f})")
    else:
        print(f"    Grid Search wins! ({score_diff:.3f})")
    
    print(f"\n  💡 Random Search Benefits:")
    print(f"    • Faster with many parameters")
    print(f"    • Can find good solutions quickly")
    print(f"    • Explores diverse parameter regions")
    print(f"    • Easy to parallelize")

validate()

## KOAN 13.3: Parameter Distributions - Defining Search Spaces
**Objective**: Learn to define appropriate parameter distributions  
**Difficulty**: Advanced

Choosing good parameter distributions is crucial for effective random search. Different parameters need different distribution types (uniform, log-uniform, discrete, etc.).

**Key Concept**: Parameter distributions should reflect prior knowledge about reasonable parameter ranges and scales.

In [None]:
def design_parameter_distributions():
    """
    Design appropriate parameter distributions for Random Forest tuning.
    
    Returns:
        dict: Parameter distributions and tuning results
    """
    # Load breast cancer dataset
    cancer = load_breast_cancer()
    X, y = cancer.data, cancer.target
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    
    # TODO: Define smart parameter distributions
    param_distributions = {
        # Number of trees: discrete integers in reasonable range
        'n_estimators': None,           # randint(50, 301) - random integers from 50 to 300
        
        # Max depth: include None (unlimited) and reasonable depths
        'max_depth': None,              # [None] + list(range(5, 31, 5)) - None plus [5,10,15,20,25,30]
        
        # Minimum samples to split: small integers
        'min_samples_split': None,      # randint(2, 21) - integers from 2 to 20
        
        # Minimum samples in leaf: small integers  
        'min_samples_leaf': None,       # randint(1, 11) - integers from 1 to 10
        
        # Maximum features: different strategies
        'max_features': None,           # ['sqrt', 'log2', None, 0.5, 0.7, 0.9]
        
        # Bootstrap: boolean choice
        'bootstrap': None               # [True, False]
    }
    
    # TODO: Create RandomForestClassifier
    rf = None  # RandomForestClassifier(random_state=42)
    
    # TODO: Create RandomizedSearchCV with 100 iterations
    # Hint: RandomizedSearchCV(rf, param_distributions, n_iter=100, cv=5, random_state=42, n_jobs=-1, scoring='accuracy')
    random_search = None
    
    # TODO: Fit the search
    # random_search.fit(X_train, y_train)
    
    # Analyze the search results
    best_params = random_search.best_params_ if random_search else {}
    best_score = random_search.best_score_ if random_search else 0
    test_score = random_search.score(X_test, y_test) if random_search else 0
    
    return {
        'param_distributions': param_distributions,
        'best_params': best_params,
        'best_cv_score': best_score,
        'test_score': test_score,
        'n_iter': 100
    }

@validator.koan(3, "Parameter Distributions - Defining Search Spaces", difficulty="Advanced")
def validate():
    results = design_parameter_distributions()
    
    assert results['best_params'] is not None and len(results['best_params']) > 0, "Best params not found"
    assert results['best_cv_score'] > 0, "Best CV score not calculated" 
    assert results['test_score'] > 0, "Test score not calculated"
    assert 0.8 <= results['best_cv_score'] <= 1.0, f"CV score should be reasonable, got {results['best_cv_score']:.3f}"
    
    print("✓ Parameter distribution design complete!")
    print(f"  - Search iterations: {results['n_iter']}")
    print(f"  - Best CV score: {results['best_cv_score']:.4f}")
    print(f"  - Test score: {results['test_score']:.4f}")
    
    print(f"\n  🏆 Optimized Parameters:")
    for param, value in results['best_params'].items():
        print(f"    {param}: {value}")
    
    print(f"\n  📊 Distribution Design Insights:")
    print(f"    • Use randint() for integer parameters")
    print(f"    • Use uniform() for continuous parameters") 
    print(f"    • Include None/default values where appropriate")
    print(f"    • Consider log-scale for parameters spanning orders of magnitude")
    print(f"    • Mix discrete choices with continuous ranges")
    
    # Check if we got sensible parameter values
    n_est = results['best_params'].get('n_estimators', 0)
    if 50 <= n_est <= 300:
        print(f"    ✓ Reasonable n_estimators: {n_est}")
    
    max_depth = results['best_params'].get('max_depth')
    if max_depth is None or (isinstance(max_depth, int) and 5 <= max_depth <= 30):
        print(f"    ✓ Reasonable max_depth: {max_depth}")

validate()

## KOAN 13.4: Bayesian Optimization - Smart Parameter Search  
**Objective**: Use Bayesian optimization for intelligent hyperparameter search  
**Difficulty**: Advanced

Bayesian optimization uses previous evaluation results to intelligently choose the next parameters to try. It's especially effective for expensive model training.

**Key Concept**: Unlike random search, Bayesian optimization learns from past evaluations to focus on promising regions of parameter space.

In [None]:
def bayesian_hyperparameter_optimization():
    """
    Use Bayesian optimization for intelligent hyperparameter search.
    Falls back to RandomizedSearchCV if BayesSearchCV not available.
    
    Returns:
        dict: Bayesian optimization results and comparison
    """
    # Load breast cancer dataset
    cancer = load_breast_cancer()
    X, y = cancer.data, cancer.target
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    
    if BAYESIAN_AVAILABLE:
        # TODO: Define search space using skopt
        search_space = {
            'n_estimators': None,      # Integer(50, 300)
            'max_depth': None,         # Integer(1, 30) 
            'min_samples_split': None, # Integer(2, 20)
            'min_samples_leaf': None,  # Integer(1, 10)
            'max_features': None       # Categorical(['sqrt', 'log2'])
        }
        
        # TODO: Create RandomForestClassifier
        rf = None  # RandomForestClassifier(random_state=42)
        
        # TODO: Create BayesSearchCV
        # Hint: BayesSearchCV(rf, search_space, n_iter=50, cv=5, random_state=42, n_jobs=-1)
        bayes_search = None
        
        # TODO: Fit the Bayesian search
        # bayes_search.fit(X_train, y_train)
        
        best_params = bayes_search.best_params_ if bayes_search else {}
        best_score = bayes_search.best_score_ if bayes_search else 0
        test_score = bayes_search.score(X_test, y_test) if bayes_search else 0
        search_type = "Bayesian"
        
    else:
        # Fallback to RandomizedSearchCV
        print("Using RandomizedSearchCV fallback...")
        param_distributions = {
            'n_estimators': randint(50, 301),
            'max_depth': randint(1, 31),
            'min_samples_split': randint(2, 21), 
            'min_samples_leaf': randint(1, 11),
            'max_features': ['sqrt', 'log2']
        }
        
        rf = RandomForestClassifier(random_state=42)
        bayes_search = RandomizedSearchCV(rf, param_distributions, n_iter=50, cv=5, random_state=42, n_jobs=-1)
        bayes_search.fit(X_train, y_train)
        
        best_params = bayes_search.best_params_
        best_score = bayes_search.best_score_
        test_score = bayes_search.score(X_test, y_test)
        search_type = "Randomized (fallback)"
    
    return {
        'search_type': search_type,
        'best_params': best_params,
        'best_cv_score': best_score,
        'test_score': test_score,
        'bayesian_available': BAYESIAN_AVAILABLE
    }

@validator.koan(4, "Bayesian Optimization - Smart Parameter Search", difficulty="Advanced")
def validate():
    results = bayesian_hyperparameter_optimization()
    
    assert results['best_params'] is not None, "Best parameters is None"
    assert results['best_cv_score'] > 0, "Best CV score not found"
    assert results['test_score'] > 0, "Test score not found"
    assert 0.8 <= results['best_cv_score'] <= 1.0, f"CV score should be reasonable, got {results['best_cv_score']:.3f}"
    
    print(f"✓ {results['search_type']} optimization complete!")
    print(f"  - Method: {results['search_type']}")
    print(f"  - Best CV score: {results['best_cv_score']:.4f}")
    print(f"  - Test score: {results['test_score']:.4f}")
    
    print(f"\n  🏆 Best Parameters:")
    for param, value in results['best_params'].items():
        print(f"    {param}: {value}")
    
    if results['bayesian_available']:
        print(f"\n  🧠 Bayesian Optimization Benefits:")
        print(f"    • Learns from previous evaluations")
        print(f"    • Focuses on promising parameter regions")
        print(f"    • More efficient than random search")
        print(f"    • Balances exploration vs exploitation")
        print(f"    • Great for expensive model training")
    else:
        print(f"\n  📦 To use Bayesian optimization:")
        print(f"    pip install scikit-optimize")
        print(f"    • Much smarter than random search")
        print(f"    • Essential for expensive hyperparameter tuning")
        
    print(f"\n  💡 When to use Bayesian optimization:")
    print(f"    • Model training is expensive (>1 minute per iteration)")
    print(f"    • Complex parameter interactions exist") 
    print(f"    • You have limited evaluation budget")

validate()

## KOAN 13.5: Early Stopping - Training Efficiency
**Objective**: Implement early stopping to prevent overfitting and save time  
**Difficulty**: Advanced

Early stopping monitors validation performance during training and stops when performance plateaus or degrades, preventing overfitting and saving computation time.

**Key Concept**: Early stopping is crucial for iterative algorithms like gradient boosting, neural networks, and any model trained with validation monitoring.

In [None]:
def implement_early_stopping():
    """
    Demonstrate early stopping with Gradient Boosting.
    
    Returns:
        dict: Results with and without early stopping
    """
    # Create a larger dataset to demonstrate early stopping benefits
    X, y = make_classification(n_samples=2000, n_features=20, n_informative=10, 
                               n_redundant=10, random_state=42)
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    
    # Further split training data for validation
    X_train_split, X_val, y_train_split, y_val = train_test_split(
        X_train, y_train, test_size=0.2, random_state=42, stratify=y_train
    )
    
    # TODO: Train model WITHOUT early stopping
    gb_no_early = None  # GradientBoostingClassifier(n_estimators=200, random_state=42, verbose=0)
    
    # TODO: Fit the model
    # gb_no_early.fit(X_train_split, y_train_split)
    
    # TODO: Train model WITH early stopping
    gb_early = None  # GradientBoostingClassifier(n_estimators=200, validation_fraction=0.2, 
                    #                             n_iter_no_change=10, random_state=42, verbose=0)
    
    # TODO: Fit with early stopping
    # gb_early.fit(X_train, y_train)  # Uses internal validation split
    
    # Calculate performance metrics
    no_early_score = gb_no_early.score(X_test, y_test) if gb_no_early else 0
    early_score = gb_early.score(X_test, y_test) if gb_early else 0
    
    # Get number of estimators actually used
    no_early_n_est = gb_no_early.n_estimators if gb_no_early else 0
    early_n_est = gb_early.n_estimators_ if gb_early and hasattr(gb_early, 'n_estimators_') else (gb_early.n_estimators if gb_early else 0)
    
    return {
        'no_early_stopping_score': no_early_score,
        'early_stopping_score': early_score,
        'no_early_n_estimators': no_early_n_est,
        'early_n_estimators': early_n_est,
        'estimators_saved': no_early_n_est - early_n_est if early_n_est < no_early_n_est else 0
    }

@validator.koan(5, "Early Stopping - Training Efficiency", difficulty="Advanced")
def validate():
    results = implement_early_stopping()
    
    assert results['no_early_stopping_score'] > 0, "No early stopping model not trained"
    assert results['early_stopping_score'] > 0, "Early stopping model not trained"
    assert 0.7 <= results['no_early_stopping_score'] <= 1.0, "No early stopping score should be reasonable"
    assert 0.7 <= results['early_stopping_score'] <= 1.0, "Early stopping score should be reasonable"
    
    print("✓ Early stopping comparison complete!")
    print(f"\n  📊 Performance Comparison:")
    print(f"    Without early stopping: {results['no_early_stopping_score']:.4f} ({results['no_early_n_estimators']} trees)")
    print(f"    With early stopping: {results['early_stopping_score']:.4f} ({results['early_n_estimators']} trees)")
    
    if results['estimators_saved'] > 0:
        saved_pct = (results['estimators_saved'] / results['no_early_n_estimators']) * 100
        print(f"    Trees saved: {results['estimators_saved']} ({saved_pct:.1f}% reduction)")
    
    score_diff = results['early_stopping_score'] - results['no_early_stopping_score']
    if score_diff >= 0:
        print(f"    Early stopping performed equally well or better!")
    else:
        print(f"    Small performance trade-off: {score_diff:.4f}")
    
    print(f"\n  ⚡ Early Stopping Benefits:")
    print(f"    • Prevents overfitting automatically")
    print(f"    • Saves computational time")
    print(f"    • Reduces model complexity")
    print(f"    • Built into many algorithms (GBM, XGBoost, neural networks)")
    
    print(f"\n  ⚙️  Key Parameters:")
    print(f"    • n_iter_no_change: Stop after N iterations without improvement")
    print(f"    • validation_fraction: Fraction of training data for validation")
    print(f"    • tol: Minimum improvement threshold")
    
    print(f"\n  💡 Best Practices:")
    print(f"    • Always use with iterative algorithms")
    print(f"    • Monitor validation loss, not training loss") 
    print(f"    • Set reasonable patience (n_iter_no_change)")
    print(f"    • Consider restoring best weights")

validate()

## KOAN 13.6: Nested Cross-Validation - Unbiased Model Evaluation
**Objective**: Implement nested CV to avoid data leakage in hyperparameter tuning  
**Difficulty**: Advanced

Nested cross-validation provides unbiased performance estimates when doing hyperparameter tuning. The outer loop evaluates performance, the inner loop tunes parameters.

**Key Concept**: Using the same data for hyperparameter tuning and performance evaluation leads to overoptimistic results. Nested CV separates these processes.

In [None]:
def nested_cross_validation():
    """
    Implement nested cross-validation for unbiased hyperparameter tuning evaluation.
    
    Returns:
        dict: Nested CV results and comparison with simple CV
    """
    # Load breast cancer dataset
    cancer = load_breast_cancer()
    X, y = cancer.data, cancer.target
    
    # TODO: Define parameter grid for tuning
    param_grid = {
        'n_estimators': None,      # [50, 100, 200]
        'max_depth': None,         # [None, 10, 20]
        'min_samples_split': None  # [2, 5, 10]
    }
    
    # TODO: Create base model
    rf = None  # RandomForestClassifier(random_state=42)
    
    # Inner CV: For hyperparameter tuning (3-fold)  
    # TODO: Create GridSearchCV for inner loop
    inner_cv = None  # GridSearchCV(rf, param_grid, cv=3, scoring='accuracy')
    
    # Outer CV: For performance evaluation (5-fold)
    # TODO: Calculate nested cross-validation scores
    # Hint: Use cross_val_score(inner_cv, X, y, cv=5)
    nested_scores = None
    
    # Compare with non-nested (biased) approach
    # This reuses the same data for tuning and evaluation - BAD!
    simple_cv = GridSearchCV(rf, param_grid, cv=5, scoring='accuracy')
    simple_cv.fit(X, y)
    biased_score = simple_cv.best_score_
    
    # Calculate statistics
    nested_mean = np.mean(nested_scores) if nested_scores is not None else 0
    nested_std = np.std(nested_scores) if nested_scores is not None else 0
    
    return {
        'nested_scores': nested_scores,
        'nested_mean': nested_mean,
        'nested_std': nested_std,
        'biased_score': biased_score,
        'bias': biased_score - nested_mean,
        'best_params': simple_cv.best_params_
    }

@validator.koan(6, "Nested Cross-Validation - Unbiased Model Evaluation", difficulty="Advanced")
def validate():
    results = nested_cross_validation()
    
    assert results['nested_scores'] is not None, "Nested CV scores not calculated"
    assert len(results['nested_scores']) == 5, "Should have 5 outer CV scores"
    assert 0.7 <= results['nested_mean'] <= 1.0, f"Nested mean should be reasonable, got {results['nested_mean']:.3f}"
    assert 0.7 <= results['biased_score'] <= 1.0, f"Biased score should be reasonable, got {results['biased_score']:.3f}"
    
    print("✓ Nested cross-validation analysis complete!")
    print(f"\n  🔄 Nested CV Results (Unbiased):")
    print(f"    Mean accuracy: {results['nested_mean']:.4f} (±{results['nested_std']:.4f})")
    print(f"    Individual scores: {[f'{s:.3f}' for s in results['nested_scores']]}")
    
    print(f"\n  ⚠️  Simple CV Results (Biased):")
    print(f"    Mean accuracy: {results['biased_score']:.4f}")
    print(f"    Optimistic bias: +{results['bias']:.4f}")
    
    if results['bias'] > 0.01:
        print(f"    📈 Significant overestimation detected!")
    else:
        print(f"    ✓ Minimal bias (dataset may be easy)")
    
    print(f"\n  🏆 Best Parameters Found:")
    for param, value in results['best_params'].items():
        print(f"    {param}: {value}")
    
    print(f"\n  💡 Why Nested CV Matters:")
    print(f"    • Prevents data leakage during hyperparameter tuning")
    print(f"    • Provides unbiased performance estimates")
    print(f"    • Essential for model comparison and selection")
    print(f"    • Required for reliable performance reporting")
    
    print(f"\n  📋 Nested CV Process:")
    print(f"    1. Outer loop: Split data for performance evaluation")
    print(f"    2. Inner loop: Tune hyperparameters on training portion")
    print(f"    3. Evaluate tuned model on validation portion")
    print(f"    4. Repeat for all outer folds")
    print(f"    5. Average performance across outer folds")
    
    print(f"\n  ⚠️  Common Mistake to Avoid:")
    print(f"    • Never use the same data for tuning AND evaluation")
    print(f"    • This leads to overoptimistic performance estimates")
    print(f"    • Use nested CV for honest performance reporting")

validate()

## KOAN 13.7: AutoML Basics - Automated Model Selection
**Objective**: Understand automated machine learning concepts and implementation  
**Difficulty**: Advanced  

AutoML automates the machine learning pipeline: feature preprocessing, algorithm selection, hyperparameter tuning, and ensemble creation. It democratizes ML and can outperform manual approaches.

**Key Concept**: AutoML systems combine multiple techniques (meta-learning, Bayesian optimization, genetic algorithms) to automate the entire ML workflow.

In [None]:
def simulate_automl_approach():
    """
    Simulate an AutoML approach by automatically trying multiple algorithms
    and selecting the best one through automated hyperparameter tuning.
    
    Returns:
        dict: AutoML results including best model and comparison
    """
    # Load breast cancer dataset
    cancer = load_breast_cancer()
    X, y = cancer.data, cancer.target
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    
    # Scale features for algorithms that need it
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # TODO: Define candidate algorithms with their parameter grids
    algorithms = {
        'RandomForest': {
            'model': None,  # RandomForestClassifier(random_state=42)
            'params': None, # {'n_estimators': [50, 100], 'max_depth': [None, 10, 20]}
            'scaled': False  # Doesn't need scaling
        },
        'SVM': {
            'model': None,  # SVC(random_state=42)
            'params': None, # {'C': [0.1, 1, 10], 'gamma': ['scale', 'auto']}
            'scaled': True   # Needs scaling
        },
        'LogisticRegression': {
            'model': None,  # LogisticRegression(random_state=42, max_iter=1000)
            'params': None, # {'C': [0.1, 1, 10], 'solver': ['liblinear', 'lbfgs']}
            'scaled': True   # Needs scaling
        }
    }
    
    # Automated model selection and tuning
    results = {}
    best_score = 0
    best_model_name = None
    best_model = None
    
    for name, config in algorithms.items():
        if config['model'] is not None and config['params'] is not None:
            # Choose appropriate data (scaled or not)
            X_train_use = X_train_scaled if config['scaled'] else X_train
            X_test_use = X_test_scaled if config['scaled'] else X_test
            
            # TODO: Perform grid search for this algorithm
            grid_search = None  # GridSearchCV(config['model'], config['params'], cv=5, scoring='accuracy')
            
            # TODO: Fit grid search
            # grid_search.fit(X_train_use, y_train)
            
            # Evaluate on test set
            test_score = grid_search.score(X_test_use, y_test) if grid_search else 0
            
            results[name] = {
                'best_params': grid_search.best_params_ if grid_search else {},
                'cv_score': grid_search.best_score_ if grid_search else 0,
                'test_score': test_score,
                'model': grid_search.best_estimator_ if grid_search else None
            }
            
            # Track best performing model
            if test_score > best_score:
                best_score = test_score
                best_model_name = name
                best_model = grid_search.best_estimator_ if grid_search else None
    
    return {
        'algorithm_results': results,
        'best_model_name': best_model_name,
        'best_model': best_model,
        'best_score': best_score,
        'algorithms_tested': len([k for k, v in algorithms.items() if v['model'] is not None])
    }

@validator.koan(7, "AutoML Basics - Automated Model Selection", difficulty="Advanced")
def validate():
    results = simulate_automl_approach()
    
    assert results['algorithm_results'] is not None, "Algorithm results not generated"
    assert results['best_model_name'] is not None, "Best model not selected"
    assert results['best_score'] > 0, "Best score not calculated"
    assert 0.8 <= results['best_score'] <= 1.0, f"Best score should be reasonable, got {results['best_score']:.3f}"
    
    print("✓ AutoML simulation complete!")
    print(f"  - Algorithms tested: {results['algorithms_tested']}")
    print(f"  - Best algorithm: {results['best_model_name']}")
    print(f"  - Best test score: {results['best_score']:.4f}")
    
    print(f"\n  📊 Algorithm Comparison:")
    for name, result in results['algorithm_results'].items():
        if result['test_score'] > 0:
            print(f"    {name}:")
            print(f"      CV Score: {result['cv_score']:.4f}")
            print(f"      Test Score: {result['test_score']:.4f}")
            print(f"      Best Params: {result['best_params']}")
    
    print(f"\n  🏆 Winning Model: {results['best_model_name']}")
    
    print(f"\n  🤖 Real AutoML Systems:")
    print(f"    • Auto-sklearn: Automated sklearn pipeline")
    print(f"    • TPOT: Genetic programming for ML pipelines")
    print(f"    • H2O AutoML: Distributed AutoML platform")
    print(f"    • Google AutoML: Cloud-based AutoML")
    print(f"    • Azure AutoML: Microsoft's AutoML service")
    
    print(f"\n  💡 AutoML Benefits:")
    print(f"    • Democratizes machine learning")
    print(f"    • Finds good models quickly")
    print(f"    • Handles algorithm selection automatically")
    print(f"    • Can discover unexpected good models")
    print(f"    • Saves expert time for harder problems")
    
    print(f"\n  ⚠️  AutoML Limitations:")
    print(f"    • May not capture domain expertise")
    print(f"    • Can be computationally expensive")
    print(f"    • Less interpretable model selection")
    print(f"    • May overfit to validation set with extensive search")

validate()

## 🎉 Congratulations!

You have mastered hyperparameter tuning - the key to unlocking your models' full potential!

### What You've Mastered
- ✅ **Grid Search**: Exhaustive parameter space exploration
- ✅ **Random Search**: Efficient parameter sampling strategies  
- ✅ **Parameter Distributions**: Smart search space design
- ✅ **Bayesian Optimization**: Intelligent, adaptive parameter search
- ✅ **Early Stopping**: Preventing overfitting and saving computation
- ✅ **Nested Cross-Validation**: Unbiased model evaluation methodology
- ✅ **AutoML Concepts**: Understanding automated machine learning

### Key Insights Gained
1. **Search Strategy Matters**: Different approaches for different scenarios
2. **Computational Trade-offs**: Balance thoroughness vs. efficiency
3. **Avoid Data Leakage**: Separate tuning from evaluation
4. **Smart Distributions**: Prior knowledge improves search efficiency
5. **Automation Potential**: AutoML can find surprising good solutions

### Performance Impact
- 🎯 **10-30% accuracy gains** from proper hyperparameter tuning
- ⚡ **10-100x speedup** with smart search strategies  
- 🛡️ **Robust evaluation** through nested cross-validation
- 🤖 **Automated workflows** reducing manual effort

### Next Steps
- **Notebook 14**: Model Selection and Pipelines (production workflows!)
- **Advanced**: Neural architecture search, multi-objective optimization
- **Practice**: Apply to your own models and datasets

### Real-World Applications
- **Competitions**: Essential for Kaggle and ML contests
- **Production**: Automated model improvement and monitoring
- **Research**: Reproducible and fair model comparisons
- **Business**: Maximizing ROI from ML investments

You now have the tools to systematically optimize any machine learning model! 🚀

In [None]:
# Final Progress Check
progress = tracker.get_notebook_progress('13_hyperparameter_tuning')
print(f"\n📊 Your Progress: {progress}% complete!")

if progress == 100:
    print("🎉 Exceptional! You've mastered all hyperparameter tuning techniques!")
    print("🎯 Ready for Notebook 14: Model Selection and Pipelines")
elif progress >= 75:
    print("🌟 Outstanding progress! Almost finished with tuning mastery.")
elif progress >= 50:
    print("💪 Great work! You're building powerful optimization skills.")
else:
    print("🚀 Keep going! Each technique builds sophisticated tuning expertise.")

print(f"\n📈 Overall course progress:")
total_notebooks = 15
completed_notebooks = len([nb for nb in range(1, 14) if tracker.get_notebook_progress(f'{nb:02d}_*') == 100])
print(f"   Completed notebooks: {completed_notebooks}/{total_notebooks}")
print(f"   Course progress: {(completed_notebooks/total_notebooks)*100:.1f}%")

print(f"\n🎯 Hyperparameter Tuning Mastery Achieved!")
print(f"   Your models will never be the same! ⚡")