# ‚öôÔ∏è Hyperparameter Tuning

**Author**: Data Science Master System  
**Difficulty**: ‚≠ê‚≠ê Intermediate  
**Time**: 45 minutes  
**Prerequisites**: 08_feature_engineering

## Learning Objectives
- Grid Search vs Random Search
- Bayesian Optimization with Optuna
- Cross-validation strategies
- Early stopping

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, cross_val_score
from sklearn.ensemble import RandomForestClassifier
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)

## 1. Load Data

In [None]:
X, y = load_breast_cancer(return_X_y=True)
print(f"Data: {X.shape}")

## 2. Grid Search

In [None]:
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 10],
    'min_samples_split': [2, 5]
}

grid_search = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)
grid_search.fit(X, y)

print(f"‚úÖ Best params: {grid_search.best_params_}")
print(f"‚úÖ Best score: {grid_search.best_score_:.4f}")

## 3. Random Search

In [None]:
from scipy.stats import randint, uniform

param_dist = {
    'n_estimators': randint(50, 300),
    'max_depth': randint(3, 20),
    'min_samples_split': randint(2, 10)
}

random_search = RandomizedSearchCV(
    RandomForestClassifier(random_state=42),
    param_dist,
    n_iter=20,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)
random_search.fit(X, y)

print(f"‚úÖ Best params: {random_search.best_params_}")
print(f"‚úÖ Best score: {random_search.best_score_:.4f}")

## 4. Optuna (Bayesian Optimization)

In [None]:
try:
    import optuna
    optuna.logging.set_verbosity(optuna.logging.WARNING)
    
    def objective(trial):
        params = {
            'n_estimators': trial.suggest_int('n_estimators', 50, 300),
            'max_depth': trial.suggest_int('max_depth', 3, 20),
            'min_samples_split': trial.suggest_int('min_samples_split', 2, 10)
        }
        model = RandomForestClassifier(**params, random_state=42)
        return cross_val_score(model, X, y, cv=5).mean()
    
    study = optuna.create_study(direction='maximize')
    study.optimize(objective, n_trials=20, show_progress_bar=True)
    
    print(f"‚úÖ Best params: {study.best_params}")
    print(f"‚úÖ Best score: {study.best_value:.4f}")
except ImportError:
    print("Install optuna: pip install optuna")

## 5. Comparison

In [None]:
comparison = pd.DataFrame({
    'Method': ['Grid Search', 'Random Search', 'Bayesian (Optuna)'],
    'Pros': ['Exhaustive', 'Fast, large space', 'Smart, efficient'],
    'Cons': ['Slow', 'May miss optimal', 'More complex'],
    'Best For': ['Small grids', 'Large spaces', 'Production']
})
display(comparison)

## üéØ Key Takeaways
- Grid: exhaustive, small space
- Random: faster, better coverage
- Bayesian: intelligent, production-ready

**Next**: 10_model_evaluation.ipynb