# X3: Hyperparameter Tuning & AutoMLSystematic optimization of model hyperparameters for maximum performance.

## Table of Contents1. Grid Search2. Random Search3. Bayesian Optimization4. Hyperband5. AutoML Tools6. Best Practices

In [None]:
import numpy as npfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.datasets import load_breast_cancerfrom sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCVfrom sklearn.metrics import accuracy_scoreimport warningswarnings.filterwarnings('ignore')

## 1. Grid Search**Exhaustive search over parameter grid**Pros: Guaranteed to find best in gridCons: Exponential time complexity

In [None]:
data = load_breast_cancer()X, y = data.data, data.targetX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)param_grid = {    'n_estimators': [50, 100, 200],    'max_depth': [3, 5, 7, None],    'min_samples_split': [2, 5, 10]}grid = GridSearchCV(RandomForestClassifier(random_state=42), param_grid, cv=3, n_jobs=-1)grid.fit(X_train, y_train)print(f'Best params: {grid.best_params_}')print(f'Best CV score: {grid.best_score_:.3f}')print(f'Test accuracy: {accuracy_score(y_test, grid.predict(X_test)):.3f}')

## 2. Random Search**Sample random parameter combinations**Pros: More efficient than grid searchCons: Might miss optimal configuration

In [None]:
from scipy.stats import randint, uniformparam_dist = {    'n_estimators': randint(50, 200),    'max_depth': [3, 5, 7, 10, None],    'min_samples_split': randint(2, 20),    'max_features': uniform(0.5, 0.5)}random_search = RandomizedSearchCV(    RandomForestClassifier(random_state=42),    param_distributions=param_dist,    n_iter=20,    cv=3,    random_state=42,    n_jobs=-1)random_search.fit(X_train, y_train)print(f'Best params: {random_search.best_params_}')print(f'Best CV score: {random_search.best_score_:.3f}')

## Best Practices1. **Start with Random Search** for initial exploration2. **Grid Search** to refine around best region3. **Use cross-validation** (k=5 minimum)4. **Nested CV** for unbiased performance estimate5. **Consider compute budget** (time vs accuracy trade-off)**Hyperparameter Importance:**- Important first (e.g., learning_rate, n_estimators)- Less important can use fewer values**Remember:** More data > better hyperparameters!
