# Day 16 – Hyperparameter Tuning using GridSearchCV and RandomizedSearchCV

## What is Hyperparameter Tuning?

Hyperparameters are external configurations to a model that are **not learned during training** (e.g., `n_estimators`, `max_depth`, `C`, `learning_rate`).

Tuning these hyperparameters is crucial to:
- Improve model performance
- Avoid overfitting/underfitting
- Optimize training time and generalization

---

## Two Common Methods in Scikit-learn

### 1. GridSearchCV
- Performs an **exhaustive search** over a specified parameter grid.
- Tries **every combination** of hyperparameters.
- Uses **cross-validation** to evaluate performance.

**Pros**: Finds the best set within the grid  
**Cons**: Computationally expensive when grid is large




### 2. RandomizedSearchCV in Scikit-learn

## Overview

`RandomizedSearchCV` is a technique used for **hyperparameter tuning** by randomly sampling a **fixed number of parameter combinations** from a specified distribution.

Unlike `GridSearchCV` (which tries all combinations), `RandomizedSearchCV`:
- Samples combinations randomly
- Is faster for large hyperparameter spaces
- Can find a good combination with fewer iterations

---

## When to Use

- When the hyperparameter space is **very large**
- When **training time is expensive**
- When **approximate solutions** are acceptable

---



In [1]:
# Day 16 – Hyperparameter Tuning using GridSearchCV and RandomizedSearchCV

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.metrics import classification_report
import numpy as np

# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Base model
rf = RandomForestClassifier(random_state=42)

# GridSearchCV: exhaustive parameter search
param_grid = {
    'n_estimators': [50, 100, 150],
    'max_depth': [None, 3, 5, 10],
    'min_samples_split': [2, 5, 10]
}
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

print("Best Parameters from GridSearchCV:")
print(grid_search.best_params_)

# Evaluate on test set
y_pred_grid = grid_search.predict(X_test)
print("\nClassification Report (GridSearchCV):")
print(classification_report(y_test, y_pred_grid))

# RandomizedSearchCV: random sampling from param distributions
param_dist = {
    'n_estimators': np.arange(50, 200, 10),
    'max_depth': [None, 3, 5, 10, 15],
    'min_samples_split': np.arange(2, 11)
}
random_search = RandomizedSearchCV(estimator=rf, param_distributions=param_dist,
                                   n_iter=10, cv=5, random_state=42, scoring='accuracy')
random_search.fit(X_train, y_train)

print("\nBest Parameters from RandomizedSearchCV:")
print(random_search.best_params_)

# Evaluate on test set
y_pred_random = random_search.predict(X_test)
print("\nClassification Report (RandomizedSearchCV):")
print(classification_report(y_test, y_pred_random))


Best Parameters from GridSearchCV:
{'max_depth': None, 'min_samples_split': 2, 'n_estimators': 100}

Classification Report (GridSearchCV):
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45


Best Parameters from RandomizedSearchCV:
{'n_estimators': np.int64(110), 'min_samples_split': np.int64(10), 'max_depth': 5}

Classification Report (RandomizedSearchCV):
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00     

Final Observations

    Both tuning methods yielded 100% accuracy on the test set.

    Indicates that the dataset is well-separated and Random Forest is highly effective for this task.

    While both approaches gave perfect scores, RandomizedSearchCV found a more regularized model (max_depth=5, min_samples_split=10), which might generalize better.