
# 🔍 Hyperparameter Tuning in Neural Network

This notebook explains what **hyperparameter tuning** is, how it works, why it's important, and demonstrates three common techniques:

1. Grid Search
2. Random Search
3. Bayesian Optimization

---



## 💡 What is Hyperparameter Tuning?

**Hyperparameters** are parameters that are set **before training** and are not learned from the data. Examples include:
- Learning rate
- Batch size
- Number of hidden layers or neurons
- Regularization strength

**Hyperparameter tuning** is the process of finding the best combination of these values to **optimize model performance**.

---



## 🌟 Why is Hyperparameter Tuning Important?

Choosing the right hyperparameters can:
- Increase accuracy
- Reduce training time
- Improve generalization (prevent overfitting or underfitting)

Manual tuning is time-consuming, so we use automated strategies.

---



## 🧪 Grid Search

**Grid Search** exhaustively tries every combination of hyperparameters from a predefined grid.

### 🔁 How It Works:
- Define a set of values for each hyperparameter.
- Try **every possible combination** of these values.
- Evaluate each using cross-validation or a validation set.

📌 **Limitation**: Computationally expensive as it grows exponentially with the number of parameters.

---


In [1]:

import tensorflow as tf
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report

# Load the dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize pixel values to [0, 1]
x_train = x_train / 255.0
x_test = x_test / 255.0

# Flatten the images for dense input
x_train = x_train.reshape((-1, 28 * 28))
x_test = x_test.reshape((-1, 28 * 28))

# Split validation set from training data
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]


# MLP and hyperparameter grid
param_grid = {
    'hidden_layer_sizes': [(50,), (100,), (100, 50)],
    'activation': ['relu', 'tanh'],
    'alpha': [0.0001, 0.001]
}

mlp = MLPClassifier(max_iter=20, random_state=42)
grid_search = GridSearchCV(mlp, param_grid, cv=3, verbose=2, n_jobs=-1)
grid_search.fit(x_train, y_train)

# Evaluate the best model
print("Best Parameters:", grid_search.best_params_)
y_pred = grid_search.predict(x_val)
print("Classification Report:")
print(classification_report(y_val, y_pred))



Fitting 3 folds for each of 12 candidates, totalling 36 fits
Best Parameters: {'activation': 'relu', 'alpha': 0.0001, 'hidden_layer_sizes': (100, 50)}
Classification Report:
              precision    recall  f1-score   support

           0       0.99      0.98      0.99       991
           1       0.98      0.99      0.99      1064
           2       0.97      0.98      0.98       990
           3       0.98      0.96      0.97      1030
           4       0.98      0.97      0.97       983
           5       0.96      0.96      0.96       915
           6       0.96      0.99      0.98       967
           7       0.99      0.98      0.98      1090
           8       0.97      0.96      0.97      1009
           9       0.96      0.97      0.96       961

    accuracy                           0.98     10000
   macro avg       0.97      0.97      0.97     10000
weighted avg       0.98      0.98      0.98     10000





## 📘 Understanding Parameters of GridSearchCV

| Parameter                   | Description                                                                                                                               |
| --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| **`estimator=mlp`**         | The machine learning model to be optimized. In this case, `mlp` is an instance of `MLPClassifier`.                                        |
| **`param_grid=param_grid`** | A dictionary specifying the grid of hyperparameters to search. Each key is a parameter name, and each value is a list of settings to try. |
| **`cv=3`**                  | Number of cross-validation folds. The dataset is split into 3 parts for training/validation.                                              |
| **`verbose=2`**             | Controls the level of verbosity. `2` prints progress messages after each combination is evaluated.                                        |
| **`n_jobs=-1`**             | Number of jobs to run in parallel. `-1` uses all available processors for faster computation.                                             |



## 🎲 Random Search

**Random Search** samples random combinations of hyperparameters from a predefined distribution.

### 🔁 How It Works:
- Define a distribution or list of possible values.
- Randomly select a fixed number of combinations.
- Evaluate those combinations.

📌 **Advantage**: Much faster than grid search and often finds a good result early.

---


In [2]:

import tensorflow as tf
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import classification_report


# Define search space
param_dist = {
    'hidden_layer_sizes': [(50,), (100,), (100, 50), (50, 50)],
    'activation': ['relu', 'tanh'],
    'alpha': [0.0001, 0.001, 0.01]
}

mlp = MLPClassifier(max_iter=20, random_state=42)
random_search = RandomizedSearchCV(mlp, param_distributions=param_dist, n_iter=10, cv=3, verbose=2, n_jobs=-1, random_state=42)
random_search.fit(x_train, y_train)

# Evaluate the best model
print("Best Parameters:", random_search.best_params_)
y_pred = random_search.predict(x_val)
print("Classification Report:")
print(classification_report(y_val, y_pred))



Fitting 3 folds for each of 10 candidates, totalling 30 fits
Best Parameters: {'hidden_layer_sizes': (100, 50), 'alpha': 0.001, 'activation': 'tanh'}
Classification Report:
              precision    recall  f1-score   support

           0       0.99      0.99      0.99       991
           1       0.98      0.99      0.99      1064
           2       0.98      0.98      0.98       990
           3       0.97      0.97      0.97      1030
           4       0.98      0.98      0.98       983
           5       0.98      0.95      0.96       915
           6       0.98      0.99      0.98       967
           7       0.98      0.98      0.98      1090
           8       0.97      0.98      0.97      1009
           9       0.97      0.96      0.97       961

    accuracy                           0.98     10000
   macro avg       0.98      0.98      0.98     10000
weighted avg       0.98      0.98      0.98     10000





## 🧩 Parameters of `RandomizedSearchCV`

| Parameter                 | Description                                                                                                                |
| ------------------------- | -------------------------------------------------------------------------------------------------------------------------- |
| **`estimator=mlp`**       | The model to be tuned. In this case, it's an instance of `MLPClassifier`.                                                  |
| **`param_distributions`** | A dictionary or distribution object that defines the **range of hyperparameters** to sample from.                          |
| **`n_iter=10`**           | The number of different combinations to sample and try. This is **not exhaustive**—only 10 random combinations are tested. |
| **`cv=3`**                | Number of folds for **cross-validation**. The model is evaluated on 3 different train/validation splits.                   |
| **`verbose=2`**           | Controls the level of output verbosity. `2` prints one line per iteration.                                                 |
| **`n_jobs=-1`**           | Number of jobs to run in parallel. `-1` means use **all available CPU cores**.                                             |
| **`random_state=42`**     | Sets a **random seed** for reproducibility of the sampling and results.                                                    |

