<a href="https://colab.research.google.com/github/shuvad23/Deep-learning-with-PyTorch/blob/main/Hyperparameter_Tuning_the_ANN_using_Optuna.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

"Hyperparameter tuning is the process of finding the best configuration of hyperparameters (the settings you choose before training a model) to maximize performance in machine learning and deep learning."

üî• What Are Hyperparameters?
- Hyperparameters are external settings that control how a model learns.
They are not learned from data ‚Äî you pick them manually or let an algorithm search for the best ones.

‚úÖ Examples in Machine Learning:

  - Learning rate (Œ∑)

  - Number of trees in Random Forest

  - Maximum depth of a decision tree

  - Number of neighbors (K) in KNN

  - Regularization strength (C) in SVM or Logistic Regression

‚úÖ Examples in Deep Learning:

  - Learning rate

  - Number of layers (depth)

  - Number of neurons per layer

  - Activation functions

  - Batch size

  - Dropout rate

  - Optimizer (Adam, SGD, RMSprop, etc.)

---

üîß What Is Hyperparameter Tuning?

- Hyperparameter tuning means:

    - Trying different combinations of hyperparameters to find the one that gives the best accuracy, loss, or performance on validation data.

  - It‚Äôs like adjusting the knobs of the model until it performs the best.


üîç Why Is Hyperparameter Tuning Important?

- Because wrong hyperparameters ‚Üí bad results, even if the model architecture is good.

  - Good tuning can:

  - Increase accuracy

  - Reduce overfitting

  - Speed up training

  - Improve model stability

üß™ Common Hyperparameter Tuning Methods

‚≠ê 1. Grid Search

  - Try every possible combination.

  - Pro: Finds best among listed options.

  - Con: Very slow for large search spaces.

‚≠ê 2. Random Search

  - Randomly sample combinations.

  - Pro: Much faster than grid search.

  - Con: Might skip good combinations.

‚≠ê 3. Bayesian Optimization

  - Uses probabilities to choose the next best hyperparameters.

  - Pro: Very efficient

  - Con: Harder to implement

‚≠ê 4. Hyperband / ASHA (Deep Learning)

  - Early-stops bad models and saves training time.

‚≠ê 5. Genetic Algorithms / Evolutionary Search

  - Search based on mutation & selection.

---
üî• Hyperparameter Tuning an ANN Using Optuna (PyTorch Example)

- Optuna is a state-of-the-art hyperparameter optimization framework.
It automatically finds the best learning rate, hidden units, optimizer, dropout, etc.

## ‚úÖ Step-by-Step Code: ANN + Optuna Tuning

In [26]:
#install optuna
!pip install optuna==4.6.0
!pip install sympy==1.12

Collecting sympy==1.12
  Downloading sympy-1.12-py3-none-any.whl.metadata (12 kB)
Downloading sympy-1.12-py3-none-any.whl (5.7 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m5.7/5.7 MB[0m [31m39.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sympy
  Attempting uninstall: sympy
    Found existing installation: sympy 1.14.0
    Uninstalling sympy-1.14.0:
      Successfully uninstalled sympy-1.14.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torch 2.9.0+cpu requires sympy>=1.13.3, but you have sympy 1.12 which is incompatible.[0m[31m
[0mSuccessfully installed sympy-1.12


üß† 1. Build a Simple ANN Class

In [14]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import optuna

In [15]:
# Example dataset (dummy)
x = torch.randn(1000,20) # 1000 samples, 20 features
y = torch.randint(0,2,(1000,)) # 1000 binary labels

In [16]:
x.shape

torch.Size([1000, 20])

In [17]:
x

tensor([[ 0.3852, -0.2024,  0.6418,  ...,  0.5914,  0.9515, -1.0156],
        [ 0.7890, -0.2004, -0.9029,  ...,  1.0663, -0.3850,  0.1282],
        [ 0.4612,  0.0124, -0.2938,  ..., -0.2692, -0.2672,  0.0660],
        ...,
        [-0.1010,  1.3794,  0.9487,  ..., -0.4104,  0.3701,  0.7955],
        [-0.8198, -0.3324,  0.8307,  ..., -0.7330, -0.8682,  1.4792],
        [ 0.9425, -0.6863,  1.8670,  ...,  1.0880,  0.8200,  1.7518]])

In [18]:
y.shape

torch.Size([1000])

In [19]:
y

tensor([1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,
        1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1,
        1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
        1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1,
        0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1,
        1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1,
        1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0,
        0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
        1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0,
        0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1,
        1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1,
        0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1,
        0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0,

In [20]:
dataset = TensorDataset(x,y)
train_loader = DataLoader(dataset,batch_size=32,shuffle=True)
test_loader = DataLoader(dataset,batch_size=32,shuffle=False)

üèó 2. Define the ANN model

In [21]:
class ANN(nn.Module):
    def __init__(self,input_dim,hidden_dim,output_dim,dropout_rate):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim,hidden_dim),
            nn.BatchNorm1d(hidden_dim),
            nn.ReLU(),
            nn.Dropout(dropout_rate),
            nn.Linear(hidden_dim,output_dim)
        )
    def forward(self,x):
        return self.net(x)


üéØ 3. The Optuna Objective Function

- Optuna will:

  - suggest learning rate

  - suggest hidden units

  - suggest dropout

  - pick optimizer

  - return accuracy

In [22]:
def objective(trial):

    # hyperparameters to tune
    input = x.shape[1]
    output = 2  # Changed from 1 to 2 for binary classification with CrossEntropyLoss
    hidden_dim = trial.suggest_int('hidden_dim',16,256)
    dropout_rate = trial.suggest_float('dropout_rate',0.0,0.5)
    learning_rate = trial.suggest_float('learning_rate',1e-5,1e-1,log=True)
    batch_size = trial.suggest_categorical('batch_size',[32,64,128])
    optimizer_name = trial.suggest_categorical('optimizer',['Adam','RMSprop','SGD'])

    #Model
    model = ANN(input_dim=input,hidden_dim=hidden_dim,output_dim=output,dropout_rate=dropout_rate)
    criterion = nn.CrossEntropyLoss()

    #optimizer
    optimizer = getattr(optim,optimizer_name)(model.parameters(),lr=learning_rate)

    # training loop(train 10 epoch)
    model.train()
    for epoch in range(10):
        for batch_idx,(data,target) in enumerate(train_loader):
            optimizer.zero_grad()
            preds = model(data)
            loss = criterion(preds,target)
            loss.backward()
            optimizer.step()

    # Evaluate accuracy
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for data,target in test_loader:
            preds = model(data)
            predicted = preds.argmax(dim=1,keepdim=True)
            correct += (predicted == target.view_as(predicted)).sum().item()
            total += target.size(0)

    accuracy = correct/total
    return accuracy

üöÄ 4. Run Optuna Study

In [23]:
# üöÄ 4. Run Optuna Study
study = optuna.create_study(direction='maximize')
study.optimize(objective,n_trials=20)


[I 2025-12-13 20:15:20,450] A new study created in memory with name: no-name-ec633a4e-1815-4888-af77-f531edfb0776
[I 2025-12-13 20:15:21,111] Trial 0 finished with value: 0.576 and parameters: {'hidden_dim': 148, 'dropout_rate': 0.4964464565881205, 'learning_rate': 0.00020050410940550896, 'batch_size': 128, 'optimizer': 'Adam'}. Best is trial 0 with value: 0.576.
[I 2025-12-13 20:15:21,752] Trial 1 finished with value: 0.518 and parameters: {'hidden_dim': 151, 'dropout_rate': 0.4928859208300117, 'learning_rate': 1.2561779356883488e-05, 'batch_size': 64, 'optimizer': 'Adam'}. Best is trial 0 with value: 0.576.
[I 2025-12-13 20:15:22,238] Trial 2 finished with value: 0.568 and parameters: {'hidden_dim': 255, 'dropout_rate': 0.2318087242976828, 'learning_rate': 0.012738910186494258, 'batch_size': 32, 'optimizer': 'SGD'}. Best is trial 0 with value: 0.576.
[I 2025-12-13 20:15:22,737] Trial 3 finished with value: 0.506 and parameters: {'hidden_dim': 199, 'dropout_rate': 0.17784263391728494,

üèÜ 5. Print Best Hyperparameters

In [24]:
# üèÜ 5. Print Best Hyperparameters
print("Best Hyperparameters:", study.best_params)
for idx,(key, value) in enumerate(study.best_params.items()):
    print(f"\t{idx+1}- {key}: {value}")
print("Best Accuracy:",study.best_value)

Best Hyperparameters: {'hidden_dim': 193, 'dropout_rate': 0.10529330995375707, 'learning_rate': 0.0026656472362355187, 'batch_size': 128, 'optimizer': 'Adam'}
	1- hidden_dim: 193
	2- dropout_rate: 0.10529330995375707
	3- learning_rate: 0.0026656472362355187
	4- batch_size: 128
	5- optimizer: Adam
Best Accuracy: 0.762


## Hyperparameter tuning for an Artificial Neural Network using Optuna and PyTorch

1. Flexible ANN Architecture

    - Variable number of layers (1-4)
    - Variable neurons per layer (16-256)
    - Configurable dropout rates

2. Hyperparameters Being Tuned:

    - Number of hidden layers
    - Number of neurons in each layer
    - Dropout rate
    - Learning rate
    - Batch size
    - Optimizer type (Adam, SGD, RMSprop)
    - Weight decay (L2 regularization)

3. Optuna Features:

    - MedianPruner: Stops unpromising trials early to save computation
    - Intermediate reporting: Monitors accuracy during training
    - Log-scale suggestions: For learning rate and other exponential parameters
    - Categorical choices: For optimizer and batch size
      
4. Complete Workflow:

    - Data generation and preprocessing
    - Study creation and optimization
    - Results summary
    - Final model training with best parameters
    - Optional visualization of optimization history

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import optuna
import numpy as np
from torch.utils.data import DataLoader, TensorDataset
from optuna.trial import Trial
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt
from optuna.visualization import plot_optimization_history, plot_param_importances


In [2]:
# Generate synthetic dataset
x,y = make_classification(
    n_samples = 1000,
    n_features = 20,
    n_informative = 15,
    n_redundant = 5,
    random_state = 42
)

In [3]:
# split and scale data
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.2,random_state=42)
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

In [4]:
# convert to pytorch tensors
x_train_tensor = torch.FloatTensor(x_train)
x_test_tensor = torch.FloatTensor(x_test)
y_train_tensor = torch.LongTensor(y_train)
y_test_tensor = torch.LongTensor(y_test)

In [5]:
# create datasets
train_dataset = TensorDataset(x_train_tensor, y_train_tensor)
test_dataset = TensorDataset(x_test_tensor, y_test_tensor)

In [9]:
# ANN class
class FlexibleANN(nn.Module):
    """Flexible ANN architecture with variable layers and neurons."""
    def __init__(self,input_size,hidden_sizes,output_size,dropout_rate):
        super(FlexibleANN,self).__init__()

        layers = []
        prev_size = input_size

        for hidden_size in hidden_sizes:
            layers.append(nn.Linear(prev_size,hidden_size))
            layers.append(nn.BatchNorm1d(hidden_size))
            layers.append(nn.ReLU()) # Fixed: Removed trailing comma
            layers.append(nn.Dropout(dropout_rate))
            prev_size = hidden_size

        # output_layer
        layers.append(nn.Linear(prev_size,output_size))
        self.network = nn.Sequential(*layers)

    def forward(self, x):
        return self.network(x)

In [10]:
# objective trial function
def objective(trial: Trial):
    """Optuna objective function for hyperparameter optimization."""

    # Suggest hyperparameters
    input_size = x_train_tensor.shape[1]
    output_size = len(torch.unique(y_train_tensor))
    n_layers = trial.suggest_int('n_layers',1,4)
    hidden_sizes = [
        trial.suggest_int(f'n_units_l{i}',16,256, log = True)
        for i in range(n_layers)
    ]
    n_epochs = trial.suggest_int('n_epochs',10,50)
    dropout_rate = trial.suggest_float('dropout_rate',0.1,0.5)
    learning_rate = trial.suggest_float('lr', 1e-5,1e-1,log = True)
    batch_size = trial.suggest_categorical('batch_size',[16,32,64,128])
    optimizer_name = trial.suggest_categorical('optimizer',['Adam','RMSprop','SGD'])
    weight_decay = trial.suggest_float('weight_decay',1e-5,1e-2,log=True)

    # create data loaders
    train_loader = DataLoader(train_dataset,batch_size=batch_size,shuffle = True)
    test_loader = DataLoader(test_dataset,batch_size=batch_size,shuffle = False)


    # Initialize model
    model = FlexibleANN(
        input_size = input_size,
        hidden_sizes = hidden_sizes,
        output_size = output_size,
        dropout_rate = dropout_rate
    )

    # initialize optimizer

    # optimizer = getattr(optim,optimizer_name)(
    #     model.parameters(),lr=learning_rate,weight_decay=weight_decay
    # )

    # another type of optimizer inilialialization
    if optimizer_name == 'Adam':
        optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=weight_decay)
    elif optimizer_name == 'RMSprop':
        optimizer = optim.RMSprop(model.parameters(),lr=learning_rate,weight_decay=weight_decay)
    elif optimizer_name == 'SGD':
        optimizer = optim.SGD(model.parameters(),lr=learning_rate,weight_decay=weight_decay,momentum=0.9)
    else:
        raise ValueError(f"Unknown optimizer: {optimizer_name}")


    criterion = nn.CrossEntropyLoss() # Changed to CrossEntropyLoss

    # Training Loop
    for epoch in range(n_epochs):
        model.train()
        for batch_x,batch_y in train_loader:
            optimizer.zero_grad()
            outputs = model(batch_x) # Removed .squeeze()
            loss = criterion(outputs,batch_y)
            loss.backward()
            optimizer.step()

        # report intermediate value for pruning
        if epoch % 10 == 0:
            model.eval()
            correct = 0
            total = 0

            with torch.no_grad():
                for batch_x, batch_y in test_loader:
                    outputs = model(batch_x) # Removed .squeeze()
                    predicted = outputs.argmax(dim=1) # Changed for CrossEntropyLoss
                    total += batch_y.size(0)
                    correct += (predicted == batch_y).sum().item()

            accuracy = correct / total
            trial.report(accuracy,epoch)

            if trial.should_prune():
                raise optuna.exceptions.TrialPruned()

      # final evalution
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_x, batch_y in test_loader:
            outputs = model(batch_x) # Removed .squeeze()
            predicted = outputs.argmax(dim=1) # Changed for CrossEntropyLoss
            total += batch_y.size(0)
            correct += (predicted == batch_y).sum().item()

    accuracy = correct / total
    return accuracy

In [11]:
# Create and run study
if __name__ == "__main__":
    # Create study with pruning
    study = optuna.create_study(
        direction = 'maximize',
        pruner = optuna.pruners.MedianPruner(n_startup_trials=5,n_warmup_steps=10)
    )
    print("Starting hyperparameter optimization...")
    study.optimize(objective,n_trials=50,show_progress_bar=True)
    print("Hyperparameter optimization completed.")


     # Print results
    print("\n" + "="*50)
    print("Optimization Results")
    print("="*50)
    print(f"\nBest trial accuracy: {study.best_trial.value:.4f}")
    print("\nBest hyperparameters:")
    for key, value in study.best_trial.params.items():
        print(f"  {key}: {value}")


    best_params = study.best_trial.params
    hidden_sizes = [
        best_params[f'n_units_l{i}']
        for i in range(best_params['n_layers'])
    ]

    final_model = FlexibleANN(
        input_size = x_train_tensor.shape[1],
        hidden_sizes = hidden_sizes,
        output_size = len(torch.unique(y_train_tensor)),
        dropout_rate = best_params['dropout_rate'] # Fixed: Access dropout_rate correctly
    )

    train_loader = DataLoader(train_dataset, batch_size = best_params['batch_size'],shuffle = True)

    if best_params['optimizer'] == 'Adam':
        optimizer = optim.Adam(
            final_model.parameters(),
            lr=best_params['lr'],
            weight_decay=best_params['weight_decay']
        )
    elif best_params['optimizer'] == 'SGD':
        optimizer = optim.SGD(
            final_model.parameters(),
            lr=best_params['lr'],
            weight_decay=best_params['weight_decay'],
            momentum=0.9
        )
    else:
        optimizer = optim.RMSprop(
            final_model.parameters(),
            lr=best_params['lr'],
            weight_decay=best_params['weight_decay']
        )

    criterion = nn.CrossEntropyLoss() # Changed to CrossEntropyLoss

    for epoch in range(best_params['n_epochs']):
        final_model.train()
        for batch_X, batch_y in train_loader:
            optimizer.zero_grad()
            outputs = final_model(batch_X) # Removed .squeeze()
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()

    # Final test accuracy
    final_model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_X, batch_y in DataLoader(test_dataset, batch_size=64):
            outputs = final_model(batch_X) # Removed .squeeze()
            predicted = outputs.argmax(dim=1) # Changed for CrossEntropyLoss
            total += batch_y.size(0)
            correct += (predicted == batch_y).sum().item()

    final_accuracy = correct / total
    print(f"\nFinal model test accuracy: {final_accuracy:.4f}")



[I 2025-12-13 20:06:51,616] A new study created in memory with name: no-name-e11ce891-51a3-47aa-b971-b4bbc552621c


Starting hyperparameter optimization...


  0%|          | 0/50 [00:00<?, ?it/s]

[I 2025-12-13 20:06:52,874] Trial 0 finished with value: 0.95 and parameters: {'n_layers': 1, 'n_units_l0': 218, 'n_epochs': 11, 'dropout_rate': 0.2820784709491412, 'lr': 0.0007926122274782242, 'batch_size': 16, 'optimizer': 'Adam', 'weight_decay': 0.0011339680192357262}. Best is trial 0 with value: 0.95.
[I 2025-12-13 20:06:56,183] Trial 1 finished with value: 0.83 and parameters: {'n_layers': 2, 'n_units_l0': 204, 'n_units_l1': 45, 'n_epochs': 34, 'dropout_rate': 0.48221576013948275, 'lr': 7.758419447256699e-05, 'batch_size': 16, 'optimizer': 'SGD', 'weight_decay': 2.1433412717605534e-05}. Best is trial 0 with value: 0.95.
[I 2025-12-13 20:06:57,807] Trial 2 finished with value: 0.825 and parameters: {'n_layers': 3, 'n_units_l0': 54, 'n_units_l1': 170, 'n_units_l2': 64, 'n_epochs': 37, 'dropout_rate': 0.26824384358861475, 'lr': 9.313165362578166e-05, 'batch_size': 64, 'optimizer': 'SGD', 'weight_decay': 0.002950680995996447}. Best is trial 0 with value: 0.95.
[I 2025-12-13 20:06:59,5


Visualization plots generated!


In [13]:
# visualize optimization history
try:
    fig1 = plot_optimization_history(study)
    fig1.show()

    fig2 = plot_param_importances(study)
    fig2.show()

    print("\nVisualization plots generated!")
except ImportError:
    print("\nInstall matplotlib and plotly for visualizations:")
    print("pip install matplotlib plotly")


Visualization plots generated!
