<div style="text-align: center; font-size: 30px; font-weight: bold; margin-bottom: 20px;">
    Program 3
</div>


### **Aim**
Understanding Artificial Neural Networks (ANN) and the role of activation functions in influencing accuracy and data transformation.

### **Theory**

#### Multilayer Perceptron (MLP)

A Multilayer Perceptron is a deep neural network composed of fully connected layers, each followed by a non-linear activation function. MLPs are capable of learning complex non-linear patterns through iterative weight updates using backpropagation. They rely on dense connections and activations to transform input data into meaningful internal representations.

#### Data Augmentation

Data augmentation artificially expands the training dataset by applying random transformations such as rotations, translations, flips, and intensity changes. For image tasks, augmentation increases data variability and helps the model generalize better to unseen samples. It reduces overfitting by preventing the network from memorizing training examples and exposes the model to more diverse scenarios without collecting new data.

#### Optimizers

Optimizers control how model weights are updated during training.

* **SGD (Stochastic Gradient Descent)** performs parameter updates using mini-batch gradients. It is simple and effective but may converge slowly.
* **Adam** adapts learning rates for each parameter using estimates of first and second moments of gradients, allowing faster and more stable convergence.
* **RMSProp** maintains a moving average of squared gradients and adapts learning rates, making it effective for training non-stationary problems.

Different optimizers influence training speed, stability, and final accuracy depending on the dataset and architecture.

#### Dropout

Dropout randomly disables a fraction of neurons during training with a specified probability. This prevents neurons from co-adapting too heavily on specific features, thereby reducing overfitting. When dropout is used, the network learns more robust feature representations. During inference, dropout is turned off, and the network uses all connections scaled appropriately.

#### Training Dynamics

Combining data augmentation, appropriate optimizers, and dropout significantly improves MLP performance. Augmentation increases diversity, dropout reduces overfitting, and optimizers ensure efficient and stable learning. Together, these components help build resilient and high-performing neural network models.

### **Source Code**

#### Importing Dependencies

In [3]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split, Subset
import numpy as np
import itertools

#### Data augmentation

In [5]:
# Data augmentation for training
train_transform = transforms.Compose([
    transforms.RandomRotation(10), # random rotation
    transforms.RandomAffine(0, translate=(0.1, 0.1)), # random translations
    transforms.ToTensor(), # 0-1 torch tensor
    transforms.Normalize((0.5,), (0.5,)) # normalizing pixels (-1,1)
])

# No augmentation for test set
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

#### Loading Dataset

In [6]:
train_dataset = datasets.MNIST(root="./data", train=True, download=True, transform=train_transform)
test_dataset = datasets.MNIST(root="./data", train=False, download=True, transform=test_transform)

train_subset, val_subset = random_split(train_dataset, [50000, 10000])

train_loader = DataLoader(train_subset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_subset, batch_size=64)
test_loader = DataLoader(test_dataset, batch_size=64)

100.0%
100.0%
100.0%
100.0%


### MLP Definition

In [7]:
class MLP(nn.Module):
    def __init__(self, hidden1=256, hidden2=128, dropout=0.3):
        super(MLP, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, hidden1),
            nn.ReLU(),
            nn.Dropout(dropout),

            nn.Linear(hidden1, hidden2),
            nn.ReLU(),
            nn.Dropout(dropout),

            nn.Linear(hidden2, 10)
        )

    def forward(self, x):
        return self.model(x)

#### Training and evaluation loops

In [8]:
def train(model, loader, criterion, optimizer, device):
    model.train()
    total_loss = total_correct = 0

    for data, target in loader:
        data, target = data.to(device), target.to(device)

        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)

        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        total_correct += (output.argmax(1) == target).sum().item()

    return total_loss / len(loader), total_correct / len(loader.dataset)


def evaluate(model, loader, criterion, device):
    model.eval()
    total_loss = total_correct = 0

    with torch.no_grad():
        for data, target in loader:
            data, target = data.to(device), target.to(device)

            output = model(data)
            loss = criterion(output, target)

            total_loss += loss.item()
            total_correct += (output.argmax(1) == target).sum().item()

    return total_loss / len(loader), total_correct / len(loader.dataset)

### Hyperparameter tuning

In [9]:
device = "cuda" if torch.cuda.is_available() else "cpu"

param_grid = {
    "hidden1": [128, 256],
    "hidden2": [64, 128],
    "dropout": [0.2, 0.3],
    "optimizer": ["adam", "sgd", "rmsprop"],
    "lr": [0.001, 0.0005]
}

def create_optimizer(name, params, lr):
    if name == "adam": return optim.Adam(params, lr=lr)
    if name == "sgd": return optim.SGD(params, lr=lr, momentum=0.9)
    if name == "rmsprop": return optim.RMSprop(params, lr=lr)

In [10]:
best_acc = 0
best_params = None

for hidden1, hidden2, dropout, opt_name, lr in itertools.product(
        param_grid["hidden1"],
        param_grid["hidden2"],
        param_grid["dropout"],
        param_grid["optimizer"],
        param_grid["lr"]):

    print(f"Testing: h1={hidden1}, h2={hidden2}, drop={dropout}, opt={opt_name}, lr={lr}")

    model = MLP(hidden1=hidden1, hidden2=hidden2, dropout=dropout).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = create_optimizer(opt_name, model.parameters(), lr)

    # Train for fewer epochs for tuning
    for epoch in range(3):
        train(model, train_loader, criterion, optimizer, device)

    val_loss, val_acc = evaluate(model, val_loader, criterion, device)
    print(f"Val Acc: {val_acc:.4f}")

    if val_acc > best_acc:
        best_acc = val_acc
        best_params = (hidden1, hidden2, dropout, opt_name, lr)

print("Best Hyperparameters:", best_params)
print("Best Validation Accuracy:", best_acc)

Testing: h1=128, h2=64, drop=0.2, opt=adam, lr=0.001
Val Acc: 0.8881
Testing: h1=128, h2=64, drop=0.2, opt=adam, lr=0.0005
Val Acc: 0.8921
Testing: h1=128, h2=64, drop=0.2, opt=sgd, lr=0.001
Val Acc: 0.7125
Testing: h1=128, h2=64, drop=0.2, opt=sgd, lr=0.0005
Val Acc: 0.6178
Testing: h1=128, h2=64, drop=0.2, opt=rmsprop, lr=0.001
Val Acc: 0.8673
Testing: h1=128, h2=64, drop=0.2, opt=rmsprop, lr=0.0005
Val Acc: 0.8585
Testing: h1=128, h2=64, drop=0.3, opt=adam, lr=0.001
Val Acc: 0.8806
Testing: h1=128, h2=64, drop=0.3, opt=adam, lr=0.0005
Val Acc: 0.8759
Testing: h1=128, h2=64, drop=0.3, opt=sgd, lr=0.001
Val Acc: 0.7127
Testing: h1=128, h2=64, drop=0.3, opt=sgd, lr=0.0005
Val Acc: 0.5810
Testing: h1=128, h2=64, drop=0.3, opt=rmsprop, lr=0.001
Val Acc: 0.8590
Testing: h1=128, h2=64, drop=0.3, opt=rmsprop, lr=0.0005
Val Acc: 0.8503
Testing: h1=128, h2=128, drop=0.2, opt=adam, lr=0.001
Val Acc: 0.9019
Testing: h1=128, h2=128, drop=0.2, opt=adam, lr=0.0005
Val Acc: 0.8992
Testing: h1=128, 

#### Training with best hyperparams and optimizer

In [11]:
h1, h2, drop, opt_name, lr = best_params
model = MLP(hidden1=h1, hidden2=h2, dropout=drop).to(device)

criterion = nn.CrossEntropyLoss()
optimizer = create_optimizer(opt_name, model.parameters(), lr)

for epoch in range(10):
    train_loss, train_acc = train(model, train_loader, criterion, optimizer, device)
    val_loss, val_acc = evaluate(model, val_loader, criterion, device)
    print(f"Epoch {epoch+1}: Train Acc={train_acc:.4f}, Val Acc={val_acc:.4f}")


Epoch 1: Train Acc=0.6908, Val Acc=0.8681
Epoch 2: Train Acc=0.8483, Val Acc=0.9006
Epoch 3: Train Acc=0.8731, Val Acc=0.9211
Epoch 4: Train Acc=0.8885, Val Acc=0.9350
Epoch 5: Train Acc=0.8944, Val Acc=0.9370
Epoch 6: Train Acc=0.9019, Val Acc=0.9415
Epoch 7: Train Acc=0.9066, Val Acc=0.9397
Epoch 8: Train Acc=0.9107, Val Acc=0.9436
Epoch 9: Train Acc=0.9130, Val Acc=0.9509
Epoch 10: Train Acc=0.9133, Val Acc=0.9511


In [12]:
test_loss, test_acc = evaluate(model, test_loader, criterion, device)
print("Final Test Accuracy:", test_acc)

Final Test Accuracy: 0.97
