<div style="text-align: center; font-size: 30px; font-weight: bold; margin-bottom: 20px;">
    Program 4
</div>


### **Aim**
Understanding Artificial Neural Networks (ANN) and the role of activation functions in influencing accuracy and data transformation.

### **Theory**

#### Convolutional Neural Networks (CNNs)

Convolutional Neural Networks are specialized deep learning models designed for processing grid-like data, such as images. Unlike fully connected networks, CNNs use convolutional layers that learn spatial features through trainable filters. These filters detect patterns like edges, textures, and shapes by sliding over the input image. The hierarchical nature of CNNs allows shallow layers to learn low-level features and deeper layers to learn high-level representations.

A typical CNN architecture includes convolutional layers, activation functions (commonly ReLU), pooling layers for spatial downsampling, and fully connected layers for final classification. CNNs excel in image recognition tasks due to their ability to exploit spatial locality and reduce the number of trainable parameters compared to dense networks.

#### Data Augmentation

Data augmentation increases dataset variability by applying controlled random transformations such as random cropping, horizontal flipping, rotation, color jitter, and normalization. This technique reduces overfitting and improves model robustness by enabling the CNN to learn invariant features under slight visual distortions. Augmentation acts as a form of regularization, making the model more generalizable to unseen images.

#### Optimizers

Optimizers determine how the networkâ€™s parameters are updated during training.

* **Adam** uses adaptive learning rates and momentum to converge quickly and stably, making it widely used in CNN training.
* **SGD with momentum** can yield strong generalization but often requires careful tuning.
  Learning rate schedulers such as StepLR adjust the learning rate over epochs, helping avoid plateaus and improving convergence.

#### Dropout

Dropout is a regularization technique applied primarily in fully connected layers. During training, a percentage of neurons are randomly disabled, preventing the model from relying too heavily on specific activations. This encourages redundancy in feature learning, reduces overfitting, and improves generalization. Dropout is especially useful in deep CNNs where dense layers can easily overfit due to their large number of parameters.

#### Importance of These Techniques

Combining CNN architectures with data augmentation, dropout, and well-selected optimizers creates a robust and high-performing image classification system. Augmentation increases effective data size, dropout reduces overfitting, and optimizers ensure stable, efficient learning. Together, they significantly enhance accuracy and generalization across real-world image tasks.

### **Source Code**

#### Dependency Imports

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms

#### Data Augmentation

In [2]:
train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5),
                         (0.5, 0.5, 0.5))
])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5),
                         (0.5, 0.5, 0.5))
])

#### Loading dataset

In [3]:
train_dataset = datasets.CIFAR10("./data", train=True, download=True,
                                 transform=train_transform)
test_dataset = datasets.CIFAR10("./data", train=False, download=True,
                                transform=test_transform)

train_subset, val_subset = random_split(train_dataset,
                                        [45000, 5000])

train_loader = DataLoader(train_subset, batch_size=64, shuffle=True)
val_loader   = DataLoader(val_subset, batch_size=64)
test_loader  = DataLoader(test_dataset, batch_size=64)

100.0%


#### CNN Architecture

In [4]:
class CNN(nn.Module):
    def __init__(self, dropout=0.3):
        super(CNN, self).__init__()

        self.features = nn.Sequential(
            nn.Conv2d(3, 32, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(32, 32, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout(dropout),

            nn.Conv2d(32, 64, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(64, 64, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout(dropout),
        )

        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 8 * 8, 256),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(256, 10)
        )

    def forward(self, x):
        return self.classifier(self.features(x))


### Optimizers and Scheduler

In [5]:
device = "cuda" if torch.cuda.is_available() else "cpu"

model = CNN(dropout=0.3).to(device)
criterion = nn.CrossEntropyLoss()

optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = optim.lr_scheduler.StepLR(optimizer,
                                      step_size=10,
                                      gamma=0.5)

#### Training and Evalution loops

In [6]:
def train(model, loader, criterion, optimizer, device):
    model.train()
    total_loss, total_correct = 0, 0

    for x, y in loader:
        x, y = x.to(device), y.to(device)

        optimizer.zero_grad()
        outputs = model(x)
        loss = criterion(outputs, y)

        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        total_correct += (outputs.argmax(1) == y).sum().item()

    return total_loss / len(loader), total_correct / len(loader.dataset)


def evaluate(model, loader, criterion, device):
    model.eval()
    total_loss, total_correct = 0, 0

    with torch.no_grad():
        for x, y in loader:
            x, y = x.to(device), y.to(device)

            outputs = model(x)
            loss = criterion(outputs, y)

            total_loss += loss.item()
            total_correct += (outputs.argmax(1) == y).sum().item()

    return total_loss / len(loader), total_correct / len(loader.dataset)


In [7]:
EPOCHS = 20

for epoch in range(EPOCHS):
    train_loss, train_acc = train(model, train_loader,
                                  criterion, optimizer, device)
    val_loss, val_acc = evaluate(model, val_loader,
                                 criterion, device)

    scheduler.step()

    print(f"Epoch {epoch+1}/{EPOCHS} | "
          f"Train Acc: {train_acc:.4f} | Val Acc: {val_acc:.4f}")

Epoch 1/20 | Train Acc: 0.3568 | Val Acc: 0.4764
Epoch 2/20 | Train Acc: 0.4925 | Val Acc: 0.5310
Epoch 3/20 | Train Acc: 0.5600 | Val Acc: 0.6036
Epoch 4/20 | Train Acc: 0.6012 | Val Acc: 0.6274
Epoch 5/20 | Train Acc: 0.6308 | Val Acc: 0.6602
Epoch 6/20 | Train Acc: 0.6508 | Val Acc: 0.6938
Epoch 7/20 | Train Acc: 0.6710 | Val Acc: 0.7128
Epoch 8/20 | Train Acc: 0.6782 | Val Acc: 0.7116
Epoch 9/20 | Train Acc: 0.6858 | Val Acc: 0.7176
Epoch 10/20 | Train Acc: 0.6984 | Val Acc: 0.7116
Epoch 11/20 | Train Acc: 0.7158 | Val Acc: 0.7474
Epoch 12/20 | Train Acc: 0.7248 | Val Acc: 0.7474
Epoch 13/20 | Train Acc: 0.7250 | Val Acc: 0.7480
Epoch 14/20 | Train Acc: 0.7284 | Val Acc: 0.7430
Epoch 15/20 | Train Acc: 0.7306 | Val Acc: 0.7402
Epoch 16/20 | Train Acc: 0.7327 | Val Acc: 0.7538
Epoch 17/20 | Train Acc: 0.7379 | Val Acc: 0.7654
Epoch 18/20 | Train Acc: 0.7413 | Val Acc: 0.7716
Epoch 19/20 | Train Acc: 0.7452 | Val Acc: 0.7642
Epoch 20/20 | Train Acc: 0.7484 | Val Acc: 0.7682


In [8]:
test_loss, test_acc = evaluate(model, test_loader,
                               criterion, device)

print("Final Test Accuracy:", test_acc)

Final Test Accuracy: 0.7919
