<div style="text-align: center; font-size: 30px; font-weight: bold; margin-bottom: 20px;">
    Program 5
</div>


### **Aim**
Transfer Learning on pretrained model (ResNet, VGG-16) for MNIST dataset

### **Theory**

#### Transfer Learning

Transfer learning is a deep learning approach in which knowledge from a large pretrained model is reused for a new, typically smaller task. Instead of training a model from scratch, layers learned on a large dataset (such as ImageNet with over a million images) are repurposed as feature extractors. This dramatically reduces training time and improves accuracy, especially when the target dataset is small or less complex, such as MNIST. Transfer learning leverages previously learned low-level and high-level visual features, allowing the model to generalize well even with limited training data.

#### Pretrained Models: ResNet and VGG-16

ResNet and VGG-16 are two widely used convolutional neural network architectures trained on ImageNet.

* **ResNet** introduces residual connections that allow gradients to flow more easily through deep networks, solving the vanishing gradient problem. Its skip connections enable very deep architectures while maintaining stability and performance.
* **VGG-16** uses a simple and uniform architecture composed of stacked 3×3 convolutions. Despite its depth, VGG-16 is easy to understand and serves as a strong feature extractor due to its large capacity.

In transfer learning, the convolutional layers of these pretrained models are used as fixed feature extractors. Only the final fully connected classifier is replaced and retrained to match the number of target classes (10 for MNIST).

#### Feature Extraction and Fine-Tuning

Transfer learning typically begins with **feature extraction**, where pretrained convolutional layers are frozen, and only the new classifier layer is trained. This allows the model to make use of powerful learned filters without altering them. If higher performance is needed, **fine-tuning** can be applied by unfreezing some deeper layers, allowing the model to adapt the pretrained features more specifically to the MNIST digit images. Fine-tuning requires a lower learning rate to avoid overwriting useful pretrained knowledge.

#### Advantages for MNIST

Since MNIST images are simple, small, and grayscale, training a deep CNN from scratch may be unnecessary. Transfer learning with models like ResNet and VGG-16 provides several benefits:

* Faster convergence due to rich initial representations
* Higher accuracy thanks to pretrained visual features
* Reduced need for large amounts of data
* Improved generalization on the test set

By resizing MNIST digits and converting them to 3-channel images, pretrained architectures can effectively classify digits with very high accuracy, making transfer learning an efficient and powerful method for this task.

### **Source Code**

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader, random_split

In [3]:
train_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.Grayscale(num_output_channels=3),  # convert 1→3 channels
    transforms.RandomRotation(10),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

test_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.Grayscale(num_output_channels=3),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

#### Loading dataset

In [4]:
train_dataset = datasets.MNIST(root="./data", train=True, download=True,
                               transform=train_transform)
test_dataset  = datasets.MNIST(root="./data", train=False, download=True,
                               transform=test_transform)

train_subset, val_subset = random_split(train_dataset, [55000, 5000])

train_loader = DataLoader(train_subset, batch_size=64, shuffle=True)
val_loader   = DataLoader(val_subset, batch_size=64)
test_loader  = DataLoader(test_dataset, batch_size=64)

#### Loading pretrained ResNet18

In [5]:
device = "cuda" if torch.cuda.is_available() else "cpu"

resnet = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)

for param in resnet.parameters():
    param.requires_grad = False

resnet.fc = nn.Linear(resnet.fc.in_features, 10)

resnet = resnet.to(device)

#### Training and Evaluation

In [6]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(resnet.fc.parameters(), lr=0.001)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)

In [7]:
def train(model, loader, criterion, optimizer, device):
    model.train()
    total_loss, correct = 0, 0

    for x, y in loader:
        x, y = x.to(device), y.to(device)

        optimizer.zero_grad()
        out = model(x)
        loss = criterion(out, y)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        correct += (out.argmax(1) == y).sum().item()

    return total_loss / len(loader), correct / len(loader.dataset)


def evaluate(model, loader, criterion, device):
    model.eval()
    total_loss, correct = 0, 0

    with torch.no_grad():
        for x, y in loader:
            x, y = x.to(device), y.to(device)
            out = model(x)
            loss = criterion(out, y)

            total_loss += loss.item()
            correct += (out.argmax(1) == y).sum().item()

    return total_loss / len(loader), correct / len(loader.dataset)


In [8]:
EPOCHS = 10

for epoch in range(EPOCHS):
    train_loss, train_acc = train(resnet, train_loader, criterion, optimizer, device)
    val_loss, val_acc     = evaluate(resnet, val_loader, criterion, device)

    scheduler.step()

    print(f"Epoch {epoch+1}/{EPOCHS} | Train Acc: {train_acc:.4f} | Val Acc: {val_acc:.4f}")


Epoch 1/10 | Train Acc: 0.8636 | Val Acc: 0.9250
Epoch 2/10 | Train Acc: 0.9235 | Val Acc: 0.9354
Epoch 3/10 | Train Acc: 0.9318 | Val Acc: 0.9372
Epoch 4/10 | Train Acc: 0.9379 | Val Acc: 0.9422
Epoch 5/10 | Train Acc: 0.9391 | Val Acc: 0.9466
Epoch 6/10 | Train Acc: 0.9453 | Val Acc: 0.9466
Epoch 7/10 | Train Acc: 0.9432 | Val Acc: 0.9508
Epoch 8/10 | Train Acc: 0.9470 | Val Acc: 0.9478
Epoch 9/10 | Train Acc: 0.9478 | Val Acc: 0.9492
Epoch 10/10 | Train Acc: 0.9470 | Val Acc: 0.9528


In [9]:
test_loss, test_acc = evaluate(resnet, test_loader, criterion, device)
print("ResNet18 Test Accuracy:", test_acc)

ResNet18 Test Accuracy: 0.9414
