## Mounting the asset directory containing dataset

In [None]:
import os

local_assets_b = False

if local_assets_b:
  assets_dir = "/content/assets/P3/"

  if not os.path.isdir(assets_dir):
    assert os.path.isfile("assets.zip")
    os.system("unzip assets.zip")
else:
  from google.colab import drive
  drive.mount('/content/drive')
  assets_dir = '/content/drive/MyDrive/CV-1/assets/P3/'

Mounted at /content/drive


# Transfer Learning Introduction

Transfer learning is a technique in deep learning where pre-trained models trained on large-scale datasets are leveraged to solve new tasks with limited labeled data. It involves taking a pre-trained model, which has learned rich and generalized features from a source task, and fine-tuning it on a target task.

### What is the dataset being used?
The dataset being used is CIFAR-10 is a widely used benchmark dataset in computer vision and machine learning. It consists of 60,000 small-sized color images (32x32 pixels) belonging to 10 different classes, with 6,000 images per class. The dataset is split into a training set of 50,000 images and a test set of 10,000 images. The classes in CIFAR-10 include common objects like airplanes, automobiles, birds, cats, deer, dogs, frogs, horses, ships, and trucks. CIFAR-10 serves as a good dataset for evaluating and benchmarking image classification models.

When it comes to transfer learning, VGG is often used as a backbone model. Its pre-trained weights, which have been learned on the large-scale ImageNet dataset, capture generic features like edges, textures, and shapes that are beneficial for various visual recognition tasks. By leveraging the pre-trained VGG model, we can fine-tune it on the CIFAR-10 dataset to perform image classification. The lower-level layers of VGG capture low-level features, such as edges and corners, while the higher-level layers learn more complex features. This enables VGG to extract meaningful representations from images and generalize well to new tasks with limited labeled data.

By fine-tuning VGG on the CIFAR-10 dataset, we can take advantage of the pre-trained weights and learn task-specific features for image classification. This approach is effective when the target task has a similar domain or visual characteristics as the source task on which VGG was pre-trained. Transfer learning with VGG can help achieve better performance on CIFAR-10 by leveraging the knowledge learned from ImageNet, even with a smaller dataset.

## Transfer Learning (VGG, ResNet) vs Building a model from Scratch

Below we will develop 3 models to show the benefits of using Transfer Learning. Transfer Learning will help us save time (which is very valuable) and cost (computation required is less, equally valuable).

### Using VGG for Transfer Learning


### What is VGG?
VGG (Visual Geometry Group) is a popular deep convolutional neural network (CNN) architecture developed by the Visual Geometry Group at the University of Oxford. VGGNet is known for its simplicity and effectiveness in image classification tasks. It consists of multiple convolutional layers followed by fully connected layers. The most common variant, VGG-16, has 16 layers, including 13 convolutional layers and 3 fully connected layers. VGGNet has achieved impressive results on various image classification benchmarks, including the ImageNet challenge.


### What are Residual Networks (ResNet)?

ResNet, short for Residual Network, is a specific type of neural network that was introduced in 2015 by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun in their paper “Deep Residual Learning for Image Recognition”. The ResNet models were extremely successful, winning 1st place in the ILSVRC 2015 classification competition with a top-5 error rate of 3.57%. ResNet is a fully convolutional neural network with an encoder-decoder structure that has been adapted to incorporate other convolutional neural network architecture designs. The architecture of ResNet can be broadly thought of as a deep neural network with skip connections between layers that add the outputs from previous layers to the outputs of stacked layers. This results in the ability to train much deeper neural networks without running into the vanishing gradient problem. ResNet has many variants that run on the same concept but have different numbers of layers. ResNet50 is used to denote the variant that can work with 50 neural network layers. ResNet has significantly enhanced the performance of neural networks with more layers and has been used in various domains, including computer vision and biomedical imaging.

### Imports

In [None]:
# define the imports
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
from torchvision.models import vgg16
import numpy as np

In [None]:
# Define a transform to preprocess the data
transform = transforms.Compose([
    transforms.ToTensor()
])

trainset = torchvision.datasets.CIFAR10(root = './data', train = True, download = True, transform=transform)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:01<00:00, 105398039.04it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data


This below snippet helps us to calculate the mean and standard deviation of the dataset.

In [None]:
# Calculate the mean and standard deviation
mean = np.mean(trainset.data, axis=(0, 1, 2)) / 255.0
std = np.std(trainset.data, axis=(0, 1, 2)) / 255.0

print('Mean:', mean)
print('Standard Deviation:', std)

Mean: [0.49139968 0.48215841 0.44653091]
Standard Deviation: [0.24703223 0.24348513 0.26158784]


We use those values to normalize the dataset.

### Image Transformation
We will be performing the following transformations:

*   Random Crop
*   Flip
*   Normalize data

In [None]:
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding = 4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2434, 0.2616))
])

Test time Transformations
* Normalization

In [None]:
transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2434, 0.2616))
])

### Dataset

Datasets are the collections of your training, validation, and test data. They consist of input samples and their corresponding target labels (for supervised learning). In PyTorch, datasets are typically created using custom classes inheriting from `torch.utils.data.Dataset`. You load your data into this class, allowing easy access during training.

In [None]:
## Preparing dataset
trainset = torchvision.datasets.CIFAR10(root = './data', train = True, download = True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size = 128, shuffle = True, num_workers = 2)

Files already downloaded and verified


In [None]:
classes = ('plane', 'car', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck')

### Dataloaders

Data Loaders wrap your dataset and provide functionalities for iterating through batches of data during training. They handle shuffling, batching, and parallel data loading, optimizing the data pipeline.

In [None]:
testset = torchvision.datasets.CIFAR10(root = './data', train = False, download = True, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size = 256, shuffle = True, num_workers = 2)

Files already downloaded and verified


### Learning Rate

The learning rate is a hyperparameter that controls how much the model's parameters should be updated during training.

In [None]:
lr = 0.001

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

### Loss Function

Loss functions measure the difference between the predicted output and the actual target values. Common loss functions include Cross-Entropy Loss for classification tasks and Mean Squared Error for regression tasks.

In [None]:
criterion = nn.CrossEntropyLoss()

### Load the Pretrained Model

Since the last layer of VGG is 1000 (as it was trained for ImageNet which contains 1000 classes) we are removing that and connecting the second last layer to the number of classes we currently have i.e. 2.

In [None]:
# Load the pre-trained VGG-16 model
vgg = vgg16(pretrained=True)

# Modify the last layer of VGG by changing it to 10 classes instead of 1000 as trained for ImageNet
vgg.classifier[6] = nn.Linear(in_features=4096, out_features=len(classes))

device = torch.device("cuda" if torch.cuda.is_available() else "cpu");
print("device: ", device)
vgg.to(device)

Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100%|██████████| 528M/528M [00:02<00:00, 276MB/s]


device:  cuda


VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

### Optimizer

Optimizers are algorithms that adjust the model's parameters during training to minimize the loss function. Common optimizers include SGD (Stochastic Gradient Descent), Adam, and RMSprop.

In [None]:
vgg_optimizer = optim.SGD(vgg.parameters(), lr = 1e-3, momentum=0.9, weight_decay = 5e-4)

### Scheduler

A scheduler adjusts the learning rate dynamically during training, allowing fine-tuning.

Cosine Annealing: The learning rate starts high and is annealed down to a minimum value following a cosine curve. It helps the model explore the search space broadly at the beginning of training and then refine the search space as it converges.

T_max: This parameter defines the total number of iterations it takes to complete one cycle of the cosine function. The learning rate will follow a cosine curve for the first T_max iterations and then restart the cycle.

Here's a conceptual explanation:

At the start of training, the learning rate is relatively high, allowing the model to explore a larger area of the loss landscape.
As training progresses (over the T_max iterations), the learning rate decreases following a cosine curve.
When T_max iterations are completed, the learning rate is at its minimum.
The scheduler then restarts the cosine curve, and the learning rate starts to increase again, allowing the model to explore broadly for the next cycle.
This approach often helps models converge more efficiently by first exploring broadly and then refining their parameters as training progresses.

In [None]:
vgg_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(vgg_optimizer, T_max = 200)

Set the models in training mode

In [None]:
vgg.train()

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

In [None]:
vgg_model = vgg.to(device)

In [None]:
def train_batch(epoch, model, optimizer):
    print("epoch ", epoch)
    model.train()
    train_loss = 0
    correct = 0
    total = 0

    for batch_idx, (input, targets) in enumerate(trainloader):
        inputs, targets = input.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()
    print(batch_idx, len(trainloader), 'Train Loss: %.3f | Acc: %.3f%% (%d/%d)'
                         % (train_loss/(batch_idx+1), 100.*correct/total, correct, total))


In [None]:
def validate_batch(epoch, model):
    global best_acc
    model.eval()
    test_loss = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(testloader):
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, targets)

            test_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

    print(batch_idx, len(testloader), 'Validation Loss: %.3f | Acc: %.3f%% (%d/%d)'
                 % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))

### Training the Model

Note: The below code takes around 10 mins to run, so you can load the model in case you would like to play around without waiting.

In [None]:
start_epoch = 0
for epoch in range(start_epoch, start_epoch+10):
    train_batch(epoch, vgg_model, vgg_optimizer)
    validate_batch(epoch, vgg_model)
    vgg_scheduler.step()

390 391 Loss: 0.384 | Acc: 86.760% (43380/50000)
epoch  0
39 40 Loss: 0.406 | Acc: 85.780% (8578/10000)
390 391 Loss: 0.357 | Acc: 87.650% (43825/50000)
epoch  1
39 40 Loss: 0.391 | Acc: 86.660% (8666/10000)
390 391 Loss: 0.338 | Acc: 88.344% (44172/50000)
epoch  2
39 40 Loss: 0.373 | Acc: 87.460% (8746/10000)
390 391 Loss: 0.317 | Acc: 88.962% (44481/50000)
epoch  3
39 40 Loss: 0.373 | Acc: 87.590% (8759/10000)
390 391 Loss: 0.296 | Acc: 89.810% (44905/50000)
epoch  4
39 40 Loss: 0.372 | Acc: 87.470% (8747/10000)
390 391 Loss: 0.281 | Acc: 90.182% (45091/50000)
epoch  5
39 40 Loss: 0.364 | Acc: 88.190% (8819/10000)
390 391 Loss: 0.267 | Acc: 90.596% (45298/50000)
epoch  6
39 40 Loss: 0.347 | Acc: 88.080% (8808/10000)
390 391 Loss: 0.253 | Acc: 91.128% (45564/50000)
epoch  7
39 40 Loss: 0.370 | Acc: 88.250% (8825/10000)
390 391 Loss: 0.239 | Acc: 91.750% (45875/50000)
epoch  8
39 40 Loss: 0.341 | Acc: 88.690% (8869/10000)
390 391 Loss: 0.229 | Acc: 92.084% (46042/50000)
epoch  9
39 40 

### Save your Model

In [None]:
# Save the model
state_dict = vgg_model.state_dict()
torch.save(state_dict, assets_dir + "vgg_model_state_dict.pt")

### Load your already saved model

In [None]:
from torchvision.models import vgg16
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim

# Load the model in case training it takes a lot of time
file_path = assets_dir + 'vgg_model_state_dict.pt'

loaded_vgg_model = vgg16(pretrained=False)
loaded_vgg_model.classifier[6] = nn.Linear(in_features=4096, out_features=len(classes))

# Load the saved state dictionary
saved_state_dict = torch.load(file_path)

# Load the state dictionary into the model
loaded_vgg_model.load_state_dict(saved_state_dict)

# Set the model to evaluation mode
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
loaded_vgg_model = loaded_vgg_model.to(device)
loaded_vgg_model.eval()

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

### Developing a model from Scratch

### CNN Model Architecture

The below CNN class is an architecture we are building from scratch in order to compare which one performs better in the task of Image Classification.

In [None]:
# Define model architecture
class CNN(nn.Module):
    """
    Convolutional Neural Network (CNN) for image classification.

    This CNN consists of two convolutional layers followed by max-pooling layers,
    and two fully connected layers. The input images are expected to have three channels (RGB).

    Args:
        num_classes (int): The number of classes in the classification task.

    Attributes:
        conv1 (nn.Conv2d): The first convolutional layer with 16 output channels and a kernel size of 3x3.
        relu1 (nn.ReLU): The ReLU activation function applied after the first convolutional layer.
        pool1 (nn.MaxPool2d): The max-pooling layer with a kernel size of 2x2 after the first convolutional layer.
        conv2 (nn.Conv2d): The second convolutional layer with 32 output channels and a kernel size of 3x3.
        relu2 (nn.ReLU): The ReLU activation function applied after the second convolutional layer.
        pool2 (nn.MaxPool2d): The max-pooling layer with a kernel size of 2x2 after the second convolutional layer.
        fc1 (nn.Linear): The first fully connected layer with 64 units.
        relu3 (nn.ReLU): The ReLU activation function applied after the first fully connected layer.
        fc2 (nn.Linear): The second fully connected layer with `num_classes` units for classification.

    Methods:
        forward(x): Performs a forward pass through the network given an input tensor x.
                    Returns the output tensor after passing through the fully connected layers.
    """

    def __init__(self, num_classes):
        super(CNN, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
        self.relu1 = nn.ReLU()
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1)
        self.relu2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.fc1 = nn.Linear(in_features=32 * 8 * 8, out_features=64)
        self.relu3 = nn.ReLU()
        self.fc2 = nn.Linear(in_features=64, out_features=num_classes)

    def forward(self, x):
        x = self.pool1(self.relu1(self.conv1(x)))
        x = self.pool2(self.relu2(self.conv2(x)))
        x = x.view(-1, 32 * 8 * 8)
        x = self.relu3(self.fc1(x))
        x = self.fc2(x)
        return x

### Loss Criterion and Optimizer

In [None]:
# Create the model
cnn_model = CNN(num_classes=len(classes)).to(device)

In [None]:
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()

In [None]:
cnn_optimizer = optim.SGD(params=cnn_model.parameters(), lr=0.001, momentum=0.9)

In [None]:
cnn_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(cnn_optimizer, T_max = 200)

### Training the model and Evaluation

In [None]:
start_epoch = 0
for epoch in range(start_epoch, start_epoch+10):
    train_batch(epoch, cnn_model, cnn_optimizer)
    validate_batch(epoch, cnn_model)
    cnn_scheduler.step()

390 391 Loss: 2.205 | Acc: 20.302% (10151/50000)
epoch  0
39 40 Loss: 2.011 | Acc: 29.210% (2921/10000)
390 391 Loss: 1.923 | Acc: 31.176% (15588/50000)
epoch  1
39 40 Loss: 1.803 | Acc: 36.150% (3615/10000)
390 391 Loss: 1.754 | Acc: 36.854% (18427/50000)
epoch  2
39 40 Loss: 1.639 | Acc: 41.220% (4122/10000)
390 391 Loss: 1.636 | Acc: 40.864% (20432/50000)
epoch  3
39 40 Loss: 1.520 | Acc: 45.050% (4505/10000)
390 391 Loss: 1.565 | Acc: 43.062% (21531/50000)
epoch  4
39 40 Loss: 1.454 | Acc: 47.570% (4757/10000)
390 391 Loss: 1.515 | Acc: 45.288% (22644/50000)
epoch  5
39 40 Loss: 1.417 | Acc: 48.640% (4864/10000)
390 391 Loss: 1.476 | Acc: 46.732% (23366/50000)
epoch  6
39 40 Loss: 1.395 | Acc: 48.780% (4878/10000)
390 391 Loss: 1.443 | Acc: 48.088% (24044/50000)
epoch  7
39 40 Loss: 1.351 | Acc: 50.810% (5081/10000)
390 391 Loss: 1.410 | Acc: 49.098% (24549/50000)
epoch  8
39 40 Loss: 1.316 | Acc: 53.440% (5344/10000)
390 391 Loss: 1.383 | Acc: 50.386% (25193/50000)
epoch  9
39 40 

In [None]:
# Save the CNN model
state_dict = cnn_model.state_dict()
torch.save(state_dict, assets_dir + "cnn_model_state_dict.pt")

In [None]:
# Load the CNN model
file_path = assets_dir + 'cnn_model_state_dict.pt'

# Load a pre-trained ResNet-50 model
loaded_cnn_model = CNN(num_classes=len(classes))


# Load the saved state dictionary
saved_state_dict = torch.load(file_path)

# Load the state dictionary into the model
loaded_cnn_model.load_state_dict(saved_state_dict)

# Set the model to evaluation mode
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
loaded_resnet_model = loaded_cnn_model.to(device)
loaded_resnet_model.eval()

CNN(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu1): ReLU()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu2): ReLU()
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=2048, out_features=64, bias=True)
  (relu3): ReLU()
  (fc2): Linear(in_features=64, out_features=10, bias=True)
)

### Load the pretrained ResNet Model

In [None]:
from torchvision.models import resnet50

# Load the pre-trained ResNet-50 model
resnet_model = resnet50(pretrained=True)

# Modify the last layer of ResNet by adding one more fully connected layer with 512 units
num_features = resnet_model.fc.in_features

resnet_model.fc = nn.Linear(num_features, len(classes))  # Output layer for your specific number of classes

Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100%|██████████| 97.8M/97.8M [00:00<00:00, 236MB/s]


In [None]:
criterion = nn.CrossEntropyLoss()

In [None]:
resnet_optimizer = optim.SGD(resnet_model.parameters(), lr = 1e-3, momentum=0.9, weight_decay = 5e-4)

In [None]:
resnet_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(resnet_optimizer, T_max = 200)

In [None]:
resnet_model.train()

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

In [None]:
resnet_model = resnet_model.to(device)

In [None]:
start_epoch = 0
for epoch in range(start_epoch, start_epoch+10):
    train_batch(epoch, resnet_model, resnet_optimizer)
    validate_batch(epoch, resnet_model)
    resnet_scheduler.step()

390 391 Loss: 0.415 | Acc: 85.280% (42640/50000)
epoch  0
39 40 Loss: 0.478 | Acc: 84.060% (8406/10000)
390 391 Loss: 0.376 | Acc: 86.782% (43391/50000)
epoch  1
39 40 Loss: 0.462 | Acc: 84.320% (8432/10000)
390 391 Loss: 0.350 | Acc: 87.826% (43913/50000)
epoch  2
39 40 Loss: 0.454 | Acc: 84.860% (8486/10000)
390 391 Loss: 0.329 | Acc: 88.260% (44130/50000)
epoch  3
39 40 Loss: 0.439 | Acc: 85.190% (8519/10000)
390 391 Loss: 0.308 | Acc: 89.144% (44572/50000)
epoch  4
39 40 Loss: 0.445 | Acc: 85.410% (8541/10000)
390 391 Loss: 0.288 | Acc: 89.902% (44951/50000)
epoch  5
39 40 Loss: 0.449 | Acc: 85.270% (8527/10000)
390 391 Loss: 0.268 | Acc: 90.576% (45288/50000)
epoch  6
39 40 Loss: 0.437 | Acc: 85.850% (8585/10000)
390 391 Loss: 0.255 | Acc: 91.098% (45549/50000)
epoch  7
39 40 Loss: 0.459 | Acc: 85.930% (8593/10000)
390 391 Loss: 0.235 | Acc: 91.594% (45797/50000)
epoch  8
39 40 Loss: 0.464 | Acc: 85.900% (8590/10000)
390 391 Loss: 0.221 | Acc: 92.088% (46044/50000)
epoch  9
39 40 

### Save the Model

In [None]:
# Save the model
state_dict = resnet_model.state_dict()
torch.save(state_dict, assets_dir + "resnet_model_state_dict.pt")

In [None]:
from torchvision.models import resnet50
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim

### Load the ResNet model

In [None]:
# Load the model
# Load the model in case training it takes a lot of time
file_path = assets_dir + '/resnet_model_state_dict.pt'

# Load a pre-trained ResNet-50 model
loaded_resnet_model = resnet50(pretrained=False)

# Modify the classifier layer for your specific task
loaded_resnet_model.fc = nn.Linear(in_features=2048, out_features=len(classes))  # Assuming you want to classify into 2 classes

# Load the saved state dictionary
saved_state_dict = torch.load(file_path)

# Load the state dictionary into the model
loaded_resnet_model.load_state_dict(saved_state_dict)

# Set the model to evaluation mode
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
loaded_resnet_model = loaded_resnet_model.to(device)
loaded_resnet_model.eval()



ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
def evaluate(model, data_loader):
    model.eval()
    predictions = []
    true_labels = []

    with torch.no_grad():
        for images, labels in data_loader:
            images = images.to(device)
            labels = labels.to(device)

            # Forward pass
            outputs = model(images)
            # Get the predicted class with the highest probability for each image
            _, predicted_labels = torch.max(outputs, 1)

            # Append the predictions
            predictions.extend(predicted_labels.cpu().numpy())
            # Append the true labels
            true_labels.extend(labels.cpu().numpy())

    # Use the predictions and true labels to evaluate the model on the following
    accuracy = accuracy_score(y_true=true_labels, y_pred=predictions)
    precision = precision_score(y_true=true_labels, y_pred=predictions, average='weighted')
    recall = recall_score(y_true=true_labels, y_pred=predictions, average='weighted')
    f1 = f1_score(y_true=true_labels, y_pred=predictions, average='weighted')
    cf_matrix = confusion_matrix(y_true=true_labels, y_pred=predictions)

    return accuracy, precision, recall, f1, cf_matrix

### CNN Model Evaluation

In [None]:
test_accuracy, test_precision, test_recall, test_f1, cf_matrix = evaluate(loaded_cnn_model, testloader)

# Print final test set performance
print(f"Test Accuracy: {test_accuracy:.4f}")
print(f"Test Precision: {test_precision:.4f}")
print(f"Test Recall: {test_recall:.4f}")
print(f"Test F1 Score: {test_f1:.4f}")
print(f"Test Confusion Matrix:")
print(cf_matrix)

Test Accuracy: 0.5427
Test Precision: 0.5476
Test Recall: 0.5427
Test F1 Score: 0.5305
Test Confusion Matrix:
[[584  62  71  20   5   5  33  33 135  52]
 [ 29 771   4  12   2   5  25  20  32 100]
 [ 67  27 432  76  43  90 130  94  22  19]
 [ 17  26  71 351  31 188 156 123  12  25]
 [ 33  16 205  62 211  72 171 213  11   6]
 [ 10  13  79 176  20 448  65 166  13  10]
 [  3  17  64  56  18  31 742  48   4  17]
 [ 14  13  32  53  30  68  36 716   3  35]
 [114 115  14  18   5  11  12  18 633  60]
 [ 39 251   8  18   4   6  34  59  42 539]]


### VGG Model Evaluation

In [None]:
test_accuracy, test_precision, test_recall, test_f1, cf_matrix = evaluate(loaded_vgg_model, testloader)

# Print final test set performance
print(f"Test Accuracy: {test_accuracy:.4f}")
print(f"Test Precision: {test_precision:.4f}")
print(f"Test Recall: {test_recall:.4f}")
print(f"Test F1 Score: {test_f1:.4f}")
print(f"Test Confusion Matrix:")
print(cf_matrix)

Test Accuracy: 0.8868
Test Precision: 0.8892
Test Recall: 0.8868
Test F1 Score: 0.8869
Test Confusion Matrix:
[[926   7   9   5  15   1   3   4  29   1]
 [ 14 934   0   0   0   0   0   2  11  39]
 [ 26   1 786  35  72  23  41  11   4   1]
 [  8   2  15 817  32  83  19  15   4   5]
 [  3   1  11  17 908  13  17  27   1   2]
 [  1   0   8 139  31 798   5  17   0   1]
 [  5   1   6  32  22   6 923   2   2   1]
 [  9   2   5  25  25  27   2 904   1   0]
 [ 32   5   2   5   3   0   1   0 948   4]
 [ 17  29   0   3   2   0   0   2  23 924]]


### ResNet Model Evaluation

In [None]:
# test_accuracy, test_precision, test_recall, test_f1, cf_matrix = evaluate(resnet, test_loader)
test_accuracy, test_precision, test_recall, test_f1, cf_matrix = evaluate(loaded_resnet_model, testloader)
# Print final test set performance
print(f"Test Accuracy: {test_accuracy:.4f}")
print(f"Test Precision: {test_precision:.4f}")
print(f"Test Recall: {test_recall:.4f}")
print(f"Test F1 Score: {test_f1:.4f}")
print(f"Test Confusion Matrix:")
print(cf_matrix)

Test Accuracy: 0.8632
Test Precision: 0.8644
Test Recall: 0.8632
Test F1 Score: 0.8631
Test Confusion Matrix:
[[906  11  24   7  11   1   4   4  22  10]
 [ 12 915   2   3   1   4   3   1  16  43]
 [ 31   0 828  29  54  15  32   9   0   2]
 [ 12   4  45 745  46  82  37  18   5   6]
 [  6   1  32  18 889   9  22  20   3   0]
 [  7   0  23 153  36 746  13  18   2   2]
 [  5   0  15  30  12   7 924   5   0   2]
 [  8   2  13  21  39  27   4 880   0   6]
 [ 65   7   4   7   5   0   2   3 895  12]
 [ 23  52   2   5   2   2   1   2   7 904]]
