In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms

Part 1

Resize the size of all images to a unanimous value (224, 224). Convert PIL image objects into Tensors.
Normalize the tensor values based on the mean and standard deviation of the RGB values of all the
images:

In [None]:
data_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
])

Create an object of torchvision.datasets.CIFAR100 to get the training and testing set:

In [None]:
dataset_root = 'cifar100_data'
trainset = datasets.CIFAR100(
    dataset_root,
    train=True,
    transform=data_transforms,
    download=True
)

testset = datasets.CIFAR100(
    dataset_root,
    train=False,
    transform=data_transforms,
    download=True
)

Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to cifar100_data/cifar-100-python.tar.gz


100%|██████████| 169001437/169001437 [00:03<00:00, 48565640.74it/s]


Extracting cifar100_data/cifar-100-python.tar.gz to cifar100_data
Files already downloaded and verified


Create a data loader.

In [None]:
training_loader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True)
testing_loader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=True)

Load a VGG16 network with pretrained ImageNet weights:

In [None]:
model = models.vgg16(pretrained = True)


Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100%|██████████| 528M/528M [00:06<00:00, 80.2MB/s]


Extract the number of input features for the last fully connected layer of the model:

In [None]:
num_in_ftrs = model.classifier[6].in_features


Replace the last fully connected layer with a new layer. The new layer has the same number of input
features as the original network but the number of outputs should be equal to the number of classes in
the CIFAR100 dataset.

In [None]:
num_cls = 100
model.classifier[6] = nn.Linear(num_in_ftrs, num_cls) # num_cls is the number of classes.

We are using pretrained weights from the ImageNet dataset. The last layer of VGG16 has been replaced
for fitting with our dataset (CIFAR100). Except for the new last layer, weights from other layers need to
be frozen. It means that these weights will not be updated during the training.

In [None]:
for param in model.parameters(): # freeze all the layers
  param.requires_grad = False
for param in model.classifier[6].parameters(): # unfreeze the last linear layer.
  param.requires_grad = True

Set the number of epochs:

In [None]:
num_epochs = 10

Move the model to GPU (if available):

In [None]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

Define a loss function for evaluating the trained model:

In [None]:
criterion = nn.CrossEntropyLoss()

Create an optimizer with an initial learning rate and momentum:

In [None]:
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

Create a scheduler to control the way that learning rate changes during the training process:

In [None]:
scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

Iterate over the epochs and save the best model weights. Basically, the best model can achieve the best
accuracy during the iteration. In every iteration, we get a mini-batch of images and their corresponding
labels. Use zero_grad() to reset the calculated gradients. Use the current model weights for predication
and backpropagate the prediction loss. After iterating over all batches and if we are in the training phase,
we need to run scheduler.step() to update the scheduler status.

In [None]:
from tqdm import tqdm

for epoch in range(num_epochs):
    model.train()
    for images, labels in tqdm(training_loader):
        optimizer.zero_grad()
        outputs = model(images.cuda())
        loss = criterion(outputs, labels.cuda())
        loss.backward()
        optimizer.step()
    scheduler.step()

torch.save(model.state_dict(), 'best_model.pth')



100%|██████████| 391/391 [05:31<00:00,  1.18it/s]
100%|██████████| 391/391 [05:25<00:00,  1.20it/s]
100%|██████████| 391/391 [05:24<00:00,  1.20it/s]
100%|██████████| 391/391 [05:25<00:00,  1.20it/s]
100%|██████████| 391/391 [05:28<00:00,  1.19it/s]
100%|██████████| 391/391 [05:26<00:00,  1.20it/s]
100%|██████████| 391/391 [05:26<00:00,  1.20it/s]
100%|██████████| 391/391 [05:24<00:00,  1.20it/s]
100%|██████████| 391/391 [05:25<00:00,  1.20it/s]
100%|██████████| 391/391 [05:24<00:00,  1.21it/s]


The testing process is very similar to the training process except that there is no need to backpropagate
the loss. For testing the model, first, you need to prepare the model in the same way that we prepared it
for the training process and load the best model that we saved in the training process.

In [None]:
model.load_state_dict(torch.load('best_model.pth'))

<All keys matched successfully>

After loading the model weights, set the model to evaluation mode. Then go through the test set, and
predict the category of images, and compute the number of correctly classified images and the accuracy.

In [None]:
model.eval()
total = 0
correct = 0
with torch.no_grad():
    for images, labels in tqdm(testing_loader):
        outputs = model(images.cuda())
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels.cuda()).sum().item()

accuracy = correct / total

print(f'Epoch [{epoch+1}/{num_epochs}]: Accuracy = {accuracy * 100:.2f}%')

100%|██████████| 313/313 [01:03<00:00,  4.93it/s]

Epoch [10/10]: Accuracy = 59.38%



