# Fine Tuning an Image Classifier in Pytorch

In this practical, we will fine-tune a pretrained ResNet-18 model to work with a custom image dataset. The task in this case is to train the model to classify photos depending on which of 10 different global cities they were taken in.


In [1]:
# @title # Run the following cell to download the necessary files for this practical. { display-mode: "form" }
# @markdown Don't worry about what's in this collapsed cell

import urllib.request
import os
import zipfile


# Download dataset.py
# [This file has been modified from the original to work on the Windows]
# if not os.path.exists('dataset.py'):
#     print('Downloading dataset.py...')
#     urllib.request.urlretrieve(
#         'https://s3-eu-west-1.amazonaws.com/aicore-portal-public-prod-307050600709/practicals_files/41982379-5961-4188-91d7-22fcb7f1c6ef/dataset.py', 'dataset.py')

# Download images.zip
if not os.path.exists('images.zip') and not os.path.exists('images'):
    print('Downloading images.zip...')
    urllib.request.urlretrieve(
        'https://s3-eu-west-1.amazonaws.com/aicore-portal-public-prod-307050600709/practicals_files/41982379-5961-4188-91d7-22fcb7f1c6ef/images.zip', 'data/images.zip')
    
    print("Extracting iamges...")
    with zipfile.ZipFile('images.zip', 'r') as zip_ref:
        zip_ref.extractall('.')
    os.remove('images.zip')


Run the cell below to import the necessary dependencies.


In [2]:
import dataset
import torch
from dataset import CitiesDataset
from torchvision.models import resnet50
from torchvision.models import ResNet50_Weights
from torch.utils.data import DataLoader
import torch.nn.functional as F
import torch
from torch.utils.tensorboard import SummaryWriter
import numpy as np
from torch.utils.data import random_split
from torchvision import transforms
from torch.optim import lr_scheduler
%load_ext tensorboard


In the cell below, we will define our network. We create a class that inherits from `torch.nn.Module`, and call the `super().__init__()` method to inherit the methods from the parent class. We have also loaded the `ResNet50` model using the pretrained weights.

- Add code to set the `grad_required` argument to False for all the `ResNet` layers.
- Define a set of three linear layers for the model, assigning them to the variable `linear_layers`.
- The output size of the last layer of `ResNet` is 2048.
- The middle layer should have input size 256 and output size of 128.
- The output layer should have an output size equivalent to the number of classes in the dataset.

- Define the forward method.


In [3]:
class TransferLearning(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = resnet50(weights=ResNet50_Weights)
        
        # set grad_required = False for all layers of the ResNet50.
        for param in self.layers.parameters():
            param.requires_grad = False

        # define a three layer network, complete with activation functions between the layers, asssign to a variable called "linear layers"
        linear_layers = torch.nn.Sequential(
            torch.nn.Linear(2048, 256),
            torch.nn.ReLU(),
            torch.nn.Linear(256, 128),
            torch.nn.ReLU(),
            torch.nn.Linear(128, 10),
        )
        self.layers.fc = linear_layers

    def forward(self, x):
        return self.layers(x)


Next we define a transform, and split the data between train, test and validation sets.

- Split the data into train, validation and test sets. The training set should be 70% of dataset length, with the validation and test sets 15% each.
- Create a dataloader for each split. Set the batch size to 32, and make sure to set shuffle=True for the training set loader.


In [4]:
size = 128
transform = transforms.Compose([
    transforms.Resize(size),
    transforms.RandomCrop((size, size), pad_if_needed=True),
    transforms.ToTensor(),
])

dataset = CitiesDataset(transform=transform)
model = TransferLearning()

# Split the dataset into train, validation and test sets. The train set should be 70% of dataset length, and the validation and test sets 15% each.
train_size = int(0.7 * len(dataset))
val_size = int(0.15 * len(dataset))
test_size = len(dataset) - train_size - val_size

train_dataset, val_dataset, test_dataset = random_split(dataset, [train_size, val_size, test_size])

# Create dataloaders for train, validation and test sets. Set batch size to 32, and make sure shuffle=True in the train loader.
train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=32)
test_dataloader = DataLoader(test_dataset, batch_size=32)



Prior to training, it will be interesting to see how the classifier performs straight out of the box. In the box below, pass a single example from the test set to the model, with the model in evaluation mode. Get the prediction and compare it to the real label. How did the model do?


In [9]:
features, label = test_dataset[1]
features = features.unsqueeze(0)

# Set model to evaluation mode.
model.eval()

# Pass the features to the model, and assign to a variable called 'outputs'.
outputs = model(features)
print(outputs.shape)

# Get a prediction from the output logits, and assign to a variable called 'prediciton'
prediction = torch.argmax(outputs, dim=1)

print("Prediction label: ", prediction.item())
class_label = dataset.idx_to_city_name[prediction.item()]
print("Prediction category: ", class_label)

print("target label:", label)
target_label = dataset.idx_to_city_name[label]
print("target city ", target_label)


torch.Size([1, 10])
Prediction label:  2
Prediction category:  Hong Kong, Chian
target label: 9
target city  Toronto, Canada


Define the train loop. Some of this has been filled in for you already, but manually fill in the sections corresponding to the `TODO` instructions. This training loop also utilises a learning rate scheduler, which steps the learning rate after a given number of epochs. You don't need to worry about this, but it's good to understand why this can be useful in terms of gradient descent.


In [10]:
def train(
    model,
    train_loader,
    val_loader,
    test_loader,
    lr=0.1,
    epochs=15,
    optimiser=torch.optim.SGD
):
 
    writer = SummaryWriter()
    # initialise an optimiser
    optimiser = optimiser(model.parameters(), lr=lr, weight_decay=0.001)
    scheduler = lr_scheduler.MultiStepLR(optimiser, milestones=[5,15], gamma=0.1,verbose=True) # learning rate scheduler drops the LR after n epochs, given by "milestones"
    batch_idx = 0

    for epoch in range(epochs):  # for each epoch
        
        print('Epoch:', epoch,'LR:', scheduler.get_lr())
        
        for features, labels in train_loader:  # for each batch in the dataloader
            # Make a prediction by passing the features to the model
            prediction = model(features)

            # Calculate the loss 
            loss = F.cross_entropy(prediction, labels)

            # Calculate the gradient of the loss with respect to each model parameter
            loss.backward()
            
            # Use the optimiser to update the model parameters using those gradients
            optimiser.step()

            print("Epoch:", epoch, "Batch:", batch_idx,
                  "Loss:", loss.item())  # log the loss

            # Zero the grad
            optimiser.zero_grad()

            writer.add_scalar("Loss/Train", loss.item(), batch_idx)
            batch_idx += 1
                   
        scheduler.step() # step the learning rate scheduler

        print('Evaluating on valiudation set')
        val_loss, val_acc = evaluate(model, val_loader)
        writer.add_scalar("Loss/Val", val_loss, batch_idx)
        writer.add_scalar("Accuracy/Val", val_acc, batch_idx)
    
    
    print('Evaluating on test set')
    test_loss = evaluate(model, test_loader)
    model.test_loss = test_loss
    
    return model   # return trained model
    

def evaluate(model, dataloader):
    losses = []
    correct = 0
    n_examples = 0
    for batch in dataloader:
        features, labels = batch
        prediction = model(features)
        loss = F.cross_entropy(prediction, labels)
        losses.append(loss.detach())
        correct += torch.sum(torch.argmax(prediction, dim=1) == labels)
        n_examples += len(labels)
    avg_loss = np.mean(losses)
    accuracy = correct / n_examples
    print("Loss:", avg_loss, "Accuracy:", accuracy.detach().numpy())
    return avg_loss, accuracy
 

Run the code block below to train the model, and leave it running until it completes.


In [11]:
trained_model = train(
    model,
    train_dataloader,
    val_dataloader,
    test_dataloader,
    epochs=15,
    lr=0.0001,
    optimiser=torch.optim.AdamW
)


Adjusting learning rate of group 0 to 1.0000e-04.
Epoch: 0 LR: [0.0001]




Epoch: 0 Batch: 0 Loss: 2.3099112510681152
Epoch: 0 Batch: 1 Loss: 2.2967610359191895
Epoch: 0 Batch: 2 Loss: 2.309739112854004
Epoch: 0 Batch: 3 Loss: 2.2913978099823
Epoch: 0 Batch: 4 Loss: 2.3150579929351807
Epoch: 0 Batch: 5 Loss: 2.3212692737579346
Epoch: 0 Batch: 6 Loss: 2.2942349910736084
Epoch: 0 Batch: 7 Loss: 2.3242757320404053
Epoch: 0 Batch: 8 Loss: 2.277378797531128
Epoch: 0 Batch: 9 Loss: 2.2708990573883057
Epoch: 0 Batch: 10 Loss: 2.3419251441955566
Adjusting learning rate of group 0 to 1.0000e-04.
Evaluating on valiudation set
Loss: 2.2755375 Accuracy: 0.11594203
Epoch: 1 LR: [0.0001]
Epoch: 1 Batch: 11 Loss: 2.243016481399536
Epoch: 1 Batch: 12 Loss: 2.257474899291992
Epoch: 1 Batch: 13 Loss: 2.267336845397949
Epoch: 1 Batch: 14 Loss: 2.261023759841919
Epoch: 1 Batch: 15 Loss: 2.254497528076172
Epoch: 1 Batch: 16 Loss: 2.2692744731903076
Epoch: 1 Batch: 17 Loss: 2.265752077102661
Epoch: 1 Batch: 18 Loss: 2.256601095199585
Epoch: 1 Batch: 19 Loss: 2.2356553077697754
Epo

Now, let's view training performance in `tensorboard`. Run the cell below to open a `tensorboard` instance, then select `time series` from the dropdown, and press refresh to view the training loss.


In [None]:
%tensorboard - -logdir runs


Finally, let's re-run our prediction code to see how the trained model performs.


In [12]:
features, label = test_dataset[1]
features = features.unsqueeze(0)
model.eval()
outputs = model(features)

dummy, pred = torch.max(outputs, 1)
print("Prediction label: ", pred.item())
class_label = dataset.idx_to_city_name[pred.item()]
print("Prediction category: ", class_label)

print("target label:", label)
target_label = dataset.idx_to_city_name[label]
print("target city ", target_label)


Prediction label:  9
Prediction category:  Toronto, Canada
target label: 9
target city  Toronto, Canada
