# Transfer Learning with VGG16

In [None]:
import torch
import torch.nn.functional as  F
from torch import nn, optim
from torchvision import datasets, models
import torchvision.transforms.v2 as transforms

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
import numpy as np

In [None]:
from tqdm import trange

In [None]:
import copy

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
import os
from functools import partial

## VGG16

VGG16 is a CNN for image classification, the runner-up in the [ImageNet]() competition in 2015. A pre-trained VGG16 model can be easily downloaded from `torchvision.models`, with the `DEFAULT` pre-trained weights.

In [None]:
model = models.vgg16(weights='DEFAULT')

Printing the model will display the model architecture:

In [None]:
print(model)

*Have a look at the model. You should be able to recognise almost all the layers of this CNN. What `AdaptiveAvgPool2d` does? Why it is useful?*

We can see that the full model is composed of two `nn.Sequential` sub-modules: `features` and `classifier`. The `features` sub-module is composed of convolutional layers and acts as a feature extraction module. The `classifier` sub-module is composed of linear layers and acts as a classifier.

Using transfer learning, we are interested in tuning the model for classification of flowers.

## Data Set

For this exercise, we use the [tf_flowers](https://www.tensorflow.org/datasets/catalog/tf_flowers) data set. Extracting the data will take a while...

In [None]:
%%bash
if [ -d ./data/flower_photos ]; then
  echo "Directory exists"
else
    wget http://download.tensorflow.org/example_images/flower_photos.tgz -P ./data/
    cd data/ && tar xfz flower_photos.tgz
    rm -f flower_photos/LICENSE.txt
fi

The data set consists of 3670 images of 5 flower classes: daisy, dandelion, roses, sunflowers, and tulips.

*Check if any of these classes were already present in the original ImageNet data set, on which VGG-16 has been trained.*

### Data Transformation

*Read [PyTorch documentation for VGG16]() and determine the mean and standard deviation needed to normalise the images, as well as the size of the input images. Define a transformation that `RandomResizedCrop` the input images to the correct size, and normalise them. Use the `ToImage()` transform to convert images to tensors, and use `ToDtype(torch.float32, scale=True)` to convert the data for `float32` before normalisation.* `RandomResizeCrop` allows to crop the images to the correct input size, while adding a bit of data augmentation.

In [None]:
# TODO
mean =
std =


transform = transforms.Compose([
    transforms.ToImage(),
    # RandomResizeCrop to the correct size for VGG-16
    # TODO
    
    transforms.ToDtype(torch.float32, scale=True),
    # Normalize input
    # TODO
    
])

### Loading the Data Set

 The data set is structured in sub-folders named after the classes of flowers.

In [None]:
dataroot = f"{os.getcwd()}/data/flower_photos"
print(dataroot)

In [None]:
! ls -l data/flower_photos

 The [`ImageFolder`](https://pytorch.org/docs/stable/torchvision/datasets.html#imagefolder) class allows to easily load such dataset:

In [None]:
dataset = datasets.ImageFolder(dataroot, transform=transform)

The amount of data in this dataset is not very large, therefore transfer learning is a very convenient technique:

In [None]:
print(f"Number of images: {len(dataset)}")

We now split the data set into a training set and a validation set. For simplicity, we skip the creation of a separate test set (something you shouldn't do!).

In [None]:
n = len(dataset)
idx_train, idx_valid = train_test_split(np.arange(n), test_size=0.2, random_state=42)

train_sampler = torch.utils.data.sampler.SubsetRandomSampler(idx_train)
valid_sampler = torch.utils.data.sampler.SubsetRandomSampler(idx_valid)

In [None]:
batch_size = 64

trainloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)
validloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)

*Create a function doing all of the above, for convenient re-use.*

In [None]:
def create_dataset(batch_size, dataroot="data/flower_photos"):
    # Define mean and standard deviation for normalisation
    # TODO
    mean =
    std =

    # Define the transform
    # TODO
    transform = transforms.Compose([
        
    ])
    
    dataset = datasets.ImageFolder(dataroot, transform=transform)
    
    n = len(dataset)
    idx_train, idx_valid = train_test_split(np.arange(n), test_size=0.2, random_state=42)

    train_sampler = torch.utils.data.sampler.SubsetRandomSampler(idx_train)
    valid_sampler = torch.utils.data.sampler.SubsetRandomSampler(idx_valid)
    
    trainloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)
    validloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)
    
    return trainloader, validloader

#### Visualizing Images and Labels

First, let's define a dictionary mapping labels (numbers from 0 to 4 denoting one of the flower classes) to the acutal classes names:

In [None]:
classes = ['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']

label_to_name = { 
    i : name 
    for i, name in enumerate(classes) 
}

print(label_to_name)

Then we can visualise a batch of images. PyTorch stores images with in the `C x H x W` convention (where C is number of channels, H the image height and W the image width) while `matplotlib` uses the `H x W x C` convention. This means that we have to transpose our tensor from `C x H x W` to  `H x W x C`.

In [None]:
trainiter = iter(trainloader)
images, labels = next(trainiter)

fig = plt.figure(figsize=(12,12))
for idx in range(64):
    ax = fig.add_subplot(8, 8, idx + 1, xticks=[], yticks=[])
    
    img = images[idx].numpy().squeeze()
    
    for i in range(3):
        img[i,:,:] = img[i,:,:] * std[i] + mean[i]

    plt.imshow(np.transpose(img, (1, 2, 0)))
    
    name = label_to_name[labels[idx].item()]
    
    ax.set_title(name, fontdict={"fontsize": 12})

plt.tight_layout()
plt.show()

*Run the previous cell a few times. How does the dataset looks like?*

## Adapting Pre-Trained VGG-16

We already downloaded the VGG16 model from `torchvision.models` above, with `DEFAULT` pre-trained weights. In order to use this model for classification of flowers, we need two things:
* Freeze the model parameters of the layers we do not want to train
* Replace the last fully connected layer with a layer with the correct number of output classes (`5`)

The VGG16 architecture is the following:

In [None]:
print(model)

*Create a function taking an integer `layer_size` that:*
* _Defines a VGG16 model with pre-trained weights_
* _Freezes the weights of all layers_
* _Substitutes the last two pre-trained linear layers with new (untrained) linear layers_

*The last linear layer needs the number of classes as output, while the input of the last linear layer should be parametrized by `layer_size`.*

In [None]:
def create_model(layer_size):
    model = models.vgg16(weights='DEFAULT')
    
    # Freeze all model parameters
    # TODO
    
    # Substitute the last two linear layers 
    # TODO
    
    return model

Let's check that the `create_model` function works as expected:

In [None]:
create_model(1024)

### Training

*Complete the training function defined below, assuming that the `params` dictionary contains the following entries:*
* `layer_size`
* `batch_size`
* `lr` _(learning rate)_
* `n_epochs`

Finally we can train our network as usual. The `require_grad=False` parameter for the frozen layers will prevent the optimiser to change the weights and biases of those layer. Effectively, only the last linear layer we modified will be trained.

In [None]:
def train_fn(params, dataroot="data/flower_photos"):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    # Create a model using the "create_model" function
    # TODO
    model =
     
    model.to(device)
    
    # Create data loaders using the "create_dataset" function
    # TODO
    trainloader, validloader =
    
    # Define an appropriate loss function
    # TODO
    loss_function =
    
    # Define an Adam optimizer
    # TODO
    optimizer =
    
    best_valid_loss = np.inf
    best_accuracy = np.inf
    
    pbar = trange(params["n_epochs"], desc='Training', leave=True)
    for epoch in pbar:
        epoch_loss = 0

        # Ensure model is in training mode
        # TODO
        
        for images, labels in trainloader:

            # Move data to GPU
            # TODO
            images, labels =
        
            # Initialize optimizer gradients to zero
            # TODO
            
            # Perform forward pass
            # TODO
            
            # Compute the loss
            # TODO
            
            # Perform backpropagation
            # TODO
            
            # Update model weights
            # TODO
            
            epoch_loss += loss.item()
            
        valid_loss = 0
        accuracy = 0

        with torch.no_grad():

            # Ensure model is in evaluation mode
            # TODO

            for images, labels in validloader:

                # Move data to GPU
                # TODO
                images, labels =
                    
                # Perform forward pass
                # TODO
                    
                # Compute the loss
                valid_loss += loss_function(output, labels).item()
                    
                # Compute class probabilities
                # TODO
                p =
                
                # Compute accuracy
                top_p, top_c = p.topk(1, dim=1) # Top prediction
                equals = (top_c == labels.view(*top_c.shape)).type(torch.FloatTensor)
                accuracy += torch.mean(equals)
                    
        t_loss = epoch_loss/len(trainloader)
        v_loss = valid_loss/len(validloader)
        acc = accuracy.item()/len(validloader)*100
        
        # Store best model (perform a deep copy of the state dictionary)
        # Store best accuracy in "best_accuracy"
        # TODO 
            
        pbar.set_postfix({"Accuracy": acc})
            
    # Load best model at the end of training
    # TODO
    model.load_state_dict(best_state_dict)
            
    return model.eval(), best_accuracy

Let's train with the following parameters:

In [None]:
params = {
    "layer_size": 1024,
    "lr": 0.005,
    "batch_size": 128,
    "n_epochs": 10
}

In [None]:
model, acc = train_fn(params)

*If you finish early, you can try to re-run the notebook whit less frozen parameters and study the impact on accuracy and training time. If time allows it, play around with the hyperparameters of the model and see if you can obtain a better accuracy on the validation set.*

*This notebooks has been structured in such a way that the training function is self-contained. This allows to adapt it to a framework for hyperparameter tuning.*