# (Continued) ML that can See: Supervised Learning with Images 

Let's load in any libraries we will use in this notebook. 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

#import torch which has many of the functions to build deep learning models and to train them
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

#import torchvision, which was lots of functions for loading and working with image data
import torchvision
import torchvision.transforms as transforms

#this is a nice progress bar representation that will be good to measure progress during training
import tqdm

# Ingredient 1: The Data

We're going to use the same dataset as last week, with images from 20 different dog breeds we want to classify.

As we have learnt to do in the past couple of weeks, we will:
1. Initialise a transformation for the dataset that:
    1. [transforms.ToTensor()](https://pytorch.org/vision/stable/generated/torchvision.transforms.ToTensor.html) -- this converts a PIL image or numpy array to a tensor while scaling the pixel values to the range [0, 1].
    2. [transforms.Resize()](https://pytorch.org/vision/stable/generated/torchvision.transforms.Resize.html) -- this resizes an input image to the specified size (height, width).
    Resize is important as it ensures the dimensions remain compatible throughout the network, allowing proper operations at each layer and maintaining the required dimensions for the final fully connected layers in the network.
    3. [transforms.Normalize()](https://pytorch.org/vision/stable/generated/torchvision.transforms.Normalize.html) -- this standardizes the pixel values of a tensor image by subtracting the mean and dividing by the standard deviation along the input channels.
2. We will then load the datasets with [torchvision.datasets.ImageFolder](https://pytorch.org/vision/stable/generated/torchvision.datasets.ImageFolder.html) -- this loads image datasets from folders, assigning labels automatically based on subdirectories.
3. Create a [torch.utils.data.DataLoader()](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)
    1. First argument is the dataset.
    2. Optional argument *batch_size* is the batch size to test the model with. How many data points will the model be tested on in parallel?
    3. Optional argument *shuffle* controls whether data is randomly shuffled before taking from the dataset.
    4. Optional argument *num_workers* is how many subprocesses are used to load data from the dataset -- it can make loading the data faster.


In [None]:
imagenet_means = (0.485, 0.456, 0.406)
imagenet_stds = (0.229, 0.224, 0.225)

transform = transforms.Compose(
    [transforms.ToTensor(),
    transforms.Resize((224, 224)), 
     transforms.Normalize(imagenet_means, imagenet_stds)])

train_dataset = torchvision.datasets.ImageFolder('./stanford_dogs_subset/train', transform = transform)
val_dataset = torchvision.datasets.ImageFolder('./stanford_dogs_subset/val', transform = transform)

batch_size = 16

trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size,
                                          shuffle=True, num_workers = 1)
valloader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size,
                                          shuffle=False, num_workers = 1)

Last week we explored that adding random data transformations during training can help prevent overfitting.

Below is a list of some common augmentations, the cell below is simply using a RandomHorizontalFlip augmentation.

1. [transforms.RandomResizedCrop](https://pytorch.org/vision/stable/generated/torchvision.transforms.RandomResizedCrop.html) -- this function randomly grabs a portion of the image (crops) and then resizes to the desired image size. By default, the crop can be anywhere between 8% to 100% of the image original area -- this is a little strict, I'm going to choose between 50% and 100% of the image area.
2. [transforms.RandomHorizontalFlip](https://pytorch.org/vision/stable/generated/torchvision.transforms.RandomHorizontalFlip.html) -- randomly flips an image horizontally. Useful for tasks where horizontal orientation doesn't change the meaning.
3. [transforms.RandomVerticalFlip](https://pytorch.org/vision/stable/generated/torchvision.transforms.RandomVerticalFlip.html) -- randomly flips an image vertically. Useful for tasks where vertical orientation doesn't change the meaning.
4. [transforms.RandomRotation](https://pytorch.org/vision/stable/generated/torchvision.transforms.RandomRotation.html) -- randomly rotates an image by a specified angle. Can simulate variations in viewpoint.
5. [transforms.ColorJitter](https://pytorch.org/vision/stable/generated/torchvision.transforms.ColorJitter.html) -- randomly changes brightness, contrast, saturation, and hue of an image. Helps the model to be robust to different lighting conditions.


Each of the 'Random' transformations will be sequentially applied to an input image, with different transformations of different severities - the severity of the transformation is the random component. When you chain together multiple different types of 'Random' transformations, we can end up with a huge variation of different images from our training dataset.

**In your own time, experiment with adding different augmentations and observe how performance changes.**


In [None]:
### Add more transforms in here
train_transform = transforms.Compose(
    [transforms.RandomHorizontalFlip(),
    ])

class_labels = train_dataset.classes
#visualise the train dataset with these transforms
data = next(iter(trainloader))
fig, ax = plt.subplots(1, 5)
for idx in range(5):
    im = data[0][idx]
    lbl = data[1][idx]
    im = train_transform(im)
    train_image = (im.numpy())/2 + 0.5
    label = class_labels[lbl]
    train_image = np.moveaxis(train_image, 0, 2)
    ax[idx].imshow(train_image)
    ax[idx].set_axis_off()
    ax[idx].set_title(label.split('-')[-1])
plt.tight_layout()
plt.show()


# Ingredient 2: The Model

This week we will again use a pretrained ResNet18, that has been trained on ImageNet, but we will be freezing certain parameters in the model so that their weights do not update. We will do this to try and prevent the model from overfitting to the new, small dataset. 

You can see all the models built into torchvision [here](https://pytorch.org/vision/stable/models.html#classification).

When we create a model for transfer learning, we should follow these steps:
1. Initialise the model with pretrained weights.
2. Adapt the architecture for the new number of classes in our new dataset by changing the final linear layer.
3. If necessary, freeze any weights.
4. Move the model to the GPU if available.



In [None]:
def setup_model(num_classes, freeze_backbone = False):
    #### Step 1: Initialise the model with pretrained weights.
    model = torchvision.models.resnet18(weights=torchvision.models.ResNet18_Weights.DEFAULT)

    #### Step 2: Adapt the architecture for the new number of classes.
    in_features = model.fc.in_features
    model.fc = nn.Linear(model.fc.in_features, num_classes)

    #### Step 3: If necessary, freeze any weights.
    if freeze_backbone: 
        for param in model.parameters():
            param.requires_grad = False
        
        # Unfreeze the parameters of the last fully connected layer
        for param in model.fc.parameters():
            param.requires_grad = True

    #### Step 4: Move the model to the GPU if available
    device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') #this line checks if we have a GPU available
    model = model.to(device)

    return model

resnet_frozen = setup_model(20, True)
print(resnet_frozen)

# Training Time: Transfer Learning with a frozen backbone



Now that we've initialised our model, adapted it's architecture for our new training dataset, and initialised the other elements of training (stochastic gradient descent optimizer and cross-entropy loss), we can start to train our model.

We're going to use a fine-tuning approach, where the model's parameters are adjusted slightly to adapt its learned features to the specific nuances of the new task or domain. We're going to adjust the parameters in only the final linear layer of the network (i.e. all other layers are frozen).

In the lecture, we reviewed that we can train our model by doing the following:

1. Initialise the model.
2. Define a loss function (or cost function or objective).
3. Initialise the SGD optimizer.
4. For n epochs (e.g. loss converged/stops changing):
    1. Put the model in "train" mode with model.train() 
    2. Training loop: For all batches in the training dataset:
        1. Apply any training data transformations to input.
        2. Perform a forward pass to find a prediction.
        3. Calculate the loss + accuracy.
        4. Perform a backward pass to calculate loss gradients with respect to the parameters.
        5. Update the parameters with SGD.
    4. Put the model in "eval" mode with model.eval()
    5. Validation loop: For all batches in the validation dataset:
        1. Perform a forward pass to find a prediction.
        2. Calculate the loss + accuracy.

Below, we've done steps 1-3 and 4A-B. 

**Your turn:** Implement 4C-D.

In [None]:
torch.manual_seed(0)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') #this line checks if we have a GPU available

#any hyperparameters
lr = 0.001
total_epochs = 10

#Step 1: Initialise the model.
resnet_frozen = setup_model(20, True)
# Step 2: Define a loss function
criterion = nn.CrossEntropyLoss()
# Step 3: Initialise the SGD optimizer.
optimizer = optim.SGD(resnet_frozen.parameters(), lr=lr, momentum=0.9)

#Step 4: For n epochs (e.g. loss converged/stops changing)
total_train_loss = []
total_train_acc = []
best_acc = 0
for epoch in range(total_epochs):    
    #Step 4A: Put the model in "train" mode
    resnet_frozen.train() 

    #Step 4B: Training loop: For all batches in the training dataset
    train_loss = []
    correct = 0
    total = 0
    for i, data in  tqdm.tqdm(enumerate(trainloader, 0), total = len(trainloader), desc = f'Epoch {epoch+1} - training phase'):
        inputs, labels = data

        inputs = train_transform(inputs)
        
        inputs = inputs.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()

        outputs = resnet_frozen(inputs)
        
        loss = criterion(outputs, labels)
        
        loss.backward()
        
        optimizer.step()

        train_loss += [loss.cpu().item()]
        
        predicted = torch.argmax(outputs, axis = 1)
        
        correct += torch.sum(predicted == labels).cpu().item()
        total += len(labels)

    mean_train_loss = np.mean(train_loss)
    train_accuracy = correct/total

    total_train_loss += [mean_train_loss]
    total_train_acc += [train_accuracy]
    
    #Step 4C: Put the model in "eval" mode
    

    #Step 4D: Validation loop: For all batches in the validation dataset


plt.plot(total_train_loss, label = 'Train')
plt.legend()
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.show()

plt.plot(total_train_acc, label = 'Train')
plt.legend()
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.show()

## Accounting for class imbalance

When we're training our model, every image in the batch is treated equally important for the learning process.

So what happens when we have some classes with very little data, and others with **loads** of data? Potentially our model will overly focus on performing well on the class with lots of data, and neglect the class with less data.

We can account for this using Pytorch's [WeightedRandomSampler](https://pytorch.org/docs/stable/data.html#torch.utils.data.WeightedRandomSampler), which can be used in the DataLoader class.

To do this, we need to choose weights for the sampler (these dictate how often certain samples are used in a batch) based on our class imbalance, create the WeightedRandomSampler, and reload our Dataloader. 

In [None]:
train_labels = train_dataset.targets

lbls, counts = np.unique(train_labels, return_counts = True)

weighting = torch.DoubleTensor([1/x for x in counts])
sample_weights = weighting[train_dataset.targets]

sampler = torch.utils.data.WeightedRandomSampler(sample_weights, len(train_dataset))
balanced_trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size,
                                          sampler = sampler)

The code below visualises the distribution of samples for each class in the training dataset, when used as is, or when using the WeightedRandomSampler to create the dataloader.

In [None]:
train_classes = train_dataset.targets
class_names = train_dataset.classes

balanced_classes = []
for data in balanced_trainloader:
    ims, tgts = data
    balanced_classes += tgts.tolist()


plt.hist([train_classes, balanced_classes], bins = 20, label = ['Train', 'Balanced'])
plt.xticks([i for i in range(20)], class_names, rotation = 90)
plt.xlabel('Class label')
plt.ylabel('Count')
plt.legend()
plt.show()

**Try re-training the model with the new balanced_trainloader and see how the performance changes.**

# Leveraging Foundation Models for Image Classification

## A new model!

We are going to use the foundation model DINOv2 as a feature extractor -- this means we will provide the model with images, collect the features, and then use some other form of machine learning or deep learning to classify these features into different classes!

Firstly, we can load the model below from the [torch hub](https://pytorch.org/hub/).

You can see the list of DINOv2 architectures at [this link](https://github.com/facebookresearch/dinov2/blob/main/MODEL_CARD.md). We're going to use one of the smaller DINOv2 architectures that still gives great performance but is not too computationally expensive to run.

In [None]:
dino = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14')
dino.eval()
dino.to(device)

DINOv2 is like our other CNN models -- to process data, it should be a Tensor, should be sized 224x224, and should be normalized.

This means that we can continue using our existing dataloader from earlier in the tutorial!

Let's pass some data through DINOv2 and see what comes out.

In [None]:
for data in trainloader:
    inputs, labels = data

    inputs = inputs.to(device)
    
    feature = dino(inputs)

    print(feature.size())
    break

## Linear Classification from DINOv2 Features

Adapt the linear classifier below so that it can take DINOv2 features as input, and return a set of class scores.

In [None]:
class LinearClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(..., ...)
       
        
    def forward(self, x):
        y = self.fc(x)
       
        return y

Now, create a training loop where we learn the parameters for the linear classifier from the DINOv2 features. 

**You can adapt from the above training/val code, BUT make sure that you make the following changes:**
* create a LinearClassifier
* link the SGD with the LinearClassifier
* everywhere you previously tested with ResNet, change to (1) use DINOv2 to reduce the image to a feature, and then (2) pass the feature through the linear classifier
* save the weights of the LinearClassifier

We will also keep the linear classifier on the cpu, as we are training only a very small linear classifier.

**CUDA Run out of memory?**
* If you get a CUDA out of memory error, you will need to restart your kernel and run the notebook again. Make sure to save your progress.
* You can stop this from happening by reducing the batch size in your data loader -- i.e. how many images being moved onto the GPU -- or by using smaller models.
* If this does not work, you may need to train on the CPU.

In [None]:
batch_size = 4
balanced_trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size,
                                          sampler = sampler, num_workers = 1)
valloader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size,
                                          shuffle=False, num_workers = 1)

**Food for thought:** 
* Could you build a KNN classifier using DINOv2 features as the input? Why might this be preferable to a deep learning model?
* Could you build a neural network using DINOv2 features as the input? Why might this be preferable to a linear classifier?