<a href="https://colab.research.google.com/github/sadeelmu/deeplearning/blob/main/Transfer_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<style>
r { color: Red }
o { color: Orange }
g { color: Green }
b { color: Blue }
l { color: lighblue }
</style>


<html>
<body>
<table style="border: 0; rules=none; font-size:28px">
<tr>
<th rowspan=5><img width="200px", height="70px" src="https://raw.githubusercontent.com/camma-public/multibypass140/master/static/camma_logo_tr.png"/></th>
<td colspan=2 style="font-size:16px; color:blue; font-weight:bold"><h1><b>Deep Learning for Computer Vision</b></h1></td>
<th rowspan=5><img width="200px", height="130px" src="https://community.sap.com/legacyfs/online/storage/blog_attachments/2019/10/283545_NeuralNetwork_R_blue.png"/></th>
</tr>
<tr><td>Instructor:</td><td>Dr. Chinedu Nwoye</td></tr>
<tr><td colspan=2>(c) Research Group CAMMA</td></tr>
<tr><td colspan=2>University of Strasbourg</td></tr>
<tr><td>Website:</td><td><g>http://camma.u-strasbg.fr</g></td></tr>
<tr><td colspan=4 style="text-align:centre; background-color:black; font-weight:bold"><center><h3><o>Training by Transfer Learning </o></td></center></tr>
</table>
</body>
</html>



------------

### Instructions

- In this lab session we will experiment two scenarios of transfer learning in deep learning.
  -  **Finetuning the convnet**: Instead of random initialization, we initialize the network with a pretrained network, like the one that is trained on imagenet 1000 dataset.
  -  **ConvNet as fixed feature extractor**: Here, we will freeze the weights for all of the network except that of the final fully connected layer. This last fully connected layer is replaced with a new one with random weights and only this layer is trained.
- Read all the descriptions and code.
- Run the pre-completed cells.
- Your will be required to complete all the TODO tasks in this exercise. Ask the instructor if you get in a hole.
* You also have an additional `Bonus` exercise if you finish very quickly.

### GPU activation

- Be sure to have cuda enabled from your computer.
- We can use Tensor Processing Unit (TPU) today.



In [None]:
# run this cell in order to have pytorch TPU acceleration
!pip install cloud-tpu-client==0.10 torch==1.12.0 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-1.12-cp37-cp37m-linux_x86_64.whl

### Imports

- Every experiments starts with importing the required libraries.
- Check and see what libraries you don't know their usage.

In [None]:
# License: BSD
# Author: Sasank Chilamkurthy

from __future__ import print_function, division

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
# from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import random
import copy
import urllib
from zipfile import ZipFile

%matplotlib inline
plt.ion()   # interactive mode


Section 1: Dataset
---------

- We are going to finetune a model on a hymenoptera dataset to classify **ants** and **bees**.
- This is a small subset of imagenet dataset.
- The dataset statistics are as follows:
  <html>
    <table>
      <tr>
        <td>Split</td> <td># ants</td> <td># bees</td> <td># Total</td>
      </tr>
      <tr>
        <td>Train</td> <td>100</td> <td>100</td> <td>200</td>
      </tr>
      <tr>
        <td>Val</td> <td>25</td> <td>30</td> <td>55</td>
      </tr>
      <tr>
        <td>Test</td> <td>50</td> <td>55</td> <td>105</td>
      </tr>    
      <tr>
        <td>Total</td> <td>175</td> <td>185</td> <td>360</td>
      </tr>
    </table>
  </html>

- This is a very small dataset to generalize upon, if trained from scratch.
- Since we are using transfer learning, we should be able to generalize reasonably well.


**[1.1] Instructions**
- Download and extract the dataset (https://seafile.unistra.fr/f/79ef71e90b5b4696b702/?dl=1) in your google drive.


In [None]:
# Download ants and bees dataset  (https://seafile.unistra.fr/f/79ef71e90b5b4696b702/?dl=1)

!wget https://seafile.unistra.fr/f/79ef71e90b5b4696b702/?dl=1 --content-disposition &&  unzip -oq hymenoptera_data.zip -d hymenoptera_data



**[1.2] Mount Drive**
- If your drive is not visible, mount Google Drive, so we can see the data.




In [None]:
from google.colab import drive
drive.mount('/content/drive')

**[1.3] Data preparation**
- Use torchvision and torch.utils.data packages for to build your data loader.
- Here, we will treat the preprocessing of our training and validation data separately.

In [None]:
# Since we pretrained from imagenet, we need the statistics
imagenet_mean = np.array([0.485, 0.456, 0.406])
imagenet_std = np.array([0.229, 0.224, 0.225])

# Training data preprocessing: (Some data augmentation and normalization)
train_transform = torchvision.transforms.Compose([
        torchvision.transforms.RandomResizedCrop(224),
        torchvision.transforms.RandomHorizontalFlip(),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(imagenet_mean, imagenet_std)
    ])

# Validation data preprocessing: just normalization
eval_transform = torchvision.transforms.Compose([
        torchvision.transforms.Resize(256),
        torchvision.transforms.CenterCrop(224),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(imagenet_mean, imagenet_std)
    ])

# combine them
data_transforms = {
    'train': train_transform,
    'val': eval_transform,
    'test': eval_transform,
}

data_dir = 'hymenoptera_data'
image_datasets = {x: torchvision.datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val', 'test']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=32, shuffle=True, num_workers=2) for x in ['train', 'val', 'test']}


dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val', 'test']}
class_names = image_datasets['train'].classes

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("device is {}".format(device))

**TODO [1.4]: Data visualization**
- Visualize a few training images to understand the data augmentations.
- We need a function to denormalize the images for display.



In [None]:
def display_images(images, titles=None):
    """Imshow for Tensor."""
    images = images.data.permute(0,2,3,1).cpu().numpy()
    images = imagenet_std.reshape([1,1,1,3]) * images + imagenet_mean.reshape([1,1,1,3])
    images = np.clip(images, 0, 1)
    num_imgs = len(images)
    if titles == None: titles = [""]*len(images)
    plt.figure(figsize=(12,80))
    for i in range(num_imgs):
        plt.subplot(num_imgs//2, 2,i+1)    # the number of images in the grid is 5*5 (25)
        plt.imshow(images[i])
        plt.title(titles[i])
        plt.axis('off')
    plt.show()




# Get a batch of training data
inputs, classes = next(iter(dataloaders['train']))

# Display images and labels
display_images(inputs, titles=[class_names[x] for x in classes])

# Section 2: Experiment Run Functions

**TODO [2.1] Training function**
- Let's write a function to train our model
- It should include loss optimzation and performance evaluation.

In [None]:
#TODO find and add one line of code that is missing in the train_step function

def train_step(inputs, labels, model, criterion, optimizer):
    model.train() # set to training mode
    inputs = inputs.to(device)
    labels = labels.to(device)
    optimizer.zero_grad() # zero the parameter gradients
    _, preds = torch.max(outputs, 1) # get predicted labels
    loss = criterion(outputs, labels) # compute loss
    loss.backward() # backprops
    optimizer.step() # optimization
    # performance
    batch_loss = loss.item() * inputs.size(0)
    batch_accs = torch.sum(preds == labels.data)
    return batch_loss, batch_accs

**TODO [2.2] Evaluation function**
- Let's write an evaluation function to evaluate our model.
- It should do just inference and performance evaluation.


In [None]:
# TODO: find and add one line of code that is missing in the function

def eval_step(inputs, labels, model):
    with torch.no_grad(): # stop gradient computation
      inputs = inputs.to(device)
      labels = labels.to(device)
      outputs = model(inputs) # inference
      preds = torch.argmax(outputs, axis=1)# get predicted labels
      batch_accs = torch.sum(preds == labels.data) # performance
    return batch_accs


**[2.3] Train and validate**
- We can a loop of train and eval on each epoch
- We include learning rate scheduling, and saving the best model after validation.
- The parameter ``scheduler`` is an LR scheduler object from ``torch.optim.lr_scheduler``.

In [None]:
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    start = time.time()

    # Let's keep track of best performing weights
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        # train loop
        phase = 'train'
        running_loss = 0.0
        running_accs = 0
        for inputs, labels in dataloaders[phase]: # Iterate over data.
            loss, acc = train_step(inputs, labels, model, criterion, optimizer)
            running_loss += loss
            running_accs += acc
        scheduler.step() # decay learning rate
        train_epoch_loss = running_loss / dataset_sizes[phase]
        train_epoch_acc = running_accs.double() / dataset_sizes[phase]

        # validation loop
        phase = 'val'
        running_accs = 0
        for inputs, labels in dataloaders[phase]: # Iterate over data.
            acc = eval_step(inputs, labels, model)
            running_accs += acc
        val_epoch_acc = running_accs.double() / dataset_sizes[phase]

        print('Epoch {}/{} >> TRAIN Loss: {:.4f} Acc: {:.4f}  | VAL Acc: {:4f}'.format(
                epoch, num_epochs-1, train_epoch_loss, train_epoch_acc, val_epoch_acc))
        print('-' * 10)

        # update weights if only it improves
        if val_epoch_acc > best_acc:
            best_acc = val_epoch_acc
            best_model_wts = copy.deepcopy(model.state_dict())

    time_elapsed = time.time() - start
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('With best val accuracy: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

**[2.4] Model prediction visualization**

- Generic function to display predictions for a few images
- We give a function to denormalize image for display.

In [None]:
def visualize_preds(model, dataloader, choice):
    iterloader = iter(dataloader)
    if choice >= len(iterloader):
        choice = 0
    for _ in range(choice-1):
        next(iterloader)
    model.eval()
    with torch.no_grad():
        inputs, labels = next(iterloader)
        inputs = inputs.to(device)
        labels = labels.to(device)
        outputs = model(inputs) # inference
        _, preds = torch.max(outputs, 1)
    titles = ['Label: {} | Predicted: {}'.format(class_names[gt], class_names[pd]) for gt, pd in zip(labels, preds)]
    display_images(inputs, titles) # display
    return None


Section 3: Finetuning the convnet
----------------------

**TODO [3.1] Model creation**
- Load a pretrained model (resnet-18)
- Create a new final fully connected layer to suit the dataset classes.


In [None]:
# ResNet-18 model pretrained on ImageNet.
model_ft = torchvision.models.resnet18(pretrained=True)

# The number of classes of ImageNet = 1000, so the model last layer has 1000 dim and do not fit our data which has 2 classes
num_input_filters = model_ft.fc.in_features
num_out_filters = model_ft.fc.out_features
print("Pretrained mode dim = ({}, {}) ".format(num_input_filters, num_out_filters) )

# TODO: Create a linear layer with appropriate dim for your data
custom_fc = ...

# TODO: Replace the final layer of the pretrained model with your new final layer
model_ft...

# Set to device
model_ft = model_ft.to(device)


# provide "summary" with model and input data size of (3, 244, 244)
from torchsummary import summary
summary(model_ft, (3, 224, 224))


**TODO [3.2] Training properties**
- Define the loss function.
- Define the optimizer.
- Define the learning rate decay schedule

In [None]:
# Loss function
criterion = nn.CrossEntropyLoss()

# TODO: Create the optimizer (use learning rate of 0.001) to optimize all the parameters of the model. What does this mean?
optimizer_ft = ...

# Scheduler: Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

**TODO [3.3] Train and validate**

- It should take around 15-25 min on CPU. On GPU though, it takes less than a minute.


In [None]:
# TODO: train for 25 epochs

model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=...)

**TODO [3.4] Test your model on the train and test sets**

In [None]:
# Testing on the train data
# Find and add 2 missing lines in this code
phase = 'train'
for inputs, labels in dataloaders[phase]: # Iterate over data.
    running_accs += acc
train_epoch_acc = running_accs.double() / dataset_sizes[phase]
print('TRAIN DATA Acc: {:.4f}'.format( train_epoch_acc))
print('-' * 10)




# Testing on the test data
# Find and add 1 missing line in this code
phase = 'test'
running_accs = 0
for inputs, labels in dataloaders: # Iterate over data.
    acc = eval_step(inputs, labels, model_ft)
    running_accs += acc
train_epoch_acc = running_accs.double() / dataset_sizes[phase]
print('TRAIN DATA Acc: {:.4f}'.format( train_epoch_acc))
print('-' * 10)

**TODO [3.5] Visualize some prediction of your model**

In [None]:
# TODO: Visualize the images and prediction, manually find the images that your model failed on.

iterloader = iter(dataloaders['test'])
N = len(dataloaders['test'])
choice = random.choice(list(range(N)))
print("Chosing batch {} out of {} batches".format(choice, N))

visualize_preds(model_ft, dataloaders['test'], choice)

Section 4: ConvNet as fixed feature extractor
----------------------------------

- Here, we need to freeze all the network except the final layer. No re-training.

- You can read more about this in the documentation
`here <https://pytorch.org/docs/notes/autograd.html#excluding-subgraphs-from-backward>`__.

**TODO [4.1] Creating the model**
- We freeze the model by setting ``requires_grad == False`` to freeze the parameters so that the gradients are not computed in ``backward()``.
- We create new final layer that is not frozen. So, only this layer will be trained.

In [None]:
# Again, we create a ResNet-18 model pretrained on ImageNet.
model_conv = torchvision.models.resnet18(pretrained=True)

# TODO: Freeze all the model parameters
for param in model_conv.parameters():
    ...

# The number of classes of ImageNet = 1000, so the model last layer has 1000 dim and do not fit our data which has 2 classes
num_input_filters = model_conv.fc.in_features
num_out_filters = model_conv.fc.out_features
print("Pretrained mode dim = ({}, {}) ".format(num_input_filters, num_out_filters) )

# TODO: Create a linear layer with appropriate dim for your data
custom_fc = ...

# Requires_grad=True by default, check it out
print("Is the new layer trainable ? :",  custom_fc.weight.requires_grad)

# TODO: Replace the final layer of the pretrained model with your new final layer
model_conv.fc = ...


# Set to device
model_conv = model_conv.to(device)


# provide "summary" with model and input data size of (3, 244, 244)
from torchsummary import summary
summary(model_conv, (3, 224, 224))



**TODO [4.2] Training properties**
- Define the loss function.
- Define the optimizer.
- Define the learning rate decay schedule

In [None]:
# loss function
criterion = nn.CrossEntropyLoss()

# TODO: Create optimizer to optimize only the parameters of final layer that you created as opposed to before.
optimizer_conv = optim.SGD(..., lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

**[4.3] Train and validate**

- On CPU this will take about half the time compared to previous scenario.
- This is expected as gradients don't need to be computed for most of the network.
- However, forward does need to be computed.

In [None]:
# TODO: provide all the input arguments and train the model

model_conv = train_model(...)

**[4.4] Test on the train and test set**

In [None]:
# Testing on the train data
phase = 'train'
running_accs = 0
for inputs, labels in dataloaders[phase]: # Iterate over data.
    acc = eval_step(inputs, labels, model_conv)
    running_accs += acc
train_epoch_acc = running_accs.double() / dataset_sizes[phase]
print('TRAIN DATA Acc: {:.4f}'.format( train_epoch_acc))
print('-' * 10)


# Testing on the test data
phase = 'test'
running_accs = 0
for inputs, labels in dataloaders[phase]: # Iterate over data.
    acc = eval_step(inputs, labels, model_conv)
    running_accs += acc
train_epoch_acc = running_accs.double() / dataset_sizes[phase]
print('TRAIN DATA Acc: {:.4f}'.format( train_epoch_acc))
print('-' * 10)

**TODO [4.5] Visualize your predictions**

In [None]:
# TODO: Visualize the images and prediction, manually find the images that your model failed on.

iterloader = iter(dataloaders['test'])
N = len(dataloaders['test'])
choice = random.choice(list(range(N)))
print("Chosing batch {} out of {} batches".format(choice, N))

visualize_preds(model_conv, dataloaders['test'], choice)