In [None]:
%matplotlib inline

Previous block must be executed in advance so as to ensure that the visualization of the different figures work well.


# Transfer Learning for Computer Vision Tutorial
**Author**: Fernando Fernández-Martínez (based on a previous work by [Sasank Chilamkurthy](https://chsasank.github.io))

In this tutorial, we will learn how to train a convolutional neural network (ConvNet or CNN) for image classification using transfer learning (TL). You can read more about the transfer
learning at [cs231n notes](https://cs231n.github.io/transfer-learning/).

Quoting these notes,

    In practice, very few people train an entire Convolutional Network
    from scratch (with random initialization), because it is relatively
    rare to have a dataset of sufficient size. Instead, it is common to
    pretrain a ConvNet on a very large dataset (e.g. ImageNet, which
    contains 1.2 million images with 1000 categories), and then use the
    ConvNet either as an initialization or a fixed feature extractor for
    the task of interest.

These two major transfer learning scenarios look as follows:

-  **Finetuning the convnet**: Instead of random initialization, we
   initialize the network with a pretrained network, like the one that is
   trained on [Imagenet](https://www.image-net.org/update-mar-11-2021.php) 1000 dataset. Rest of the training looks as
   usual.
-  **ConvNet as fixed feature extractor**: Here, we will freeze the weights
   for all of the network except that of the final fully connected
   layer. This last fully connected layer is replaced with a new one
   with random weights and only this layer is trained.


# Step 0) Computer vision libraries in PyTorch

In [None]:
# License: BSD
# Author: Fernando Fernandez-Martinez (based on previous work by Sasank Chilamkurthy)

from __future__ import print_function, division

# Import PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.backends.cudnn as cudnn
import numpy as np

# torchvision	library: it contains datasets, model architectures and image transformations often used for computer vision problems.
import torchvision

# torchvision.datasets: here you'll find many example computer vision datasets for a range of problems from image classification,
# object detection, image captioning, video classification and more. It also contains a series of base classes for making custom datasets.

# torchvision.models:	this module contains well-performing and commonly used computer vision model architectures implemented in PyTorch, 
# you can use these with your own problems.

# torchvision.transforms:	often images need to be transformed (turned into numbers/processed/augmented) before being used with a model, 
# common image transformations are found here.
from torchvision import datasets, models, transforms

# Import matplotlib for visualization
import matplotlib.pyplot as plt
import time
import os
import copy

# upload external file before import
from google.colab import files

cudnn.benchmark = True
plt.ion()   # interactive mode

# Step 1) Connect to your Google Drive and unzip required data
This notebook will initially request access to your Google Drive files. You should give it access to Google Drive in order to mount it and access its content. By giving such access, the code running in the notebook will be able to modify the files in your Google Drive (this is mandatory to be able to download the models that will result from the training process).

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

First run of this notebook will require you to unzip the dataset that we are going to use as a baseline to become familiar with the training process.

Subsequent runs of the notebook can simply overlook this step by commenting the line. It is possible to comment out a line of code by simply adding a preceding #.

NOTE ::
   You can directly download the data from
   [here](https://download.pytorch.org/tutorial/hymenoptera_data.zip).
   Then you just have to move the .zip file to a folder named:

> **INSE_SmartRecycling/data/**
   
   The notebook will automatically extract its content to the current directory.


In [None]:
!unzip /content/gdrive/MyDrive/INSE_SmartRecycling/data/hymenoptera_data.zip -d data/

# Step 2) Getting data ready

We will use torchvision and torch.utils.data packages for loading the
data.

First, we will start by solving the problem of training a model to classify
**ants** and **bees**. We have about 120 training images each for ants and bees.
There are 75 validation images for each class. Usually, this is a very
small dataset to generalize upon, if trained from scratch. Since we
are using transfer learning, we should be able to generalize reasonably
well.

This dataset is a very small subset of imagenet.




In [None]:
# Data augmentation and normalization for training
# Just normalization for validation
data_transforms = {
    'train': transforms.Compose([
        transforms.Resize(256),
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

### 2.1)  Getting the dataset
DO NOT FORGET TO CHANGE `notebook_run` ACCORDINGLY!!!

Please, leave **just one** of the next alternative lines uncommented.

In [None]:
notebook_run = 1
if notebook_run == 0:
  # Baseline example to familiarize with all the stuff
  data_dir = 'data/hymenoptera_data'
else:
  # After completing a first run we will check the effect of using our own data
  data_dir = '/content/gdrive/MyDrive/INSE_SmartRecycling/EXERCISE/'

In [None]:
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}

### 2.2) Prepare DataLoader

Now we've got a dataset ready to go.
The next step is to prepare it with a torch.utils.data.DataLoader or DataLoader for short. The DataLoader does what you think it might do: **it helps load data into a model**, for training and for inference.

It turns a large Dataset into a Python iterable of smaller chunks.
These smaller chunks are called **batches** or **mini-batches** and can be set by the **batch_size** parameter.

Why do this? Because it's more computationally efficient. 
In an ideal world you could do the forward pass and backward pass across all of your data at once.
But once you start using really large datasets, unless you've got infinite computing power, it's easier to break them up into batches. 
It also gives your model more opportunities to improve.
With mini-batches (small portions of the data), gradient descent is performed more often per epoch (once per mini-batch rather than once per epoch).

What's a good batch size?

32 is a good place to start for a fair amount of problems.
But since this is a value you can set (a hyperparameter) you can try all different kinds of values, though generally powers of 2 are used most often (e.g. 32, 64, 128, 256, 512).

In [None]:
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], 
                                              batch_size=4,
                                              shuffle=True,
                                              num_workers=4) for x in ['train', 'val']}

dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}

### 2.3) Define the problem classes
DO NOT FORGET TO CHANGE `class_names` ACCORDINGLY!!!


In [None]:
if notebook_run == 0:
  class_names = image_datasets['train'].classes
else:
  class_names = ['brontosaurus', 'elephant', 'rhino', 'stegosaurus']
  #class_names = ['bottle', 'mouse', 'pencilcase', 'raspberry']

### 2.4) Visualize a few images
Let's visualize a few training images so as to understand the data
augmentations.



In [None]:
def imshow(inp, title=None):
    """Imshow for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated


# Get a batch of training data
inputs, classes = next(iter(dataloaders['train']))

# Make a grid from batch
out = torchvision.utils.make_grid(inputs)

imshow(out, title=[class_names[x] for x in classes])

# Step 3) Functionizing training and test loops

## 3.1) Training the model

Now, let's write a general function to train a model. Here, we will
illustrate:

-  Scheduling the learning rate
-  Saving the best model

In the following, parameter ``scheduler`` is an LR scheduler object from
``torch.optim.lr_scheduler``.



In [None]:
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    
    train_losses, test_losses = [], []
    
    for epoch in range(num_epochs):
        print(f'Epoch {epoch}/{num_epochs - 1}')
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 'train':
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            if phase == 'train': 
              train_losses.append(epoch_loss)
            else:
              test_losses.append(epoch_loss)

            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
    print(f'Best val Acc: {best_acc:4f}')

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, train_losses, test_losses

## 3.2) Visualizing the model predictions

Generic function to display predictions for a few images




In [None]:
def visualize_model(model, num_images=6):
    was_training = model.training
    model.eval()
    images_so_far = 0
    fig = plt.figure()

    with torch.no_grad():
        for i, (inputs, labels) in enumerate(dataloaders['val']):
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)

            for j in range(inputs.size()[0]):
                images_so_far += 1
                ax = plt.subplot(num_images//2, 2, images_so_far)
                ax.axis('off')
                ax.set_title(f'predicted: {class_names[preds[j]]}')
                imshow(inputs.cpu().data[j])
                print(f'predicted: {class_names[preds[j]]} ({preds[j]})')

                if images_so_far == num_images:
                    model.train(mode=was_training)
                    return
        model.train(mode=was_training)

# Step 4) Finetuning the convnet

Load a pretrained model and reset final fully connected layer.




In [None]:
if notebook_run == 0:
  model_ft = models.resnet18(pretrained=True)
else:
  model_ft = models.mobilenet_v2(pretrained=True)

print(model_ft)

In [None]:
if notebook_run == 0:
  # First run: baseline Resnet model
  # We are about to modify the last layer from:
  # (fc): Linear(in_features=512, out_features=1000, bias=True)
  # to:
  # (fc): Linear(in_features=512, out_features=2, bias=True)
  # since our classification problem exactly involves 2 different classes (ants & bees)
  num_ftrs = model_ft.fc.in_features
  model_ft.fc = nn.Linear(num_ftrs, len(class_names))
else:
  # Second run of the notebook: we adapt the classification layer to match MobileNet
  # We are about to modify the last layer from:
  # (classifier): Sequential(
  #  (0): Dropout(p=0.2, inplace=False)
  # -> (1): Linear(in_features=1280, out_features=1000, bias=True) <-
  # to:
  # (1): Linear(in_features=1280, out_features=4, bias=True)
  # since our new custom classification problem exactly involves 4 different classes (the ones you've chosen)
  num_ftrs = model_ft.classifier[1].in_features
  model_ft.classifier[1] = nn.Linear(num_ftrs, len(class_names)) 

model_ft = model_ft.to(device)

# Setup loss function and optimizer
criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

### 4.1) Train and evaluate

It should take around 15-25 min on CPU. On GPU though, it takes less than a
minute.




In [None]:
model_ft, train_losses, val_losses = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=25)

## 4.2) Make and evaluate random predictions with best model

In [None]:
visualize_model(model_ft)

## 4.3) Visualizing training and validation losses

In [None]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

plt.plot(train_losses, label='Training loss')
plt.plot(val_losses, label='Validation loss')
plt.legend(frameon=False)


## 4.4) Saving the model

In [None]:
if notebook_run == 0:
  model_filename = 'model_ft.pth'
else:
  model_filename = 'custom_model_ft.pth'

torch.save(model_ft.state_dict(), model_filename)

# download checkpoint file
files.download(model_filename)

# Step 5) ConvNet as fixed feature extractor

Here, we need to freeze all the network except the final layer. We need
to set ``requires_grad = False`` to freeze the parameters so that the
gradients are not computed in ``backward()``.

You can read more about this in the documentation
[here](https://pytorch.org/docs/notes/autograd.html#excluding-subgraphs-from-backward).




In [None]:
if notebook_run == 0:
  model_conv = models.resnet18(pretrained=True)
else:
  model_conv = torchvision.models.mobilenet_v2(pretrained=True)

print(model_conv)

for param in model_conv.parameters():
    param.requires_grad = False

# Parameters of newly constructed modules have requires_grad=True by default

if notebook_run == 0:
  # First run: baseline Resnet model
  num_ftrs = model_conv.fc.in_features
  model_conv.fc = nn.Linear(num_ftrs, len(class_names))
else:
  # Second run of the notebook: we adapt the classification layer to match MobileNet
  num_ftrs = model_conv.classifier[1].in_features
  model_conv.classifier[1] = nn.Linear(num_ftrs, len(class_names)) 

model_conv = model_conv.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that only parameters of final layer are being optimized as opposed to before.
if notebook_run == 0:
  optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)
else:
  optimizer_conv = optim.SGD(model_conv.classifier[1].parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

### 5.1) Train and evaluate

On CPU this will take about half the time compared to previous scenario.
This is expected as gradients don't need to be computed for most of the
network. However, forward does need to be computed.




In [None]:
model_conv, train_losses, val_losses = train_model(model_conv, criterion, optimizer_conv, exp_lr_scheduler, num_epochs=25)

## 5.2) Make and evaluate random predictions with best model

In [None]:
visualize_model(model_conv)

plt.ioff()
plt.show()

## 5.3) Visualizing training and validation losses

In [None]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

plt.plot(train_losses, label='Training loss')
plt.plot(val_losses, label='Validation loss')
plt.legend(frameon=False)

## 5.4) Saving the model

In [None]:
if notebook_run == 0:
  model_filename = 'model_conv.pth'
else:
  model_filename = 'custom_model_conv.pth'

torch.save(model_conv.state_dict(), model_filename)

# download checkpoint file
files.download(model_filename)

# Step 6) Training a custom model

Go back to Step 2), arrange your own custom dataset and re-run the notebook with `notebook_run = 1` to train your Raspberry Pi models. 

# Step 7) Further Learning

If you would like to learn more about the applications of transfer learning,
checkout our [Quantized Transfer Learning for Computer Vision Tutorial](https://pytorch.org/tutorials/intermediate/quantized_transfer_learning_tutorial.html).



