# Model training via transfer learning

In this notebook we will briefly see how the vision models have been trained with the BirdPics dataset. 

⚠️ It is **necessary** that you have already **downloaded** the dataset to your personal Drive. You can do it via the notebook '*colab_notebooks/download_gdrive.ipynb*'.

In this project the Pytorch package has been used for the Machine Learning protocol. We are using *tranfer learning*, which means that we are loading large models that have already been trained for computer vision and retraining them with the BirdPics dataset.

## Prerequisites

We first load the GitHub repository to be able to use the built in functions found in the *utils* folder.

In [3]:
!git clone https://github.com/jgbeni/BirdPics.git
!mv BirdPics/utils .
!rm -rf BirdPics

fatal: destination path 'BirdPics' already exists and is not an empty directory.


We import the required packages and mount the Drive. Ideally, the code should be run with a GPU for a significant speedup.

In [None]:
import utils.data_preprocessing as dp #This is the package found in the GitHub repository
import numpy as np
import h5py
import torch
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.optim as optim
import torchvision.models as models
import os
from tqdm import tqdm
from google.colab import drive

drive.mount('/content/drive',force_remount=True)
device = "cuda" if torch.cuda.is_available() else "cpu"

## Loading and preparing the dataset

We create a *models* folder to save the results

In [5]:
dir = '/content/drive/MyDrive/BirdPics'
os.makedirs(dir+'/models',exist_ok=True)

We load training and validation images (*X*) and labels (*Y*)

In [6]:
# Run this cell only once to avoid running our of RAM memory
f = h5py.File(dir+'/data/bird_data.hdf5', "r")

In [7]:
X_train,Y_train = f['train']['X'],f['train']['Y']
X_val,Y_val = f['val']['X'],f['val']['Y']

It is also important to encode the labels from strings to integers. In this notebook we have chosen the following encoding:


*   swallow = 0
*   swift = 1
*   martin = 2





In [8]:
Y_train = dp.prepare_labels(Y_train)
Y_val = dp.prepare_labels(Y_val)

print('Sample of encoded training labels:',Y_train[0:8])

We load the dataset into a custom Pytorch Dataset class. It has been written to apply **data augmentation** (check notebook '*N2_image_preprocessing.ipynb*') for training data (*train = True*) and to leave the data intact otherwise.

In [9]:
train_dataset = dp.HDF5Dataset(X_train,Y_train,train=True)
val_dataset = dp.HDF5Dataset(X_val,Y_val)

And we create Pytorch dataloaders that will load the dataset into **minibatches** of size 64. This batch size can be changed depending on the machine capabilities: a larger batch size will fasten the training process at the expense of a higher memory cost.

In [17]:
batch_size = 64

train_loader = DataLoader(train_dataset, num_workers=2, batch_size=batch_size, pin_memory=True,
                                                shuffle=True)
val_loader = DataLoader(val_dataset, num_workers=2, batch_size=batch_size, pin_memory=True,
                                                shuffle=True)

## Choosing the model

In this example, we will use a pre-trained VGG16 model (https://www.kaggle.com/code/blurredmachine/vggnet-16-architecture-a-complete-guide) and fine-tune it for our bird species classification task.

In [None]:
# Load pre-trained VGG16 model
model = models.vgg16(weights='DEFAULT')
# Other models can be loaded similarly, these are other examples that can be found in the repository:
# model = models.vgg19(weights='DEFAULT') # VGG19 model
# model = models.resnet50(weights='DEFAULT') # ResNet50 model

model_path = 'models/vgg16_retrained.pth' # Path to save the retrained model
loss_acc_path = 'models/vgg16_retrained.npz' # Path to save loss and accuracy data
# Print the model architecture
print(model)

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

It is important to rewrite the last linear layer to match the number of output features needed for our task (i.e., three output features):

In [None]:
model.classifier[-1] = nn.Linear(in_features=4096, out_features=3) # Modify the last layer for 3 output features
model = model.to(device) # Move the model to the appropriate device (ideally a GPU)

Note that we are not *freezing* any of the model's layers, as is usual in most transfer-learning/fine-tuning protocols to avoir overfitting. In our case, as we are implementing **data augmentation** on the training dataset, we can safely optimize every model layer without finding any overfitting. 

## Training and saving the results

Now we need to define the model hyperparameters:

In [None]:
# Define the loss function
criterion = nn.CrossEntropyLoss().to(device)

# Define the optimizer
optimizer = optim.AdamW(model.parameters(), lr=0.00005 ,weight_decay=1e-3)

# Exponential learning rate decay
decayRate = 0.96
lr_scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer=optimizer, gamma=decayRate)

Some insights and coments on the choices:
- The learning rate (*lr*) has been set to a quite small value, $l_{r} = 5 \cdot 10^{-5}$. This ensures the model parameters do not diverge too much from the default ones, which were already optimized for image classification. This way the model is just being fine-tuned for our concrete task.
- The AdamW optimizer is chosen so the weight decay is decoupled from the learning rate, which improves generalization on image classification tasks (https://arxiv.org/abs/1711.05101).
- An exponential learning rate decay is implemented to improve the optimization as the model gets closer to convergence.

Now we end up with the training loop code, where we are saving the model every time the validation accuracy increases among epochs.

In [None]:
num_epochs = 20 # number of epochs for training

max_val = 0.0 # variable to store the maximum validation accuracy achieved during training (starting at 0) 
PATH = os.path.join(dir,model_path) # full path to save the model

# Arrays to store loss and accuracy values
train_loss,val_loss = np.zeros(num_epochs,dtype=np.float32),np.zeros(num_epochs,dtype=np.float32)
train_acc,val_acc = np.zeros(num_epochs,dtype=np.float32),np.zeros(num_epochs,dtype=np.float32)

# Training loop
for epoch in range(num_epochs):
    print('Epoch %d/%d' %(epoch+1,num_epochs))
    train_correct,train_samples = 0,0 # variables to count correct predictions and total samples in training
    val_correct,val_samples = 0,0 # variables to count correct predictions and total samples in validation
    for i, (images, labels) in enumerate(tqdm(train_loader)): # loop over training batches
        images = images.to(device) # move images to GPU
        labels = labels.type(torch.LongTensor) # ensure labels are of type LongTensor
        labels = labels.to(device) # move labels to GPU

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        train_loss[epoch] += loss.item()/len(train_loader) # accumulate training loss

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        _, predicted = torch.max(outputs, 1) # get the index of the max log-probability
        train_samples += labels.size(0) # update total number of training samples
        train_correct += (predicted == labels).sum().item() # update number of correct predictions

    train_acc[epoch] = 100.0 * train_correct / train_samples # calculate training accuracy
    print('train loss %.4f - train acc. %.2f' %(train_loss[epoch],train_acc[epoch]))


    for val_images, val_labels in val_loader: # loop over validation batches
        val_images = val_images.to(device) # move images to GPU
        val_labels = val_labels.type(torch.LongTensor) # ensure labels are of type LongTensor
        val_labels = val_labels.to(device) # move labels to GPU

        outputs = model(val_images)
        _, predicted = torch.max(outputs, 1) # get the index of the max log-probability
        val_samples += val_labels.size(0) # update total number of validation samples
        val_correct += (predicted == val_labels).sum().item() # update number of correct predictions

        val_loss[epoch] += criterion(outputs, val_labels).item()/len(val_loader) # accumulate validation loss
    val_acc[epoch] = 100.0 * val_correct / val_samples # calculate validation accuracy
    if val_acc[epoch] > max_val: # save model if current validation accuracy is greater than the previous maximum
        max_val = val_acc[epoch]
        torch.save(model.state_dict(), PATH)
    print('val loss %.4f - val acc. %.2f' %(val_loss[epoch],val_acc[epoch]))

    lr_scheduler.step() # update learning rate

print('Finished Training')

100%|██████████| 312/312 [04:26<00:00,  1.17it/s]

train loss 1.2603 - train acc. 46.30





val loss 0.9744 - val acc. 61.62


100%|██████████| 312/312 [04:43<00:00,  1.10it/s]

train loss 0.9377 - train acc. 52.58





val loss 0.8637 - val acc. 67.01


100%|██████████| 312/312 [04:43<00:00,  1.10it/s]

train loss 0.8732 - train acc. 56.12





val loss 0.7700 - val acc. 74.46


100%|██████████| 312/312 [04:43<00:00,  1.10it/s]

train loss 0.8066 - train acc. 59.61





val loss 0.7350 - val acc. 76.67


100%|██████████| 312/312 [04:43<00:00,  1.10it/s]

train loss 0.7697 - train acc. 61.63





val loss 0.7193 - val acc. 78.38


100%|██████████| 312/312 [04:43<00:00,  1.10it/s]

train loss 0.7309 - train acc. 63.77





val loss 0.6518 - val acc. 78.82


100%|██████████| 312/312 [04:31<00:00,  1.15it/s]

train loss 0.6987 - train acc. 65.06





val loss 0.6676 - val acc. 79.95


100%|██████████| 312/312 [04:31<00:00,  1.15it/s]

train loss 0.6839 - train acc. 66.21





val loss 0.5823 - val acc. 83.43


100%|██████████| 312/312 [04:31<00:00,  1.15it/s]

train loss 0.6642 - train acc. 66.64





val loss 0.5422 - val acc. 84.80


100%|██████████| 312/312 [04:31<00:00,  1.15it/s]

train loss 0.6501 - train acc. 67.21





val loss 0.5248 - val acc. 85.83


100%|██████████| 312/312 [04:32<00:00,  1.14it/s]

train loss 0.6340 - train acc. 68.06





val loss 0.5176 - val acc. 84.36


100%|██████████| 312/312 [04:32<00:00,  1.15it/s]

train loss 0.6163 - train acc. 68.95





val loss 0.4924 - val acc. 87.16


100%|██████████| 312/312 [04:32<00:00,  1.15it/s]

train loss 0.6113 - train acc. 69.06





val loss 0.4986 - val acc. 84.51


100%|██████████| 312/312 [04:31<00:00,  1.15it/s]

train loss 0.6091 - train acc. 69.38





val loss 0.4803 - val acc. 87.30


100%|██████████| 312/312 [04:32<00:00,  1.15it/s]

train loss 0.5963 - train acc. 69.73





val loss 0.4663 - val acc. 86.57


100%|██████████| 312/312 [04:32<00:00,  1.15it/s]

train loss 0.5876 - train acc. 70.07





val loss 0.4302 - val acc. 87.45


100%|██████████| 312/312 [04:32<00:00,  1.15it/s]

train loss 0.5831 - train acc. 70.31





val loss 0.4324 - val acc. 86.08


100%|██████████| 312/312 [04:31<00:00,  1.15it/s]

train loss 0.5766 - train acc. 70.51





val loss 0.4005 - val acc. 87.60


100%|██████████| 312/312 [04:32<00:00,  1.15it/s]

train loss 0.5637 - train acc. 71.33





val loss 0.4129 - val acc. 87.50


100%|██████████| 312/312 [04:31<00:00,  1.15it/s]

train loss 0.5690 - train acc. 70.80





val loss 0.4374 - val acc. 88.77
Finished Training


And we finally save the loss and accuracy trajectories in an .npz file for further evaluation.

In [22]:
np.savez(os.path.join(dir,loss_acc_path),train_loss=train_loss,val_loss=val_loss,train_acc=train_acc,val_acc=val_acc)