# Transfer learning

In this lab we will make use of pretrained models in order to boost performance on smaller datasets. For this experiment, we will be working with an AlexNet model pretrained on the Imagenet dataset in order to get a good accuracy score on the Caltech 101 dataset.

### Prerequisites

1. In order to perform the experiments, please download in advance the Caltech 101 dataset from https://drive.google.com/file/d/137RyRjvTBkBiIfeYBNZBtViDHQ6_Ewsp/view
2. In the working directory please create a folder named 'dataset' and a subfolder named 'caltech101' within it. Extract the dataset in the subfolder. The overall folder structure should look as follows: dataset/caltech101/101_ObjectCategories.
3. Install the torchvision module using 'conda install torchvision' if you have not done so already.

In [73]:
from tqdm import tqdm
import numpy as np
import numpy.random as random
import torch
import torchvision
import warnings
import matplotlib.pyplot as plt
import typing as t
from torch import Tensor
from torch.utils.data import random_split
from torch.utils.data import DataLoader, Dataset
from torchvision.models import AlexNet_Weights


warnings.filterwarnings('ignore')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
seed = 42

torchvision.set_image_backend('PIL')
gen = torch.Generator()
gen.manual_seed(seed)
random.seed(seed)

In [75]:
torch.cuda.is_available()

False

Firstly, we will load the AlexNet model architecture using torchvision. All available models with their respective parameters can be found at: https://pytorch.org/vision/stable/models.html

In [19]:
model = torchvision.models.alexnet()

In the first run we will just load the model architecture, without the pretrained weights. We can visualize the model architecture as follows:

In [20]:
model

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
 

Next, we will load the Caltech 101 dataset and apply the neccesary transformations on it. Afterwards, we will split the dataset into train, validation and test.

In this block of code, define the dataloaders for train, validation and test and try to iterate through the data. What happens? Try to fix the problem using a lambda transform: https://pytorch.org/vision/stable/transforms.html#generic-transforms

In [71]:
from torchvision.transforms.v2 import Compose, ToImage, ToDtype, Resize, Normalize, Lambda, Grayscale
from torchvision.models import AlexNet_Weights
from torchvision.transforms.v2 import Transform

# Use original transformations of AlexNet
weights = AlexNet_Weights.DEFAULT
preprocess: Transform = weights.transforms()
transform = Compose([
    ToImage(),
    ToDtype(dtype=torch.float32, scale=True),
    Resize((224, 224)),
    Lambda(lambda x: x.repeat(3, 1, 1) if x.shape[0] != 3 else x),
])

def collate_fn(batch):
    images = []
    labels = []
    for X, y in batch:
        images.append(transform(X))
        labels.append(y)
    return torch.stack(images), torch.tensor(labels)

# Preprocess the dataset using those transforms
dataset = torchvision.datasets.Caltech101('./dataset')

# Split datasets
batch_size= 16
n_samples = len(dataset)
train_ds, val_ds, test_ds = random_split(dataset, [0.8, 0.1, 0.1])

# Define dataloaders for train, validation and test
# Iterate through the dataloaders
train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True, generator=gen, collate_fn=collate_fn, pin_memory=True, pin_memory_device=device.type)
valid_dl = DataLoader(val_ds, batch_size=batch_size, shuffle=True, generator=gen, collate_fn=collate_fn, pin_memory=True, pin_memory_device=device.type)
test_dl = DataLoader(test_ds, batch_size=batch_size, shuffle=True, generator=gen, collate_fn=collate_fn, pin_memory=True, pin_memory_device=device.type)

In [72]:
next(iter(train_dl))

NotImplementedError: Could not run 'aten::_pin_memory' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_pin_memory' is only available for these backends: [BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

BackendSelect: registered at aten\src\ATen\RegisterBackendSelect.cpp:742 [kernel]
Python: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:153 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at ..\aten\src\ATen\functorch\DynamicLayer.cpp:498 [backend fallback]
Functionalize: registered at ..\aten\src\ATen\FunctionalizeFallbackKernel.cpp:290 [backend fallback]
Named: registered at ..\aten\src\ATen\core\NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at ..\aten\src\ATen\ConjugateFallback.cpp:17 [backend fallback]
Negative: registered at ..\aten\src\ATen\native\NegateFallback.cpp:19 [backend fallback]
ZeroTensor: registered at ..\aten\src\ATen\ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:86 [backend fallback]
AutogradOther: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradCPU: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradCUDA: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradHIP: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradXLA: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradMPS: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradIPU: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradXPU: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradHPU: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradVE: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradLazy: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradMTIA: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradPrivateUse1: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradPrivateUse2: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradPrivateUse3: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradMeta: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
AutogradNestedTensor: registered at ..\torch\csrc\autograd\generated\VariableType_0.cpp:16790 [autograd kernel]
Tracer: registered at ..\torch\csrc\autograd\generated\TraceType_0.cpp:16725 [kernel]
AutocastCPU: fallthrough registered at ..\aten\src\ATen\autocast_mode.cpp:382 [backend fallback]
AutocastCUDA: fallthrough registered at ..\aten\src\ATen\autocast_mode.cpp:249 [backend fallback]
FuncTorchBatched: registered at ..\aten\src\ATen\functorch\LegacyBatchingRegistrations.cpp:710 [backend fallback]
FuncTorchVmapMode: fallthrough registered at ..\aten\src\ATen\functorch\VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at ..\aten\src\ATen\LegacyBatchingRegistrations.cpp:1075 [backend fallback]
VmapMode: fallthrough registered at ..\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at ..\aten\src\ATen\functorch\TensorWrapper.cpp:203 [backend fallback]
PythonTLSSnapshot: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:161 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at ..\aten\src\ATen\functorch\DynamicLayer.cpp:494 [backend fallback]
PreDispatch: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:165 [backend fallback]
PythonDispatcher: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:157 [backend fallback]


With the dataset ready, it is now time to adapt the model architecture in order to fit our needs. Define a new classifier for the AlexNet model having the same structure, changing only the number of output neurons to 101.

In [65]:
model.classifier

Sequential(
  (0): Dropout(p=0.5, inplace=False)
  (1): Linear(in_features=9216, out_features=4096, bias=True)
  (2): ReLU(inplace=True)
  (3): Dropout(p=0.5, inplace=False)
  (4): Linear(in_features=4096, out_features=4096, bias=True)
  (5): ReLU(inplace=True)
  (6): Linear(in_features=4096, out_features=101, bias=True)
)

In [66]:
import torch.nn as nn
from torch.nn import Dropout, Linear, ReLU


# Create a new classifier similar to AlexNet
model.classifier = torch.nn.Sequential(
    Dropout(p=0.5, inplace=False),
    Linear(in_features=9216, out_features=4096, bias=True),
    ReLU(inplace=True),
    Dropout(p=0.5, inplace=False),
    Linear(in_features=4096, out_features=4096, bias=True),
    ReLU(inplace=True),
    Linear(in_features=4096, out_features=101, bias=True)
)

### Training the model

Define an Adam optimizer with a learining rate of 1e-4 and a cross entropy loss. Afterwards, train the model for 2 epochs. Note the results

In [67]:

class Metrics(t.TypedDict):
    accuracy: t.List[float]
    loss: t.List[float]


class TrainHistory(t.TypedDict):
    train: Metrics
    valid: Metrics


def train_validate(model: nn.Module,
                   train_dl: DataLoader,
                   valid_dl: DataLoader,
                   epochs: int,
                   loss_fn: nn.Module,
                   optim: torch.optim.Optimizer) -> TrainHistory:
    # Track history
    history: TrainHistory = {
        'train': {
            'accuracy': [],
            'loss': [],
        },
        'valid': {
            'accuracy': [],
            'loss': [],
        }
    }

    # Do Training & Validation & Testing
    for epoch in range(epochs):
        print('Epoch [%d/%d]' % (epoch + 1, epochs), end=' - ')

        ### Training ###
        model.train(True)
        model.requires_grad_(True)

        # Track across a single epoch
        train_loss = []
        train_accuracy = []

        for b, (X, y) in enumerate(train_dl):
            X, y = X.to(device), y.to(device)

            # Prevent grad accumulation
            optim.zero_grad()

            # Forward pass
            logits = model.forward(X)
            loss: Tensor = loss_fn(logits, y)
            y_pred: Tensor = logits.argmax(dim=1).detach()

            # Backward pass
            loss.backward()
            optim.step()

            # Track metrics
            train_loss.append(loss.detach().cpu().item())
            train_accuracy.extend((y_pred == y).detach().cpu().tolist())

        # Aggregate training results
        history['train']['loss'].append(torch.mean(torch.tensor(train_loss)).item())
        history['train']['accuracy'].append((torch.sum(torch.tensor(train_accuracy)) / len(train_accuracy)).item())

        ### Validation ###
        model.train(False)
        model.requires_grad_(False)

        # Track across a single epoch
        valid_loss = []
        valid_accuracy = []

        for b, (X, y) in enumerate(valid_dl):
            X, y = X.to(device), y.to(device)

            # Forward pass
            logits = model.forward(X)
            loss: Tensor = loss_fn(logits, y)
            y_pred: Tensor = logits.argmax(dim=1)

            # Track metrics
            valid_loss.append(loss.detach().cpu().item())
            valid_accuracy.extend((y_pred == y).detach().cpu().tolist())

        # Aggregate training results
        history['valid']['loss'].append(torch.mean(torch.tensor(valid_loss)).item())
        history['valid']['accuracy'].append((torch.sum(torch.tensor(valid_accuracy)) / len(valid_accuracy)).item())

        # Inform regarding current metrics
        print('t_loss: %f, t_acc: %f, v_loss: %f, v_acc: %f'
              % (history['train']['loss'][-1], history['train']['accuracy'][-1], history['valid']['loss'][-1], history['valid']['accuracy'][-1]))

    # Output the obtained results so far
    return history

In [68]:
# Q: Train the model for 2 epochs using a cross-entropy loss and an Adam optimizer with a lr of 1e-4
# Prepare training settings
epochs = 2
lr_rate = 1e-4
loss_fn = nn.CrossEntropyLoss()
optim = torch.optim.Adam(model.parameters(), lr=lr_rate)

# Send model to GPU
model = model.to(device)

# Start training
history = train_validate(
    model=model,
    train_dl=train_dl,
    valid_dl=valid_dl,
    epochs=epochs,
    loss_fn=loss_fn,
    optim=optim,
)

Epoch [1/2] - 

KeyboardInterrupt: 

## Experiments:

1. Rerun training (restart kernel and run all cells) but this time, when loading the model in the first block of code, specify 'pretrained = True' in order to make use of the weights pretrained on Imagenet.
2. Rerun the code using the pretrained model but this time use a learning rate of 1e-3. What happens?
3. Rerun using the pretrained model and a lr of 1e-4 but this time only change the last layer in the model instead of the entire classifier.
3. Rerun the code using the pretrained model and a lr of 1e-4. This time, freeze the pretrained layers and only update the new layers for the first epochs. Afterwards, proceed to update the entire model. You can freeze parameters by specifying 'requires_grad = False'.
4. Rerun experiment 3 but gradually unfreeze layers instead of unfreezeing the entire model at once.

### Experiment 1

1. Rerun training (restart kernel and run all cells) but this time, when loading the model in the first block of code, specify 'pretrained = True' in order to make use of the weights 

In [None]:
from torchvision.models import AlexNet_Weights
from torchvision.transforms.v2 import Transform


# Use original transformations of AlexNet
weights = AlexNet_Weights.DEFAULT
preprocess: Transform = weights.transforms()

# Preprocess the dataset using those transforms
dataset = torchvision.datasets.Caltech101(
    './dataset',
    transform = Compose([
        ToImage(),
        Lambda(lambda x: x.repeat(3, 1, 1) if x.shape[0] != 3 else x),
        Lambda(lambda x: preprocess(x)),
    ])
)

# Redefine subsets & dataloaders
train_ds, val_ds, test_ds = random_split(dataset, [0.8, 0.1, 0.1], gen)
train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True, generator=gen)
valid_dl = DataLoader(val_ds, batch_size=batch_size, shuffle=True, generator=gen)
test_dl = DataLoader(test_ds, batch_size=batch_size, shuffle=True, generator=gen)

In [None]:
# Use pretrained model
model = torchvision.models.alexnet(weights=weights)

# Create a new classifier similar to AlexNet
model.classifier = torch.nn.Sequential(
    Dropout(p=0.5, inplace=False),
    Linear(in_features=9216, out_features=4096, bias=True),
    ReLU(inplace=True),
    Dropout(p=0.5, inplace=False),
    Linear(in_features=4096, out_features=4096, bias=True),
    ReLU(inplace=True),
    Linear(in_features=4096, out_features=101, bias=True)
)

# Prepare training settings
epochs = 2
lr_rate = 1e-4
loss_fn = nn.CrossEntropyLoss()
optim = torch.optim.Adam(model.parameters(), lr=lr_rate)

# Send model to GPU
model = model.to(device)

# Start training
history = train_validate(
    model=model,
    train_dl=train_dl,
    valid_dl=valid_dl,
    epochs=epochs,
    loss_fn=loss_fn,
    optim=optim,
)

Epoch [1/2] - t_loss: 1.841016, t_acc: 0.582685, v_loss: 0.844043, v_acc: 0.775346
Epoch [2/2] - t_loss: 0.528327, t_acc: 0.857678, v_loss: 0.602315, v_acc: 0.839862


### Experiment 2

In [None]:
# Use pretrained model
model = torchvision.models.alexnet(weights=weights)

# Create a new classifier similar to AlexNet
model.classifier = torch.nn.Sequential(
    Dropout(p=0.5, inplace=False),
    Linear(in_features=9216, out_features=4096, bias=True),
    ReLU(inplace=True),
    Dropout(p=0.5, inplace=False),
    Linear(in_features=4096, out_features=4096, bias=True),
    ReLU(inplace=True),
    Linear(in_features=4096, out_features=101, bias=True)
)

# Prepare training settings
epochs = 2
lr_rate = 1e-3
loss_fn = nn.CrossEntropyLoss()
optim = torch.optim.Adam(model.parameters(), lr=lr_rate)

# Send model to GPU
model = model.to(device)

# Start training
history = train_validate(
    model=model,
    train_dl=train_dl,
    valid_dl=valid_dl,
    epochs=epochs,
    loss_fn=loss_fn,
    optim=optim,
)

Epoch [1/2] - t_loss: 4.285237, t_acc: 0.085710, v_loss: 4.198857, v_acc: 0.092166
Epoch [2/2] - t_loss: 4.217043, t_acc: 0.086719, v_loss: 4.187852, v_acc: 0.100230
