## PyTorch Transfer Learning

* Taking parameters of what one model has learnt on another dataset and applying them to our own problem

* Pretrained models = foundation moddels




In [1]:
import torch
import torchvision

In [2]:
torchvision.__version__ # need 0.13+

'0.16.0+cu121'

In [3]:
torch.__version__

'2.1.0+cu121'

Import the code we have written in previous sections (especially the scripts) from GitHub + torchinfo

In [4]:
# Continue with regular imports
import matplotlib.pyplot as plt
import torch
import torchvision

from torch import nn
from torchvision import transforms

# Try to get torchinfo, install it if it doesn't work
try:
    from torchinfo import summary
except:
    print("[INFO] Couldn't find torchinfo... installing it.")
    !pip install -q torchinfo
    from torchinfo import summary

# Try to import the going_modular directory, download it from GitHub if it doesn't work
try:
    from going_modular.going_modular import data_setup, engine
except:
    # Get the going_modular scripts
    print("[INFO] Couldn't find going_modular scripts... downloading them from GitHub.")
    !git clone https://github.com/mrdbourke/pytorch-deep-learning
    !mv pytorch-deep-learning/going_modular .
    !rm -rf pytorch-deep-learning
    from going_modular.going_modular import data_setup, engine

[INFO] Couldn't find torchinfo... installing it.
[INFO] Couldn't find going_modular scripts... downloading them from GitHub.
Cloning into 'pytorch-deep-learning'...
remote: Enumerating objects: 4056, done.[K
remote: Counting objects: 100% (1244/1244), done.[K
remote: Compressing objects: 100% (239/239), done.[K
remote: Total 4056 (delta 1079), reused 1090 (delta 1002), pack-reused 2812[K
Receiving objects: 100% (4056/4056), 651.13 MiB | 34.02 MiB/s, done.
Resolving deltas: 100% (2372/2372), done.
Updating files: 100% (248/248), done.


In [5]:
# device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

# Get data of pizza, steak, sushi

In [6]:
import os
import zipfile

from pathlib import Path

import requests

# Set up data path
data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi" # images from a subset of classes from Food101 Dataset

# If image folder does not exist, download
if image_path.is_dir():
  print(f"{image_path} directory exists, skipping redownload...")
else:
  print(f"Did not find {image_path}, downloading it...")
  image_path.mkdir(parents=True, exist_ok=True)

  # Download data
  with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
    request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip")
    print("Downloading pizza, steak, sushi data...")
    f.write(request.content)

  # Unzip data
  with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:
    print("Unzipping pizza, steak, sushi data...")
    zip_ref.extractall(image_path)

  # Remove .zip file
  os.remove(data_path / "pizza_steak_sushi.zip")

Did not find data/pizza_steak_sushi, downloading it...
Downloading pizza, steak, sushi data...
Unzipping pizza, steak, sushi data...


In [7]:
# Set up directory path
train_dir = image_path / "train"
test_dir = image_path / "test"

train_dir, test_dir

(PosixPath('data/pizza_steak_sushi/train'),
 PosixPath('data/pizza_steak_sushi/test'))

## Create Datasets and DataLoaders

Now we've got some data, want to turn it into PyTorch DataLoaders.

We can use `data_setup.py` and `create_dataloaders()` we made in going_modular section.

We have to think about how to **transform** the data.

With `torchvision` 0.13+ we can:
1. Manually created transforms - you define what transforms you want your data to go through

2. Automatically created transforms - transform for your data is defined by the model you would like to use

When using a pretrained model, its important that the data (including your custom data) that you pass through it is **transformed** in the same way the data the model was trained on else performance degredation

## Creating a transform for `torchvision.models` manually

`torchvision.models` contain the pretrained models

In [8]:
from torchvision import transforms
# From the documentation (older release) of torchvision.models
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

manual_transforms = transforms.Compose([
    transforms.Resize((224, 224)), # resize image to 224 x 224
    transforms.ToTensor(),
    normalize # images have the same distribution as ImageNet where the pretrained model has trained
])


In [9]:
from going_modular.going_modular import data_setup
train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir,
                                                                               test_dir=test_dir,
                                                                               transform=manual_transforms,
                                                                               batch_size=32)

train_dataloader, test_dataloader, class_names

(<torch.utils.data.dataloader.DataLoader at 0x7eab17cdce20>,
 <torch.utils.data.dataloader.DataLoader at 0x7eab17cdcb80>,
 ['pizza', 'steak', 'sushi'])

## Auto creation of transform for `torchvision.models`

As of the current version of torchvision, there is now support for automatic data transform creation based on the pretrained model we are using

In [10]:
# Get a set of pretrained model weights
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # DEFAULT refers to the best performing weight here default refers ImageNet1k (model was trained on it)
weights

EfficientNet_B0_Weights.IMAGENET1K_V1

In [11]:
# Get the transforms used to create our pretrained weights

auto_transforms = weights.transforms()
auto_transforms

ImageClassification(
    crop_size=[224]
    resize_size=[256]
    mean=[0.485, 0.456, 0.406]
    std=[0.229, 0.224, 0.225]
    interpolation=InterpolationMode.BICUBIC
)

In [12]:
# Create dataloaders using automatic transforms
train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir,
                                                                               test_dir=test_dir,
                                                                               transform=auto_transforms,
                                                                               batch_size=32)

train_dataloader, test_dataloader, class_names

(<torch.utils.data.dataloader.DataLoader at 0x7eab17cdfbe0>,
 <torch.utils.data.dataloader.DataLoader at 0x7eab17cddd50>,
 ['pizza', 'steak', 'sushi'])

## Where to get pretrained models
1. PyTorch domain libraries
2. Libraries like `timm`
3. HuggingFace Hub
4. Paperswithcode

## Which pretrained model should you use?

Need to experiment with various models

* Take a well-performing model from a problem space similar to
your own

Three things to consider:
1. Speed - how fast does it run
2. Size - how big is the model
3. Performance - how well does it go on your chosen problem

Where does the model live?

Is it on device? (like a self-driving car)

Or does it live on a server?

E.g. for FoodVisionMini we need small size to deploy mobile phone (using computing power of a phone as well) but yet good performance (EffNetB0 has high accuracy with low parameters) --> performance vs size

If we had infinite compute, we will choose biggest model + params + accuracy etc

## Setting up a pretrained model

Want to create an instance of EffNetB0

In [13]:
# Creating a pretrained model (torchvision v0.13+)

# Bug in current torchvision version in google colab
# fix
from torchvision.models._api import WeightsEnum
from torch.hub import load_state_dict_from_url

def get_state_dict(self, *args, **kwargs):
    kwargs.pop("check_hash")
    return load_state_dict_from_url(self.url, *args, **kwargs)
WeightsEnum.get_state_dict = get_state_dict
# endfix

# Creating a pretrained model
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # .DEFAULT = best available weights
model = torchvision.models.efficientnet_b0(weights=weights).to(device)

model

Downloading: "https://download.pytorch.org/models/efficientnet_b0_rwightman-3dd342df.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_b0_rwightman-3dd342df.pth
100%|██████████| 20.5M/20.5M [00:00<00:00, 69.9MB/s]


EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

In [14]:
model.classifier # 1000 outfeatures as it was trained on the ImageNet (1000 classes)

Sequential(
  (0): Dropout(p=0.2, inplace=True)
  (1): Linear(in_features=1280, out_features=1000, bias=True)
)

Model has:
1. Features (extract features from images)
2. Avgpool (turns features into feature vector by taking the average)
3. Classifier (turns feature vector into prediction logits and the out_features can be adjusted to the number of classes you have)

Feature extraction is this (simply adjusting the last line from the model backbone) with the feature extraction layers frozen, only the input dataset and the output shape changes.

You can do fine-tuning of the feature extraction model if you have lots of data where you start to adjust the individual feature extraction layers

Start with foundation model (pre-trained), feature extraction and then fine-tuning


## Getting a summary of our model with `torchinfo.summary()`

In [15]:
# print a summary with torchinfo
from torchinfo import summary

summary(model=model,
        input_size=(1, 3, 224, 224), # example of batch_size, colour_channels, height, width
        col_names=["input_size", "output_size", "num_params", "trainable"],
        col_width=20,
        row_settings=["var_names"])

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [1, 3, 224, 224]     [1, 1000]            --                   True
├─Sequential (features)                                      [1, 3, 224, 224]     [1, 1280, 7, 7]      --                   True
│    └─Conv2dNormActivation (0)                              [1, 3, 224, 224]     [1, 32, 112, 112]    --                   True
│    │    └─Conv2d (0)                                       [1, 3, 224, 224]     [1, 32, 112, 112]    864                  True
│    │    └─BatchNorm2d (1)                                  [1, 32, 112, 112]    [1, 32, 112, 112]    64                   True
│    │    └─SiLU (2)                                         [1, 32, 112, 112]    [1, 32, 112, 112]    --                   --
│    └─Sequential (1)                                        [1, 32, 112, 112]    [1, 16, 112,

## Freezing the base model and changing the output layer to suit our needs



In [16]:
model.features

Sequential(
  (0): Conv2dNormActivation(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): SiLU(inplace=True)
  )
  (1): Sequential(
    (0): MBConv(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
          (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): SiLU(inplace=True)
        )
        (1): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
          (activation): SiLU(inplace=True)
          (scale_activation): Sigmoid()
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), 

In [17]:
# Freeeze all the base layers in EffNetB0
for param in model.features.parameters():
  # print(param)
  param.requires_grad = False # Freeze all the parameters

In [18]:
# Update the classifier head of the model to 3 classes
from torch import nn
torch.manual_seed(42)
torch.cuda.manual_seed(42)

model.classifier = nn.Sequential(
    nn.Dropout(p=0.2, inplace=True), # dropout could turn off some neurons halfway through the neural network, so the rest can learn more generalisable patterns
    nn.Linear(in_features=1280, # 20% of these feature vectors will be dropped
              out_features=len(class_names), bias=True) # change the out_features
).to(device)

In [19]:
summary(model=model,
        input_size=(1, 3, 224, 224), # example of batch_size, colour_channels, height, width
        col_names=["input_size", "output_size", "num_params", "trainable"],
        col_width=20,
        row_settings=["var_names"])

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [1, 3, 224, 224]     [1, 3]               --                   Partial
├─Sequential (features)                                      [1, 3, 224, 224]     [1, 1280, 7, 7]      --                   False
│    └─Conv2dNormActivation (0)                              [1, 3, 224, 224]     [1, 32, 112, 112]    --                   False
│    │    └─Conv2d (0)                                       [1, 3, 224, 224]     [1, 32, 112, 112]    (864)                False
│    │    └─BatchNorm2d (1)                                  [1, 32, 112, 112]    [1, 32, 112, 112]    (64)                 False
│    │    └─SiLU (2)                                         [1, 32, 112, 112]    [1, 32, 112, 112]    --                   --
│    └─Sequential (1)                                        [1, 32, 112, 112]    [1, 1

## Train model


In [20]:
# Loss fn and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model.parameters(),
                             lr=0.001)

In [None]:
# Import train function
from going_modular.going_modular import engine

# Set seeds
torch.manual_seed(42)
torch.cuda.manual_seed(42)

# Start timer
from timeit import default_timer as timer
start_time = timer()

# Setup training and save results
results = engine.train(model=model,
                       train_dataloader=train_dataloader,
                       test_dataloader=test_dataloader,
                       optimizer=optimizer,
                       loss_fn=loss_fn,
                       epochs=5,
                       device=device)

# End timer and print time
end_time = timer()
print(f"[INFO] Total training time: {end_time-start_time:.3f} seconds")

  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0924 | train_acc: 0.3984 | test_loss: 0.9133 | test_acc: 0.5398


In [None]:
device # will have lower training time if cuda

In [None]:
results

In [None]:
## Evaluate model by plotting loss curves
try:
  from helper_functions import plot_loss_curves
except:
  print(f"[INFO] Couldn't find helper_functions.py, downloading...")
  with open("helper_functions.py", "wb") as f:
    import requests
    request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py")
    f.write(request.content)
    from helper_functions import plot_loss_curves

# Plot loss curves of model
plot_loss_curves(results)

## Make predictions on images from test set

Ensure that the test/custom data is:
* Same shape - images need to be same shape as model was trained on
* Same datatype - custom data should be in the same datatype
* Same device
* Same transform - if transformed the custom data, ideally will transform the test data and custom data the same

To do all this automatically, lets creata a function called `pred_and_plot_image`

1. Take in a trained model, a list of class names, a filepath to a target image, an image size, a transform and target device

2. Open the image with `PIL.Image.Open()`

3. Create a transform if one doesn't exist

4. Make sure the model is on the target device

5. Turn model to `model.eval()` mode to get ready for inference (will turn off nn.Dropout etc)

6. Transform the target image and make sure its dimensionality is suited for the model (unsqueeze for batch size)

7. Make a prediction by passing image to model

8. Convert logits to pred probs using torch.softmax

9. Convert pred probs to pred labels using torch.argmax

10. Plot image with `matplotlib` and set title to pred label and probability

In [None]:
from typing import List, Tuple

from PIL import Image

from torchvision import transforms

import matplotlib.pyplot as plt

# Take in a trained model
def pred_and_plot_image(model: torch.nn.Module,
                        image_path: str,
                        class_names: List[str],
                        image_size: Tuple[int, int] = (224, 224),
                        transform: torchvision.transforms = None,
                        device: torch.device=device):

  # Open the image with PIL
  img = Image.open(image_path)

  # Create transform if does not exist
  if transform:
    image_transform = transform
  else:
    image_transform = transforms.Compose([
        transforms.Resize((image_size)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
    ])

  ### Predict on image ###
  # Model on target device
  model.to(device)

  # Inference mode and eval()
  model.eval()
  with torch.inference_mode():
    # Transform the image and add extra batch dimension
    transformed_image = image_transform(img).unsqueeze(dim=0) # [batch_size, colour_channels, height, width]

    # Make a prediction on transformed image and ensure image on target device
    target_image_pred = model(transformed_image.to(device))

    # Pred logits to pred probs
    target_image_pred_probs = torch.softmax(target_image_pred, dim=1)

    # Convert probs to pred labels
    target_image_pred_label = torch.argmax(target_image_pred_probs, dim=1)

    # Plot the image (pass from PIL)
    plt.figure()
    plt.imshow(img)
    plt.title(f"Pred: {class_names[target_image_pred_label]} | Prob: {target_image_pred_probs.max().item():.3f}")
    plt.axis(False);


In [None]:
class_names

In [None]:
# Get a random list of image paths from the test set
import random
num_images_to_plot = 3
test_image_path_list = list(Path(test_dir).glob("*/*.jpg"))
test_image_path_sample = random.sample(population=test_image_path_list,
                                       k=num_images_to_plot)

# Make predictions on and plot the images
for image_path in test_image_path_sample:
  pred_and_plot_image(model=model,
                      image_path=image_path,
                      class_names=class_names,
                      image_size=(224,224))

## Making predicions on a custom image

In [None]:
# Download image
import requests

# Setup custom image path
custom_image_path = data_path / "04-pizza-dad.jpeg"

# Download image if needed
if not custom_image_path.is_file(): # check for file
  with open(custom_image_path, "wb") as f:
    request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/blob/main/data/04-pizza-dad.jpeg?raw=true")
    print(f"Download {custom_image_path}...")
    f.write(request.content)

else:
  print(f"{custom_image_path} already exists, skipping download...")

In [None]:
# Predict on custom image
pred_and_plot_image(model=model,
                    image_path=custom_image_path,
                    class_names=class_names)