## PyTorch Experiment Tracking

Machine Learning is very experimental. 

In order to figure out which experiment worth pursuing, that's where **experiment tracking** comes in, it helps you to figure out what doesn't work so you can figure out what does work.

In this notebook, we're going to see an example of programmatically tracking experiments.

In [1]:
!nvidia-smi

Thu May  9 20:10:15 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 537.79                 Driver Version: 537.79       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA T1200 Laptop GPU      WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   64C    P8               3W /  35W |    145MiB /  4096MiB |     19%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [2]:
import torch 
import torchvision 

print(torch.__version__)
print(torchvision.__version__)

2.3.0+cu118
0.18.0+cu118


In [3]:
import matplotlib.pyplot as plt
import os 
import numpy as np
import pandas as pd

from torch import nn 
from torchvision import transforms

In [4]:
# need to install torchinfo module 
try: 
    import torchinfo 
except: 
    print("[INFO] we don't have torchinfo, installing it....")
    !pip install torchinfo

In [5]:
# internal module  (need to clone them from github)
try: 
    from going_modular.going_modular import data_setup, engine 
except:
    print("[INFO] couldn't find the going_modular scripts, cloning them from github....")
    !git clone https://github.com/mrdbourke/pytorch-deep-learning
    !mv 
    !rm -rf pytorch-deep-learning
    from going_modular.going_modular import data_setup, engine

In [6]:
# !git clone https://github.com/mrdbourke/pytorch-deep-learning.git

# !move pytorch-deep-learning/going_modular .

In [7]:
# Setting device agnostic code 
device = "cuda" if torch.cuda.is_available() else "cpu"

print(f"Target Device: {device}")

Target Device: cuda


In [8]:
# Set Seeds 
def set_seeds(seed: int = 42):
    # Set the seed for general torch operations
    torch.manual_seed(seed)
    # Set the seed for CUDA torch operations
    torch.cuda.manual_seed(seed)

In [9]:
set_seeds(42)

### 1. Getting Dataset


In [10]:
import os 
import zipfile 

from pathlib import Path

import requests


def download_data(source_url: str, 
                  destination_path: str,
                  remove_zipfile: bool = True) -> Path:
    """ 
    Downloads a zipped dataset from source URL and unzips it to destination path. 
    Removes the zip file after extraction.

    Args: 
        source_url (str): URL of the zipped dataset
        destination_path (str): Path to extract the dataset
        remove_zipfile (bool): Flag to remove the zip file after extraction (default is True)

    Returns:
        Path: Path to the extracted dataset
    """
    # Setup path to data folder and image folder (destination)
    data_path = Path('dataset/')
    image_path = data_path / Path(destination_path)

    # Create destination directory if it doesn't exist
    if image_path.is_dir():
        print(f"[INFO] {image_path} directory already exists....")
    else: 
        print(f"[INFO] {image_path} directory doesn't exist, creating one....")
        image_path.mkdir(parents=True, exist_ok=True)

    # Download the dataset from source
    with open(data_path / 'dataset.zip', 'wb') as f: 
        request = requests.get(source_url)
        print(f"[INFO] Downloading dataset from {source_url}....")
        f.write(request.content)

    # Unzip the dataset
    with zipfile.ZipFile(data_path / 'dataset.zip', 'r') as zip_ref:
        print(f"[INFO] Extracting dataset to {image_path}....")
        zip_ref.extractall(image_path)
    
    # Remove the zip file
    if remove_zipfile: 
        print(f"[INFO] Removing the dataset zip file....")
        os.remove(data_path / 'dataset.zip')

    return image_path

In [11]:
# download_data(source_url="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip", 
#               destination_path="pizza_steak_sushi", 
#               remove_zipfile=True)

In [12]:
image_path = Path('dataset/pizza_steak_sushi')

In [13]:
image_path

WindowsPath('dataset/pizza_steak_sushi')

### 2. Create Dataset and Dataloaders

We can create transforms manually or automatically (latest) 

Here I will create transforms automatically! 

* The goal with transforms is to ensure your custom data is formatted in a reproducible way as well as a way that will suit pre-trained models.

In [14]:
# Setup directories 
train_dir = image_path / "train"
test_dir = image_path / "test"

train_dir, test_dir

(WindowsPath('dataset/pizza_steak_sushi/train'),
 WindowsPath('dataset/pizza_steak_sushi/test'))

In [15]:
# Creating a data loader using Automatic transform

# Setup pretrained weights (plenty of these weights are available in torchvision)
import torchvision

weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # "DEFAULT" = best available weights

# Get the transforms from weights (these transforms are used while training the model)
efficientnet_b0_transforms = weights.transforms()

In [16]:
efficientnet_b0_transforms

ImageClassification(
    crop_size=[224]
    resize_size=[256]
    mean=[0.485, 0.456, 0.406]
    std=[0.229, 0.224, 0.225]
    interpolation=InterpolationMode.BICUBIC
)

In [17]:
from going_modular.going_modular import data_setup

# create Dataloaders
train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir,
                                                                  test_dir=test_dir,
                                                                  transform=efficientnet_b0_transforms,
                                                                  batch_size=32)

train_dataloader, test_dataloader, class_names

(<torch.utils.data.dataloader.DataLoader at 0x1bb4157a920>,
 <torch.utils.data.dataloader.DataLoader at 0x1bb4157b880>,
 ['pizza', 'steak', 'sushi'])

### 3. Getting a pretrained model, freeze the base layers and add a classifier head (for our task)

In [18]:
from torchvision.models import efficientnet_b0, EfficientNet_B0_Weights

# Download the pretrained weights for EfficientNetB0
effnet_b0_weights = EfficientNet_B0_Weights.DEFAULT # "DEFAULT" = best available weights

# Create the model and send it to device (cuda)
model = efficientnet_b0(weights=effnet_b0_weights).to(device)

In [19]:
model

EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

In [20]:
# base layers (feature extractor)
model.features

Sequential(
  (0): Conv2dNormActivation(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): SiLU(inplace=True)
  )
  (1): Sequential(
    (0): MBConv(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
          (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): SiLU(inplace=True)
        )
        (1): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
          (activation): SiLU(inplace=True)
          (scale_activation): Sigmoid()
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), 

In [21]:
# average pooling layer
model.avgpool

AdaptiveAvgPool2d(output_size=1)

In [22]:
# classifier layer 
model.classifier

Sequential(
  (0): Dropout(p=0.2, inplace=True)
  (1): Linear(in_features=1280, out_features=1000, bias=True)
)

In [23]:
from torchinfo import summary

summary(
    model=model,
    input_size=(1, 3, 224, 224),
    col_names=["input_size", "output_size", "num_params", "trainable"]
)

Layer (type:depth-idx)                                  Input Shape               Output Shape              Param #                   Trainable
EfficientNet                                            [1, 3, 224, 224]          [1, 1000]                 --                        True
├─Sequential: 1-1                                       [1, 3, 224, 224]          [1, 1280, 7, 7]           --                        True
│    └─Conv2dNormActivation: 2-1                        [1, 3, 224, 224]          [1, 32, 112, 112]         --                        True
│    │    └─Conv2d: 3-1                                 [1, 3, 224, 224]          [1, 32, 112, 112]         864                       True
│    │    └─BatchNorm2d: 3-2                            [1, 32, 112, 112]         [1, 32, 112, 112]         64                        True
│    │    └─SiLU: 3-3                                   [1, 32, 112, 112]         [1, 32, 112, 112]         --                        --
│    └─Sequential: 2-2  

In [24]:
# Freeze the base layers (feature extractor) by setting their requires_grad attribute to False
for param in model.features.parameters():
    param.requires_grad = False

In [25]:
summary(
    model=model,
    input_size=(1, 3, 224, 224),
    col_names=["input_size", "output_size", "num_params", "trainable"]
)

Layer (type:depth-idx)                                  Input Shape               Output Shape              Param #                   Trainable
EfficientNet                                            [1, 3, 224, 224]          [1, 1000]                 --                        Partial
├─Sequential: 1-1                                       [1, 3, 224, 224]          [1, 1280, 7, 7]           --                        False
│    └─Conv2dNormActivation: 2-1                        [1, 3, 224, 224]          [1, 32, 112, 112]         --                        False
│    │    └─Conv2d: 3-1                                 [1, 3, 224, 224]          [1, 32, 112, 112]         (864)                     False
│    │    └─BatchNorm2d: 3-2                            [1, 32, 112, 112]         [1, 32, 112, 112]         (64)                      False
│    │    └─SiLU: 3-3                                   [1, 32, 112, 112]         [1, 32, 112, 112]         --                        --
│    └─Sequential

In [26]:
model.classifier

Sequential(
  (0): Dropout(p=0.2, inplace=True)
  (1): Linear(in_features=1280, out_features=1000, bias=True)
)

In [27]:
len(class_names)

3

In [28]:
# Added a new classifier layer to the model according to our task 
# our task is to classify between pizza, steak and sushi (3 classes)

# to maintain reproducibility, we need to set the seeds 
set_seeds(42)

model.classifier = nn.Sequential(
        nn.Dropout(p=0.2, inplace=True),
        nn.Linear(in_features=1280, out_features=len(class_names), bias=True)
).to(device)

In [29]:
summary(
    model=model,
    input_size=(32, 3, 224, 224), # [batch_size, color_channels, height, width]
    col_names=["input_size", "output_size", "num_params", "trainable"],
    verbose=0,
    col_width=20,
    row_settings=["var_names"]
)

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [32, 3, 224, 224]    [32, 3]              --                   Partial
├─Sequential (features)                                      [32, 3, 224, 224]    [32, 1280, 7, 7]     --                   False
│    └─Conv2dNormActivation (0)                              [32, 3, 224, 224]    [32, 32, 112, 112]   --                   False
│    │    └─Conv2d (0)                                       [32, 3, 224, 224]    [32, 32, 112, 112]   (864)                False
│    │    └─BatchNorm2d (1)                                  [32, 32, 112, 112]   [32, 32, 112, 112]   (64)                 False
│    │    └─SiLU (2)                                         [32, 32, 112, 112]   [32, 32, 112, 112]   --                   --
│    └─Sequential (1)                                        [32, 32, 112, 112]   [32, 

Now we only have 3,843 Trainable parameters.

## 4. Train a single model and track results

In [30]:
# Define loss function and optimizer 
loss_fn = nn.CrossEntropyLoss() 
optimizer = torch.optim.Adam(params=model.parameters(), 
                             lr=0.001)

To track experiments, we're going to use TensorBoard

In [31]:
# Setup TensorBoard for logging training results
from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter()

In [32]:
from going_modular.going_modular.engine import train_step, test_step

from tqdm.auto import tqdm
from typing import Dict, List, Tuple

def train(model: torch.nn.Module, 
          train_dataloader: torch.utils.data.DataLoader, 
          test_dataloader: torch.utils.data.DataLoader, 
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module,
          epochs: int,
          device: torch.device) -> Dict[str, List]:
    """Trains and tests a PyTorch model.

    Passes a target PyTorch models through train_step() and test_step()
    functions for a number of epochs, training and testing the model
    in the same epoch loop.

    Calculates, prints and stores evaluation metrics throughout.

    Args:
    model: A PyTorch model to be trained and tested.
    train_dataloader: A DataLoader instance for the model to be trained on.
    test_dataloader: A DataLoader instance for the model to be tested on.
    optimizer: A PyTorch optimizer to help minimize the loss function.
    loss_fn: A PyTorch loss function to calculate loss on both datasets.
    epochs: An integer indicating how many epochs to train for.
    device: A target device to compute on (e.g. "cuda" or "cpu").

    Returns:
    A dictionary of training and testing loss as well as training and
    testing accuracy metrics. Each metric has a value in a list for 
    each epoch.
    In the form: {train_loss: [...],
              train_acc: [...],
              test_loss: [...],
              test_acc: [...]} 
    For example if training for epochs=2: 
             {train_loss: [2.0616, 1.0537],
              train_acc: [0.3945, 0.3945],
              test_loss: [1.2641, 1.5706],
              test_acc: [0.3400, 0.2973]} 
    """
    # Create empty results dictionary
    results = {"train_loss": [],
               "train_acc": [],
               "test_loss": [],
               "test_acc": []
    }
    
    # Make sure model on target device
    model.to(device)

    # Loop through training and testing steps for a number of epochs
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
                                          dataloader=train_dataloader,
                                          loss_fn=loss_fn,
                                          optimizer=optimizer,
                                          device=device)
        # model evaluation
        test_loss, test_acc = test_step(model=model,
          dataloader=test_dataloader,
          loss_fn=loss_fn,
          device=device)

        # Print out what's happening
        print(
          f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.4f} | "
          f"train_acc: {train_acc:.4f} | "
          f"test_loss: {test_loss:.4f} | "
          f"test_acc: {test_acc:.4f}"
        )

        # Update results dictionary
        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)

        ## New: Experiment Tracking
        writer.add_scalars(main_tag="Loss", 
                           tag_scalar_dict={"train_loss": train_loss,
                                            "test_loss": test_loss},
                           global_step=epoch)

        writer.add_scalars(main_tag="Accuracy",
                           tag_scalar_dict={"train_acc": train_acc,
                                            "test_acc": test_acc},
                           global_step=epoch)

        writer.add_graph(model=model, 
                         input_to_model=torch.randn(32, 3, 224, 224).to(device))

        # closing writer 
        writer.close()

    # Return the filled results at the end of the epochs
    return results

In [33]:
# Train model 

# setting random seed
set_seeds()

# # saving results
# results = train(model=model,
#       train_dataloader=train_dataloader,
#       test_dataloader=test_dataloader,
#       optimizer=optimizer,
#       loss_fn=loss_fn,
#       epochs=5,
#       device=device)

In [34]:
 from torch.utils.tensorboard import SummaryWriter
 

### Lauching Tensorboard in Notebook

In [35]:
# %load_ext tensorboard

# %tensorboard --logdir runs

### 6. Create a function to prepare a `SummaryWriter()` instance 

By default our `SummaryWriter()` class saves to `log_dir`. 

How about if we wanted to save different experiments to different folders? 

In essence, one experiment = one folder.

For example, we'd like to track: 

* Experiment date/timestamp
* Experiment name 
* Model name 
* Extra - is there anything else that should be tracked ?


Let's create a function to create a `SummaryWriter()` instance to take all of these things into account. 

So ideally we end up tracking experiments to a directory: 

`runs/YYYY-MM-DD/experiment_name/model_name/extra`


In [36]:
from datetime import datetime

datetime.now()

datetime.now().strftime('%Y-%m-%d')

'2024-05-09'

In [37]:
from datetime import datetime
from torch.utils.tensorboard import SummaryWriter
import os


def create_writer(experiment_name: str,
                  model_name: str,
                   extra: str = None): 
    """Creates a torch.utils.tensorboard.writer.SummaryWriter instance tracking to a specific directory."""

    # Get timestamp of current date in reverse order
    time_stamp = datetime.now().strftime("%Y-%m-%d")

    if extra: 
        # create log directory 
        log_dir = os.path.join("runs", time_stamp, experiment_name, model_name, extra)
    else: 
        log_dir = os.path.join("runs", time_stamp, experiment_name, model_name)

    print(f"[INFO] Created SummaryWriter saving to {log_dir}")
    return SummaryWriter(log_dir=log_dir)

In [38]:
example_writer = create_writer(
    experiment_name="data_10_percent",
    model_name="efficientnet_b0",
    extra="5_epochs"
)

example_writer

[INFO] Created SummaryWriter saving to runs\2024-05-09\data_10_percent\efficientnet_b0\5_epochs


<torch.utils.tensorboard.writer.SummaryWriter at 0x1bba3e3bcd0>

#### 6.1 update the `train()` to include a `writer` parameter to track our experiments 

In [39]:
from going_modular.going_modular.engine import train_step, test_step
from torch.utils.tensorboard import SummaryWriter
from tqdm.auto import tqdm
from typing import Dict, List, Tuple

def train(model: torch.nn.Module, 
          train_dataloader: torch.utils.data.DataLoader, 
          test_dataloader: torch.utils.data.DataLoader, 
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module,
          epochs: int,
          device: torch.device,
          writer: SummaryWriter = None ) -> Dict[str, List]:
    """Trains and tests a PyTorch model.

    Passes a target PyTorch models through train_step() and test_step()
    functions for a number of epochs, training and testing the model
    in the same epoch loop.

    Calculates, prints and stores evaluation metrics throughout.

    Args:
    model: A PyTorch model to be trained and tested.
    train_dataloader: A DataLoader instance for the model to be trained on.
    test_dataloader: A DataLoader instance for the model to be tested on.
    optimizer: A PyTorch optimizer to help minimize the loss function.
    loss_fn: A PyTorch loss function to calculate loss on both datasets.
    epochs: An integer indicating how many epochs to train for.
    device: A target device to compute on (e.g. "cuda" or "cpu").

    Returns:
    A dictionary of training and testing loss as well as training and
    testing accuracy metrics. Each metric has a value in a list for 
    each epoch.
    In the form: {train_loss: [...],
              train_acc: [...],
              test_loss: [...],
              test_acc: [...]} 
    For example if training for epochs=2: 
             {train_loss: [2.0616, 1.0537],
              train_acc: [0.3945, 0.3945],
              test_loss: [1.2641, 1.5706],
              test_acc: [0.3400, 0.2973]} 
    """
    # Create empty results dictionary
    results = {"train_loss": [],
               "train_acc": [],
               "test_loss": [],
               "test_acc": []
    }
    
    # Make sure model on target device
    model.to(device)

    # Loop through training and testing steps for a number of epochs
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
                                          dataloader=train_dataloader,
                                          loss_fn=loss_fn,
                                          optimizer=optimizer,
                                          device=device)
        # model evaluation
        test_loss, test_acc = test_step(model=model,
          dataloader=test_dataloader,
          loss_fn=loss_fn,
          device=device)

        # Print out what's happening
        print(
          f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.4f} | "
          f"train_acc: {train_acc:.4f} | "
          f"test_loss: {test_loss:.4f} | "
          f"test_acc: {test_acc:.4f}"
        )

        # Update results dictionary
        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)

        if writer:
            ## Experiment Tracking
            writer.add_scalars(main_tag="Loss", 
                            tag_scalar_dict={"train_loss": train_loss,
                                                "test_loss": test_loss},
                            global_step=epoch)

            writer.add_scalars(main_tag="Accuracy",
                            tag_scalar_dict={"train_acc": train_acc,
                                                "test_acc": test_acc},
                            global_step=epoch)

            writer.add_graph(model=model, 
                            input_to_model=torch.randn(32, 3, 224, 224).to(device))

            # closing writer 
            writer.close()

    # Return the filled results at the end of the epochs
    return results

In [40]:
# creating a writer 
trail_1_writer = create_writer(
    experiment_name="trail_1", 
    model_name="effinet_b0",
    extra="3_epochs"
)

[INFO] Created SummaryWriter saving to runs\2024-05-09\trail_1\effinet_b0\3_epochs


In [41]:
train_results = train(model=model,
                     train_dataloader=train_dataloader,
                     test_dataloader=test_dataloader,
                     optimizer=optimizer,
                     loss_fn=loss_fn,
                     epochs=3,
                     device=device,
                     writer=trail_1_writer)

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0916 | train_acc: 0.3828 | test_loss: 0.9098 | test_acc: 0.5909
Epoch: 2 | train_loss: 0.8992 | train_acc: 0.6445 | test_loss: 0.7881 | test_acc: 0.8561
Epoch: 3 | train_loss: 0.8069 | train_acc: 0.7422 | test_loss: 0.6774 | test_acc: 0.8864


### 7. Setting up a series of modelling experiments

The number of machine learning experiments you can run, is like the number of different models you can build... almost limitless.

However, you can't test everything...

So what should you test?

* Change the number of epochs
* Change the number of hidden layers/units
* Change the amount of data (right now we're using 10% of the Food101 dataset for pizza, steak, sushi)
* Change the learning rate
* Try different kinds of data augmentation
* Choose a different model architecture


This is why transfer learning is so powerful, because, it's a working model that you can apply to your own problem.

#### 7.2 What experiments are we going to run ? 

We're going to turn three dials: 

1. **Model Size**: EffNetB0 vs EffNetB2 (in terms of number of parameters).
2. **Dataset Size**: 10% of dataset vs 20% dataset (generally more data = better results). 
3. **Training Time**: 5 spochs vs 10 epochs (generally longer training time = better results, up to a point).

To begin, we're still keeping things relatively small so that our experiments run quickly.

**Our goal:** a model that is well performing but still small enough to run on a mobile device or web browser, so FoodVision Mini can come to life.

If you had infinite compute + time, you should basically always choose the biggest model and biggest dataset you can. See: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

Experiments that we are going to perform: 

| Experiment number | Training Dataset                | Model (pretrained on ImageNet) | Number of epochs |
| :---------------- | :------------------------------ | :----------------------------- | :--------------- |
| 1                 | Pizza, Steak, Sushi 10% percent | EfficientNetB0                 | 5                |
| 2                 | Pizza, Steak, Sushi 10% percent | EfficientNetB2                 | 5                |
| 3                 | Pizza, Steak, Sushi 10% percent | EfficientNetB0                 | 10               |
| 4                 | Pizza, Steak, Sushi 10% percent | EfficientNetB2                 | 10               |
| 5                 | Pizza, Steak, Sushi 20% percent | EfficientNetB0                 | 5                |
| 6                 | Pizza, Steak, Sushi 20% percent | EfficientNetB2                 | 5                |
| 7                 | Pizza, Steak, Sushi 20% percent | EfficientNetB0                 | 10               |
| 8                 | Pizza, Steak, Sushi 20% percent | EfficientNetB2                 | 10               |


#### 7.3 Download different datasets (10% vs 20%)

We want to two datasets: 
1. Pizza-Steak-Sushi (10% percent)
2. Pizza-Steak-Sushi (20% percent)

In [42]:
# Downlod 10 percent and 20 percent data
data_10_percent_path = download_data(source_url="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip", 
                                     destination_path="pizza_steak_sushi_10_percent")

data_20_percent_path = download_data(source_url="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi_20_percent.zip",
                                     destination_path="pizza_steak_sushi_20_percent")

[INFO] dataset\pizza_steak_sushi_10_percent directory already exists....
[INFO] Downloading dataset from https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip....
[INFO] Extracting dataset to dataset\pizza_steak_sushi_10_percent....
[INFO] Removing the dataset zip file....
[INFO] dataset\pizza_steak_sushi_20_percent directory already exists....
[INFO] Downloading dataset from https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi_20_percent.zip....
[INFO] Extracting dataset to dataset\pizza_steak_sushi_20_percent....
[INFO] Removing the dataset zip file....


#### 7.4 Transform Dataset and Create Dataloaders

We need to transform our data in a few ways: 

1. Resize the image to (224, 224).
2. Make sure image tensor values are between [0,1]
3. Normalize the images so they have the same data distribution as ImageNet.

In [43]:
# Setup training directories

train_dir_10_percent = data_10_percent_path / "train"
train_dir_20_percent = data_20_percent_path / "train"

# Setup testing directories (only one is enough)
test_dir = data_10_percent_path / "test"

train_dir_10_percent, train_dir_20_percent, test_dir

(WindowsPath('dataset/pizza_steak_sushi_10_percent/train'),
 WindowsPath('dataset/pizza_steak_sushi_20_percent/train'),
 WindowsPath('dataset/pizza_steak_sushi_10_percent/test'))

In [44]:
# extracting transform from EfficientNetB0
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT

EffNetB0_transfroms = weights.transforms()

EffNetB0_transfroms

ImageClassification(
    crop_size=[224]
    resize_size=[256]
    mean=[0.485, 0.456, 0.406]
    std=[0.229, 0.224, 0.225]
    interpolation=InterpolationMode.BICUBIC
)

In [45]:
# extracting transform from EfficientNetB2 
weights = torchvision.models.EfficientNet_B2_Weights.DEFAULT

EffNetB2_transfroms = weights.transforms()

EffNetB2_transfroms

ImageClassification(
    crop_size=[288]
    resize_size=[288]
    mean=[0.485, 0.456, 0.406]
    std=[0.229, 0.224, 0.225]
    interpolation=InterpolationMode.BICUBIC
)

In [46]:
# Creating dataloaders for 10 percent and 20 percent data

BATCH_SIZE = 32

# creating dataloaders for 10 percent data
train_dataloader_10_percent, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir_10_percent,
                                                                                          test_dir=test_dir,
                                                                                          transform=EffNetB0_transfroms,
                                                                                            batch_size=BATCH_SIZE)


# creating dataloaders for 10 percent data
train_dataloader_20_percent, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir_20_percent,
                                                                                          test_dir=test_dir,
                                                                                          transform=EffNetB0_transfroms,
                                                                                            batch_size=BATCH_SIZE)


print(f'Number of batches: {BATCH_SIZE} in 10% train dataset: {len(train_dataloader_10_percent)}')
print(f'Number of batches: {BATCH_SIZE} in 20% train dataset: {len(train_dataloader_20_percent)}')
print(f'Number of batches: {BATCH_SIZE} in test dataset: {len(test_dataloader)}')
print(f'Class Names: {len(class_names)}')
train_dataloader_10_percent, train_dataloader_20_percent, test_dataloader, class_names

Number of batches: 32 in 10% train dataset: 8
Number of batches: 32 in 20% train dataset: 15
Number of batches: 32 in test dataset: 3
Class Names: 3


(<torch.utils.data.dataloader.DataLoader at 0x1bbe484cd90>,
 <torch.utils.data.dataloader.DataLoader at 0x1bbe484e110>,
 <torch.utils.data.dataloader.DataLoader at 0x1bbe484fd30>,
 ['pizza', 'steak', 'sushi'])

### 7.5 Create Feature Extractor Models

We want two functions: 

1. Creates a `torchvision.models.efficientnet_b0()` feature extractor with a frozen backbone/base layer and a custom classifier head. (EffNetB0) 
2. Creates a `torchvision.models.efficientnet_b2()` feature extractor with a frozen backbone/base layer and a custom classifier head. (EffNetB2)

In [48]:
torchvision.__version__

'0.18.0+cu118'

In [50]:
import torchvision 

# extract the EfficientNetB0 weights
effnet_b0_weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # "DEFAULT" = best available weights
# create an EfficientNetB0 model
effnet_b0 = torchvision.models.efficientnet_b0(weights=effnet_b0_weights)

# extract the EfficientNetB2 weights
effnet_b2_weights = torchvision.models.EfficientNet_B2_Weights.DEFAULT # "DEFAULT" = best available weights

# create an EfficientNetB2 model
effnet_b2 = torchvision.models.efficientnet_b2(weights=effnet_b2_weights)

Downloading: "https://download.pytorch.org/models/efficientnet_b2_rwightman-c35c1473.pth" to C:\Users\karthik.kolluri/.cache\torch\hub\checkpoints\efficientnet_b2_rwightman-c35c1473.pth
100%|██████████| 35.2M/35.2M [00:13<00:00, 2.70MB/s]


Inspecting EfficentNet B0 model architecture

In [51]:
effnet_b0

EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

In [53]:
from torchinfo import summary

summary(model=effnet_b0,
        input_size=(32, 3, 224, 224),
        col_names=["input_size", "output_size", "num_params", "trainable"],
        col_width=16,
        row_settings=["var_names"],
        verbose=0)

Layer (type (var_name))                                      Input Shape      Output Shape     Param #          Trainable
EfficientNet (EfficientNet)                                  [32, 3, 224, 224] [32, 1000]       --               True
├─Sequential (features)                                      [32, 3, 224, 224] [32, 1280, 7, 7] --               True
│    └─Conv2dNormActivation (0)                              [32, 3, 224, 224] [32, 32, 112, 112] --               True
│    │    └─Conv2d (0)                                       [32, 3, 224, 224] [32, 32, 112, 112] 864              True
│    │    └─BatchNorm2d (1)                                  [32, 32, 112, 112] [32, 32, 112, 112] 64               True
│    │    └─SiLU (2)                                         [32, 32, 112, 112] [32, 32, 112, 112] --               --
│    └─Sequential (1)                                        [32, 32, 112, 112] [32, 16, 112, 112] --               True
│    │    └─MBConv (0)                   

Inspecting EfficientNet B2 Model

In [52]:
effnet_b2

EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

In [54]:
summary(model=effnet_b2,
        input_size=(32, 3, 224, 224),
        col_names=["input_size", "output_size", "num_params", "trainable"],
        col_width=16,
        row_settings=["var_names"],
        verbose=0)

Layer (type (var_name))                                      Input Shape      Output Shape     Param #          Trainable
EfficientNet (EfficientNet)                                  [32, 3, 224, 224] [32, 1000]       --               True
├─Sequential (features)                                      [32, 3, 224, 224] [32, 1408, 7, 7] --               True
│    └─Conv2dNormActivation (0)                              [32, 3, 224, 224] [32, 32, 112, 112] --               True
│    │    └─Conv2d (0)                                       [32, 3, 224, 224] [32, 32, 112, 112] 864              True
│    │    └─BatchNorm2d (1)                                  [32, 32, 112, 112] [32, 32, 112, 112] 64               True
│    │    └─SiLU (2)                                         [32, 32, 112, 112] [32, 32, 112, 112] --               --
│    └─Sequential (1)                                        [32, 32, 112, 112] [32, 16, 112, 112] --               True
│    │    └─MBConv (0)                   

In [60]:
import torchvision
from torch import nn

OUT_FEATURES = len(class_names)

# Creates an EffNetB0 feature extractor
def create_effnetb0() -> nn.Module:
    """
    Creates an EfficientNetB0 model with a custom classifier head.
    Freezes the Base layer (Feature extractor) and changes classifier head.
    """
    # Get the weights for the model
    weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # "DEFAULT" = best available weights
    # Create the model
    model = torchvision.models.efficientnet_b0(weights=weights).to(device=device)

    # Freeze the base layers (feature extractor) by setting their requires_grad attribute to False
    for param in model.parameters():
        param.requires_grad = False

    # Change Classifier head 
    set_seeds()
    model.classifier = nn.Sequential(
        nn.Dropout(p=0.2, inplace=True),
        nn.Linear(in_features=1280, out_features=OUT_FEATURES, bias=True)
    ).to(device=device)

    # Give the model a name
    model.name = "effnet-b0"
    print(f'[INFO] Created {model.name} model...!')
    return model


# Creates an EffNetB2 model (frozen feature extractor and custom classifier)
def create_effnetb2() -> nn.Module: 
    """
    Creates an EfficientNetB2 model with a custom classifier head.
    Freezes the Base layer (Feature extractor) and changes classifier head.
    """
    # Get the weights for the model 
    weights = torchvision.models.EfficientNet_B2_Weights.DEFAULT # "DEFAULT" = best available weights
    # Create the model
    model = torchvision.models.efficientnet_b2(weights=weights).to(device=device)

    # Freeze the base layers (feature extractor) by setting their requires_grad attribute to False
    for param in model.parameters():
        param.requires_grad = False

    # Change Classifier head
    model.classifier = nn.Sequential(
        nn.Dropout(p=0.3, inplace=True),
        nn.Linear(in_features=1408, out_features=OUT_FEATURES, bias=True)
    )

    # Give the model a name
    model.name = "effnet-b2"
    print(f'[INFO] Created {model.name} model...!')
    return model


In [57]:
created_model_test = create_effnetb0()

created_model_test

[INFO] Created effnet-b0 model...!


EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

In [58]:
summary(model=created_model_test,
        input_size=(32, 3, 224, 224),
        col_names=["input_size", "output_size", "num_params", "trainable"],
        col_width=16,
        row_settings=["var_names"],
        verbose=0)

Layer (type (var_name))                                      Input Shape      Output Shape     Param #          Trainable
EfficientNet (EfficientNet)                                  [32, 3, 224, 224] [32, 3]          --               Partial
├─Sequential (features)                                      [32, 3, 224, 224] [32, 1280, 7, 7] --               False
│    └─Conv2dNormActivation (0)                              [32, 3, 224, 224] [32, 32, 112, 112] --               False
│    │    └─Conv2d (0)                                       [32, 3, 224, 224] [32, 32, 112, 112] (864)            False
│    │    └─BatchNorm2d (1)                                  [32, 32, 112, 112] [32, 32, 112, 112] (64)             False
│    │    └─SiLU (2)                                         [32, 32, 112, 112] [32, 32, 112, 112] --               --
│    └─Sequential (1)                                        [32, 32, 112, 112] [32, 16, 112, 112] --               False
│    │    └─MBConv (0)           