# 07 PyTorch Experiment Tracking

Machine Learning is very experimental. 

In order to figure out which experiments are worth pursuing, thats where **experiment tracking** comes in, it helps you to figure out what dosent work so you can figure out what **does** work.

In this notebook, we're going to see an example of programmatically tracking experiments 

Resources:
* Book version of notebook: https://www.learnpytorch.io/07_pytorch_experiment_tracking//
* Ask a question: https://github.com/mrdbourke/pytorch-deep-learning/discussions
* Extra-curriculum: https://madewithml.com/courses/mlops/experiment-tracking/

In [1]:
import torch
import torchvision

print(torch.__version__)
print(torchvision.__version__)

2.0.1+cpu
0.15.2+cpu


In [2]:
# # For this notebook to run with updated APIs, we need torch 1.12+ and torchvision 0.13+
# try:
#     import torch
#     import torchvision
#     assert int(torch.__version__.split(".")[1]) >= 12, "torch version should be 1.12+"
#     assert int(torchvision.__version__.split(".")[1]) >= 13, "torchvision version should be 0.13+"
#     print(f"torch version: {torch.__version__}")
#     print(f"torchvision version: {torchvision.__version__}")
# except:
#     print(f"[INFO] torch/torchvision versions not as required, installing nightly versions.")
#     !pip3 install -U torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
#     import torch
#     import torchvision
#     print(f"torch version: {torch.__version__}")
#     print(f"torchvision version: {torchvision.__version__}")

In [3]:
# Continue with regular imports
import matplotlib.pyplot as plt
import torch
import torchvision

from torch import nn
from torchvision import transforms

# Try to get torchinfo, install it if it doesn't work
try:
    from torchinfo import summary
except:
    print("[INFO] Couldn't find torchinfo... installing it.")
    !pip install -q torchinfo
    from torchinfo import summary

# Try to import the going_modular directory, download it from GitHub if it doesn't work
try:
    from going_modular.going_modular import data_setup, engine
except:
    # Get the going_modular scripts
    print("[INFO] Couldn't find going_modular scripts... downloading them from GitHub.")
    !git clone https://github.com/mrdbourke/pytorch-deep-learning
    !mv pytorch-deep-learning/going_modular .
    !rm -rf pytorch-deep-learning
    from going_modular.going_modular import data_setup, engine

In [4]:
# Setup device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cpu'

In [5]:
# Set seeds
def set_seeds(seed: int=42):
    """Sets random sets for torch operations.

    Args:
        seed (int, optional): Random seed to set. Defaults to 42.
    """
    # Set the seed for general torch operations
    torch.manual_seed(seed)
    # Set the seed for CUDA torch operations (ones that happen on the GPU)
    torch.cuda.manual_seed(seed)

In [6]:
set_seeds()

## 1. Get data

Want to get pizza, steak, sushi images.

So we can run experiments building FoodVision Mini and see which model performs best.

In [7]:
import os
import zipfile

from pathlib import Path

import requests

def download_data(source: str,
                  destination: str,
                  remove_source: bool = True) -> Path:
    """
    Dowloads a zipped dataset from source and unzips to destination.
    """
    # Setup path to data folder
    data_path = Path("data/")
    image_path = data_path / destination

    # If image folder dosen't exist, create it
    if image_path.is_dir():
        print(f"[INFO] {image_path} directory already exists, skipping download.")
    else:
        print(f"[INFO] Did not find {image_path} directory, creating one..")
        image_path.mkdir(parents=True, exist_ok=True)

        # Download the target data
        target_file = Path(source).name
        with open(data_path / target_file, "wb") as f:
            request = request.get(source)
            print(f"[INFO] Downloading {target_file} from {source}...")
            f.write(request.content)

        # Unzip target file
        with zipfile.ZipFile(data_path / target_file, "r") as zip_ref:
            print(f"[INFO] Unzipping {target_file} data..")
            zip_ref.extractall(image_path)

        #Remove .zip file if needed
        if remove_source:
            os.remove(data_path / target_file)

    return image_path

In [8]:
image_path = download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
                           destination = "pizza_steak_sushi")
image_path

[INFO] data\pizza_steak_sushi directory already exists, skipping download.


WindowsPath('data/pizza_steak_sushi')

## 2. Create datasets and dataloaders

### 2.1 Create dataloaders with manual transforms

The goal with transforms is to ensure your custom data is formatted in a reproducable way as well as a way that will suit pretrained models.

In [9]:
# Setup directories
train_dir = image_path / "train"
test_dir = image_path / "test"

train_dir, test_dir

(WindowsPath('data/pizza_steak_sushi/train'),
 WindowsPath('data/pizza_steak_sushi/test'))

In [10]:
# Setup ImageNet normalization levels
# See here: https://pytorch.org/vision/0.12/models.html
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

# Create transform pipeline manually
from torchvision import transforms
manual_transforms = transforms.Compose([ 
                                       transforms.Resize((224,224)),
                                       transforms.ToTensor(),
                                       normalize])

print(f"Manually created transforms: {manual_transforms}")

# Create DataLoaders
from going_modular.going_modular import data_setup
train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir= train_dir,
                                                                  test_dir= test_dir,
                                                                  transform= manual_transforms,
                                                                  batch_size= 32)
train_dataloader, test_dataloader, class_names

Manually created transforms: Compose(
    Resize(size=(224, 224), interpolation=bilinear, max_size=None, antialias=warn)
    ToTensor()
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)


(<torch.utils.data.dataloader.DataLoader at 0x208c9201db0>,
 <torch.utils.data.dataloader.DataLoader at 0x208c9201d50>,
 ['pizza', 'steak', 'sushi'])

### 2.2 Create DataLoaders using automatically created transforms

The same principal applies for automatic transforms: we want out custom data in the same format our pretrained model was trained on.

https://pytorch.org/vision/main/models.html#using-the-pre-trained-models

In [11]:
# Setup dirs 
train_dir = image_path / "train"
test_dir = image_path / "test"

# Setup pretrained weights (plenty of these weights abiailaible in torchvision v0.13+)
import torchvision 
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # "Default" = best available

# Get transforms from weights (these are the transforms used to train a particular or obtain a particular set of weights)
automatic_transforms = weights.transforms()
print(f"Automatically created transforms: {automatic_transforms}")

# Create dataloaders
train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir= train_dir,
                                                                               test_dir= test_dir,
                                                                               transform= automatic_transforms,
                                                                               batch_size= 32)
train_dataloader, test_dataloader, class_names

Automatically created transforms: ImageClassification(
    crop_size=[224]
    resize_size=[256]
    mean=[0.485, 0.456, 0.406]
    std=[0.229, 0.224, 0.225]
    interpolation=InterpolationMode.BICUBIC
)


(<torch.utils.data.dataloader.DataLoader at 0x208c9203ee0>,
 <torch.utils.data.dataloader.DataLoader at 0x208c9201660>,
 ['pizza', 'steak', 'sushi'])

## 3. Getting a pretrained model, freeze the base layers and change the classifier head

In [12]:
# Note: This is how a pretrained model would be created prior to torchvision v0.13
# model = torchvision.models.efficientnet_b0(pretrained= True).to(device)
# model

# New method
# Download the pretrained weights for EfficientNet_B0
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT

# Setup the model with the pretrained weights and send it to the target device
model = torchvision.models.efficientnet_b0(weights= weights).to(device)
model

EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

In [13]:
# Freeze all base layers by setting their requires_grad attribute to False
for param in model.features.parameters():
    param.requires_grad = False

In [14]:
model.classifier

Sequential(
  (0): Dropout(p=0.2, inplace=True)
  (1): Linear(in_features=1280, out_features=1000, bias=True)
)

In [15]:
# Adjust the classifier head
model.classifier = nn.Sequential( 
                                 nn.Dropout(p = 0.2, inplace= True),
                                 nn.Linear(in_features= 1280, out_features= len(class_names))).to(device)

In [16]:
model

EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

In [17]:
from torchinfo import summary

summary(model, 
        input_size = (32,3,224,224),
        verbose = 0,
        col_names = ["input_size","output_size","num_params","trainable"],
        col_width= 20,
        row_settings= ["var_names"])

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [32, 3, 224, 224]    [32, 3]              --                   Partial
├─Sequential (features)                                      [32, 3, 224, 224]    [32, 1280, 7, 7]     --                   False
│    └─Conv2dNormActivation (0)                              [32, 3, 224, 224]    [32, 32, 112, 112]   --                   False
│    │    └─Conv2d (0)                                       [32, 3, 224, 224]    [32, 32, 112, 112]   (864)                False
│    │    └─BatchNorm2d (1)                                  [32, 32, 112, 112]   [32, 32, 112, 112]   (64)                 False
│    │    └─SiLU (2)                                         [32, 32, 112, 112]   [32, 32, 112, 112]   --                   --
│    └─Sequential (1)                                        [32, 32, 112, 112]   [32, 

## 4. Train a single model and track results

In [18]:
# Define a loss function optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr= 0.001)

To track experiments, we're going to use TensorBoard: https://www.tensorflow.org/tensorboard/

And to interact with TensorBoard, we can use PyTorch's SummaryWriter - https://pytorch.org/docs/stable/tensorboard.html
* Also see here: https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter

In [19]:
# Setup a SummaryWriter
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
writer

<torch.utils.tensorboard.writer.SummaryWriter at 0x208cb62d960>

In [20]:
from tqdm.auto import tqdm
from typing import Dict, List, Tuple

from going_modular.going_modular.engine import train_step, test_step

def train(model: torch.nn.Module, 
          train_dataloader: torch.utils.data.DataLoader, 
          test_dataloader: torch.utils.data.DataLoader, 
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module,
          epochs: int,
          device: torch.device,) -> Dict[str, List[float]]:
    """Trains and tests a PyTorch model.

    Passes a target PyTorch models through train_step() and test_step()
    functions for a number of epochs, training and testing the model
    in the same epoch loop.

    Calculates, prints and stores evaluation metrics throughout.

    Args:
    model: A PyTorch model to be trained and tested.
    train_dataloader: A DataLoader instance for the model to be trained on.
    test_dataloader: A DataLoader instance for the model to be tested on.
    optimizer: A PyTorch optimizer to help minimize the loss function.
    loss_fn: A PyTorch loss function to calculate loss on both datasets.
    epochs: An integer indicating how many epochs to train for.
    device: A target device to compute on (e.g. "cuda" or "cpu").

    Returns:
    A dictionary of training and testing loss as well as training and
    testing accuracy metrics. Each metric has a value in a list for 
    each epoch.
    In the form: {train_loss: [...],
              train_acc: [...],
              test_loss: [...],
              test_acc: [...]} 
    For example if training for epochs=2: 
             {train_loss: [2.0616, 1.0537],
              train_acc: [0.3945, 0.3945],
              test_loss: [1.2641, 1.5706],
              test_acc: [0.3400, 0.2973]} 
    """
    # Create empty results dictionary
    results = {"train_loss": [],
               "train_acc": [],
               "test_loss": [],
               "test_acc": []
    }

    # Loop through training and testing steps for a number of epochs
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
                                          dataloader=train_dataloader,
                                          loss_fn=loss_fn,
                                          optimizer=optimizer,
                                          device=device)
        test_loss, test_acc = test_step(model=model,
                                        dataloader=test_dataloader,
                                        loss_fn=loss_fn,
                                        device=device)

        # Print out what's happening
        print(
          f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.4f} | "
          f"train_acc: {train_acc:.4f} | "
          f"test_loss: {test_loss:.4f} | "
          f"test_acc: {test_acc:.4f}"
        )

        # Update results dictionary
        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)


        ### New: Experiment tracking ###
        # See SummaryWriter documentation
        writer.add_scalars(main_tag= "loss",
                           tag_scalar_dict= {"train_loss": train_loss,
                                             "test_loss": test_loss},
                                             global_step = epoch)
        
        writer.add_scalars(main_tag= "Accuracy",
                           tag_scalar_dict={"train_acc": train_acc,
                                            "test_acc": test_acc},
                           global_step= epoch)
        
        writer.add_graph(model= model,
                         input_to_model= torch.randn(32,3,224,224).to(device))
        
    # close the writer
    writer.close()
    ### End new ###

    # Return the filled results at the end of the epochs
    return results

In [21]:
# Train model
# Note: not using engine.train(), ince we updated the train() function above
set_seeds()
results = train(model = model,
                train_dataloader= train_dataloader,
                test_dataloader= test_dataloader,
                optimizer= optimizer,
                loss_fn= loss_fn,
                epochs=5,
                device=device)

  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0853 | train_acc: 0.4219 | test_loss: 0.8532 | test_acc: 0.7737
Epoch: 2 | train_loss: 0.9059 | train_acc: 0.6758 | test_loss: 0.8085 | test_acc: 0.7121
Epoch: 3 | train_loss: 0.7440 | train_acc: 0.7773 | test_loss: 0.6371 | test_acc: 0.9062
Epoch: 4 | train_loss: 0.6754 | train_acc: 0.7812 | test_loss: 0.6129 | test_acc: 0.8655
Epoch: 5 | train_loss: 0.6366 | train_acc: 0.8047 | test_loss: 0.6095 | test_acc: 0.8352


In [22]:
results

{'train_loss': [1.0853236243128777,
  0.9059076905250549,
  0.7440202981233597,
  0.6754261776804924,
  0.636649377644062],
 'train_acc': [0.421875, 0.67578125, 0.77734375, 0.78125, 0.8046875],
 'test_loss': [0.8531814416249593,
  0.8084683418273926,
  0.6371185580889384,
  0.6128527124722799,
  0.6095381279786428],
 'test_acc': [0.7736742424242425,
  0.712121212121212,
  0.90625,
  0.8655303030303031,
  0.8352272727272728]}

## 5. View our model's results with TensorBoard

There are a few ways to view TensorBoard results, see them here: https://www.learnpytorch.io/07_pytorch_experiment_tracking/#5-view-our-models-results-in-tensorboard

In [23]:
# Lets view our experiments from within the notebook
%load_ext tensorboard
%tensorboard --logdir runs

Reusing TensorBoard on port 6006 (pid 5336), started 7 days, 23:49:24 ago. (Use '!kill 5336' to kill it.)

## 6. Create a function to prepare a 'SummaryWriter()' instance

By default our 'SummaryWriter()' class saves to 'log_dir'.

How about if we wanted to save different experiments to different folders?

In essence, one experiment = one folder.

For example, we'd like to track:
* Experiment data/time
* Experiment name
* Model name
* Extra - is there anything else that should be tracked?

Let's create a function to create a 'SummaryWriter()' instance to take all these things into account.

So ideally we end up tracking experiments to a directory:

'runs/YYYY-MM-DD/experiment_name/model_name/extra'

In [24]:
from torch.utils.tensorboard import SummaryWriter
def create_writer(experiment_name: str,
                  model_name: str,
                  extra: str = None):
    """Creates a torch.utils.tensorboard.writer.SummaryWriter() instance tracking to a specific directory"""
    from datetime import datetime
    import os
    
    # Get timestamp of current date in reverse order
    timestamp = datetime.now().strftime("%Y-%m-%d")
    
    if extra:
        # Create log directory path
        log_dir = os.path.join("runs", timestamp, experiment_name, model_name, extra)
    else:
        log_dir = os.path.join("runs", timestamp, experiment_name, model_name)
    
    print(f"[INFO] Created Summary Writer saving to {log_dir}")    
    return SummaryWriter(log_dir = log_dir)

In [25]:
example_writer = create_writer(experiment_name= "data_10_percent",
                              model_name = 'effnetb0',
                              extra="5_epochs")
example_writer

[INFO] Created Summary Writer saving to runs\2023-12-07\data_10_percent\effnetb0\5_epochs


<torch.utils.tensorboard.writer.SummaryWriter at 0x208cd872770>

### 6.1 Update the 'train()' function to include a 'writer' parameter

In [26]:
def train(model: torch.nn.Module, 
          train_dataloader: torch.utils.data.DataLoader, 
          test_dataloader: torch.utils.data.DataLoader, 
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module,
          epochs: int,
          device: torch.device,
          writer: torch.utils.tensorboard.writer.SummaryWriter) -> Dict[str, List[float]]:
    """Trains and tests a PyTorch model.

    Passes a target PyTorch models through train_step() and test_step()
    functions for a number of epochs, training and testing the model
    in the same epoch loop.

    Calculates, prints and stores evaluation metrics throughout.

    Args:
    model: A PyTorch model to be trained and tested.
    train_dataloader: A DataLoader instance for the model to be trained on.
    test_dataloader: A DataLoader instance for the model to be tested on.
    optimizer: A PyTorch optimizer to help minimize the loss function.
    loss_fn: A PyTorch loss function to calculate loss on both datasets.
    epochs: An integer indicating how many epochs to train for.
    device: A target device to compute on (e.g. "cuda" or "cpu").

    Returns:
    A dictionary of training and testing loss as well as training and
    testing accuracy metrics. Each metric has a value in a list for 
    each epoch.
    In the form: {train_loss: [...],
              train_acc: [...],
              test_loss: [...],
              test_acc: [...]} 
    For example if training for epochs=2: 
             {train_loss: [2.0616, 1.0537],
              train_acc: [0.3945, 0.3945],
              test_loss: [1.2641, 1.5706],
              test_acc: [0.3400, 0.2973]} 
    """
    # Create empty results dictionary
    results = {"train_loss": [],
               "train_acc": [],
               "test_loss": [],
               "test_acc": []
    }

    # Loop through training and testing steps for a number of epochs
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
                                          dataloader=train_dataloader,
                                          loss_fn=loss_fn,
                                          optimizer=optimizer,
                                          device=device)
        test_loss, test_acc = test_step(model=model,
                                        dataloader=test_dataloader,
                                        loss_fn=loss_fn,
                                        device=device)

        # Print out what's happening
        print(
          f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.4f} | "
          f"train_acc: {train_acc:.4f} | "
          f"test_loss: {test_loss:.4f} | "
          f"test_acc: {test_acc:.4f}"
        )

        # Update results dictionary
        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)


        ### New: Experiment tracking ###
        if writer:
            
            # See SummaryWriter documentation
            writer.add_scalars(main_tag= "loss",
                            tag_scalar_dict= {"train_loss": train_loss,
                                                "test_loss": test_loss},
                                                global_step = epoch)
            
            writer.add_scalars(main_tag= "Accuracy",
                            tag_scalar_dict={"train_acc": train_acc,
                                                "test_acc": test_acc},
                            global_step= epoch)
            
            writer.add_graph(model= model,
                            input_to_model= torch.randn(32,3,224,224).to(device))
        
            # close the writer
            writer.close()
        
        else:
            pass
    ### End new ###

    # Return the filled results at the end of the epochs
    return results

## 7. Setting up a series of modelling experiments

* Setup 2x modelling experiments with effnetb0, pizza, steak, sushi data and train one model for 5 epochs and another for 10 epochs

### 7.1 What kind of experiments should you run?

The number of machine learning experiments you can run, is like the number of different models you can build... almost limitless. 

However, you cant test everything...add()

So what should you test?
* Change the number of epochs 
* Change the number of hidden layers/units 
* Change the amount of data (right now we're using 10% of the Food101 dataset for pizza, steak, sushi)
* Change the learning rate 
* Try different kinds of data augmentation
* Choose a different model architecture 

Thats why transfer learning is so powerul, beacuse, it's a working model that you can apply to your own problem.

### 7.2 What experiments are we going to run?

We're going to turn three dials:
1. Model size - EffnetB0 vs EffnetB2 ( in terms of number of parameters)
2. Dataset size - 10% of pizza, steak, sushi images vs 20% (generally more data = better results)
3. Training time - 5 epochs vs 10 epochs ( generally longer training time = better results, up to a point)

To begin, we're still keeping things relatively small so that our experiments run quickly.

**Our goal:** a model that is well performing but still small enough to run on a mobile device or web browser, so FoodVision Mini can come to life.

If you had infinite compute + time, you should basically always choose the biggest model and biggest dataset you can. See here: bitter lesson

### 7.3 Download different datasets

We want two datasets:

1. Pizza, steak, sushi 10% - https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip
2. Pizza, steak, sushi 20% - https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi_20_percent.zip

They were created with: https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/04_custom_data_creation.ipynb

In [27]:
# Download 10 percent and 20 percent datasets
data_10_percent_path = download_data(source = 'https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip',
                                destination = "pizza_steak_sushi")

data_20_percent_path = download_data(source = 'https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi_20_percent.zip',
                                destination = "pizza_steak_sushi_20_percent")

[INFO] data\pizza_steak_sushi directory already exists, skipping download.
[INFO] data\pizza_steak_sushi_20_percent directory already exists, skipping download.


### 7.4 Transform Datasets and Create DataLoaders

We'll transform our data in a few ways:
1. Resize the images to (224,224)
2. Make sure the image values are between [0, 1]
3. Normalize the images so they have the same data distribution as ImageNet

In [28]:
# Setup training directory paths
train_dir_10_percent = data_10_percent_path / "train"
train_dir_20_percent = data_20_percent_path / "train"

# Setup test directory 
test_dir = data_10_percent_path / "test" 

train_dir_10_percent, train_dir_20_percent, test_dir

(WindowsPath('data/pizza_steak_sushi/train'),
 WindowsPath('data/pizza_steak_sushi_20_percent/train'),
 WindowsPath('data/pizza_steak_sushi/test'))

In [29]:
from torchvision import transforms

# Setup ImageNet normalization levels
# See here: https://pytorch.org/vision/0.12/models.html
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

# Compose transforms into a pipeline
simple_transform = transforms.Compose([ 
                                       transforms.Resize((224, 224)),
                                       transforms.ToTensor(),
                                       normalize
                                       ])

In [30]:
BATCH_SIZE = 32

#Create 10% training and test DataLoaders
train_dataloader_10_percent, test_dataloader, class_names = data_setup.create_dataloaders(train_dir= train_dir_10_percent,
                                                                                          test_dir= test_dir,
                                                                                          transform= simple_transform,
                                                                                          batch_size= BATCH_SIZE)

#Create 20% training and test DataLoaders
train_dataloader_20_percent, test_dataloader, class_names = data_setup.create_dataloaders(train_dir= train_dir_20_percent,
                                                                                          test_dir= test_dir,
                                                                                          transform= simple_transform,
                                                                                          batch_size= BATCH_SIZE)

print(f"Number of batches of size {BATCH_SIZE} in 10% train data: {len(train_dataloader_10_percent)}")
print(f"Number of batches of size {BATCH_SIZE} in 20% train data: {len(train_dataloader_20_percent)}")
print(f"Number of batches of size {BATCH_SIZE} in 10% test data: {len(test_dataloader)}")
print(f"Class names: {class_names}")

Number of batches of size 32 in 10% train data: 8
Number of batches of size 32 in 20% train data: 15
Number of batches of size 32 in 10% test data: 3
Class names: ['pizza', 'steak', 'sushi']


### 7.5 Create feature extractor models

We want two function:
1. Creates a 'torchvision.models.efficientnet_b0()' feature extractor with a frozen backbone/base layers and a custom classifier head. (EffNetB0)
2. Creates a 'torchvision.models.efficientnet_b2()' feature extractor with a frozen backbone/base layers and a custom classifier head. (EffnetB2)

In [31]:
import torchvision 

# Create an EffNetB2
effnetb2_weights = torchvision.models.EfficientNet_B2_Weights.DEFAULT
effnetb2 = torchvision.models.efficientnet_b2(weights= effnetb2_weights)

effnetb2

EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

In [32]:
summary(model= effnetb2, 
        input_size = (32,3,224,224),
        verbose = 0,
        col_names = ["input_size","output_size","num_params","trainable"],
        col_width= 20,
        row_settings= ["var_names"])

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [32, 3, 224, 224]    [32, 1000]           --                   True
├─Sequential (features)                                      [32, 3, 224, 224]    [32, 1408, 7, 7]     --                   True
│    └─Conv2dNormActivation (0)                              [32, 3, 224, 224]    [32, 32, 112, 112]   --                   True
│    │    └─Conv2d (0)                                       [32, 3, 224, 224]    [32, 32, 112, 112]   864                  True
│    │    └─BatchNorm2d (1)                                  [32, 32, 112, 112]   [32, 32, 112, 112]   64                   True
│    │    └─SiLU (2)                                         [32, 32, 112, 112]   [32, 32, 112, 112]   --                   --
│    └─Sequential (1)                                        [32, 32, 112, 112]   [32, 16, 112

In [33]:
import torchvision
from torch import nn

OUT_FEATURES = len(class_names)

# Create an EffNetB0 feature extractor
def create_effnetb0():
    # Get the weights and setup a model
    weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
    model = torchvision.models.efficientnet_b0(weights= weights).to(device)

    #Freeze the base model layers
    for param in model.features.parameters():
        param.requires_grad = False

    # Change the classifier head
    set_seeds()
    model.classifier = nn.Sequential(
        nn.Dropout(p=0.2, inplace=True),
        nn.Linear(in_features=1280, out_features=OUT_FEATURES)
    ).to(device)

    # Give the model a name
    model.name = "effnetb0"
    print(f"[INFO] Created new {model.name} model...")
    return model

# Create an EffNetB2 feature extractor
def create_effnetb2():
    # Get the weights and setup a model
    weights = torchvision.models.EfficientNet_B2_Weights.DEFAULT
    model = torchvision.models.efficientnet_b2(weights= weights).to(device)

    #Freeze the base model layers
    for param in model.features.parameters():
        param.requires_grad = False

    # Change the classifier head
    set_seeds()
    model.classifier = nn.Sequential(
        nn.Dropout(p=0.3, inplace=True),
        nn.Linear(in_features=1408, out_features=OUT_FEATURES)
    ).to(device)

    # Give the model a name
    model.name = "effnetb2"
    print(f"[INFO] Created new {model.name} model...")
    return model

In [34]:
created_model_test_effnet_b2= create_effnetb2()
created_model_test_effnet_b0= create_effnetb0()

[INFO] Created new effnetb2 model...
[INFO] Created new effnetb0 model...


In [35]:
summary(model= created_model_test_effnet_b2, 
        input_size = (32,3,224,224),
        verbose = 0,
        col_names = ["input_size","output_size","num_params","trainable"],
        col_width= 20,
        row_settings= ["var_names"])

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [32, 3, 224, 224]    [32, 3]              --                   Partial
├─Sequential (features)                                      [32, 3, 224, 224]    [32, 1408, 7, 7]     --                   False
│    └─Conv2dNormActivation (0)                              [32, 3, 224, 224]    [32, 32, 112, 112]   --                   False
│    │    └─Conv2d (0)                                       [32, 3, 224, 224]    [32, 32, 112, 112]   (864)                False
│    │    └─BatchNorm2d (1)                                  [32, 32, 112, 112]   [32, 32, 112, 112]   (64)                 False
│    │    └─SiLU (2)                                         [32, 32, 112, 112]   [32, 32, 112, 112]   --                   --
│    └─Sequential (1)                                        [32, 32, 112, 112]   [32, 

In [36]:
summary(model= created_model_test_effnet_b0, 
        input_size = (32,3,224,224),
        verbose = 0,
        col_names = ["input_size","output_size","num_params","trainable"],
        col_width= 20,
        row_settings= ["var_names"])

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [32, 3, 224, 224]    [32, 3]              --                   Partial
├─Sequential (features)                                      [32, 3, 224, 224]    [32, 1280, 7, 7]     --                   False
│    └─Conv2dNormActivation (0)                              [32, 3, 224, 224]    [32, 32, 112, 112]   --                   False
│    │    └─Conv2d (0)                                       [32, 3, 224, 224]    [32, 32, 112, 112]   (864)                False
│    │    └─BatchNorm2d (1)                                  [32, 32, 112, 112]   [32, 32, 112, 112]   (64)                 False
│    │    └─SiLU (2)                                         [32, 32, 112, 112]   [32, 32, 112, 112]   --                   --
│    └─Sequential (1)                                        [32, 32, 112, 112]   [32, 

### 7.6 Create experiments and setup training code

In [37]:
# Create epoch list
num_epoch = [5, 10]

# Create models list (need to create a new model for each experiment)
models = ["effnetb0","effnetb2"]

# Create a dataloaders dictionary
train_dataloaders = {"data_10_percent": train_dataloader_10_percent,
                     "data_20_percent": train_dataloader_20_percent}

In [38]:
%%time 
from going_modular.going_modular.utils import save_model

#Set the seds
set_seeds(42)

# Keep track of experiment numbers
experiment_number = 0

# Loop through each DataLoader
for dataloader_name, train_dataloader in train_dataloaders.items():
    # Loop through the epochs
    for epochs in num_epoch:
        # Loop through each model name and create a new model instance
        for model_name in models:
            
            # Print out info
            experiment_number += 1
            print(f"[INFO] Experiment number: {experiment_number}")
            print(f"[INFO] Model: {model_name}")
            print(f"[INFO] DataLoader: {dataloader_name}")
            print(f"[INFO] Number of epochs: {epochs}")
            
            # Select and create the model
            if model_name == "effnetb0":
                model = create_effnetb0()
            else:
                model = create_effnetb2()
            
            # Create a new loss and optimizer for every model
            loss_fn = nn.CrossEntropyLoss()
            optimizer = torch.optim.Adam(params= model.parameters(), lr= 0.001)
            
            # Train target model with target dataloader and track experiments
            train(model= model,
                  train_dataloader= train_dataloader,
                  test_dataloader= test_dataloader,
                  optimizer= optimizer,
                  loss_fn= loss_fn,
                  epochs= epochs,
                  device= device,
                  writer= create_writer(experiment_name= dataloader_name,
                                        model_name= model_name,
                                        extra= f"{epochs}_epochs"))
            
            # Save the model to a file so that we can import it later if need br
            save_filepath = f"07_{model_name}_{dataloader_name}_{epochs}_epochs.pth"
            save_model(model= model,
                       target_dir= "models",
                       model_name= save_filepath)
            print("-"*50 + "\n")

[INFO] Experiment number: 1
[INFO] Model: effnetb0
[INFO] DataLoader: data_10_percent
[INFO] Number of epochs: 5
[INFO] Created new effnetb0 model...
[INFO] Created Summary Writer saving to runs\2023-12-07\data_10_percent\effnetb0\5_epochs


  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0564 | train_acc: 0.4688 | test_loss: 0.9015 | test_acc: 0.4782
Epoch: 2 | train_loss: 0.9633 | train_acc: 0.5469 | test_loss: 0.7998 | test_acc: 0.7235
Epoch: 3 | train_loss: 0.7965 | train_acc: 0.7305 | test_loss: 0.7020 | test_acc: 0.8352
Epoch: 4 | train_loss: 0.7192 | train_acc: 0.7344 | test_loss: 0.5854 | test_acc: 0.8864
Epoch: 5 | train_loss: 0.6150 | train_acc: 0.8711 | test_loss: 0.5719 | test_acc: 0.8968
[INFO] Saving model to: models\07_effnetb0_data_10_percent_5_epochs.pth
--------------------------------------------------

[INFO] Experiment number: 2
[INFO] Model: effnetb2
[INFO] DataLoader: data_10_percent
[INFO] Number of epochs: 5
[INFO] Created new effnetb2 model...
[INFO] Created Summary Writer saving to runs\2023-12-07\data_10_percent\effnetb2\5_epochs


  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0892 | train_acc: 0.3359 | test_loss: 0.9453 | test_acc: 0.7102
Epoch: 2 | train_loss: 0.9260 | train_acc: 0.6562 | test_loss: 0.8759 | test_acc: 0.7642
Epoch: 3 | train_loss: 0.8395 | train_acc: 0.7031 | test_loss: 0.7522 | test_acc: 0.9176
Epoch: 4 | train_loss: 0.7210 | train_acc: 0.7266 | test_loss: 0.7177 | test_acc: 0.8769
Epoch: 5 | train_loss: 0.6548 | train_acc: 0.8086 | test_loss: 0.7193 | test_acc: 0.8665
[INFO] Saving model to: models\07_effnetb2_data_10_percent_5_epochs.pth
--------------------------------------------------

[INFO] Experiment number: 3
[INFO] Model: effnetb0
[INFO] DataLoader: data_10_percent
[INFO] Number of epochs: 10
[INFO] Created new effnetb0 model...
[INFO] Created Summary Writer saving to runs\2023-12-07\data_10_percent\effnetb0\10_epochs


  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0564 | train_acc: 0.4688 | test_loss: 0.9015 | test_acc: 0.4782
Epoch: 2 | train_loss: 0.9633 | train_acc: 0.5469 | test_loss: 0.7998 | test_acc: 0.7235
Epoch: 3 | train_loss: 0.7965 | train_acc: 0.7305 | test_loss: 0.7020 | test_acc: 0.8352
Epoch: 4 | train_loss: 0.7192 | train_acc: 0.7344 | test_loss: 0.5854 | test_acc: 0.8864
Epoch: 5 | train_loss: 0.6150 | train_acc: 0.8711 | test_loss: 0.5719 | test_acc: 0.8968
Epoch: 6 | train_loss: 0.6346 | train_acc: 0.7461 | test_loss: 0.5812 | test_acc: 0.8968
Epoch: 7 | train_loss: 0.5905 | train_acc: 0.7852 | test_loss: 0.5006 | test_acc: 0.8864
Epoch: 8 | train_loss: 0.5185 | train_acc: 0.9297 | test_loss: 0.4996 | test_acc: 0.8968
Epoch: 9 | train_loss: 0.4913 | train_acc: 0.9141 | test_loss: 0.5367 | test_acc: 0.8655
Epoch: 10 | train_loss: 0.4684 | train_acc: 0.8984 | test_loss: 0.4973 | test_acc: 0.8864
[INFO] Saving model to: models\07_effnetb0_data_10_percent_10_epochs.pth
------------------------------------

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0892 | train_acc: 0.3359 | test_loss: 0.9453 | test_acc: 0.7102
Epoch: 2 | train_loss: 0.9260 | train_acc: 0.6562 | test_loss: 0.8759 | test_acc: 0.7642
Epoch: 3 | train_loss: 0.8395 | train_acc: 0.7031 | test_loss: 0.7522 | test_acc: 0.9176
Epoch: 4 | train_loss: 0.7210 | train_acc: 0.7266 | test_loss: 0.7177 | test_acc: 0.8769
Epoch: 5 | train_loss: 0.6548 | train_acc: 0.8086 | test_loss: 0.7193 | test_acc: 0.8665
Epoch: 6 | train_loss: 0.6023 | train_acc: 0.8008 | test_loss: 0.6890 | test_acc: 0.8873
Epoch: 7 | train_loss: 0.5797 | train_acc: 0.8047 | test_loss: 0.6349 | test_acc: 0.8466
Epoch: 8 | train_loss: 0.5147 | train_acc: 0.9414 | test_loss: 0.5900 | test_acc: 0.8977
Epoch: 9 | train_loss: 0.5287 | train_acc: 0.8008 | test_loss: 0.5763 | test_acc: 0.8873
Epoch: 10 | train_loss: 0.5211 | train_acc: 0.8164 | test_loss: 0.5747 | test_acc: 0.8873
[INFO] Saving model to: models\07_effnetb2_data_10_percent_10_epochs.pth
------------------------------------

  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9652 | train_acc: 0.5625 | test_loss: 0.6631 | test_acc: 0.8655
Epoch: 2 | train_loss: 0.7256 | train_acc: 0.7688 | test_loss: 0.5930 | test_acc: 0.8873
Epoch: 3 | train_loss: 0.5615 | train_acc: 0.8333 | test_loss: 0.4716 | test_acc: 0.8968
Epoch: 4 | train_loss: 0.5029 | train_acc: 0.8396 | test_loss: 0.4645 | test_acc: 0.9176


## 8. View our experiments in TensorBoard

Now let's *visualize, visualize, visualize*

In [None]:
# Lets view our experiments withhin Tensorboard from within the notebook
%load_ext tensorboard
%tensorboard --logdir runs

The best performing model was:
* Model EffNetB2
* Dataset: Pizza, steak, sushi 20%
* Epochs: 10

And the overall trend of all the results was more the data, bigger model and longer training time genrally led to better results

## 9. Load the best model and make predictions with it

This is out best model filepath: ''