<a href="https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/extras/exercises/07_pytorch_experiment_tracking_exercise_template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 07. PyTorch Experiment Tracking Exercise Template

Welcome to the 07. PyTorch Experiment Tracking exercise template notebook.

> **Note:** There may be more than one solution to each of the exercises. This notebook only shows one possible example.

## Resources

1. These exercises/solutions are based on [section 07. PyTorch Transfer Learning](https://www.learnpytorch.io/07_pytorch_experiment_tracking/) of the Learn PyTorch for Deep Learning course by Zero to Mastery.
2. See a live [walkthrough of the solutions (errors and all) on YouTube](https://youtu.be/cO_r2FYcAjU).
3. See [other solutions on the course GitHub](https://github.com/mrdbourke/pytorch-deep-learning/tree/main/extras/solutions).

> **Note:** The first section of this notebook is dedicated to getting various helper functions and datasets used for the exercises. The exercises start at the heading "Exercise 1: ...".

### Get various imports and helper functions

We'll need to make sure we have `torch` v.1.12+ and `torchvision` v0.13+.

In [1]:
# For this notebook to run with updated APIs, we need torch 1.12+ and torchvision 0.13+
import torch
import torchvision

#try:
#    import torch
#    import torchvision
#    assert int(torch.__version__.split(".")[1]) >= 12, "torch version should be 1.12+"
#    assert int(torchvision.__version__.split(".")[1]) >= 13, "torchvision version should be 0.13+"
#    print(f"torch version: {torch.__version__}")
#    print(f"torchvision version: {torchvision.__version__}")
#except:
#    print(f"[INFO] torch/torchvision versions not as required, installing nightly versions.")
#    !pip3 install -U --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cu113
#    import torch
#    import torchvision
#    print(f"torch version: {torch.__version__}")
#    print(f"torchvision version: {torchvision.__version__}")

In [2]:
 # Make sure we have a GPU
 device = "cuda" if torch.cuda.is_available() else "cpu"
 device

'cuda'

In [3]:
# Get regular imports 
import matplotlib.pyplot as plt
import torch
import torchvision

from torch import nn
from torchvision import transforms

from torchinfo import summary

from going_modular.sergio import data_setup, engine, helper_functions

# Try to get torchinfo, install it if it doesn't work
#try:
#    from torchinfo import summary
#except:
#    print("[INFO] Couldn't find torchinfo... installing it.")
#    !pip install -q torchinfo
#    from torchinfo import summary

# Try to import the going_modular directory, download it from GitHub if it doesn't work
#try:
#    from going_modular.going_modular import data_setup, engine
#except:
#    # Get the going_modular scripts
#    print("[INFO] Couldn't find going_modular scripts... downloading them from GitHub.")
#    !git clone https://github.com/mrdbourke/pytorch-deep-learning
#    !mv pytorch-deep-learning/going_modular .
#    !rm -rf pytorch-deep-learning
#    from going_modular.going_modular import data_setup, engine

In [4]:
# Set seeds
def set_seeds(seed: int=42):
    """Sets random sets for torch operations.

    Args:
        seed (int, optional): Random seed to set. Defaults to 42.
    """
    # Set the seed for general torch operations
    torch.manual_seed(seed)
    # Set the seed for CUDA torch operations (ones that happen on the GPU)
    torch.cuda.manual_seed(seed)

In [5]:
import os
import zipfile

from pathlib import Path

import requests

def download_data(source: str, 
                  destination: str,
                  remove_source: bool = True) -> Path:
    """Downloads a zipped dataset from source and unzips to destination.

    Args:
        source (str): A link to a zipped file containing data.
        destination (str): A target directory to unzip data to.
        remove_source (bool): Whether to remove the source after downloading and extracting.
    
    Returns:
        pathlib.Path to downloaded data.
    
    Example usage:
        download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
                      destination="pizza_steak_sushi")
    """
    # Setup path to data folder
    data_path = Path("data/")
    image_path = data_path / destination

    # If the image folder doesn't exist, download it and prepare it... 
    if image_path.is_dir():
        print(f"[INFO] {image_path} directory exists, skipping download.")
    else:
        print(f"[INFO] Did not find {image_path} directory, creating one...")
        image_path.mkdir(parents=True, exist_ok=True)
        
        # Download pizza, steak, sushi data
        target_file = Path(source).name
        with open(data_path / target_file, "wb") as f:
            request = requests.get(source)
            print(f"[INFO] Downloading {target_file} from {source}...")
            f.write(request.content)

        # Unzip pizza, steak, sushi data
        with zipfile.ZipFile(data_path / target_file, "r") as zip_ref:
            print(f"[INFO] Unzipping {target_file} data...") 
            zip_ref.extractall(image_path)

        # Remove .zip file
        if remove_source:
            os.remove(data_path / target_file)
    
    return image_path

image_path = download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
                           destination="pizza_steak_sushi")
image_path

[INFO] data\pizza_steak_sushi directory exists, skipping download.


WindowsPath('data/pizza_steak_sushi')

In [6]:
from torch.utils.tensorboard import SummaryWriter
def create_writer(experiment_name: str, 
                  model_name: str, 
                  extra: str=None):
    """Creates a torch.utils.tensorboard.writer.SummaryWriter() instance saving to a specific log_dir.

    log_dir is a combination of runs/timestamp/experiment_name/model_name/extra.

    Where timestamp is the current date in YYYY-MM-DD format.

    Args:
        experiment_name (str): Name of experiment.
        model_name (str): Name of model.
        extra (str, optional): Anything extra to add to the directory. Defaults to None.

    Returns:
        torch.utils.tensorboard.writer.SummaryWriter(): Instance of a writer saving to log_dir.

    Example usage:
        # Create a writer saving to "runs/2022-06-04/data_10_percent/effnetb2/5_epochs/"
        writer = create_writer(experiment_name="data_10_percent",
                               model_name="effnetb2",
                               extra="5_epochs")
        # The above is the same as:
        writer = SummaryWriter(log_dir="runs/2022-06-04/data_10_percent/effnetb2/5_epochs/")
    """
    from datetime import datetime
    import os

    # Get timestamp of current date (all experiments on certain day live in same folder)
    timestamp = datetime.now().strftime("%Y-%m-%d") # returns current date in YYYY-MM-DD format

    if extra:
        # Create log directory path
        log_dir = os.path.join("runs", timestamp, experiment_name, model_name, extra)
    else:
        log_dir = os.path.join("runs", timestamp, experiment_name, model_name)
        
    print(f"[INFO] Created SummaryWriter, saving to: {log_dir}...")
    return SummaryWriter(log_dir=log_dir)

In [7]:
# Create a test writer
#from going_modular.sergio.engine import create_writer

writer = engine.create_writer(experiment_name="test_experiment_name",
                       model_name="this_is_the_model_name",
                       extra="add_a_little_extra_if_you_want")

[INFO] Created SummaryWriter, saving to: runs\2024-11-15\test_experiment_name\this_is_the_model_name\add_a_little_extra_if_you_want...


In [7]:
from typing import Dict, List
from tqdm.auto import tqdm

#from going_modular.going_modular.engine import train_step, test_step

# Add writer parameter to train()
#def train(model: torch.nn.Module, 
#          train_dataloader: torch.utils.data.DataLoader, 
#          test_dataloader: torch.utils.data.DataLoader, 
#          optimizer: torch.optim.Optimizer,
#          loss_fn: torch.nn.Module,
#          epochs: int,
#          device: torch.device, 
#          writer: torch.utils.tensorboard.writer.SummaryWriter # new parameter to take in a writer
#          ) -> Dict[str, List]:
#    """Trains and tests a PyTorch model.
#
#    Passes a target PyTorch models through train_step() and test_step()
#    functions for a number of epochs, training and testing the model
#    in the same epoch loop.

#    Calculates, prints and stores evaluation metrics throughout.

#    Stores metrics to specified writer log_dir if present.

#    Args:
#      model: A PyTorch model to be trained and tested.
#      train_dataloader: A DataLoader instance for the model to be trained on.
#      test_dataloader: A DataLoader instance for the model to be tested on.
#      optimizer: A PyTorch optimizer to help minimize the loss function.
#      loss_fn: A PyTorch loss function to calculate loss on both datasets.
#      epochs: An integer indicating how many epochs to train for.
#      device: A target device to compute on (e.g. "cuda" or "cpu").
#      writer: A SummaryWriter() instance to log model results to.

#    Returns:
#      A dictionary of training and testing loss as well as training and
#      testing accuracy metrics. Each metric has a value in a list for 
#      each epoch.
#      In the form: {train_loss: [...],
#                train_acc: [...],
#                test_loss: [...],
#                test_acc: [...]} 
#      For example if training for epochs=2: 
#              {train_loss: [2.0616, 1.0537],
#                train_acc: [0.3945, 0.3945],
#                test_loss: [1.2641, 1.5706],
#                test_acc: [0.3400, 0.2973]} 
#    """
    # Create empty results dictionary
#    results = {"train_loss": [],
#               "train_acc": [],
#               "test_loss": [],
#               "test_acc": []
#    }

    # Loop through training and testing steps for a number of epochs
#    for epoch in tqdm(range(epochs)):
#        train_loss, train_acc = train_step(model=model,
#                                          dataloader=train_dataloader,
#                                          loss_fn=loss_fn,
#                                          optimizer=optimizer,
#                                          device=device)
#        test_loss, test_acc = test_step(model=model,
#          dataloader=test_dataloader,
#          loss_fn=loss_fn,
#          device=device)

        # Print out what's happening
#        print(
#          f"Epoch: {epoch+1} | "
#          f"train_loss: {train_loss:.4f} | "
#          f"train_acc: {train_acc:.4f} | "
#          f"test_loss: {test_loss:.4f} | "
#          f"test_acc: {test_acc:.4f}"
#        )

        # Update results dictionary
#        results["train_loss"].append(train_loss)
#        results["train_acc"].append(train_acc)
#        results["test_loss"].append(test_loss)
#        results["test_acc"].append(test_acc)


        ### New: Use the writer parameter to track experiments ###
        # See if there's a writer, if so, log to it
#        if writer:
#            # Add results to SummaryWriter
#            writer.add_scalars(main_tag="Loss", 
#                               tag_scalar_dict={"train_loss": train_loss,
#                                                "test_loss": test_loss},
#                               global_step=epoch)
#            writer.add_scalars(main_tag="Accuracy", 
#                               tag_scalar_dict={"train_acc": train_acc,
#                                                "test_acc": test_acc}, 
#                               global_step=epoch)#

            # Close the writer
#            writer.close()
#        else:
#            pass
    ### End new ###

    # Return the filled results at the end of the epochs
#    return results

### Download data

Using the same data from https://www.learnpytorch.io/07_pytorch_experiment_tracking/

In [8]:
# Download 10 percent and 20 percent training data (if necessary)
data_10_percent_path = helper_functions.download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
                                     destination="pizza_steak_sushi")

data_20_percent_path = helper_functions.download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi_20_percent.zip",
                                     destination="pizza_steak_sushi_20_percent")

[INFO] data\pizza_steak_sushi directory exists, skipping download.
[INFO] data\pizza_steak_sushi_20_percent directory exists, skipping download.


In [9]:
# Setup training directory paths
train_dir_10_percent = data_10_percent_path / "train"
train_dir_20_percent = data_20_percent_path / "train"

# Setup testing directory paths (note: use the same test dataset for both to compare the results)
test_dir = data_10_percent_path / "test"

# Check the directories
print(f"Training directory 10%: {train_dir_10_percent}")
print(f"Training directory 20%: {train_dir_20_percent}")
print(f"Testing directory: {test_dir}")

Training directory 10%: data\pizza_steak_sushi\train
Training directory 20%: data\pizza_steak_sushi_20_percent\train
Testing directory: data\pizza_steak_sushi\test


In [10]:
from torchvision import transforms

# Create a transform to normalize data distribution to be inline with ImageNet
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], # values per colour channel [red, green, blue]
                                 std=[0.229, 0.224, 0.225])

# Create a transform pipeline
simple_transform = transforms.Compose([
                                       transforms.Resize((224, 224)),
                                       transforms.ToTensor(), # get image values between 0 & 1
                                       normalize
])

In [11]:
from torchvision.transforms import v2

normalize = v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
simple_transform = v2.Compose([
    v2.Resize((224, 224)),
    v2.ToImage(), v2.ToDtype(torch.float32, scale=True),
    normalize
])

### Turn data into DataLoaders 

In [12]:
BATCH_SIZE = 32

# Create 10% training and test DataLoaders
train_dataloader_10_percent, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir_10_percent,
                                                                                          test_dir=test_dir,
                                                                                          transform=simple_transform,
                                                                                          batch_size=BATCH_SIZE)

# Create 20% training and test DataLoaders
train_dataloader_20_percent, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir_20_percent,
                                                                                          test_dir=test_dir,
                                                                                          transform=simple_transform,
                                                                                          batch_size=BATCH_SIZE)

# Find the number of samples/batches per dataloader (using the same test_dataloader for both experiments)
print(f"Number of batches of size {BATCH_SIZE} in 10 percent training data: {len(train_dataloader_10_percent)}")
print(f"Number of batches of size {BATCH_SIZE} in 20 percent training data: {len(train_dataloader_20_percent)}")
print(f"Number of batches of size {BATCH_SIZE} in testing data: {len(train_dataloader_10_percent)} (all experiments will use the same test set)")
print(f"Number of classes: {len(class_names)}, class names: {class_names}")

Number of batches of size 32 in 10 percent training data: 8
Number of batches of size 32 in 20 percent training data: 15
Number of batches of size 32 in testing data: 8 (all experiments will use the same test set)
Number of classes: 3, class names: ['pizza', 'steak', 'sushi']


## Exercise 1: Pick a larger model from [`torchvision.models`](https://pytorch.org/vision/main/models.html) to add to the list of experiments (for example, EffNetB3 or higher)

* How does it perform compared to our existing models?
* **Hint:** You'll need to set up an exerpiment similar to [07. PyTorch Experiment Tracking section 7.6](https://www.learnpytorch.io/07_pytorch_experiment_tracking/#76-create-experiments-and-set-up-training-code).

In [15]:
weights = torchvision.models.EfficientNet_B3_Weights.DEFAULT
model = torchvision.models.efficientnet_b3(weights=weights).to(device)

model

EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 40, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(40, 40, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=40, bias=False)
            (1): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(40, 10, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(10, 40, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActiv

In [None]:
import torchvision
from torch import nn

# Get num out features (one for each class pizza, steak, sushi)
OUT_FEATURES = len(class_names)

# Create an EffNetB0 feature extractor
def create_effnetb0():
    # 1. Get the base mdoel with pretrained weights and send to target device
    weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
    model = torchvision.models.efficientnet_b0(weights=weights).to(device)

    # 2. Freeze the base model layers
    for param in model.features.parameters():
        param.requires_grad = False

    # 3. Set the seeds
    set_seeds()

    # 4. Change the classifier head
    model.classifier = nn.Sequential(
        nn.Dropout(p=0.2),
        nn.Linear(in_features=1280, out_features=OUT_FEATURES)
    ).to(device)

    # 5. Give the model a name
    model.name = "effnetb0"
    print(f"[INFO] Created new {model.name} model.")
    return model

# Create an EffNetB2 feature extractor
def create_effnetb2():
    # 1. Get the base model with pretrained weights and send to target device
    weights = torchvision.models.EfficientNet_B2_Weights.DEFAULT
    model = torchvision.models.efficientnet_b2(weights=weights).to(device)

    # 2. Freeze the base model layers
    for param in model.features.parameters():
        param.requires_grad = False

    # 3. Set the seeds
    set_seeds()

    # 4. Change the classifier head
    model.classifier = nn.Sequential(
        nn.Dropout(p=0.3),
        nn.Linear(in_features=1408, out_features=OUT_FEATURES)
    ).to(device)

    # 5. Give the model a name
    model.name = "effnetb2"
    print(f"[INFO] Created new {model.name} model.")
    return model


# Create an EffNetB3 feature extractor
def create_effnetb3():
    # 1. Get the base model with pretrained weights and send to target device
    weights = torchvision.models.EfficientNet_B3_Weights.DEFAULT
    model = torchvision.models.efficientnet_b3(weights=weights).to(device)

    # 2. Freeze the base model layers
    for param in model.features.parameters():
        param.requires_grad = False

    # 3. Set the seeds
    set_seeds()

    # 4. Change the classifier head
    model.classifier = nn.Sequential(
        nn.Dropout(p=0.3),
        nn.Linear(in_features=1536, out_features=OUT_FEATURES)
    ).to(device)

    # 5. Give the model a name
    model.name = "effnetb3"
    print(f"[INFO] Created new {model.name} model.")
    return model

effnetb3 = create_effnetb3() 

# Get an output summary of the layers in our EffNetB0 feature extractor model (uncomment to view full output)
#summary(model=effnetb3, 
#        input_size=(32, 3, 224, 224), # make sure this is "input_size", not "input_shape"
#        # col_names=["input_size"], # uncomment for smaller output
#        col_names=["input_size", "output_size", "num_params", "trainable"],
#        col_width=20,
#        row_settings=["var_names"]
#)

def create_effnetv2s():
    # 1. Get the base model with pretrained weights and send to target device
    weights = torchvision.models.EfficientNet_V2_S_Weights.DEFAULT
    model = torchvision.models.efficientnet_v2_s(weights=weights).to(device)

    # 2. Freeze the base model layers
    for param in model.features.parameters():
        param.requires_grad = False

    # 3. Set the seeds
    set_seeds()

    # 4. Change the classifier head
    model.classifier = nn.Sequential(
        nn.Dropout(p=0.2),
        nn.Linear(in_features=1280, out_features=OUT_FEATURES)
    ).to(device)

    # 5. Give the model a name
    model.name = "effnet_v2_s"
    print(f"[INFO] Created new {model.name} model.")

    return model

[INFO] Created new effnetb3 model.


In [None]:
# 1. Create epochs list
num_epochs = [10]

# 2. Create models list (need to create a new model for each experiment)
models = ["effnetb0", "effnetb2", "effnetb3"]

# 3. Create dataloaders dictionary for various dataloaders
train_dataloaders = {"data_20_percent": train_dataloader_20_percent}

In [18]:
#from going_modular.going_modular.utils import save_model
from going_modular.sergio.helper_functions import save_model

# 1. Set the random seeds
set_seeds(seed=42)

# 2. Keep track of experiment numbers
experiment_number = 0

# 3. Loop through each DataLoader
for dataloader_name, train_dataloader in train_dataloaders.items():

    # 4. Loop through each number of epochs
    for epochs in num_epochs: 

        # 5. Loop through each model name and create a new model based on the name
        for model_name in models:

            # 6. Create information print outs
            experiment_number += 1
            print(f"[INFO] Experiment number: {experiment_number}")
            print(f"[INFO] Model: {model_name}")
            print(f"[INFO] DataLoader: {dataloader_name}")
            print(f"[INFO] Number of epochs: {epochs}")  

            # 7. Select the model
            if model_name == "effnetb0":
                model = create_effnetb0() # creates a new model each time (important because we want each experiment to start from scratch)
            else:
                if model_name == "effnetb2":
                    model = create_effnetb2() # creates a new model each time (important because we want each experiment to start from scratch)
                else:
                    model = create_effnetb3() # creates a new model each time (important because we want each experiment to start from scratch
            
            # 8. Create a new loss and optimizer for every model
            loss_fn = nn.CrossEntropyLoss()
            optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001)

            # 9. Train target model with target dataloaders and track experiments
            engine.train(model=model,
                        train_dataloader=train_dataloader,
                        test_dataloader=test_dataloader, 
                        optimizer=optimizer,
                        loss_fn=loss_fn,
                        epochs=epochs,
                        device=device,
                        writer=create_writer(experiment_name=dataloader_name,
                                            model_name=model_name,
                                            extra=f"{epochs}_epochs"))
            
            # 10. Save the model to file so we can get back the best model
            save_filepath = f"07_{model_name}_{dataloader_name}_{epochs}_epochs.pth"
            save_model(model=model,
                       target_dir="models",
                       model_name=save_filepath)
            print("-"*50 + "\n")

[INFO] Experiment number: 1
[INFO] Model: effnetb0
[INFO] DataLoader: data_20_percent
[INFO] Number of epochs: 10
[INFO] Created new effnetb0 model.
[INFO] Created SummaryWriter, saving to: runs\2024-11-15\data_20_percent\effnetb0\10_epochs...


  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9587 | train_acc: 0.6125 | test_loss: 0.6551 | test_acc: 0.8655
Epoch: 2 | train_loss: 0.6894 | train_acc: 0.8521 | test_loss: 0.5800 | test_acc: 0.8873
Epoch: 3 | train_loss: 0.5805 | train_acc: 0.8604 | test_loss: 0.4576 | test_acc: 0.9176
Epoch: 4 | train_loss: 0.4937 | train_acc: 0.8646 | test_loss: 0.4454 | test_acc: 0.9176
Epoch: 5 | train_loss: 0.4886 | train_acc: 0.8500 | test_loss: 0.3914 | test_acc: 0.9176
Epoch: 6 | train_loss: 0.3708 | train_acc: 0.8833 | test_loss: 0.3565 | test_acc: 0.9072
Epoch: 7 | train_loss: 0.3558 | train_acc: 0.9208 | test_loss: 0.3182 | test_acc: 0.9072
Epoch: 8 | train_loss: 0.3739 | train_acc: 0.8938 | test_loss: 0.3346 | test_acc: 0.8977
Epoch: 9 | train_loss: 0.2976 | train_acc: 0.9375 | test_loss: 0.3087 | test_acc: 0.9280
Epoch: 10 | train_loss: 0.3625 | train_acc: 0.8479 | test_loss: 0.2771 | test_acc: 0.9072
[INFO] Saving model to: models\07_effnetb0_data_20_percent_10_epochs.pth
------------------------------------

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9819 | train_acc: 0.5604 | test_loss: 0.7775 | test_acc: 0.8049
Epoch: 2 | train_loss: 0.7297 | train_acc: 0.8021 | test_loss: 0.6664 | test_acc: 0.8873
Epoch: 3 | train_loss: 0.6011 | train_acc: 0.8479 | test_loss: 0.5633 | test_acc: 0.9280
Epoch: 4 | train_loss: 0.5429 | train_acc: 0.8354 | test_loss: 0.5685 | test_acc: 0.8977
Epoch: 5 | train_loss: 0.4410 | train_acc: 0.8708 | test_loss: 0.4469 | test_acc: 0.9280
Epoch: 6 | train_loss: 0.3876 | train_acc: 0.9146 | test_loss: 0.4564 | test_acc: 0.8977
Epoch: 7 | train_loss: 0.3480 | train_acc: 0.9292 | test_loss: 0.4226 | test_acc: 0.9384
Epoch: 8 | train_loss: 0.3861 | train_acc: 0.8792 | test_loss: 0.4350 | test_acc: 0.9280
Epoch: 9 | train_loss: 0.3315 | train_acc: 0.8979 | test_loss: 0.4253 | test_acc: 0.9081
Epoch: 10 | train_loss: 0.3382 | train_acc: 0.8979 | test_loss: 0.3908 | test_acc: 0.9384
[INFO] Saving model to: models\07_effnetb2_data_20_percent_10_epochs.pth
------------------------------------

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9828 | train_acc: 0.5729 | test_loss: 0.8723 | test_acc: 0.8144
Epoch: 2 | train_loss: 0.7654 | train_acc: 0.7792 | test_loss: 0.6766 | test_acc: 0.8248
Epoch: 3 | train_loss: 0.5752 | train_acc: 0.8875 | test_loss: 0.5858 | test_acc: 0.8759
Epoch: 4 | train_loss: 0.5342 | train_acc: 0.8417 | test_loss: 0.5413 | test_acc: 0.8551
Epoch: 5 | train_loss: 0.4505 | train_acc: 0.8854 | test_loss: 0.4895 | test_acc: 0.8248
Epoch: 6 | train_loss: 0.4345 | train_acc: 0.8875 | test_loss: 0.4969 | test_acc: 0.8049
Epoch: 7 | train_loss: 0.3849 | train_acc: 0.9042 | test_loss: 0.4520 | test_acc: 0.8248
Epoch: 8 | train_loss: 0.3965 | train_acc: 0.8854 | test_loss: 0.4641 | test_acc: 0.7945
Epoch: 9 | train_loss: 0.3518 | train_acc: 0.9021 | test_loss: 0.4131 | test_acc: 0.8759
Epoch: 10 | train_loss: 0.3376 | train_acc: 0.9229 | test_loss: 0.4390 | test_acc: 0.8049
[INFO] Saving model to: models\07_effnetv2s_data_20_percent_10_epochs.pth
-----------------------------------

In [22]:
# 1. Create epochs list
num_epochs = [10]

# 2. Create models list (need to create a new model for each experiment)
models = ["effnetv2s"]

# 3. Create dataloaders dictionary for various dataloaders
train_dataloaders = {"data_20_percent": train_dataloader_20_percent}

In [23]:
# 1. Set the random seeds
set_seeds(seed=42)

# 2. Keep track of experiment numbers
experiment_number = 0

# 3. Loop through each DataLoader
for dataloader_name, train_dataloader in train_dataloaders.items():

    # 4. Loop through each number of epochs
    for epochs in num_epochs: 

        # 5. Loop through each model name and create a new model based on the name
        for model_name in models:

            # 6. Create information print outs
            experiment_number += 1
            print(f"[INFO] Experiment number: {experiment_number}")
            print(f"[INFO] Model: {model_name}")
            print(f"[INFO] DataLoader: {dataloader_name}")
            print(f"[INFO] Number of epochs: {epochs}")  

            # 7. Select the model
            if model_name == "effnetb0":
                model = create_effnetb0() # creates a new model each time (important because we want each experiment to start from scratch)
            else:
                if model_name == "effnetb2":
                    model = create_effnetb2() # creates a new model each time (important because we want each experiment to start from scratch)
                else:
                    if model_name == "effnetb3":
                        model = create_effnetb3() # creates a new model each time (important because we want each experiment to start from scratch
                    else:
                        model = create_effnetv2s() # creates a new model each time (important because we want each experiment to start from scratch
            
            
            # 8. Create a new loss and optimizer for every model
            loss_fn = nn.CrossEntropyLoss()
            optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001)

            # 9. Train target model with target dataloaders and track experiments
            engine.train(model=model,
                        train_dataloader=train_dataloader,
                        test_dataloader=test_dataloader, 
                        optimizer=optimizer,
                        loss_fn=loss_fn,
                        epochs=epochs,
                        device=device,
                        writer=create_writer(experiment_name=dataloader_name,
                                            model_name=model_name,
                                            extra=f"{epochs}_epochs"))
            
            # 10. Save the model to file so we can get back the best model
            save_filepath = f"07_{model_name}_{dataloader_name}_{epochs}_epochs.pth"
            save_model(model=model,
                       target_dir="models",
                       model_name=save_filepath)
            print("-"*50 + "\n")

[INFO] Experiment number: 1
[INFO] Model: effnetv2s
[INFO] DataLoader: data_20_percent
[INFO] Number of epochs: 10
[INFO] Created new effnet_v2_s model.
[INFO] Created SummaryWriter, saving to: runs\2024-11-15\data_20_percent\effnetv2s\10_epochs...


  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9843 | train_acc: 0.5750 | test_loss: 0.7440 | test_acc: 0.8665
Epoch: 2 | train_loss: 0.7295 | train_acc: 0.7792 | test_loss: 0.6413 | test_acc: 0.8258
Epoch: 3 | train_loss: 0.6193 | train_acc: 0.8167 | test_loss: 0.5199 | test_acc: 0.8759
Epoch: 4 | train_loss: 0.5584 | train_acc: 0.8229 | test_loss: 0.4902 | test_acc: 0.8561
Epoch: 5 | train_loss: 0.5464 | train_acc: 0.8000 | test_loss: 0.4352 | test_acc: 0.8769
Epoch: 6 | train_loss: 0.4449 | train_acc: 0.8542 | test_loss: 0.4155 | test_acc: 0.9072
Epoch: 7 | train_loss: 0.4324 | train_acc: 0.8458 | test_loss: 0.3856 | test_acc: 0.8769
Epoch: 8 | train_loss: 0.4077 | train_acc: 0.8667 | test_loss: 0.4068 | test_acc: 0.8769
Epoch: 9 | train_loss: 0.3640 | train_acc: 0.9000 | test_loss: 0.4022 | test_acc: 0.8873
Epoch: 10 | train_loss: 0.3907 | train_acc: 0.8667 | test_loss: 0.3456 | test_acc: 0.9176
[INFO] Saving model to: models\07_effnetv2s_data_20_percent_10_epochs.pth
-----------------------------------

In [None]:
%load_ext tensorboard

It seems that EffnetB3 overfits: better results on training but worse on test compared to B0 and B2

## Exercise 2. Introduce data augmentation to the list of experiments using the 20% pizza, steak, sushi training and test datasets, does this change anything?
    
* For example, you could have one training DataLoader that uses data augmentation (e.g. `train_dataloader_20_percent_aug` and `train_dataloader_20_percent_no_aug`) and then compare the results of two of the same model types training on these two DataLoaders.
* **Note:** You may need to alter the `create_dataloaders()` function to be able to take a transform for the training data and the testing data (because you don't need to perform data augmentation on the test data). See [04. PyTorch Custom Datasets section 6](https://www.learnpytorch.io/04_pytorch_custom_datasets/#6-other-forms-of-transforms-data-augmentation) for examples of using data augmentation or the script below for an example:

```python
# Note: Data augmentation transform like this should only be performed on training data
train_transform_data_aug = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.TrivialAugmentWide(),
    transforms.ToTensor(),
    normalize
])

# Create a helper function to visualize different augmented (and not augmented) images
def view_dataloader_images(dataloader, n=10):
    if n > 10:
        print(f"Having n higher than 10 will create messy plots, lowering to 10.")
        n = 10
    imgs, labels = next(iter(dataloader))
    plt.figure(figsize=(16, 8))
    for i in range(n):
        # Min max scale the image for display purposes
        targ_image = imgs[i]
        sample_min, sample_max = targ_image.min(), targ_image.max()
        sample_scaled = (targ_image - sample_min)/(sample_max - sample_min)

        # Plot images with appropriate axes information
        plt.subplot(1, 10, i+1)
        plt.imshow(sample_scaled.permute(1, 2, 0)) # resize for Matplotlib requirements
        plt.title(class_names[labels[i]])
        plt.axis(False)

# Have to update `create_dataloaders()` to handle different augmentations
import os
from torch.utils.data import DataLoader
from torchvision import datasets

NUM_WORKERS = os.cpu_count() # use maximum number of CPUs for workers to load data 

# Note: this is an update version of data_setup.create_dataloaders to handle
# differnt train and test transforms.
def create_dataloaders(
    train_dir, 
    test_dir, 
    train_transform, # add parameter for train transform (transforms on train dataset)
    test_transform,  # add parameter for test transform (transforms on test dataset)
    batch_size=32, num_workers=NUM_WORKERS
):
    # Use ImageFolder to create dataset(s)
    train_data = datasets.ImageFolder(train_dir, transform=train_transform)
    test_data = datasets.ImageFolder(test_dir, transform=test_transform)

    # Get class names
    class_names = train_data.classes

    # Turn images into data loaders
    train_dataloader = DataLoader(
        train_data,
        batch_size=batch_size,
        shuffle=True,
        num_workers=num_workers,
        pin_memory=True,
    )
    test_dataloader = DataLoader(
        test_data,
        batch_size=batch_size,
        shuffle=True,
        num_workers=num_workers,
        pin_memory=True,
    )

    return train_dataloader, test_dataloader, class_names
```

In [None]:
# Data augmentation
normalize = v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
transform_data_aug = v2.Compose([
    v2.Resize((224, 224)),
    v2.TrivialAugmentWide(),
    v2.ToImage(), v2.ToDtype(torch.float32, scale=True),
    normalize
])

# Create a helper function to visualize different augmented (and not augmented) images
def view_dataloader_images(dataloader, n=10):
    if n > 10:
        print(f"Having n higher than 10 will create messy plots, lowering to 10.")
        n = 10
    imgs, labels = next(iter(dataloader))
    plt.figure(figsize=(16, 8))
    for i in range(n):
        # Min max scale the image for display purposes
        targ_image = imgs[i]
        sample_min, sample_max = targ_image.min(), targ_image.max()
        sample_scaled = (targ_image - sample_min)/(sample_max - sample_min)

        # Plot images with appropriate axes information
        plt.subplot(1, 10, i+1)
        plt.imshow(sample_scaled.permute(1, 2, 0)) # resize for Matplotlib requirements
        plt.title(class_names[labels[i]])
        plt.axis(False)

# Have to update `create_dataloaders()` to handle different augmentations
import os
from torch.utils.data import DataLoader
from torchvision import datasets

NUM_WORKERS = os.cpu_count() # use maximum number of CPUs for workers to load data 

# Note: this is an update version of data_setup.create_dataloaders to handle
# differnt train and test transforms.
def create_dataloaders(
    train_dir, 
    test_dir, 
    train_transform, # add parameter for train transform (transforms on train dataset)
    test_transform,  # add parameter for test transform (transforms on test dataset)
    batch_size=32, num_workers=NUM_WORKERS
):
    # Use ImageFolder to create dataset(s)
    train_data = datasets.ImageFolder(train_dir, transform=train_transform)
    test_data = datasets.ImageFolder(test_dir, transform=test_transform)

    # Get class names
    class_names = train_data.classes

    # Turn images into data loaders
    train_dataloader = DataLoader(
        train_data,
        batch_size=batch_size,
        shuffle=True,
        num_workers=num_workers,
        pin_memory=True,
    )
    test_dataloader = DataLoader(
        test_data,
        batch_size=batch_size,
        shuffle=True,
        num_workers=num_workers,
        pin_memory=True,
    )

    return train_dataloader, test_dataloader, class_names

In [None]:
BATCH_SIZE = 32

# Create 20% training and test DataLoaders with augmentation
train_dataloader_20_percent_aug, test_dataloader, class_names = create_dataloaders(train_dir=train_dir_20_percent,
                                                                                    test_dir=test_dir,
                                                                                    train_transform=transform_data_aug,
                                                                                    test_transform=simple_transform,
                                                                                    batch_size=BATCH_SIZE)

# Find the number of samples/batches per dataloader (using the same test_dataloader for both experiments)
print(f"Number of batches of size {BATCH_SIZE} in 20 percent training data aug: {len(train_dataloader_20_percent_aug)}")
print(f"Number of batches of size {BATCH_SIZE} in testing data: {len(train_dataloader_20_percent)} (all experiments will use the same test set)")
print(f"Number of classes: {len(class_names)}, class names: {class_names}")

Number of batches of size 32 in 20 percent training data aug: 15
Number of batches of size 32 in testing data: 15 (all experiments will use the same test set)
Number of classes: 3, class names: ['pizza', 'steak', 'sushi']


In [17]:
# 1. Create epochs list
num_epochs = [10]

# 2. Create models list (need to create a new model for each experiment)
models = ["effnetb2"]

# 3. Create dataloaders dictionary for various dataloaders
train_dataloaders = {"data_20_percent_aug": train_dataloader_20_percent_aug}

In [None]:
# 1. Set the random seeds
set_seeds(seed=42)

# 2. Keep track of experiment numbers
experiment_number = 0

# 3. Loop through each DataLoader
for dataloader_name, train_dataloader in train_dataloaders.items():

    # 4. Loop through each number of epochs
    for epochs in num_epochs: 

        # 5. Loop through each model name and create a new model based on the name
        for model_name in models:

            # 6. Create information print outs
            experiment_number += 1
            print(f"[INFO] Experiment number: {experiment_number}")
            print(f"[INFO] Model: {model_name}")
            print(f"[INFO] DataLoader: {dataloader_name}")
            print(f"[INFO] Number of epochs: {epochs}")  

            # 7. Select the model
            if model_name == "effnetb0":
                model = create_effnetb0() # creates a new model each time (important because we want each experiment to start from scratch)
            else:
                if model_name == "effnetb2":
                    model = create_effnetb2() # creates a new model each time (important because we want each experiment to start from scratch)
                else:
                    model = create_effnetb3() # creates a new model each time (important because we want each experiment to start from scratch
            
            # 8. Create a new loss and optimizer for every model
            loss_fn = nn.CrossEntropyLoss()
            optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001)

            # 9. Train target model with target dataloaders and track experiments
            engine.train(model=model,
                        train_dataloader=train_dataloader,
                        test_dataloader=test_dataloader, 
                        optimizer=optimizer,
                        loss_fn=loss_fn,
                        epochs=epochs,
                        device=device,
                        writer=create_writer(experiment_name=dataloader_name,
                                            model_name=model_name,
                                            extra=f"{epochs}_epochs"))
            
            # 10. Save the model to file so we can get back the best model
            save_filepath = f"07_{model_name}_{dataloader_name}_{epochs}_epochs.pth"
            save_model(model=model,
                       target_dir="models",
                       model_name=save_filepath)
            print("-"*50 + "\n")

[INFO] Experiment number: 1
[INFO] Model: effnetb2
[INFO] DataLoader: data_20_percent_aug
[INFO] Number of epochs: 10
[INFO] Created new effnetb2 model.
[INFO] Created SummaryWriter, saving to: runs\2024-11-15\data_20_percent_aug\effnetb2\10_epochs...


  0%|          | 0/10 [00:00<?, ?it/s]

## Exercise 3. Scale up the dataset to turn FoodVision Mini into FoodVision Big using the entire [Food101 dataset from `torchvision.models`](https://pytorch.org/vision/stable/generated/torchvision.datasets.Food101.html#torchvision.datasets.Food101)
    
* You could take the best performing model from your various experiments or even the EffNetB2 feature extractor we created in this notebook and see how it goes fitting for 5 epochs on all of Food101.
* If you try more than one model, it would be good to have the model's results tracked.
* If you load the Food101 dataset from `torchvision.models`, you'll have to create PyTorch DataLoaders to use it in training.
* **Note:** Due to the larger amount of data in Food101 compared to our pizza, steak, sushi dataset, this model will take longer to train.

In [13]:
# Get Food101 Dataset

# Create a transform to normalize data distribution to be inline with ImageNet
normalize = v2.Normalize(mean=[0.485, 0.456, 0.406], # values per colour channel [red, green, blue]
                         std=[0.229, 0.224, 0.225])

# Create a transform pipeline
simple_transform = v2.Compose([
    transforms.Resize((224, 224)),
    v2.ToImage(), v2.ToDtype(torch.float32, scale=True),
    normalize
])

# Download and transform Food101 data (note: this may take ~5 minutes in Google Colab)
train_data = torchvision.datasets.Food101(root="data",
                                          split="train",
                                          transform=simple_transform,
                                          download=True)

test_data = torchvision.datasets.Food101(root="data",
                                        split="test",
                                        transform=simple_transform,
                                        download=True)

In [14]:
import os
BATCH_SIZE = 32 # use a big batch size to get through all the images (100,000+ in Food101)

train_dataloader_big = torch.utils.data.DataLoader(train_data,
                                                   shuffle=True,
                                                   batch_size=BATCH_SIZE,
                                                   num_workers=os.cpu_count(),
                                                   pin_memory=True) # avoid copies of the data into and out of memory, where possible (for speed ups)

test_dataloader_big = torch.utils.data.DataLoader(test_data,
                                                  shuffle=False,
                                                  batch_size=BATCH_SIZE,
                                                  num_workers=os.cpu_count(),
                                                  pin_memory=True)

In [15]:
# Check sample numbers
len(train_data), len(test_data)

(75750, 25250)

In [16]:
# 1. Create epochs list
num_epochs = [10]

# 2. Create models list (need to create a new model for each experiment)
models = ["effnetv2s"]

# 3. Create dataloaders dictionary for various dataloaders
train_dataloaders = {"data_20_percent_big": train_dataloader_big}

In [17]:
# 1. Set the random seeds
set_seeds(seed=42)

# 2. Keep track of experiment numbers
experiment_number = 0

# 3. Loop through each DataLoader
for dataloader_name, train_dataloader in train_dataloaders.items():

    # 4. Loop through each number of epochs
    for epochs in num_epochs: 

        # 5. Loop through each model name and create a new model based on the name
        for model_name in models:

            # 6. Create information print outs
            experiment_number += 1
            print(f"[INFO] Experiment number: {experiment_number}")
            print(f"[INFO] Model: {model_name}")
            print(f"[INFO] DataLoader: {dataloader_name}")
            print(f"[INFO] Number of epochs: {epochs}")  

            # 7. Create the model      
            #weights = torchvision.models.EfficientNet_B2_Weights.DEFAULT
            #model = torchvision.models.efficientnet_b2(weights=weights).to(device)            
            weights = torchvision.models.EfficientNet_V2_S_Weights.DEFAULT
            model = torchvision.models.efficientnet_v2_s(weights=weights).to(device)            
            for param in model.features.parameters():
                param.requires_grad = False
            set_seeds()          
            #model.classifier = nn.Sequential(
            #    nn.Dropout(p=0.3),
            #    nn.Linear(in_features=1408, out_features=101)
            #).to(device)
            model.classifier = nn.Sequential(
                nn.Dropout(p=0.2),
                nn.Linear(in_features=1280, out_features=101)
            ).to(device)
                        
            # 8. Create a new loss and optimizer for every model
            loss_fn = nn.CrossEntropyLoss()
            optimizer = torch.optim.Adam(params=model.parameters(), lr=0.00001)

            # 9. Train target model with target dataloaders and track experiments
            engine.train(model=model,
                        train_dataloader=train_dataloader,
                        test_dataloader=test_dataloader, 
                        optimizer=optimizer,
                        loss_fn=loss_fn,
                        epochs=epochs,
                        device=device,
                        writer=create_writer(experiment_name=dataloader_name,
                                            model_name=model_name,
                                            extra=f"{epochs}_epochs"))
            
            # 10. Save the model to file so we can get back the best model
            save_filepath = f"07_{model_name}_{dataloader_name}_{epochs}_epochs_big.pth"
            save_model(model=model,
                       target_dir="models",
                       model_name=save_filepath)
            print("-"*50 + "\n")

[INFO] Experiment number: 1
[INFO] Model: effnetv2s
[INFO] DataLoader: data_20_percent_big
[INFO] Number of epochs: 10
[INFO] Created SummaryWriter, saving to: runs\2024-11-15\data_20_percent_big\effnetv2s\10_epochs...


  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 4.5378 | train_acc: 0.0343 | test_loss: 4.6126 | test_acc: 0.0104
Epoch: 2 | train_loss: 4.3578 | train_acc: 0.1144 | test_loss: 4.5911 | test_acc: 0.0000
Epoch: 3 | train_loss: 4.2007 | train_acc: 0.1916 | test_loss: 4.5712 | test_acc: 0.0000
Epoch: 4 | train_loss: 4.0563 | train_acc: 0.2377 | test_loss: 4.5547 | test_acc: 0.0104
Epoch: 5 | train_loss: 3.9232 | train_acc: 0.2660 | test_loss: 4.5318 | test_acc: 0.0208
Epoch: 6 | train_loss: 3.8024 | train_acc: 0.2856 | test_loss: 4.5725 | test_acc: 0.0208


KeyboardInterrupt: 