<a href="https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/extras/exercises/07_pytorch_experiment_tracking_exercise_template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 07. PyTorch Experiment Tracking Exercise Template

Welcome to the 07. PyTorch Experiment Tracking exercise template notebook.

> **Note:** There may be more than one solution to each of the exercises. This notebook only shows one possible example.

## Resources

1. These exercises/solutions are based on [section 07. PyTorch Transfer Learning](https://www.learnpytorch.io/07_pytorch_experiment_tracking/) of the Learn PyTorch for Deep Learning course by Zero to Mastery.
2. See a live [walkthrough of the solutions (errors and all) on YouTube](https://youtu.be/cO_r2FYcAjU).
3. See [other solutions on the course GitHub](https://github.com/mrdbourke/pytorch-deep-learning/tree/main/extras/solutions).

> **Note:** The first section of this notebook is dedicated to getting various helper functions and datasets used for the exercises. The exercises start at the heading "Exercise 1: ...".

### Get various imports and helper functions

We'll need to make sure we have `torch` v.1.12+ and `torchvision` v0.13+.

In [1]:
# For this notebook to run with updated APIs, we need torch 1.12+ and torchvision 0.13+
# try:
#     import torch
#     import torchvision
#     assert int(torch.__version__.split(".")[1]) >= 12, "torch version should be 1.12+"
#     assert int(torchvision.__version__.split(".")[1]) >= 13, "torchvision version should be 0.13+"
#     print(f"torch version: {torch.__version__}")
#     print(f"torchvision version: {torchvision.__version__}")
# except:
#     print(f"[INFO] torch/torchvision versions not as required, installing nightly versions.")
#     !pip3 install -U --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cu113
#     import torch
#     import torchvision
#     print(f"torch version: {torch.__version__}")
#     print(f"torchvision version: {torchvision.__version__}")

In [2]:
import torch
import torchvision

In [3]:
 # Make sure we have a GPU
 device = "cuda" if torch.cuda.is_available() else "cpu"
 device

'cuda'

In [4]:
# Get regular imports 
import matplotlib.pyplot as plt
import torch
import torchvision

from torch import nn
from torchvision import transforms

# Try to get torchinfo, install it if it doesn't work
try:
    from torchinfo import summary
except:
    print("[INFO] Couldn't find torchinfo... installing it.")
    !pip install -q torchinfo
    from torchinfo import summary

# Try to import the going_modular directory, download it from GitHub if it doesn't work
try:
    from going_modular.going_modular import data_setup, engine
except:
    # Get the going_modular scripts
    print("[INFO] Couldn't find going_modular scripts... downloading them from GitHub.")
    !git clone https://github.com/mrdbourke/pytorch-deep-learning
    !mv pytorch-deep-learning/going_modular .
    !rm -rf pytorch-deep-learning
    from going_modular.going_modular import data_setup, engine

In [5]:
# Set seeds
def set_seeds(seed: int=42):
    """Sets random sets for torch operations.

    Args:
        seed (int, optional): Random seed to set. Defaults to 42.
    """
    # Set the seed for general torch operations
    torch.manual_seed(seed)
    # Set the seed for CUDA torch operations (ones that happen on the GPU)
    torch.cuda.manual_seed(seed)

In [6]:
import os
import zipfile

from pathlib import Path

import requests

def download_data(source: str, 
                  destination: str,
                  remove_source: bool = True) -> Path:
    """Downloads a zipped dataset from source and unzips to destination.

    Args:
        source (str): A link to a zipped file containing data.
        destination (str): A target directory to unzip data to.
        remove_source (bool): Whether to remove the source after downloading and extracting.
    
    Returns:
        pathlib.Path to downloaded data.
    
    Example usage:
        download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
                      destination="pizza_steak_sushi")
    """
    # Setup path to data folder
    data_path = Path("data/")
    image_path = data_path / destination

    # If the image folder doesn't exist, download it and prepare it... 
    if image_path.is_dir():
        print(f"[INFO] {image_path} directory exists, skipping download.")
    else:
        print(f"[INFO] Did not find {image_path} directory, creating one...")
        image_path.mkdir(parents=True, exist_ok=True)
        
        # Download pizza, steak, sushi data
        target_file = Path(source).name
        with open(data_path / target_file, "wb") as f:
            request = requests.get(source)
            print(f"[INFO] Downloading {target_file} from {source}...")
            f.write(request.content)

        # Unzip pizza, steak, sushi data
        with zipfile.ZipFile(data_path / target_file, "r") as zip_ref:
            print(f"[INFO] Unzipping {target_file} data...") 
            zip_ref.extractall(image_path)

        # Remove .zip file
        if remove_source:
            os.remove(data_path / target_file)
    
    return image_path

image_path = download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
                           destination="pizza_steak_sushi")
image_path

[INFO] data/pizza_steak_sushi directory exists, skipping download.


PosixPath('data/pizza_steak_sushi')

In [7]:
from torch.utils.tensorboard import SummaryWriter
def create_writer(experiment_name: str, 
                  model_name: str, 
                  extra: str=None):
    """Creates a torch.utils.tensorboard.writer.SummaryWriter() instance saving to a specific log_dir.

    log_dir is a combination of runs/timestamp/experiment_name/model_name/extra.

    Where timestamp is the current date in YYYY-MM-DD format.

    Args:
        experiment_name (str): Name of experiment.
        model_name (str): Name of model.
        extra (str, optional): Anything extra to add to the directory. Defaults to None.

    Returns:
        torch.utils.tensorboard.writer.SummaryWriter(): Instance of a writer saving to log_dir.

    Example usage:
        # Create a writer saving to "runs/2022-06-04/data_10_percent/effnetb2/5_epochs/"
        writer = create_writer(experiment_name="data_10_percent",
                               model_name="effnetb2",
                               extra="5_epochs")
        # The above is the same as:
        writer = SummaryWriter(log_dir="runs/2022-06-04/data_10_percent/effnetb2/5_epochs/")
    """
    from datetime import datetime
    import os

    # Get timestamp of current date (all experiments on certain day live in same folder)
    timestamp = datetime.now().strftime("%Y-%m-%d") # returns current date in YYYY-MM-DD format

    if extra:
        # Create log directory path
        log_dir = os.path.join("runs", timestamp, experiment_name, model_name, extra)
    else:
        log_dir = os.path.join("runs", timestamp, experiment_name, model_name)
        
    print(f"[INFO] Created SummaryWriter, saving to: {log_dir}...")
    return SummaryWriter(log_dir=log_dir)

In [8]:
# Create a test writer
writer = create_writer(experiment_name="test_experiment_name",
                       model_name="this_is_the_model_name",
                       extra="add_a_little_extra_if_you_want")

[INFO] Created SummaryWriter, saving to: runs/2024-09-14/test_experiment_name/this_is_the_model_name/add_a_little_extra_if_you_want...


In [9]:
from typing import Dict, List
from tqdm.auto import tqdm

from going_modular.going_modular.engine import train_step, test_step

# Add writer parameter to train()
def train(model: torch.nn.Module, 
          train_dataloader: torch.utils.data.DataLoader, 
          test_dataloader: torch.utils.data.DataLoader, 
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module,
          epochs: int,
          device: torch.device, 
          writer: torch.utils.tensorboard.writer.SummaryWriter # new parameter to take in a writer
          ) -> Dict[str, List]:
    """Trains and tests a PyTorch model.

    Passes a target PyTorch models through train_step() and test_step()
    functions for a number of epochs, training and testing the model
    in the same epoch loop.

    Calculates, prints and stores evaluation metrics throughout.

    Stores metrics to specified writer log_dir if present.

    Args:
      model: A PyTorch model to be trained and tested.
      train_dataloader: A DataLoader instance for the model to be trained on.
      test_dataloader: A DataLoader instance for the model to be tested on.
      optimizer: A PyTorch optimizer to help minimize the loss function.
      loss_fn: A PyTorch loss function to calculate loss on both datasets.
      epochs: An integer indicating how many epochs to train for.
      device: A target device to compute on (e.g. "cuda" or "cpu").
      writer: A SummaryWriter() instance to log model results to.

    Returns:
      A dictionary of training and testing loss as well as training and
      testing accuracy metrics. Each metric has a value in a list for 
      each epoch.
      In the form: {train_loss: [...],
                train_acc: [...],
                test_loss: [...],
                test_acc: [...]} 
      For example if training for epochs=2: 
              {train_loss: [2.0616, 1.0537],
                train_acc: [0.3945, 0.3945],
                test_loss: [1.2641, 1.5706],
                test_acc: [0.3400, 0.2973]} 
    """
    # Create empty results dictionary
    results = {"train_loss": [],
               "train_acc": [],
               "test_loss": [],
               "test_acc": []
    }

    # Loop through training and testing steps for a number of epochs
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
                                          dataloader=train_dataloader,
                                          loss_fn=loss_fn,
                                          optimizer=optimizer,
                                          device=device)
        test_loss, test_acc = test_step(model=model,
          dataloader=test_dataloader,
          loss_fn=loss_fn,
          device=device)

        # Print out what's happening
        print(
          f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.4f} | "
          f"train_acc: {train_acc:.4f} | "
          f"test_loss: {test_loss:.4f} | "
          f"test_acc: {test_acc:.4f}"
        )

        # Update results dictionary
        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)


        ### New: Use the writer parameter to track experiments ###
        # See if there's a writer, if so, log to it
        if writer:
            # Add results to SummaryWriter
            writer.add_scalars(main_tag="Loss", 
                               tag_scalar_dict={"train_loss": train_loss,
                                                "test_loss": test_loss},
                               global_step=epoch)
            writer.add_scalars(main_tag="Accuracy", 
                               tag_scalar_dict={"train_acc": train_acc,
                                                "test_acc": test_acc}, 
                               global_step=epoch)

            # Close the writer
            writer.close()
        else:
            pass
    ### End new ###

    # Return the filled results at the end of the epochs
    return results

### Download data

Using the same data from https://www.learnpytorch.io/07_pytorch_experiment_tracking/

In [10]:
# Download 10 percent and 20 percent training data (if necessary)
data_10_percent_path = download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
                                     destination="pizza_steak_sushi")

data_20_percent_path = download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi_20_percent.zip",
                                     destination="pizza_steak_sushi_20_percent")

[INFO] data/pizza_steak_sushi directory exists, skipping download.
[INFO] data/pizza_steak_sushi_20_percent directory exists, skipping download.


In [11]:
# Setup training directory paths
train_dir_10_percent = data_10_percent_path / "train"
train_dir_20_percent = data_20_percent_path / "train"

# Setup testing directory paths (note: use the same test dataset for both to compare the results)
test_dir = data_10_percent_path / "test"

# Check the directories
print(f"Training directory 10%: {train_dir_10_percent}")
print(f"Training directory 20%: {train_dir_20_percent}")
print(f"Testing directory: {test_dir}")

Training directory 10%: data/pizza_steak_sushi/train
Training directory 20%: data/pizza_steak_sushi_20_percent/train
Testing directory: data/pizza_steak_sushi/test


In [12]:
from torchvision import transforms

# Create a transform to normalize data distribution to be inline with ImageNet
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], # values per colour channel [red, green, blue]
                                 std=[0.229, 0.224, 0.225])

# Create a transform pipeline
simple_transform = transforms.Compose([
                                       transforms.Resize((224, 224)),
                                       transforms.ToTensor(), # get image values between 0 & 1
                                       normalize
])

### Turn data into DataLoaders 

In [13]:
BATCH_SIZE = 32

# Create 10% training and test DataLoaders
train_dataloader_10_percent, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir_10_percent,
                                                                                          test_dir=test_dir,
                                                                                          transform=simple_transform,
                                                                                          batch_size=BATCH_SIZE)

# Create 20% training and test DataLoaders
train_dataloader_20_percent, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir_20_percent,
                                                                                          test_dir=test_dir,
                                                                                          transform=simple_transform,
                                                                                          batch_size=BATCH_SIZE)

# Find the number of samples/batches per dataloader (using the same test_dataloader for both experiments)
print(f"Number of batches of size {BATCH_SIZE} in 10 percent training data: {len(train_dataloader_10_percent)}")
print(f"Number of batches of size {BATCH_SIZE} in 20 percent training data: {len(train_dataloader_20_percent)}")
print(f"Number of batches of size {BATCH_SIZE} in testing data: {len(test_dataloader)} (all experiments will use the same test set)")
print(f"Number of classes: {len(class_names)}, class names: {class_names}")

Number of batches of size 32 in 10 percent training data: 8
Number of batches of size 32 in 20 percent training data: 15
Number of batches of size 32 in testing data: 3 (all experiments will use the same test set)
Number of classes: 3, class names: ['pizza', 'steak', 'sushi']


## Exercise 1: Pick a larger model from [`torchvision.models`](https://pytorch.org/vision/main/models.html) to add to the list of experiments (for example, EffNetB3 or higher)

* How does it perform compared to our existing models?
* **Hint:** You'll need to set up an exerpiment similar to [07. PyTorch Experiment Tracking section 7.6](https://www.learnpytorch.io/07_pytorch_experiment_tracking/#76-create-experiments-and-set-up-training-code).

In [14]:
# weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
# model = torchvision.models.efficientnet_b0(weights=weights).to(device)

# for p in model.features.parameters():
#     p.requires_grad = False

# model.classifier = nn.Sequential(
#     nn.Dropout(p=0.2, inplace=True),
#     nn.Linear(1280, 3)
# ).to(device)

# summary(model, input_size=(BATCH_SIZE, 3, 224, 224), col_width=20, row_settings=["var_names"],
#         col_names=["input_size", "output_size", "num_params", "trainable"]), model.classifier

In [15]:
def create_effnetb0():
    weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
    model = torchvision.models.efficientnet_b0(weights=weights).to(device)
    
    for p in model.features.parameters():
        p.requires_grad = False
    
    model.classifier = nn.Sequential(
        nn.Dropout(p=0.2, inplace=True),
        nn.Linear(1280, 3)
    ).to(device)
    
    return model

In [16]:
# weights = torchvision.models.EfficientNet_B2_Weights.DEFAULT
# model = torchvision.models.efficientnet_b2(weights=weights).to(device)

# for p in model.features.parameters():
#     p.requires_grad = False

# model.classifier = nn.Sequential(
#     nn.Dropout(p=0.3, inplace=True),
#     nn.Linear(1408, 3)
# ).to(device)

# summary(model, input_size=(BATCH_SIZE, 3, 224, 224), col_width=20, row_settings=["var_names"],
#         col_names=["input_size", "output_size", "num_params", "trainable"]), model.classifier

In [17]:
def create_effnetb2():
    weights = torchvision.models.EfficientNet_B2_Weights.DEFAULT
    model = torchvision.models.efficientnet_b2(weights=weights).to(device)
    
    for p in model.features.parameters():
        p.requires_grad = False
    
    model.classifier = nn.Sequential(
        nn.Dropout(p=0.3, inplace=True),
        nn.Linear(1408, 3)
    ).to(device)
    
    return model

In [18]:
# weights = torchvision.models.EfficientNet_B3_Weights.DEFAULT
# model = torchvision.models.efficientnet_b3(weights=weights).to(device)

# for p in model.features.parameters():
#     p.requires_grad = False

# model.classifier = nn.Sequential(
#     nn.Dropout(p=0.3, inplace=True),
#     nn.Linear(1536, 3)
# ).to(device)

# summary(model, input_size=(BATCH_SIZE, 3, 224, 224), col_width=20, row_settings=["var_names"],
#         col_names=["input_size", "output_size", "num_params", "trainable"]), model.classifier

In [19]:
def create_effnetb3():
    weights = torchvision.models.EfficientNet_B3_Weights.DEFAULT
    model = torchvision.models.efficientnet_b3(weights=weights).to(device)

    for p in model.features.parameters():
        p.requires_grad = False

    model.classifier = nn.Sequential(
        nn.Dropout(p=0.3, inplace=True),
        nn.Linear(1536, 3)
    ).to(device)
    
    return model

In [20]:
dataloaders = [("data_10_percent", train_dataloader_10_percent), ("data_20_percent", train_dataloader_20_percent)]
epochs_count = [5, 10]
model_names = ["effnetb0", "effnetb2", "effnetb3"]

In [21]:
%%time
# TODO: your code
experiment_number = 0

for data_name, data in dataloaders:
    for epochs in epochs_count:
        for model_name in model_names:
            experiment_number += 1
            print(f"[INFO] Experiment numer: {experiment_number}")
            print(f"[INFO] Model: {model_name}")
            print(f"[INFO] Data: {data_name}")
            print(f"[INFO] Number of epochs: {epochs}")
            
            if model_name == "effnetb0":
                model = create_effnetb0()
            elif model_name == "effnetb2":
                model = create_effnetb2()
            else:
                model = create_effnetb3()
    
            loss_fn = nn.CrossEntropyLoss()
            optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001)
    
            train(model=model,
                  train_dataloader=data,
                  test_dataloader=test_dataloader, 
                  optimizer=optimizer,
                  loss_fn=loss_fn,
                  epochs=epochs,
                  device=device,
                  writer=create_writer(experiment_name=data_name,
                                       model_name=model_name,
                                       extra=f"{epochs}_epochs"))

            torch.save(model.state_dict(), f"models/{model_name}_{data_name}_{epochs}_epochs")
            print("-"*50 + '\n')

[INFO] Experiment numer: 1
[INFO] Model: effnetb0
[INFO] Data: data_10_percent
[INFO] Number of epochs: 5
[INFO] Created SummaryWriter, saving to: runs/2024-09-14/data_10_percent/effnetb0/5_epochs...


  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0108 | train_acc: 0.5234 | test_loss: 0.8805 | test_acc: 0.6723
Epoch: 2 | train_loss: 0.9525 | train_acc: 0.4961 | test_loss: 0.8830 | test_acc: 0.5379
Epoch: 3 | train_loss: 0.7442 | train_acc: 0.7930 | test_loss: 0.6465 | test_acc: 0.8864
Epoch: 4 | train_loss: 0.7187 | train_acc: 0.7109 | test_loss: 0.5875 | test_acc: 0.8343
Epoch: 5 | train_loss: 0.6304 | train_acc: 0.8867 | test_loss: 0.6030 | test_acc: 0.8864
--------------------------------------------------

[INFO] Experiment numer: 2
[INFO] Model: effnetb2
[INFO] Data: data_10_percent
[INFO] Number of epochs: 5
[INFO] Created SummaryWriter, saving to: runs/2024-09-14/data_10_percent/effnetb2/5_epochs...


  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.1084 | train_acc: 0.3047 | test_loss: 0.9680 | test_acc: 0.6705
Epoch: 2 | train_loss: 0.8746 | train_acc: 0.7969 | test_loss: 0.8485 | test_acc: 0.7850
Epoch: 3 | train_loss: 0.8409 | train_acc: 0.7422 | test_loss: 0.7846 | test_acc: 0.8258
Epoch: 4 | train_loss: 0.7511 | train_acc: 0.7383 | test_loss: 0.6660 | test_acc: 0.8873
Epoch: 5 | train_loss: 0.6010 | train_acc: 0.9102 | test_loss: 0.6454 | test_acc: 0.8561
--------------------------------------------------

[INFO] Experiment numer: 3
[INFO] Model: effnetb3
[INFO] Data: data_10_percent
[INFO] Number of epochs: 5
[INFO] Created SummaryWriter, saving to: runs/2024-09-14/data_10_percent/effnetb3/5_epochs...


  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0859 | train_acc: 0.3555 | test_loss: 0.9673 | test_acc: 0.7434
Epoch: 2 | train_loss: 0.9456 | train_acc: 0.6172 | test_loss: 0.8262 | test_acc: 0.8438
Epoch: 3 | train_loss: 0.8497 | train_acc: 0.6836 | test_loss: 0.8040 | test_acc: 0.7850
Epoch: 4 | train_loss: 0.7281 | train_acc: 0.7656 | test_loss: 0.7783 | test_acc: 0.7850
Epoch: 5 | train_loss: 0.6434 | train_acc: 0.9023 | test_loss: 0.7179 | test_acc: 0.7955
--------------------------------------------------

[INFO] Experiment numer: 4
[INFO] Model: effnetb0
[INFO] Data: data_10_percent
[INFO] Number of epochs: 10
[INFO] Created SummaryWriter, saving to: runs/2024-09-14/data_10_percent/effnetb0/10_epochs...


  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0614 | train_acc: 0.4062 | test_loss: 0.9488 | test_acc: 0.4962
Epoch: 2 | train_loss: 0.9128 | train_acc: 0.5820 | test_loss: 0.7870 | test_acc: 0.7320
Epoch: 3 | train_loss: 0.7612 | train_acc: 0.8828 | test_loss: 0.6915 | test_acc: 0.8551
Epoch: 4 | train_loss: 0.6993 | train_acc: 0.7695 | test_loss: 0.6648 | test_acc: 0.8665
Epoch: 5 | train_loss: 0.6813 | train_acc: 0.7422 | test_loss: 0.6255 | test_acc: 0.8759
Epoch: 6 | train_loss: 0.5800 | train_acc: 0.9141 | test_loss: 0.5140 | test_acc: 0.8864
Epoch: 7 | train_loss: 0.5574 | train_acc: 0.8828 | test_loss: 0.4839 | test_acc: 0.8655
Epoch: 8 | train_loss: 0.5740 | train_acc: 0.7461 | test_loss: 0.4702 | test_acc: 0.8655
Epoch: 9 | train_loss: 0.4660 | train_acc: 0.8945 | test_loss: 0.4831 | test_acc: 0.9072
Epoch: 10 | train_loss: 0.5759 | train_acc: 0.7656 | test_loss: 0.5137 | test_acc: 0.8968
--------------------------------------------------

[INFO] Experiment numer: 5
[INFO] Model: effnetb2
[INFO] 

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9969 | train_acc: 0.5938 | test_loss: 0.9300 | test_acc: 0.7131
Epoch: 2 | train_loss: 0.8888 | train_acc: 0.6328 | test_loss: 0.8635 | test_acc: 0.7027
Epoch: 3 | train_loss: 0.7751 | train_acc: 0.7344 | test_loss: 0.7867 | test_acc: 0.8049
Epoch: 4 | train_loss: 0.7155 | train_acc: 0.7773 | test_loss: 0.6709 | test_acc: 0.8864
Epoch: 5 | train_loss: 0.6367 | train_acc: 0.7695 | test_loss: 0.6599 | test_acc: 0.8466
Epoch: 6 | train_loss: 0.6543 | train_acc: 0.7695 | test_loss: 0.6109 | test_acc: 0.8864
Epoch: 7 | train_loss: 0.5508 | train_acc: 0.7969 | test_loss: 0.5880 | test_acc: 0.9176
Epoch: 8 | train_loss: 0.5442 | train_acc: 0.8008 | test_loss: 0.5694 | test_acc: 0.8977
Epoch: 9 | train_loss: 0.4752 | train_acc: 0.9492 | test_loss: 0.5684 | test_acc: 0.8977
Epoch: 10 | train_loss: 0.5139 | train_acc: 0.8047 | test_loss: 0.5492 | test_acc: 0.9072
--------------------------------------------------

[INFO] Experiment numer: 6
[INFO] Model: effnetb3
[INFO] 

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0876 | train_acc: 0.3750 | test_loss: 1.0211 | test_acc: 0.6098
Epoch: 2 | train_loss: 0.9306 | train_acc: 0.6602 | test_loss: 0.9176 | test_acc: 0.6411
Epoch: 3 | train_loss: 0.8124 | train_acc: 0.7109 | test_loss: 0.7670 | test_acc: 0.8864
Epoch: 4 | train_loss: 0.7401 | train_acc: 0.7227 | test_loss: 0.7217 | test_acc: 0.8561
Epoch: 5 | train_loss: 0.6992 | train_acc: 0.7852 | test_loss: 0.6490 | test_acc: 0.8958
Epoch: 6 | train_loss: 0.6001 | train_acc: 0.8984 | test_loss: 0.6447 | test_acc: 0.8561
Epoch: 7 | train_loss: 0.5729 | train_acc: 0.8516 | test_loss: 0.6518 | test_acc: 0.7850
Epoch: 8 | train_loss: 0.5505 | train_acc: 0.8516 | test_loss: 0.6255 | test_acc: 0.8258
Epoch: 9 | train_loss: 0.5179 | train_acc: 0.8633 | test_loss: 0.6005 | test_acc: 0.8258
Epoch: 10 | train_loss: 0.5640 | train_acc: 0.7695 | test_loss: 0.5707 | test_acc: 0.8561
--------------------------------------------------

[INFO] Experiment numer: 7
[INFO] Model: effnetb0
[INFO] 

  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9855 | train_acc: 0.5417 | test_loss: 0.7065 | test_acc: 0.8362
Epoch: 2 | train_loss: 0.6801 | train_acc: 0.8438 | test_loss: 0.5875 | test_acc: 0.8977
Epoch: 3 | train_loss: 0.5614 | train_acc: 0.8854 | test_loss: 0.4828 | test_acc: 0.8561
Epoch: 4 | train_loss: 0.4643 | train_acc: 0.9000 | test_loss: 0.4444 | test_acc: 0.8977
Epoch: 5 | train_loss: 0.4535 | train_acc: 0.8792 | test_loss: 0.3975 | test_acc: 0.9081
--------------------------------------------------

[INFO] Experiment numer: 8
[INFO] Model: effnetb2
[INFO] Data: data_20_percent
[INFO] Number of epochs: 5
[INFO] Created SummaryWriter, saving to: runs/2024-09-14/data_20_percent/effnetb2/5_epochs...


  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0056 | train_acc: 0.5083 | test_loss: 0.8464 | test_acc: 0.7453
Epoch: 2 | train_loss: 0.7423 | train_acc: 0.8063 | test_loss: 0.7171 | test_acc: 0.8873
Epoch: 3 | train_loss: 0.5745 | train_acc: 0.8625 | test_loss: 0.6168 | test_acc: 0.8977
Epoch: 4 | train_loss: 0.5125 | train_acc: 0.8833 | test_loss: 0.5351 | test_acc: 0.8977
Epoch: 5 | train_loss: 0.4655 | train_acc: 0.8688 | test_loss: 0.5318 | test_acc: 0.8977
--------------------------------------------------

[INFO] Experiment numer: 9
[INFO] Model: effnetb3
[INFO] Data: data_20_percent
[INFO] Number of epochs: 5
[INFO] Created SummaryWriter, saving to: runs/2024-09-14/data_20_percent/effnetb3/5_epochs...


  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9698 | train_acc: 0.5479 | test_loss: 0.7940 | test_acc: 0.7955
Epoch: 2 | train_loss: 0.7510 | train_acc: 0.7958 | test_loss: 0.6631 | test_acc: 0.8362
Epoch: 3 | train_loss: 0.5741 | train_acc: 0.8792 | test_loss: 0.5623 | test_acc: 0.8968
Epoch: 4 | train_loss: 0.5236 | train_acc: 0.8542 | test_loss: 0.5068 | test_acc: 0.8665
Epoch: 5 | train_loss: 0.4358 | train_acc: 0.9104 | test_loss: 0.4601 | test_acc: 0.8759
--------------------------------------------------

[INFO] Experiment numer: 10
[INFO] Model: effnetb0
[INFO] Data: data_20_percent
[INFO] Number of epochs: 10
[INFO] Created SummaryWriter, saving to: runs/2024-09-14/data_20_percent/effnetb0/10_epochs...


  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9422 | train_acc: 0.6188 | test_loss: 0.6703 | test_acc: 0.9072
Epoch: 2 | train_loss: 0.6634 | train_acc: 0.8583 | test_loss: 0.5885 | test_acc: 0.8873
Epoch: 3 | train_loss: 0.5510 | train_acc: 0.8833 | test_loss: 0.4699 | test_acc: 0.9280
Epoch: 4 | train_loss: 0.4476 | train_acc: 0.9104 | test_loss: 0.4144 | test_acc: 0.9176
Epoch: 5 | train_loss: 0.4184 | train_acc: 0.9021 | test_loss: 0.4010 | test_acc: 0.8977
Epoch: 6 | train_loss: 0.3909 | train_acc: 0.8938 | test_loss: 0.3495 | test_acc: 0.9176
Epoch: 7 | train_loss: 0.3253 | train_acc: 0.9313 | test_loss: 0.3666 | test_acc: 0.9081
Epoch: 8 | train_loss: 0.3464 | train_acc: 0.8729 | test_loss: 0.3053 | test_acc: 0.9489
Epoch: 9 | train_loss: 0.2974 | train_acc: 0.9292 | test_loss: 0.3100 | test_acc: 0.9072
Epoch: 10 | train_loss: 0.3567 | train_acc: 0.8625 | test_loss: 0.2886 | test_acc: 0.9384
--------------------------------------------------

[INFO] Experiment numer: 11
[INFO] Model: effnetb2
[INFO]

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9487 | train_acc: 0.6604 | test_loss: 0.8240 | test_acc: 0.7746
Epoch: 2 | train_loss: 0.7086 | train_acc: 0.7937 | test_loss: 0.6652 | test_acc: 0.9176
Epoch: 3 | train_loss: 0.5875 | train_acc: 0.8708 | test_loss: 0.6078 | test_acc: 0.8873
Epoch: 4 | train_loss: 0.4774 | train_acc: 0.8938 | test_loss: 0.5441 | test_acc: 0.8864
Epoch: 5 | train_loss: 0.4414 | train_acc: 0.8833 | test_loss: 0.5105 | test_acc: 0.8665
Epoch: 6 | train_loss: 0.4053 | train_acc: 0.9062 | test_loss: 0.4962 | test_acc: 0.8968
Epoch: 7 | train_loss: 0.4010 | train_acc: 0.8750 | test_loss: 0.4649 | test_acc: 0.8977
Epoch: 8 | train_loss: 0.3487 | train_acc: 0.9021 | test_loss: 0.4036 | test_acc: 0.9280
Epoch: 9 | train_loss: 0.3105 | train_acc: 0.9229 | test_loss: 0.4872 | test_acc: 0.8674
Epoch: 10 | train_loss: 0.3427 | train_acc: 0.9313 | test_loss: 0.4503 | test_acc: 0.8873
--------------------------------------------------

[INFO] Experiment numer: 12
[INFO] Model: effnetb3
[INFO]

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9800 | train_acc: 0.5521 | test_loss: 0.8764 | test_acc: 0.7850
Epoch: 2 | train_loss: 0.7185 | train_acc: 0.8438 | test_loss: 0.6610 | test_acc: 0.8456
Epoch: 3 | train_loss: 0.5985 | train_acc: 0.8688 | test_loss: 0.6072 | test_acc: 0.8456
Epoch: 4 | train_loss: 0.4889 | train_acc: 0.8896 | test_loss: 0.5167 | test_acc: 0.8655
Epoch: 5 | train_loss: 0.4531 | train_acc: 0.8792 | test_loss: 0.5242 | test_acc: 0.8049
Epoch: 6 | train_loss: 0.4569 | train_acc: 0.8688 | test_loss: 0.4649 | test_acc: 0.8352
Epoch: 7 | train_loss: 0.4685 | train_acc: 0.8708 | test_loss: 0.4908 | test_acc: 0.8456
Epoch: 8 | train_loss: 0.4009 | train_acc: 0.8771 | test_loss: 0.4142 | test_acc: 0.8864
Epoch: 9 | train_loss: 0.3868 | train_acc: 0.9042 | test_loss: 0.4575 | test_acc: 0.8153
Epoch: 10 | train_loss: 0.3253 | train_acc: 0.9125 | test_loss: 0.4472 | test_acc: 0.8049
--------------------------------------------------

CPU times: user 1min 15s, sys: 44.2 s, total: 1min 59s
Wa

effnetb0 worked the best, primarly because the bigger models didn't leverage their capability of using bigger images, but also because we didn't train for too long. Data augmentation may also help, which I will investigate in the next section

## Exercise 2. Introduce data augmentation to the list of experiments using the 20% pizza, steak, sushi training and test datasets, does this change anything?
    
* For example, you could have one training DataLoader that uses data augmentation (e.g. `train_dataloader_20_percent_aug` and `train_dataloader_20_percent_no_aug`) and then compare the results of two of the same model types training on these two DataLoaders.
* **Note:** You may need to alter the `create_dataloaders()` function to be able to take a transform for the training data and the testing data (because you don't need to perform data augmentation on the test data). See [04. PyTorch Custom Datasets section 6](https://www.learnpytorch.io/04_pytorch_custom_datasets/#6-other-forms-of-transforms-data-augmentation) for examples of using data augmentation or the script below for an example:

```python
# Note: Data augmentation transform like this should only be performed on training data
train_transform_data_aug = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.TrivialAugmentWide(),
    transforms.ToTensor(),
    normalize
])

# Create a helper function to visualize different augmented (and not augmented) images
def view_dataloader_images(dataloader, n=10):
    if n > 10:
        print(f"Having n higher than 10 will create messy plots, lowering to 10.")
        n = 10
    imgs, labels = next(iter(dataloader))
    plt.figure(figsize=(16, 8))
    for i in range(n):
        # Min max scale the image for display purposes
        targ_image = imgs[i]
        sample_min, sample_max = targ_image.min(), targ_image.max()
        sample_scaled = (targ_image - sample_min)/(sample_max - sample_min)

        # Plot images with appropriate axes information
        plt.subplot(1, 10, i+1)
        plt.imshow(sample_scaled.permute(1, 2, 0)) # resize for Matplotlib requirements
        plt.title(class_names[labels[i]])
        plt.axis(False)

# Have to update `create_dataloaders()` to handle different augmentations
import os
from torch.utils.data import DataLoader
from torchvision import datasets

NUM_WORKERS = os.cpu_count() # use maximum number of CPUs for workers to load data 

# Note: this is an update version of data_setup.create_dataloaders to handle
# differnt train and test transforms.
def create_dataloaders(
    train_dir, 
    test_dir, 
    train_transform, # add parameter for train transform (transforms on train dataset)
    test_transform,  # add parameter for test transform (transforms on test dataset)
    batch_size=32, num_workers=NUM_WORKERS
):
    # Use ImageFolder to create dataset(s)
    train_data = datasets.ImageFolder(train_dir, transform=train_transform)
    test_data = datasets.ImageFolder(test_dir, transform=test_transform)

    # Get class names
    class_names = train_data.classes

    # Turn images into data loaders
    train_dataloader = DataLoader(
        train_data,
        batch_size=batch_size,
        shuffle=True,
        num_workers=num_workers,
        pin_memory=True,
    )
    test_dataloader = DataLoader(
        test_data,
        batch_size=batch_size,
        shuffle=True,
        num_workers=num_workers,
        pin_memory=True,
    )

    return train_dataloader, test_dataloader, class_names
```

In [17]:
# Have to update `create_dataloaders()` to handle different augmentations
import os
from torch.utils.data import DataLoader
from torchvision import datasets

NUM_WORKERS = os.cpu_count() # use maximum number of CPUs for workers to load data 

# Note: this is an update version of data_setup.create_dataloaders to handle
# differnt train and test transforms.
def create_dataloaders(
    train_dir, 
    test_dir, 
    train_transform, # add parameter for train transform (transforms on train dataset)
    test_transform,  # add parameter for test transform (transforms on test dataset)
    batch_size=32, num_workers=NUM_WORKERS
):
    # Use ImageFolder to create dataset(s)
    train_data = datasets.ImageFolder(train_dir, transform=train_transform)
    test_data = datasets.ImageFolder(test_dir, transform=test_transform)

    # Get class names
    class_names = train_data.classes

    # Turn images into data loaders
    train_dataloader = DataLoader(
        train_data,
        batch_size=batch_size,
        shuffle=True,
        num_workers=num_workers,
        pin_memory=True,
    )
    test_dataloader = DataLoader(
        test_data,
        batch_size=batch_size,
        shuffle=False,
        num_workers=num_workers,
        pin_memory=True,
    )

    return train_dataloader, test_dataloader, class_names

#### I will try all the 3 models with their default input size and data augmentation to see how good can they get

In [23]:
# TODO: your code
test_transform0 = torchvision.models.EfficientNet_B0_Weights.DEFAULT.transforms()
test_transform2 = torchvision.models.EfficientNet_B2_Weights.DEFAULT.transforms()
test_transform3 = torchvision.models.EfficientNet_B3_Weights.DEFAULT.transforms()
test_transform0, test_transform2, test_transform3

(ImageClassification(
     crop_size=[224]
     resize_size=[256]
     mean=[0.485, 0.456, 0.406]
     std=[0.229, 0.224, 0.225]
     interpolation=InterpolationMode.BICUBIC
 ),
 ImageClassification(
     crop_size=[288]
     resize_size=[288]
     mean=[0.485, 0.456, 0.406]
     std=[0.229, 0.224, 0.225]
     interpolation=InterpolationMode.BICUBIC
 ),
 ImageClassification(
     crop_size=[300]
     resize_size=[320]
     mean=[0.485, 0.456, 0.406]
     std=[0.229, 0.224, 0.225]
     interpolation=InterpolationMode.BICUBIC
 ))

In [24]:
batch = torch.rand(32, 3, 1000, 1000)
test_transform0(batch).shape, test_transform2(batch).shape, test_transform3(batch).shape

(torch.Size([32, 3, 224, 224]),
 torch.Size([32, 3, 288, 288]),
 torch.Size([32, 3, 300, 300]))

In [25]:
from torchvision.transforms import v2

transform0 = v2.Compose([
    v2.TrivialAugmentWide(),
    v2.Resize((224, 224)),
    v2.PILToTensor(),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize(mean=[0.485, 0.456, 0.406], 
                 std=[0.229, 0.224, 0.225])
])

transform2 = v2.Compose([
    v2.TrivialAugmentWide(),
    v2.Resize((288, 288)),
    v2.PILToTensor(),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize(mean=[0.485, 0.456, 0.406], 
                 std=[0.229, 0.224, 0.225])
])

transform3 = v2.Compose([
    v2.TrivialAugmentWide(),
    v2.Resize((300, 300)),
    v2.PILToTensor(),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize(mean=[0.485, 0.456, 0.406], 
                 std=[0.229, 0.224, 0.225])
])

In [26]:
train_dl0, test_dl0, class_names = create_dataloaders(train_dir=train_dir_20_percent,
                                                     test_dir=test_dir,
                                                     train_transform=transform0,
                                                     test_transform=test_transform0,
                                                     batch_size=BATCH_SIZE)

train_dl2, test_dl2, _ = create_dataloaders(train_dir=train_dir_20_percent,
                                                     test_dir=test_dir,
                                                     train_transform=transform2,
                                                     test_transform=test_transform2,
                                                     batch_size=BATCH_SIZE)

train_dl3, test_dl3, _ = create_dataloaders(train_dir=train_dir_20_percent,
                                                     test_dir=test_dir,
                                                     train_transform=transform3,
                                                     test_transform=test_transform3,
                                                     batch_size=BATCH_SIZE)

In [27]:
epochs_count = [10, 20]
model_names = ["effnetb0", "effnetb2", "effnetb3"]

In [28]:
%%time
experiment_number = 0

for epochs in epochs_count:
    for model_name in model_names:
        experiment_number += 1
        print(f"[INFO] Experiment numer: {experiment_number}")
        print(f"[INFO] Model: {model_name}")
        print(f"[INFO] Number of epochs: {epochs}")
        
        if model_name == "effnetb0":
            model = create_effnetb0()
            train_dataloader = train_dl0
            test_dataloader = test_dl0
        elif model_name == "effnetb2":
            model = create_effnetb2()
            train_dataloader = train_dl2
            test_dataloader = test_dl2
        else:
            model = create_effnetb3()
            train_dataloader = train_dl3
            test_dataloader = test_dl3

        loss_fn = nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001)

        train(model=model,
              train_dataloader=train_dataloader,
              test_dataloader=test_dataloader,
              optimizer=optimizer,
              loss_fn=loss_fn,
              epochs=epochs,
              device=device,
              writer=create_writer(experiment_name="data_augmented",
                                   model_name=model_name,
                                   extra=f"{epochs}_epochs"))

        torch.save(model.state_dict(), f"models/{model_name}_augmented_20_data_{epochs}_epochs")
        print("-"*50 + '\n')

[INFO] Experiment numer: 1
[INFO] Model: effnetb0
[INFO] Number of epochs: 10
[INFO] Created SummaryWriter, saving to: runs/2024-09-14/data_augmented/effnetb0/10_epochs...


  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9880 | train_acc: 0.5479 | test_loss: 0.7592 | test_acc: 0.8049
Epoch: 2 | train_loss: 0.7355 | train_acc: 0.7646 | test_loss: 0.5830 | test_acc: 0.8759
Epoch: 3 | train_loss: 0.5908 | train_acc: 0.8479 | test_loss: 0.4884 | test_acc: 0.9167
Epoch: 4 | train_loss: 0.5481 | train_acc: 0.8125 | test_loss: 0.4681 | test_acc: 0.8561
Epoch: 5 | train_loss: 0.5102 | train_acc: 0.8604 | test_loss: 0.3967 | test_acc: 0.8968
Epoch: 6 | train_loss: 0.4326 | train_acc: 0.8792 | test_loss: 0.3922 | test_acc: 0.8665
Epoch: 7 | train_loss: 0.4787 | train_acc: 0.8313 | test_loss: 0.3548 | test_acc: 0.8864
Epoch: 8 | train_loss: 0.4576 | train_acc: 0.8354 | test_loss: 0.3545 | test_acc: 0.8759
Epoch: 9 | train_loss: 0.4717 | train_acc: 0.8229 | test_loss: 0.3603 | test_acc: 0.8665
Epoch: 10 | train_loss: 0.4028 | train_acc: 0.8542 | test_loss: 0.3100 | test_acc: 0.8665
--------------------------------------------------

[INFO] Experiment numer: 2
[INFO] Model: effnetb2
[INFO] 

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9906 | train_acc: 0.5417 | test_loss: 0.7694 | test_acc: 0.8864
Epoch: 2 | train_loss: 0.7538 | train_acc: 0.7708 | test_loss: 0.6616 | test_acc: 0.8561
Epoch: 3 | train_loss: 0.6527 | train_acc: 0.8313 | test_loss: 0.5771 | test_acc: 0.8665
Epoch: 4 | train_loss: 0.5713 | train_acc: 0.8479 | test_loss: 0.5190 | test_acc: 0.8769
Epoch: 5 | train_loss: 0.4750 | train_acc: 0.8833 | test_loss: 0.4923 | test_acc: 0.8873
Epoch: 6 | train_loss: 0.4326 | train_acc: 0.8854 | test_loss: 0.4356 | test_acc: 0.8769
Epoch: 7 | train_loss: 0.4470 | train_acc: 0.8729 | test_loss: 0.4911 | test_acc: 0.8570
Epoch: 8 | train_loss: 0.4411 | train_acc: 0.8833 | test_loss: 0.4339 | test_acc: 0.8873
Epoch: 9 | train_loss: 0.4165 | train_acc: 0.8958 | test_loss: 0.4096 | test_acc: 0.8873
Epoch: 10 | train_loss: 0.4593 | train_acc: 0.7958 | test_loss: 0.4062 | test_acc: 0.8769
--------------------------------------------------

[INFO] Experiment numer: 3
[INFO] Model: effnetb3
[INFO] 

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0000 | train_acc: 0.5854 | test_loss: 0.8971 | test_acc: 0.7652
Epoch: 2 | train_loss: 0.7361 | train_acc: 0.8646 | test_loss: 0.6612 | test_acc: 0.8267
Epoch: 3 | train_loss: 0.6036 | train_acc: 0.8771 | test_loss: 0.5338 | test_acc: 0.8873
Epoch: 4 | train_loss: 0.5167 | train_acc: 0.8708 | test_loss: 0.4910 | test_acc: 0.9081
Epoch: 5 | train_loss: 0.5057 | train_acc: 0.8396 | test_loss: 0.4047 | test_acc: 0.9176
Epoch: 6 | train_loss: 0.4185 | train_acc: 0.9083 | test_loss: 0.4002 | test_acc: 0.8977
Epoch: 7 | train_loss: 0.4131 | train_acc: 0.9021 | test_loss: 0.3810 | test_acc: 0.8977
Epoch: 8 | train_loss: 0.4416 | train_acc: 0.8542 | test_loss: 0.3668 | test_acc: 0.9384
Epoch: 9 | train_loss: 0.3707 | train_acc: 0.8979 | test_loss: 0.3044 | test_acc: 0.9280
Epoch: 10 | train_loss: 0.3527 | train_acc: 0.8979 | test_loss: 0.2804 | test_acc: 0.9583
--------------------------------------------------

[INFO] Experiment numer: 4
[INFO] Model: effnetb0
[INFO] 

  0%|          | 0/20 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9777 | train_acc: 0.5729 | test_loss: 0.7809 | test_acc: 0.8352
Epoch: 2 | train_loss: 0.7189 | train_acc: 0.7917 | test_loss: 0.5754 | test_acc: 0.8854
Epoch: 3 | train_loss: 0.6673 | train_acc: 0.7625 | test_loss: 0.4888 | test_acc: 0.9062
Epoch: 4 | train_loss: 0.5098 | train_acc: 0.8625 | test_loss: 0.4813 | test_acc: 0.8769
Epoch: 5 | train_loss: 0.4774 | train_acc: 0.8313 | test_loss: 0.3934 | test_acc: 0.8864
Epoch: 6 | train_loss: 0.5075 | train_acc: 0.7937 | test_loss: 0.4084 | test_acc: 0.9072
Epoch: 7 | train_loss: 0.4824 | train_acc: 0.8083 | test_loss: 0.3481 | test_acc: 0.8968
Epoch: 8 | train_loss: 0.4213 | train_acc: 0.8396 | test_loss: 0.4061 | test_acc: 0.8873
Epoch: 9 | train_loss: 0.3928 | train_acc: 0.8708 | test_loss: 0.3216 | test_acc: 0.8968
Epoch: 10 | train_loss: 0.4191 | train_acc: 0.8500 | test_loss: 0.3243 | test_acc: 0.8864
Epoch: 11 | train_loss: 0.4380 | train_acc: 0.8417 | test_loss: 0.3394 | test_acc: 0.8873
Epoch: 12 | train_l

  0%|          | 0/20 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9609 | train_acc: 0.5813 | test_loss: 0.8185 | test_acc: 0.7140
Epoch: 2 | train_loss: 0.7438 | train_acc: 0.7937 | test_loss: 0.6204 | test_acc: 0.9072
Epoch: 3 | train_loss: 0.6124 | train_acc: 0.8208 | test_loss: 0.5785 | test_acc: 0.9072
Epoch: 4 | train_loss: 0.5026 | train_acc: 0.8854 | test_loss: 0.4841 | test_acc: 0.9176
Epoch: 5 | train_loss: 0.4950 | train_acc: 0.8875 | test_loss: 0.4721 | test_acc: 0.9176
Epoch: 6 | train_loss: 0.4528 | train_acc: 0.8750 | test_loss: 0.4543 | test_acc: 0.9176
Epoch: 7 | train_loss: 0.4304 | train_acc: 0.8417 | test_loss: 0.4265 | test_acc: 0.9280
Epoch: 8 | train_loss: 0.3789 | train_acc: 0.9062 | test_loss: 0.4301 | test_acc: 0.9176
Epoch: 9 | train_loss: 0.3922 | train_acc: 0.8583 | test_loss: 0.3877 | test_acc: 0.8873
Epoch: 10 | train_loss: 0.3502 | train_acc: 0.8771 | test_loss: 0.3423 | test_acc: 0.9176
Epoch: 11 | train_loss: 0.3971 | train_acc: 0.8812 | test_loss: 0.4091 | test_acc: 0.9072
Epoch: 12 | train_l

  0%|          | 0/20 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 0.9847 | train_acc: 0.6042 | test_loss: 0.7729 | test_acc: 0.8873
Epoch: 2 | train_loss: 0.7594 | train_acc: 0.7979 | test_loss: 0.6441 | test_acc: 0.9176
Epoch: 3 | train_loss: 0.6535 | train_acc: 0.8562 | test_loss: 0.5740 | test_acc: 0.8561
Epoch: 4 | train_loss: 0.5447 | train_acc: 0.8542 | test_loss: 0.4881 | test_acc: 0.8873
Epoch: 5 | train_loss: 0.5170 | train_acc: 0.8875 | test_loss: 0.4455 | test_acc: 0.8977
Epoch: 6 | train_loss: 0.4397 | train_acc: 0.8958 | test_loss: 0.4528 | test_acc: 0.8769
Epoch: 7 | train_loss: 0.3818 | train_acc: 0.9104 | test_loss: 0.3971 | test_acc: 0.8873
Epoch: 8 | train_loss: 0.3976 | train_acc: 0.8854 | test_loss: 0.3584 | test_acc: 0.8873
Epoch: 9 | train_loss: 0.3716 | train_acc: 0.9167 | test_loss: 0.4005 | test_acc: 0.8769
Epoch: 10 | train_loss: 0.3380 | train_acc: 0.9104 | test_loss: 0.3192 | test_acc: 0.8769
Epoch: 11 | train_loss: 0.3435 | train_acc: 0.9187 | test_loss: 0.3273 | test_acc: 0.8873
Epoch: 12 | train_l

the augmentation didn't make a noticable difference, training for 20 instead of 10 epochs also seems pointless

## Exercise 3. Scale up the dataset to turn FoodVision Mini into FoodVision Big using the entire [Food101 dataset from `torchvision.models`](https://pytorch.org/vision/stable/generated/torchvision.datasets.Food101.html#torchvision.datasets.Food101)
    
* You could take the best performing model from your various experiments or even the EffNetB2 feature extractor we created in this notebook and see how it goes fitting for 5 epochs on all of Food101.
* If you try more than one model, it would be good to have the model's results tracked.
* If you load the Food101 dataset from `torchvision.models`, you'll have to create PyTorch DataLoaders to use it in training.
* **Note:** Due to the larger amount of data in Food101 compared to our pizza, steak, sushi dataset, this model will take longer to train.

In [1]:
#TODO

**I have done this (and much more) here** https://github.com/noNScop/sem2.5/tree/main/image_classification_on_food101