<a href="https://colab.research.google.com/github/gauthiermartin/pytorch-deep-learning-course/blob/main/07_pytorch_experiment_tracking.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 07. PyTorch Experiment Tracking

Machine learning is very experimental.

In order to figure out which experiments are worth pursuing, that's where **experiments tracking** come in.

In this notebook we will do this programmatically.


- Book Version - https://www.learnpytorch.io/07_pytorch_experiment_tracking/
- Made with ML - https://madewithml.com/courses/mlops/experiment-tracking/

In [1]:
import torch
import torchvision

print(torch.__version__)
print(torchvision.__version__)

2.0.1+cu118
0.15.2+cu118


In [2]:
# Continue with regular imports
import matplotlib.pyplot as plt
import torch
import torchvision

from torch import nn
from torchvision import transforms

# Try to get torchinfo, install it if it doesn't work
try:
    from torchinfo import summary
except:
    print("[INFO] Couldn't find torchinfo... installing it.")
    !pip install -q torchinfo
    from torchinfo import summary

# Try to import the going_modular directory, download it from GitHub if it doesn't work
try:
    from going_modular.going_modular import data_setup, engine
except:
    # Get the going_modular scripts
    print("[INFO] Couldn't find going_modular scripts... downloading them from GitHub.")
    !git clone https://github.com/mrdbourke/pytorch-deep-learning
    !mv pytorch-deep-learning/going_modular .
    !rm -rf pytorch-deep-learning
    from going_modular.going_modular import data_setup, engine

[INFO] Couldn't find torchinfo... installing it.
[INFO] Couldn't find going_modular scripts... downloading them from GitHub.
Cloning into 'pytorch-deep-learning'...
remote: Enumerating objects: 4028, done.[K
remote: Counting objects: 100% (1216/1216), done.[K
remote: Compressing objects: 100% (216/216), done.[K
remote: Total 4028 (delta 1065), reused 1095 (delta 997), pack-reused 2812[K
Receiving objects: 100% (4028/4028), 651.38 MiB | 15.97 MiB/s, done.
Resolving deltas: 100% (2358/2358), done.
Updating files: 100% (248/248), done.


In [3]:
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

In [4]:
# Set seeds
def set_seeds(seed: int=42):
    """Sets random sets for torch operations.

    Args:
        seed (int, optional): Random seed to set. Defaults to 42.
    """
    # Set the seed for general torch operations
    torch.manual_seed(seed)
    # Set the seed for CUDA torch operations (ones that happen on the GPU)
    torch.cuda.manual_seed(seed)

In [5]:
set_seeds()


# 1. Get Data

Want to get pizza, steak and sushi images.

So we can run experiments building FoodMiniVision model on them.

In [6]:
import os
import zipfile
import requests

from pathlib import Path

def download_data(
    source: str,
    destination: str,
    remove_source: bool = True
) -> Path:
  """
  Downloads a zipped dataset from source and unzips to destination.

  Args:
      source (str): A link to a zipped file containing data.
      destination (str): A target directory to unzip data to.
      remove_source (bool): Whether to remove the source after downloading and extracting.

  Returns:
      pathlib.Path to downloaded data.

  Example usage:
      download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
                    destination="pizza_steak_sushi")
  """

  # Setup path to data folder
  data_path = Path("data/")
  image_path = data_path / destination

  # If the image folder doesn't exist, create it
  if image_path.is_dir():
    print(f"[INFO] {destination} folder already exists. skipping download")
  else:
    print(f"[INFO] Creating {destination} folder.")
    image_path.mkdir(parents=True, exist_ok=True)

  # Download target data
  target_file = Path(source).name

  with open(data_path / target_file, "wb") as f:
    print(f"[INFO] Downloading {target_file}...")
    response = requests.get(source)
    f.write(response.content)

  # Unzip target file
  with zipfile.ZipFile(data_path / target_file, "r") as zip_ref:
    print(f"[INFO] Unzipping {target_file} data...")
    zip_ref.extractall(image_path)

  # Remove source file
  if remove_source:
    print(f"[INFO] Removing {target_file}...")
    os.remove(data_path / target_file)

  return image_path


In [7]:
image_path = download_data(
              source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
              destination="pizza_steak_sushi"
)
image_path

[INFO] Creating pizza_steak_sushi folder.
[INFO] Downloading pizza_steak_sushi.zip...
[INFO] Unzipping pizza_steak_sushi.zip data...
[INFO] Removing pizza_steak_sushi.zip...


PosixPath('data/pizza_steak_sushi')

## 2. Creating Datasets and DataLoaders

### 2.1 Create DataLoaders with manual transforms

The goal with transforms is to ensure our custom data is formated in a reproducable way and a way that match pretrained model requirements.



In [8]:
# Setup the directories

train_dir = image_path / "train"
test_dir = image_path / "test"

train_dir, test_dir

(PosixPath('data/pizza_steak_sushi/train'),
 PosixPath('data/pizza_steak_sushi/test'))

In [9]:
# Setup Imagenet normalization

normalize = transforms.Normalize(
  mean=[0.485, 0.456, 0.406],
  std=[0.229, 0.224, 0.225]
)

# Create transform pipeline manually
from torchvision import transforms

manual_transform = transforms.Compose([
  transforms.Resize((224, 224)),
  transforms.ToTensor(),
  normalize
])
print(f"Manually created transform: {manual_transform}")


# Create DataLoader
from going_modular.going_modular import data_setup

train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(
  train_dir=train_dir,
  test_dir=test_dir,
  transform=manual_transform,
  batch_size=32,
)

train_dataloader, test_dataloader, class_names

Manually created transform: Compose(
    Resize(size=(224, 224), interpolation=bilinear, max_size=None, antialias=warn)
    ToTensor()
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)


(<torch.utils.data.dataloader.DataLoader at 0x78e9ce444850>,
 <torch.utils.data.dataloader.DataLoader at 0x78e9ce444760>,
 ['pizza', 'steak', 'sushi'])


### 2.2 Create DataLoaders using automatically created transforms

The same principles apply for automatically created transforms: we want our data to reflect the Transformation that our model expects.





In [10]:
# Setup pretrained weights (plenty of these weights available in torchvision 0.13+)

import torchvision

weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT

automatic_transforms = weights.transforms()
print(f"Automatically created transforms: {automatic_transforms}")

# Create DataLoaders
from going_modular.going_modular import data_setup

train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(
  train_dir=train_dir,
  test_dir=test_dir,
  transform=automatic_transforms,
  batch_size=32,
)

train_dataloader, test_dataloader, class_names

Automatically created transforms: ImageClassification(
    crop_size=[224]
    resize_size=[256]
    mean=[0.485, 0.456, 0.406]
    std=[0.229, 0.224, 0.225]
    interpolation=InterpolationMode.BICUBIC
)


(<torch.utils.data.dataloader.DataLoader at 0x78e9ce445e40>,
 <torch.utils.data.dataloader.DataLoader at 0x78e9ce444f70>,
 ['pizza', 'steak', 'sushi'])

## 3. Getting a pretrained model, freeze base layer, change classifier head

In [11]:
# Note: This is how a pretrained model would be created prior to torchvision 0.13
# model = torchvision.models.efficientnet_b0(pretrained=True).to(device)

# Download the pretrained weights for EfficientNet_B0
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # DEFAULT = Best avaialable weights

# Create the model
model = torchvision.models.efficientnet_b0(weights=weights).to(device)


# Freeze all base layer by settings theur `requires_grad` to False
for param in model.parameters():
  param.requires_grad = False

Downloading: "https://download.pytorch.org/models/efficientnet_b0_rwightman-3dd342df.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_b0_rwightman-3dd342df.pth
100%|██████████| 20.5M/20.5M [00:01<00:00, 21.2MB/s]


In [12]:
model.classifier

Sequential(
  (0): Dropout(p=0.2, inplace=True)
  (1): Linear(in_features=1280, out_features=1000, bias=True)
)

In [13]:
set_seeds()
# Change the classifier head
model.classifier = torch.nn.Sequential(
  torch.nn.Dropout(p=0.2, inplace=True),
  torch.nn.Linear(in_features=1280, out_features=len(class_names))
).to(device)

In [14]:
from torchinfo import summary

summary(
    model,
    input_size=(32, 3, 224, 224),
    col_names=["input_size", "output_size", "num_params", "trainable"],
    col_width=20,
    row_settings=["var_names"]
)



Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [32, 3, 224, 224]    [32, 3]              --                   Partial
├─Sequential (features)                                      [32, 3, 224, 224]    [32, 1280, 7, 7]     --                   False
│    └─Conv2dNormActivation (0)                              [32, 3, 224, 224]    [32, 32, 112, 112]   --                   False
│    │    └─Conv2d (0)                                       [32, 3, 224, 224]    [32, 32, 112, 112]   (864)                False
│    │    └─BatchNorm2d (1)                                  [32, 32, 112, 112]   [32, 32, 112, 112]   (64)                 False
│    │    └─SiLU (2)                                         [32, 32, 112, 112]   [32, 32, 112, 112]   --                   --
│    └─Sequential (1)                                        [32, 32, 112, 112]   [32, 

### 4. Train a single model an track results

In [15]:
#Define loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)

To track experiments, we are going to use tensorboard : https://www.tensorflow.org/tensorboard

To interact with tensorboard we can use PyTorch's SummaryWriter: https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter

In [16]:
# Setup a SummaryWriter
from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter()

In [17]:
from tqdm.auto import tqdm
from typing import Dict, List, Tuple
from torch.utils.tensorboard import SummaryWriter

from going_modular.going_modular.engine import train_step, test_step

def train(model: torch.nn.Module,
          train_dataloader: torch.utils.data.DataLoader,
          test_dataloader: torch.utils.data.DataLoader,
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module,
          epochs: int,
          device: torch.device,
          writer: SummaryWriter) -> Dict[str, List]:
    """Trains and tests a PyTorch model.

    Passes a target PyTorch models through train_step() and test_step()
    functions for a number of epochs, training and testing the model
    in the same epoch loop.

    Calculates, prints and stores evaluation metrics throughout.

    Args:
    model: A PyTorch model to be trained and tested.
    train_dataloader: A DataLoader instance for the model to be trained on.
    test_dataloader: A DataLoader instance for the model to be tested on.
    optimizer: A PyTorch optimizer to help minimize the loss function.
    loss_fn: A PyTorch loss function to calculate loss on both datasets.
    epochs: An integer indicating how many epochs to train for.
    device: A target device to compute on (e.g. "cuda" or "cpu").

    Returns:
    A dictionary of training and testing loss as well as training and
    testing accuracy metrics. Each metric has a value in a list for
    each epoch.
    In the form: {train_loss: [...],
              train_acc: [...],
              test_loss: [...],
              test_acc: [...]}
    For example if training for epochs=2:
             {train_loss: [2.0616, 1.0537],
              train_acc: [0.3945, 0.3945],
              test_loss: [1.2641, 1.5706],
              test_acc: [0.3400, 0.2973]}
    """
    # Create empty results dictionary
    results = {"train_loss": [],
               "train_acc": [],
               "test_loss": [],
               "test_acc": []
    }

    # Make sure model on target device
    model.to(device)

    # Loop through training and testing steps for a number of epochs
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
                                          dataloader=train_dataloader,
                                          loss_fn=loss_fn,
                                          optimizer=optimizer,
                                          device=device)
        test_loss, test_acc = test_step(model=model,
          dataloader=test_dataloader,
          loss_fn=loss_fn,
          device=device)

        # Print out what's happening
        print(
          f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.4f} | "
          f"train_acc: {train_acc:.4f} | "
          f"test_loss: {test_loss:.4f} | "
          f"test_acc: {test_acc:.4f}"
        )

        # Update results dictionary
        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)

        if writer:
          # Experiment tracking
          writer.add_scalars(
              main_tag="Loss",
              tag_scalar_dict={
                  "train_loss": train_loss,
                  "test_loss": test_loss
              },
              global_step=epoch
          )
          writer.add_scalars(
              main_tag="Accuracy",
              tag_scalar_dict={"train_acc": train_acc,
                              "test_acc": test_acc
              },
              global_step=epoch
          )
          writer.add_graph(model, input_to_model=torch.randn(32, 3, 224, 224).to(device))
          writer.close()
    # Return the filled results at the end of the epochs
    return results




In [18]:
# Train model
set_seeds()

results = train(model=model,
                train_dataloader=train_dataloader,
                test_dataloader=test_dataloader,
                optimizer=optimizer,
                loss_fn=loss_fn,
                epochs=5,
                device=device,
                writer=writer)

  0%|          | 0/5 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 1.0924 | train_acc: 0.3984 | test_loss: 0.9133 | test_acc: 0.5398
Epoch: 2 | train_loss: 0.8975 | train_acc: 0.6562 | test_loss: 0.7838 | test_acc: 0.8561
Epoch: 3 | train_loss: 0.8038 | train_acc: 0.7461 | test_loss: 0.6723 | test_acc: 0.8864
Epoch: 4 | train_loss: 0.6770 | train_acc: 0.8516 | test_loss: 0.6699 | test_acc: 0.8049
Epoch: 5 | train_loss: 0.7066 | train_acc: 0.7188 | test_loss: 0.6747 | test_acc: 0.7737


## 5. View our model results with Tensorboard

In [19]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

In [24]:
!kill 1888
%tensorboard --logdir=runs

<IPython.core.display.Javascript object>