<a href="https://colab.research.google.com/github/ioannis-toumpoglou/pytorch-repo/blob/main/07_pytorch_experiment_tracking.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 07. PyTorch Experiment Tracking

Machine Learning is very experimental.

In order to figure out which experiments are worth pursuing, the use of **experiment tracking** helps to figure out what doesn't work, leading to figuring out what **does** work.

In [3]:
import torch
import torchvision

print(torch.__version__)
print(torchvision.__version__)

2.0.1+cu118
0.15.2+cu118


In [4]:
try:
    import torch
    import torchvision
    assert int(torch.__version__.split(".")[1]) >= 12, "torch version should be 1.12+"
    assert int(torchvision.__version__.split(".")[1]) >= 13, "torchvision version should be 0.13+"
    print(f"torch version: {torch.__version__}")
    print(f"torchvision version: {torchvision.__version__}")
except:
    print(f"[INFO] torch/torchvision versions not as required, installing nightly versions.")
    !pip3 install -U torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
    import torch
    import torchvision
    print(f"torch version: {torch.__version__}")
    print(f"torchvision version: {torchvision.__version__}")

[INFO] torch/torchvision versions not as required, installing nightly versions.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/, https://download.pytorch.org/whl/cu113
torch version: 2.0.1+cu118
torchvision version: 0.15.2+cu118


In [5]:
# Continue with regular imports
import matplotlib.pyplot as plt
import torch
import torchvision

from torch import nn
from torchvision import transforms

# Try to get torchinfo, install it if it doesn't work
try:
    from torchinfo import summary
except:
    print("[INFO] Couldn't find torchinfo... installing it.")
    !pip install -q torchinfo
    from torchinfo import summary

# Try to import the going_modular directory, download it from GitHub if it doesn't work
try:
    from going_modular.going_modular import data_setup, engine
except:
    # Get the going_modular scripts
    print("[INFO] Couldn't find going_modular scripts... downloading them from GitHub.")
    !git clone https://github.com/mrdbourke/pytorch-deep-learning
    !mv pytorch-deep-learning/going_modular .
    !rm -rf pytorch-deep-learning
    from going_modular.going_modular import data_setup, engine

[INFO] Couldn't find torchinfo... installing it.
[INFO] Couldn't find going_modular scripts... downloading them from GitHub.
Cloning into 'pytorch-deep-learning'...
remote: Enumerating objects: 3830, done.[K
remote: Counting objects: 100% (473/473), done.[K
remote: Compressing objects: 100% (263/263), done.[K
remote: Total 3830 (delta 248), reused 401 (delta 203), pack-reused 3357[K
Receiving objects: 100% (3830/3830), 649.88 MiB | 39.00 MiB/s, done.
Resolving deltas: 100% (2204/2204), done.
Updating files: 100% (248/248), done.


In [6]:
# Setup device-agnostic code
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [7]:
# Set seeds
def set_seeds(seed: int=42):
    """Sets random sets for torch operations.

    Args:
        seed (int, optional): Random seed to set. Defaults to 42.
    """
    # Set the seed for general torch operations
    torch.manual_seed(seed)
    # Set the seed for CUDA torch operations (ones that happen on the GPU)
    torch.cuda.manual_seed(seed)

In [8]:
set_seeds()

## 1. Get data

In [9]:
import os
import zipfile
from pathlib import Path
import requests

def download_data(source: str,
                  destination: str,
                  remove_source: bool=True) -> Path:
  """
    Downloads a zipped dataset from a source and unzips to destination.
  """
  # Setup a path to the data folder
  data_path = Path('data/')
  image_path = data_path / destination

  if image_path.is_dir():
    print(f'[INFO] {image_path} already exists, skipping download...')
  else:
    print(f'[INFO] Unable to find {image_path}, creating one...')
    image_path.mkdir(parents=True, exist_ok=True)
    # Download the target data
    target_file = Path(source).name

    with open(data_path / target_file, 'wb') as f:
      request = requests.get(source)
      print(f'[INFO] Downloading target file from source...')
      f.write(request.content)

    # Unzip target file
    with zipfile.ZipFile(data_path / target_file, 'r') as zip_ref:
      print(f'[INFO] Unzipping {target_file} data...')
      zip_ref.extractall(image_path)

    # Remove zip file
    if remove_source:
      os.remove(data_path / target_file)

  return image_path

In [12]:
image_path = download_data(source='https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip',
                           destination='pizza_steak_sushi')
image_path

[INFO] Unable to fine data/pizza_steak_sushi, creating one...
[INFO] Downloading target file from source...
[INFO] Unzipping pizza_steak_sushi.zip data...


PosixPath('data/pizza_steak_sushi')

## 2. Creating Datasets and DataLoaders

### 2.1 Create DataLoaders with manual transforms

The goal with tranforms is to ensure that the custom data is formatted in a reproducible way, as well as a way that will suit the pretrained models.

In [13]:
# Setup the directories
train_dir = image_path / 'train'
test_dir = image_path / 'test'

train_dir, test_dir

(PosixPath('data/pizza_steak_sushi/train'),
 PosixPath('data/pizza_steak_sushi/test'))

In [14]:
# Setup ImageNet normalization levels
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
# Create transform pipeline manually
from torchvision import transforms

manual_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    normalize
])
print(f'Manually created transforms: {manual_transforms}')

# Create DataLoaders
from going_modular.going_modular import data_setup

train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir,
                                                                               test_dir=test_dir,
                                                                               transform=manual_transforms,
                                                                               batch_size=32)
train_dataloader, test_dataloader, class_names

Manually created transforms: Compose(
    Resize(size=(224, 224), interpolation=bilinear, max_size=None, antialias=warn)
    ToTensor()
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)


(<torch.utils.data.dataloader.DataLoader at 0x7f17a9830550>,
 <torch.utils.data.dataloader.DataLoader at 0x7f17a9832d40>,
 ['pizza', 'steak', 'sushi'])