# Lab 2: Training Engine

In this lab, we'll create `engine.py` - the heart of our training pipeline. This script contains the functions that actually train and evaluate our model.

We'll build:
- **`train_step()`**: Train the model for one epoch
- **`test_step()`**: Evaluate the model for one epoch
- **`train()`**: Combine both for full training loop

By the end of this lab, you'll have a reusable training engine that can train any PyTorch model!

## Install Dependencies

In [1]:
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
!pip install matplotlib requests tqdm

## Import Libraries

In [2]:
import os
import requests
import zipfile
from pathlib import Path
from typing import Dict, List, Tuple

import torch
from torch import nn
from torchvision import transforms
from tqdm.auto import tqdm

print(f"PyTorch version: {torch.__version__}")

## 1. Setup: Get Data and Use Previous Scripts

Let's use the scripts we created in Lab 1. First, we need to set up our data and model.

In [3]:
data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi"

if image_path.is_dir():
    print(f"{image_path} directory exists.")
else:
    print(f"Did not find {image_path} directory, creating one...")
    image_path.mkdir(parents=True, exist_ok=True)
    
    with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
        request = requests.get("https://raw.githubusercontent.com/poridhioss/Introduction-to-Deep-Learning-with-Pytorch-Resources/main/Going-module/pizza_steak_sushi.zip")
        print("Downloading pizza, steak, sushi data...")
        f.write(request.content)

    with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:
        print("Unzipping pizza, steak, sushi data...")
        zip_ref.extractall(image_path)

    os.remove(data_path / "pizza_steak_sushi.zip")
    print("Download complete!")

train_dir = image_path / "train"
test_dir = image_path / "test"

In [4]:
going_modular_path = Path("going_modular")
going_modular_path.mkdir(parents=True, exist_ok=True)

### Create data_setup.py (from Lab 1)

In [5]:
%%writefile going_modular/data_setup.py
import os

from torchvision import datasets, transforms
from torch.utils.data import DataLoader

NUM_WORKERS = os.cpu_count()

def create_dataloaders(
    train_dir: str, 
    test_dir: str, 
    transform: transforms.Compose, 
    batch_size: int, 
    num_workers: int = NUM_WORKERS
):
    train_data = datasets.ImageFolder(train_dir, transform=transform)
    test_data = datasets.ImageFolder(test_dir, transform=transform)
    class_names = train_data.classes

    train_dataloader = DataLoader(
        train_data,
        batch_size=batch_size,
        shuffle=True,
        num_workers=num_workers,
        pin_memory=True,
    )
    test_dataloader = DataLoader(
        test_data,
        batch_size=batch_size,
        shuffle=False,
        num_workers=num_workers,
        pin_memory=True,
    )

    return train_dataloader, test_dataloader, class_names

### Create model_builder.py (from Lab 1)

In [6]:
%%writefile going_modular/model_builder.py
import torch
from torch import nn

class TinyVGG(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int) -> None:
        super().__init__()
        self.conv_block_1 = nn.Sequential(
            nn.Conv2d(in_channels=input_shape, 
                      out_channels=hidden_units, 
                      kernel_size=3, 
                      stride=1, 
                      padding=0),  
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, 
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=0),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.conv_block_2 = nn.Sequential(
            nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=0),
            nn.ReLU(),
            nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=0),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=hidden_units*13*13, out_features=output_shape)
        )
    
    def forward(self, x: torch.Tensor):
        return self.classifier(self.conv_block_2(self.conv_block_1(x)))

### Create DataLoaders and Model

In [7]:
from going_modular import data_setup, model_builder

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

data_transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor()
])

train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(
    train_dir=train_dir,
    test_dir=test_dir,
    transform=data_transform,
    batch_size=32,
    num_workers=0
)

print(f"Class names: {class_names}")
print(f"Train batches: {len(train_dataloader)}")
print(f"Test batches: {len(test_dataloader)}")

In [8]:
torch.manual_seed(42)
model = model_builder.TinyVGG(
    input_shape=3,
    hidden_units=10,
    output_shape=len(class_names)
).to(device)

print(model)

## 2. Understanding the Training Loop

Before we create `engine.py`, let's understand what happens in a typical training loop.

### Training Step (for one epoch):
1. Set model to training mode (`model.train()`)
2. Loop through batches:
   - Forward pass: Get predictions
   - Calculate loss
   - Zero gradients
   - Backward pass: Calculate gradients
   - Optimizer step: Update weights
   - Track metrics (loss, accuracy)

### Testing Step (for one epoch):
1. Set model to evaluation mode (`model.eval()`)
2. Turn off gradients (`torch.inference_mode()`)
3. Loop through batches:
   - Forward pass: Get predictions
   - Calculate loss
   - Track metrics

## 3. Create `train_step()` Function

Let's first create the training step function in the notebook, then we'll add it to our script.

In [9]:
def train_step(
    model: torch.nn.Module, 
    dataloader: torch.utils.data.DataLoader, 
    loss_fn: torch.nn.Module, 
    optimizer: torch.optim.Optimizer,
    device: torch.device
) -> Tuple[float, float]:

    model.train()
    train_loss, train_acc = 0, 0
    
    # Loop through data loader data batches
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)
        y_pred = model(X)
        loss = loss_fn(y_pred, y)
        train_loss += loss.item() 
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Calculate and accumulate accuracy metric across all batches
        y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
        train_acc += (y_pred_class == y).sum().item() / len(y_pred)

    # Adjust metrics to get average loss and accuracy per batch 
    train_loss = train_loss / len(dataloader)
    train_acc = train_acc / len(dataloader)
    return train_loss, train_acc

### Test `train_step()`

In [10]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Run one training step
train_loss, train_acc = train_step(
    model=model,
    dataloader=train_dataloader,
    loss_fn=loss_fn,
    optimizer=optimizer,
    device=device
)

print(f"Train loss: {train_loss:.4f}")
print(f"Train accuracy: {train_acc:.4f}")

## 4. Create `test_step()` Function

Now let's create the testing step function.

In [11]:
def test_step(
    model: torch.nn.Module, 
    dataloader: torch.utils.data.DataLoader, 
    loss_fn: torch.nn.Module,
    device: torch.device
) -> Tuple[float, float]:

    model.eval() 
    test_loss, test_acc = 0, 0
    
    with torch.inference_mode():
        for batch, (X, y) in enumerate(dataloader):
            X, y = X.to(device), y.to(device)
            test_pred_logits = model(X)
            loss = loss_fn(test_pred_logits, y)
            test_loss += loss.item()
            test_pred_labels = test_pred_logits.argmax(dim=1)
            test_acc += (test_pred_labels == y).sum().item() / len(test_pred_labels)
            
    test_loss = test_loss / len(dataloader)
    test_acc = test_acc / len(dataloader)
    return test_loss, test_acc

### Test `test_step()`

In [12]:
# Run one test step
test_loss, test_acc = test_step(
    model=model,
    dataloader=test_dataloader,
    loss_fn=loss_fn,
    device=device
)

print(f"Test loss: {test_loss:.4f}")
print(f"Test accuracy: {test_acc:.4f}")

## 5. Create `train()` Function

Now let's combine both steps into a full training function that runs for multiple epochs.

Here, we take trains and tests a PyTorch model.
```
    Args:
        model: A PyTorch model to be trained and tested.
        train_dataloader: A DataLoader instance for the model to be trained on.
        test_dataloader: A DataLoader instance for the model to be tested on.
        optimizer: A PyTorch optimizer to help minimize the loss function.
        loss_fn: A PyTorch loss function to calculate loss on both datasets.
        epochs: An integer indicating how many epochs to train for.
        device: A target device to compute on (e.g. "cuda" or "cpu").

    Returns:
        A dictionary of training and testing loss as well as training and
        testing accuracy metrics. In the form:
        {train_loss: [...], train_acc: [...], test_loss: [...], test_acc: [...]}
```

In [13]:
def train(
    model: torch.nn.Module, 
    train_dataloader: torch.utils.data.DataLoader, 
    test_dataloader: torch.utils.data.DataLoader, 
    optimizer: torch.optim.Optimizer,
    loss_fn: torch.nn.Module,
    epochs: int,
    device: torch.device
) -> Dict[str, List]:

    results = {
        "train_loss": [],
        "train_acc": [],
        "test_loss": [],
        "test_acc": []
    }
    
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(
            model=model,
            dataloader=train_dataloader,
            loss_fn=loss_fn,
            optimizer=optimizer,
            device=device
        )
        test_loss, test_acc = test_step(
            model=model,
            dataloader=test_dataloader,
            loss_fn=loss_fn,
            device=device
        )
        
        print(
            f"Epoch: {epoch+1} | "
            f"train_loss: {train_loss:.4f} | "
            f"train_acc: {train_acc:.4f} | "
            f"test_loss: {test_loss:.4f} | "
            f"test_acc: {test_acc:.4f}"
        )

        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)

    return results

### Test the Full Training Loop

In [14]:
torch.manual_seed(42)
model = model_builder.TinyVGG(
    input_shape=3,
    hidden_units=10,
    output_shape=len(class_names)
).to(device)

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Train for 5 epochs
results = train(
    model=model,
    train_dataloader=train_dataloader,
    test_dataloader=test_dataloader,
    optimizer=optimizer,
    loss_fn=loss_fn,
    epochs=5,
    device=device
)

In [15]:
print("\nTraining Results:")
print(f"Final train loss: {results['train_loss'][-1]:.4f}")
print(f"Final train accuracy: {results['train_acc'][-1]:.4f}")
print(f"Final test loss: {results['test_loss'][-1]:.4f}")
print(f"Final test accuracy: {results['test_acc'][-1]:.4f}")

## 6. Create `engine.py`

Now let's save all our training functions to `engine.py`!

In [16]:
%%writefile going_modular/engine.py
import torch

from tqdm.auto import tqdm
from typing import Dict, List, Tuple


def train_step(
    model: torch.nn.Module, 
    dataloader: torch.utils.data.DataLoader, 
    loss_fn: torch.nn.Module, 
    optimizer: torch.optim.Optimizer,
    device: torch.device
) -> Tuple[float, float]:

    model.train()
    train_loss, train_acc = 0, 0
    
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)
        y_pred = model(X)
        loss = loss_fn(y_pred, y)
        train_loss += loss.item() 
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
        train_acc += (y_pred_class == y).sum().item() / len(y_pred)

    train_loss = train_loss / len(dataloader)
    train_acc = train_acc / len(dataloader)
    return train_loss, train_acc


def test_step(
    model: torch.nn.Module, 
    dataloader: torch.utils.data.DataLoader, 
    loss_fn: torch.nn.Module,
    device: torch.device
) -> Tuple[float, float]:

    model.eval() 
    test_loss, test_acc = 0, 0
    
    with torch.inference_mode():
        for batch, (X, y) in enumerate(dataloader):
            X, y = X.to(device), y.to(device)
            test_pred_logits = model(X)
            loss = loss_fn(test_pred_logits, y)
            test_loss += loss.item()
            
            test_pred_labels = test_pred_logits.argmax(dim=1)
            test_acc += (test_pred_labels == y).sum().item() / len(test_pred_labels)
            
    test_loss = test_loss / len(dataloader)
    test_acc = test_acc / len(dataloader)
    return test_loss, test_acc


def train(
    model: torch.nn.Module, 
    train_dataloader: torch.utils.data.DataLoader, 
    test_dataloader: torch.utils.data.DataLoader, 
    optimizer: torch.optim.Optimizer,
    loss_fn: torch.nn.Module,
    epochs: int,
    device: torch.device
) -> Dict[str, List]:

    results = {
        "train_loss": [],
        "train_acc": [],
        "test_loss": [],
        "test_acc": []
    }
    
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(
            model=model,
            dataloader=train_dataloader,
            loss_fn=loss_fn,
            optimizer=optimizer,
            device=device
        )
        test_loss, test_acc = test_step(
            model=model,
            dataloader=test_dataloader,
            loss_fn=loss_fn,
            device=device
        )
        
        print(
            f"Epoch: {epoch+1} | "
            f"train_loss: {train_loss:.4f} | "
            f"train_acc: {train_acc:.4f} | "
            f"test_loss: {test_loss:.4f} | "
            f"test_acc: {test_acc:.4f}"
        )

        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)

    return results

### Testing `engine.py`

In [17]:
from going_modular import engine

torch.manual_seed(42)
model = model_builder.TinyVGG(
    input_shape=3,
    hidden_units=10,
    output_shape=len(class_names)
).to(device)

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

results = engine.train(
    model=model,
    train_dataloader=train_dataloader,
    test_dataloader=test_dataloader,
    optimizer=optimizer,
    loss_fn=loss_fn,
    epochs=5,
    device=device
)

## 7. Check Our Directory Structure

Let's verify our `going_modular` directory now has all three scripts.

In [18]:
print("Files in going_modular/:")
for file in os.listdir("going_modular"):
    print(f"  - {file}")

## Summary

In this lab, we created `engine.py` - the training engine for our modular PyTorch project:

1. **`train_step()`** - Trains the model for one epoch
   - Sets model to train mode
   - Performs forward pass, loss calculation, backward pass, optimizer step
   - Returns average loss and accuracy

2. **`test_step()`** - Evaluates the model for one epoch
   - Sets model to eval mode
   - Uses `torch.inference_mode()` for efficiency
   - Returns average loss and accuracy

3. **`train()`** - Full training loop
   - Combines train_step and test_step
   - Runs for specified number of epochs
   - Returns results dictionary with metrics history

Our directory structure now looks like:
```
going_modular/
    ├── data_setup.py      # DataLoader creation
    ├── model_builder.py   # TinyVGG model
    └── engine.py          # Training functions
```

In the next lab, we'll create `utils.py` and `train.py` to complete our modular pipeline!

## Exercises

1. **Add learning rate scheduling** - Modify the `train()` function to accept an optional learning rate scheduler and step it after each epoch.

2. **Add early stopping** - Implement early stopping in the `train()` function that stops training if test loss doesn't improve for N epochs.

3. **Add gradient clipping** - Modify `train_step()` to include gradient clipping to prevent exploding gradients.