# Lab 02: Training & Evaluating a Transfer Learning Model

In Lab 01, we set up our transfer learning pipeline: downloaded a pretrained EfficientNet_B0 model, froze the base layers, and modified the classifier. Now it's time to train and see the power of transfer learning!

**Our goal**: Train the model on pizza/steak/sushi images and achieve high accuracy with minimal training.

![Our Problem](https://raw.githubusercontent.com/poridhiEng/lab-asset/3cf35c4bc9e49c2beebb77f8f30429b9aecfb753/tensorcode/Deep-learning-with-pytorch/Transfer-learning-with-pytorch/Lab_02/images/infra-3.svg)

## Install Dependencies

First, let's install the required libraries.

In [None]:
!pip install requests torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
!pip install torchinfo matplotlib tqdm

## Setup from Lab 01

Before we start training, we need to recreate everything from Lab 01:
- Download the dataset
- Create DataLoaders with proper transforms
- Load pretrained EfficientNet_B0
- Freeze base layers and modify classifier

Run this cell to set everything up:

In [None]:
import torch
import torchvision
from torch import nn
from torchvision import transforms, datasets
from torch.utils.data import DataLoader
from torchinfo import summary
from tqdm.auto import tqdm

import matplotlib.pyplot as plt
import os
import zipfile
import requests
from pathlib import Path
from typing import List, Tuple
from PIL import Image
import random

print(f"PyTorch version: {torch.__version__}")
print(f"Torchvision version: {torchvision.__version__}")

# Setup device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

In [None]:
# Download and setup data
data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi"

if image_path.is_dir():
    print(f"{image_path} directory exists.")
else:
    print(f"Downloading data...")
    image_path.mkdir(parents=True, exist_ok=True)
    with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
        # Use raw.githubusercontent.com to get the actual file
        request = requests.get("https://raw.githubusercontent.com/poridhioss/Introduction-to-Deep-Learning-with-Pytorch-Resources/main/Transfer-learning/pizza_steak_sushi.zip")
        f.write(request.content)
    # The zip contains train/ and test/ folders directly, so extract to image_path
    with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:
        zip_ref.extractall(image_path)  # Extract to data/pizza_steak_sushi/
    os.remove(data_path / "pizza_steak_sushi.zip")
    print("Done!")

train_dir = image_path / "train"
test_dir = image_path / "test"

In [None]:
# Get pretrained weights and transforms
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
auto_transforms = weights.transforms()

# Create datasets
train_dataset = datasets.ImageFolder(root=train_dir, transform=auto_transforms)
test_dataset = datasets.ImageFolder(root=test_dir, transform=auto_transforms)

class_names = train_dataset.classes
print(f"Class names: {class_names}")

# Create DataLoaders
BATCH_SIZE = 32
train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
test_dataloader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=0)

print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")

In [None]:
# Load pretrained model
model = torchvision.models.efficientnet_b0(weights=weights).to(device)

# Freeze base layers
for param in model.features.parameters():
    param.requires_grad = False

# Modify classifier
torch.manual_seed(42)
torch.cuda.manual_seed(42)

model.classifier = nn.Sequential(
    nn.Dropout(p=0.2, inplace=True),
    nn.Linear(in_features=1280, out_features=len(class_names), bias=True)
).to(device)

print("Model setup complete!")
print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

## 1. Create Loss Function and Optimizer

For multiclass classification, we use:
- **CrossEntropyLoss**: Standard loss for classification (includes softmax internally)
- **Adam optimizer**: Adaptive learning rate optimizer, works well with transfer learning

We use `lr=0.001` as a starting learning rate.

In [None]:
# Define loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

print(f"Loss function: {loss_fn}")
print(f"Optimizer: {optimizer}")

## 2. Create Training and Testing Step Functions

Let's create reusable functions for training and testing. The diagram below shows the training loop process:

![Training Loop](https://raw.githubusercontent.com/poridhiEng/lab-asset/3cf35c4bc9e49c2beebb77f8f30429b9aecfb753/tensorcode/Deep-learning-with-pytorch/Transfer-learning-with-pytorch/Lab_02/images/infra-2.svg)

For each epoch, we iterate through batches of data and perform these steps: zero the gradients (`optimizer.zero_grad()`), run the forward pass (`model(features)`), compute the loss, run the backward pass (`loss.backward()`), and update weights with the optimizer (`optimizer.step()`). This cycle repeats for every batch until all epochs are complete.

### Training Step

In [None]:
def train_step(model: nn.Module,
               dataloader: DataLoader,
               loss_fn: nn.Module,
               optimizer: torch.optim.Optimizer,
               device: torch.device) -> Tuple[float, float]:
    """Performs a training step.
    
    Returns:
        Tuple of (train_loss, train_accuracy)
    """
    model.train()
    train_loss, train_acc = 0, 0
    
    for batch, (X, y) in enumerate(dataloader):
        # Send data to device
        X, y = X.to(device), y.to(device)
        
        # 1. Zero gradients
        optimizer.zero_grad()
        
        # 2. Forward pass
        y_pred = model(X)
        
        # 3. Calculate loss
        loss = loss_fn(y_pred, y)
        train_loss += loss.item()
        
        # 4. Backward pass
        loss.backward()
        
        # 5. Optimizer step
        optimizer.step()
        
        # Calculate accuracy
        y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
        train_acc += (y_pred_class == y).sum().item() / len(y)
    
    # Average loss and accuracy
    train_loss = train_loss / len(dataloader)
    train_acc = train_acc / len(dataloader)
    
    return train_loss, train_acc

### Testing Step

Similar to training, but:
- Use `model.eval()` mode
- Use `torch.inference_mode()` context (no gradient tracking)
- No optimizer step (we're just evaluating)

In [None]:
def test_step(model: nn.Module,
              dataloader: DataLoader,
              loss_fn: nn.Module,
              device: torch.device) -> Tuple[float, float]:
    """Performs a testing step.
    
    Returns:
        Tuple of (test_loss, test_accuracy)
    """
    model.eval()
    test_loss, test_acc = 0, 0
    
    with torch.inference_mode():
        for batch, (X, y) in enumerate(dataloader):
            # Send data to device
            X, y = X.to(device), y.to(device)
            
            # Forward pass
            y_pred = model(X)
            
            # Calculate loss
            loss = loss_fn(y_pred, y)
            test_loss += loss.item()
            
            # Calculate accuracy
            y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
            test_acc += (y_pred_class == y).sum().item() / len(y)
    
    # Average loss and accuracy
    test_loss = test_loss / len(dataloader)
    test_acc = test_acc / len(dataloader)
    
    return test_loss, test_acc

## 3. Train the Model

Now let's train our model! We'll train for **5 epochs** and track:
- Training loss and accuracy
- Test loss and accuracy

We'll store results in a dictionary for easy plotting later.

In [None]:
from timeit import default_timer as timer

# Set random seeds for reproducibility
torch.manual_seed(42)
torch.cuda.manual_seed(42)

# Set number of epochs
NUM_EPOCHS = 5

# Create results dictionary
results = {
    "train_loss": [],
    "train_acc": [],
    "test_loss": [],
    "test_acc": []
}

# Start timer
start_time = timer()

# Training loop
for epoch in tqdm(range(NUM_EPOCHS)):
    # Training step
    train_loss, train_acc = train_step(
        model=model,
        dataloader=train_dataloader,
        loss_fn=loss_fn,
        optimizer=optimizer,
        device=device
    )
    
    # Testing step
    test_loss, test_acc = test_step(
        model=model,
        dataloader=test_dataloader,
        loss_fn=loss_fn,
        device=device
    )
    
    # Store results
    results["train_loss"].append(train_loss)
    results["train_acc"].append(train_acc)
    results["test_loss"].append(test_loss)
    results["test_acc"].append(test_acc)
    
    # Print progress
    print(
        f"Epoch: {epoch+1} | "
        f"Train Loss: {train_loss:.4f} | "
        f"Train Acc: {train_acc:.4f} | "
        f"Test Loss: {test_loss:.4f} | "
        f"Test Acc: {test_acc:.4f}"
    )

# End timer
end_time = timer()
print(f"\n[INFO] Total training time: {end_time - start_time:.3f} seconds")

### Training Results

Look at those results! In just 5 epochs and a few seconds of training, we achieved **~85%+ test accuracy**.

That's the power of transfer learning!

## 4. Plot Loss Curves

Let's visualize the training progress with loss curves. This helps us understand:
- Is the model learning? (loss decreasing)
- Is it overfitting? (train loss low, test loss high)
- Is it underfitting? (both losses high)

In [None]:
def plot_loss_curves(results: dict):
    """Plots training and test loss/accuracy curves."""
    epochs = range(1, len(results["train_loss"]) + 1)
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Plot loss
    axes[0].plot(epochs, results["train_loss"], label="Train Loss")
    axes[0].plot(epochs, results["test_loss"], label="Test Loss")
    axes[0].set_xlabel("Epoch")
    axes[0].set_ylabel("Loss")
    axes[0].set_title("Loss Curves")
    axes[0].legend()
    axes[0].grid(True)
    
    # Plot accuracy
    axes[1].plot(epochs, results["train_acc"], label="Train Accuracy")
    axes[1].plot(epochs, results["test_acc"], label="Test Accuracy")
    axes[1].set_xlabel("Epoch")
    axes[1].set_ylabel("Accuracy")
    axes[1].set_title("Accuracy Curves")
    axes[1].legend()
    axes[1].grid(True)
    
    plt.tight_layout()
    plt.show()

# Plot the loss curves
plot_loss_curves(results)

### Analyzing the Loss Curves

From the plots, we can see:

1. **Both losses are decreasing**: The model is learning!
2. **Losses are converging**: Train and test loss are close, no major overfitting
3. **Accuracy is increasing**: Both train and test accuracy improving
4. **Fast convergence**: Good results after just 5 epochs

This is the ideal scenario for transfer learning — the pretrained features provide such a good starting point that we converge quickly.

## 5. Make Predictions on Test Set

Let's visualize our model's predictions on some test images. This helps us qualitatively assess:
- How confident is the model?
- What kinds of images does it get right/wrong?

### Create Prediction Function

In [None]:
def pred_and_plot_image(model: nn.Module,
                        image_path: str,
                        class_names: List[str],
                        image_size: Tuple[int, int] = (224, 224),
                        transform: torchvision.transforms = None,
                        device: torch.device = device):
    """
    Makes a prediction on an image and plots it with the prediction.
    """
    # Open image
    img = Image.open(image_path)
    
    # Create transform if not provided
    if transform is not None:
        image_transform = transform
    else:
        image_transform = transforms.Compose([
            transforms.Resize(image_size),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225]),
        ])
    
    # Make prediction
    model.to(device)
    model.eval()
    
    with torch.inference_mode():
        # Transform and add batch dimension
        transformed_image = image_transform(img).unsqueeze(dim=0)
        
        # Make prediction
        target_image_pred = model(transformed_image.to(device))
    
    # Convert logits to probabilities
    target_image_pred_probs = torch.softmax(target_image_pred, dim=1)
    
    # Get predicted label
    target_image_pred_label = torch.argmax(target_image_pred_probs, dim=1)
    
    # Plot image with prediction
    plt.figure(figsize=(6, 6))
    plt.imshow(img)
    plt.title(f"Pred: {class_names[target_image_pred_label]} | Prob: {target_image_pred_probs.max():.3f}")
    plt.axis(False)
    plt.show()

### Predict on Random Test Images

Let's select some random images from the test set and see how our model performs.

In [None]:
# Get a list of all test image paths
test_image_paths = list(Path(test_dir).glob("*/*.jpg"))
print(f"Total test images: {len(test_image_paths)}")

# Randomly select 3 images
random.seed(42)
sample_image_paths = random.sample(test_image_paths, k=3)

# Make predictions on each
for image_path in sample_image_paths:
    pred_and_plot_image(
        model=model,
        image_path=image_path,
        class_names=class_names,
        transform=auto_transforms
    )

### Visualize Multiple Predictions

Let's create a grid of predictions to see more results at once.

In [None]:
def predict_and_plot_grid(model: nn.Module,
                          test_dir: Path,
                          class_names: List[str],
                          transform,
                          n_images: int = 9,
                          device: torch.device = device):
    """Predicts on multiple images and plots them in a grid."""
    
    # Get random image paths
    test_image_paths = list(test_dir.glob("*/*.jpg"))
    sample_paths = random.sample(test_image_paths, k=n_images)
    
    # Setup plot
    n_cols = 3
    n_rows = (n_images + n_cols - 1) // n_cols
    fig, axes = plt.subplots(n_rows, n_cols, figsize=(12, n_rows * 4))
    axes = axes.flatten()
    
    model.eval()
    
    for i, image_path in enumerate(sample_paths):
        # Load and transform image
        img = Image.open(image_path)
        img_transformed = transform(img).unsqueeze(0).to(device)
        
        # Predict
        with torch.inference_mode():
            pred_logits = model(img_transformed)
            pred_probs = torch.softmax(pred_logits, dim=1)
            pred_label = torch.argmax(pred_probs, dim=1).item()
            pred_prob = pred_probs.max().item()
        
        # Get true label from path
        true_label = image_path.parent.name
        
        # Plot
        axes[i].imshow(img)
        
        # Color title based on correctness
        pred_class = class_names[pred_label]
        color = "green" if pred_class == true_label else "red"
        axes[i].set_title(f"Pred: {pred_class} ({pred_prob:.2f})\nTrue: {true_label}", color=color)
        axes[i].axis("off")
    
    plt.tight_layout()
    plt.show()

# Plot predictions grid
random.seed(123)  # Different seed for variety
predict_and_plot_grid(
    model=model,
    test_dir=test_dir,
    class_names=class_names,
    transform=auto_transforms,
    n_images=9
)

Green titles indicate correct predictions, red titles indicate incorrect predictions. The model should get most of these right!

## 6. Make Predictions on Custom Images

The real test of a model is predicting on completely new data — images it's never seen before.

Let's download a custom image and test our model on it.

In [None]:
# Download a custom image
custom_image_path = data_path / "pizza-dad.jpeg"

if not custom_image_path.is_file():
    with open(custom_image_path, "wb") as f:
        request = requests.get("https://github.com/poridhiEng/lab-asset/blob/main/tensorcode/Deep-learning-with-pytorch/Transfer-learning-with-pytorch/Lab_02/images/image-1.png?raw=true")
        print(f"Downloading {custom_image_path}...")
        f.write(request.content)
else:
    print(f"{custom_image_path} already exists.")

# Predict on custom image
pred_and_plot_image(
    model=model,
    image_path=custom_image_path,
    class_names=class_names,
    transform=auto_transforms
)

The model correctly identifies the pizza in the image! And notice the high confidence — this is much better than our TinyVGG model which had lower confidence.

## 7. Save the Trained Model

Let's save our trained model so we can use it later without retraining.

In [None]:
# Create models directory
MODEL_PATH = Path("models")
MODEL_PATH.mkdir(parents=True, exist_ok=True)

# Define model save path
MODEL_NAME = "efficientnet_b0_pizza_steak_sushi.pth"
MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME

# Save the model state dict
torch.save(obj=model.state_dict(), f=MODEL_SAVE_PATH)

print(f"Model saved to: {MODEL_SAVE_PATH}")

### Load the Model (Verification)

Let's verify we can load the model back correctly.

In [None]:
# Create a new model instance
loaded_model = torchvision.models.efficientnet_b0(weights=None)  # No pretrained weights

# Modify the classifier (same as before)
loaded_model.classifier = nn.Sequential(
    nn.Dropout(p=0.2, inplace=True),
    nn.Linear(in_features=1280, out_features=len(class_names), bias=True)
)

# Load the saved state dict
loaded_model.load_state_dict(torch.load(MODEL_SAVE_PATH))
loaded_model = loaded_model.to(device)

print("Model loaded successfully!")

# Verify by making a prediction
loaded_model.eval()
with torch.inference_mode():
    img = Image.open(custom_image_path)
    img_transformed = auto_transforms(img).unsqueeze(0).to(device)
    pred = loaded_model(img_transformed)
    pred_class = class_names[torch.argmax(pred, dim=1).item()]
    
print(f"Loaded model prediction: {pred_class}")

## 8. Compare Results: Transfer Learning vs From Scratch

Let's summarize the comparison between training from scratch and transfer learning:

In [None]:
# Final summary
print("="*60)
print("COMPARISON: Transfer Learning vs Training From Scratch")
print("="*60)
print(f"\nDataset: Pizza, Steak, Sushi ({len(train_dataset)} train, {len(test_dataset)} test)")
print(f"Epochs: {NUM_EPOCHS}")
print()
print("+" + "-"*58 + "+")
print(f"| {'Metric':<25} | {'TinyVGG':<12} | {'EfficientNet_B0':<12} |")
print("+" + "-"*58 + "+")
print(f"| {'Test Accuracy':<25} | {'~40%':<12} | {'~' + str(int(results['test_acc'][-1]*100)) + '%':<12} |")
print(f"| {'Parameters Trained':<25} | {'8,083':<12} | {'3,843':<12} |")
print(f"| {'Total Parameters':<25} | {'8,083':<12} | {'~5.3M':<12} |")
print(f"| {'Pretrained':<25} | {'No':<12} | {'Yes (ImageNet)':<12} |")
print("+" + "-"*58 + "+")
print()
print("Key Insight: Transfer learning achieves 2x+ accuracy while")
print("training fewer parameters, thanks to pretrained features!")

## Summary

In this lab, we:

1. **Set up training components** — CrossEntropyLoss and Adam optimizer
2. **Created training and testing functions** — Reusable step functions
3. **Trained for 5 epochs** — Achieved ~85%+ test accuracy
4. **Plotted loss curves** — Visualized training progress
5. **Made predictions on test images** — Model works well on unseen data
6. **Predicted on custom images** — Works on real-world photos
7. **Saved the trained model** — For future use

### Key Takeaways

1. **Transfer learning is incredibly powerful**: ~85%+ accuracy vs ~40% from scratch
2. **Less training required**: Only 3,843 parameters trained, fast convergence
3. **Works with small datasets**: The pretrained features generalize well
4. **Easy to implement**: Just freeze layers and modify classifier

### What's Next?

You can extend this work by:
- Training for more epochs
- Using a larger EfficientNet (B1, B2, etc.)
- Fine-tuning the entire model (unfreezing some base layers)
- Adding data augmentation
- Trying on your own custom dataset