# Lab 01: Transfer Learning Fundamentals

In this notebook, we'll explore **transfer learning** — a powerful technique that allows us to leverage pretrained models for our own problems. Instead of building a CNN from scratch, we'll use a model that's already learned from millions of images.

![Transfer Learning](https://raw.githubusercontent.com/poridhiEng/lab-asset/3cf35c4bc9e49c2beebb77f8f30429b9aecfb753/tensorcode/Deep-learning-with-pytorch/Transfer-learning-with-pytorch/Lab_01/images/infra-1.svg)

In the diagram above, **Task 1** shows a model trained on a large dataset (like ImageNet with 1000 classes). With **transfer learning**, we take this trained model and apply it to **Task 2** — keeping the same Model 1 but replacing Head 1 with a new Head 2 for our specific problem (3 classes: pizza, steak, sushi).

**Our goal**: Download a pretrained EfficientNet_B0 model, explore its architecture, freeze the base layers, and customize the classifier for our 3-class food classification problem.

## Install Dependencies

First, let's install the required libraries. We need:
- `torch` and `torchvision`: Core PyTorch libraries
- `torchinfo`: To get a visual summary of our model architecture

In [None]:
!pip install requests torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
!pip install torchinfo matplotlib

## Import Libraries

We need:
- `torch`: Core PyTorch library for tensors
- `torchvision`: For pretrained models and transforms
- `torchinfo`: To visualize model architecture
- `matplotlib`: For visualization
- `os`, `zipfile`, `requests`, `pathlib`: For downloading and managing data

In [None]:
import torch
import torchvision
from torch import nn
from torchvision import transforms
from torchinfo import summary

import matplotlib.pyplot as plt
import os
import zipfile
import requests
from pathlib import Path

print(f"PyTorch version: {torch.__version__}")
print(f"Torchvision version: {torchvision.__version__}")

## Setup Device

Let's setup device-agnostic code. We'll use GPU if available (faster), otherwise CPU.

In [None]:
# Setup device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

## 1. Get Data

Before we can use transfer learning, we need a dataset. We'll use a food classification dataset with **3 classes**:

![Food Dataset](https://raw.githubusercontent.com/poridhiEng/lab-asset/3cf35c4bc9e49c2beebb77f8f30429b9aecfb753/tensorcode/Deep-learning-with-pytorch/Transfer-learning-with-pytorch/Lab_01/images/infra-3.svg)

In this "FoodVision Mini" problem — classifying images of food into pizza, steak, and sushi categories. Let's download the dataset from `Poridhi's GitHub Repository` and unzip it.

In [None]:
# Setup path to data folder
data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi"

# If the image folder doesn't exist, download it and prepare it
if image_path.is_dir():
    print(f"{image_path} directory exists.")
else:
    print(f"Did not find {image_path} directory, creating one...")
    image_path.mkdir(parents=True, exist_ok=True)
    
    # Download pizza, steak, sushi data
    with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
        # Use raw.githubusercontent.com to get the actual file, not the HTML page
        request = requests.get("https://raw.githubusercontent.com/poridhioss/Introduction-to-Deep-Learning-with-Pytorch-Resources/main/Transfer-learning/pizza_steak_sushi.zip")
        print("Downloading pizza, steak, sushi data...")
        f.write(request.content)

    # Unzip pizza, steak, sushi data
    # The zip contains train/ and test/ folders directly, so extract to image_path
    with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:
        print("Unzipping pizza, steak, sushi data...") 
        zip_ref.extractall(image_path)  # Extract to data/pizza_steak_sushi/

    # Remove .zip file
    os.remove(data_path / "pizza_steak_sushi.zip")
    print("Done!")

### Setup Directory Paths

Now let's create paths to our training and test directories. The data is organized in the standard image classification format:

```
data/
└── pizza_steak_sushi/
    ├── train/
    │   ├── pizza/
    │   ├── steak/
    │   └── sushi/
    └── test/
        ├── pizza/
        ├── steak/
        └── sushi/
```

In [None]:
# Setup train and test directories
train_dir = image_path / "train"
test_dir = image_path / "test"

print(f"Training directory: {train_dir}")
print(f"Testing directory: {test_dir}")

## 2. Create Transforms for Pretrained Model

When using a pretrained model, it's crucial that **your data is prepared the same way as the original training data**.

For models pretrained on ImageNet:
- Images should be **224x224** pixels (minimum)
- Values should be in range **[0, 1]** (done by ToTensor)
- Normalized with **mean=[0.485, 0.456, 0.406]** and **std=[0.229, 0.224, 0.225]**

These mean and std values were calculated from ImageNet data.

### Method 1: Manual Transform Creation

We can manually create the transforms using `torchvision.transforms`:

In [None]:
# Create a transforms pipeline manually
manual_transforms = transforms.Compose([
    transforms.Resize((224, 224)),  # 1. Resize to 224x224
    transforms.ToTensor(),  # 2. Convert to tensor (scales to 0-1)
    transforms.Normalize(mean=[0.485, 0.456, 0.406],  # 3. Normalize with ImageNet stats
                         std=[0.229, 0.224, 0.225])
])

print("Manual transforms created:")
print(manual_transforms)

### Method 2: Automatic Transform Creation (Recommended)

Since `torchvision` v0.13+, we can automatically get the transforms used to train a pretrained model. This ensures we're using the **exact same transforms** as the original training.

We access transforms through the model weights:

In [None]:
# Get a set of pretrained model weights
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT  # .DEFAULT = best available weights
print(f"Weights: {weights}")

# Get the transforms used to create our pretrained weights
auto_transforms = weights.transforms()
print(f"\nAutomatic transforms:")
print(auto_transforms)

Notice how `auto_transforms` is very similar to `manual_transforms`. The benefit of automatic creation is you're guaranteed to use the same transforms as the pretrained model.

We'll use `auto_transforms` for our DataLoaders.

## 3. Create Datasets and DataLoaders

Now let's create our training and test datasets using `ImageFolder` and wrap them in DataLoaders.

`ImageFolder` automatically:
- Loads images from a directory structure
- Assigns labels based on subdirectory names
- Applies the specified transforms

In [None]:
from torchvision import datasets
from torch.utils.data import DataLoader

# Create training dataset
train_dataset = datasets.ImageFolder(
    root=train_dir,
    transform=auto_transforms
)

# Create test dataset
test_dataset = datasets.ImageFolder(
    root=test_dir,
    transform=auto_transforms
)

# Get class names
class_names = train_dataset.classes
print(f"Class names: {class_names}")
print(f"Number of training samples: {len(train_dataset)}")
print(f"Number of test samples: {len(test_dataset)}")

In [None]:
# Create DataLoaders
BATCH_SIZE = 32

train_dataloader = DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=0  # Set to 0 for compatibility
)

test_dataloader = DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,
    num_workers=0
)

print(f"Number of training batches: {len(train_dataloader)}")
print(f"Number of test batches: {len(test_dataloader)}")

### Visualize Sample Images

Let's visualize a few images from our dataset to understand what we're working with.

In [None]:
# Get a batch of images and labels
images, labels = next(iter(train_dataloader))
print(f"Image batch shape: {images.shape}")
print(f"Label batch shape: {labels.shape}")

# Plot some images
fig, axes = plt.subplots(1, 4, figsize=(12, 4))
for i, ax in enumerate(axes):
    # Denormalize the image for display
    img = images[i].permute(1, 2, 0).numpy()
    mean = [0.485, 0.456, 0.406]
    std = [0.229, 0.224, 0.225]
    img = img * std + mean  # Denormalize
    img = img.clip(0, 1)  # Clip values to valid range
    
    ax.imshow(img)
    ax.set_title(class_names[labels[i]])
    ax.axis('off')
plt.tight_layout()
plt.show()

## 4. Get and Explore a Pretrained Model

Now comes the exciting part — downloading a pretrained model!

We'll use **EfficientNet_B0** from `torchvision.models`. This model:
- Has been trained on ImageNet (1.2 million images, 1000 classes)
- Achieves ~77.7% top-1 accuracy on ImageNet
- Has about 5.3 million parameters
- Is a good balance of performance and efficiency

### Loading the Pretrained Model

In [None]:
# Setup the model with pretrained weights
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
model = torchvision.models.efficientnet_b0(weights=weights)

# Send model to device
model = model.to(device)

print("EfficientNet_B0 loaded successfully!")
print(f"Model is on: {next(model.parameters()).device}")

### Understanding the Model Architecture

Let's print the model to see its structure. EfficientNet_B0 has three main parts:

1. **features**: The convolutional backbone (feature extractor)
2. **avgpool**: Adaptive average pooling layer
3. **classifier**: The final classification layers

Let's look at just the high-level structure first:

In [None]:
# Print high-level structure (just the main components)
print("EfficientNet_B0 Main Components:")
print("=" * 50)
for name, module in model.named_children():
    print(f"\n{name}:")
    if name == "features":
        print(f"  Contains {len(list(module.children()))} sub-modules (convolutional blocks)")
    elif name == "avgpool":
        print(f"  {module}")
    elif name == "classifier":
        print(f"  {module}")

### Getting a Detailed Model Summary

Let's use `torchinfo.summary()` to get a detailed view of the model. This shows:
- Input and output shapes at each layer
- Number of parameters
- Which layers are trainable

In [None]:
# Get a detailed summary of the model BEFORE freezing
print("Model Summary (BEFORE freezing and modifying):")
print("=" * 80)
summary(model=model,
        input_size=(1, 3, 224, 224),  # (batch_size, color_channels, height, width)
        col_names=["input_size", "output_size", "num_params", "trainable"],
        col_width=20,
        row_settings=["var_names"])

### Key Observations

From the summary, notice:

1. **Total parameters**: ~5.3 million (much more than our TinyVGG model!)
2. **All layers are trainable**: The "Trainable" column shows `True` for all layers
3. **Output shape**: The classifier outputs `[batch_size, 1000]` (for ImageNet's 1000 classes)

We need to:
1. **Freeze** the feature extractor layers (keep pretrained weights)
2. **Modify** the classifier to output 3 classes instead of 1000

## 5. Freeze the Base Model Layers

**Freezing** layers means keeping them unchanged during training. We want to:
- Keep the learned patterns in the `features` section (they're already great at extracting image features)
- Only train the `classifier` section (to learn our specific classes)

To freeze layers, we set `requires_grad=False` for their parameters.

In [None]:
# Freeze all base layers in the "features" section
for param in model.features.parameters():
    param.requires_grad = False

print("Base layers frozen!")
print(f"\nChecking requires_grad for first layer in features:")
print(f"  requires_grad = {next(model.features.parameters()).requires_grad}")

Now the `features` layers won't be updated during training. This has two benefits:

1. **Preserves learned patterns**: The pretrained weights stay intact
2. **Faster training**: Fewer parameters to update means faster training

## 6. Modify the Classifier Layer

The original classifier outputs 1000 classes (for ImageNet). We only have 3 classes (pizza, steak, sushi).

Let's look at the current classifier:

In [None]:
# View the current classifier
print("Current classifier:")
print(model.classifier)
print(f"\nOutput features: {model.classifier[1].out_features}")

The classifier has:
- **Dropout(p=0.2)**: Regularization to prevent overfitting
- **Linear(1280, 1000)**: Maps from 1280 features to 1000 classes

We'll replace this with a new classifier that outputs 3 classes:

In [None]:
# Set random seeds for reproducibility
torch.manual_seed(42)
torch.cuda.manual_seed(42)

# Get the number of output classes
output_shape = len(class_names)
print(f"Number of output classes: {output_shape}")
print(f"Class names: {class_names}")

# Create a new classifier layer
model.classifier = nn.Sequential(
    nn.Dropout(p=0.2, inplace=True),
    nn.Linear(in_features=1280,  # Must match the output of avgpool
              out_features=output_shape,  # 3 classes for our problem
              bias=True)
).to(device)

print(f"\nNew classifier:")
print(model.classifier)

## 7. Verify the Modified Model

Let's get another summary to see what changed. Pay attention to:
- The **Trainable** column (most layers should be `False` now)
- The **classifier output** (should be 3 instead of 1000)
- The number of **trainable parameters** (should be much smaller)

In [None]:
# Get a detailed summary of the model AFTER freezing and modifying
print("Model Summary (AFTER freezing and modifying):")
print("=" * 80)
summary(model=model,
        input_size=(1, 3, 224, 224),
        col_names=["input_size", "output_size", "num_params", "trainable"],
        col_width=20,
        row_settings=["var_names"])

### Key Changes

Compare before and after:

| Aspect | Before | After |
|--------|--------|-------|
| Features layers | Trainable | **Frozen** |
| Classifier output | 1000 | **3** |
| Trainable params | ~5.3M | **~3,843** |
| Non-trainable params | 0 | **~5.3M** |

We went from training 5.3 million parameters to just 3,843! This means:
- **Faster training** (fewer gradients to compute)
- **Less memory** required
- **Less risk of overfitting** (fewer parameters to tune)

## 8. Test the Model Architecture

Let's verify our model works correctly by passing a sample batch through it.

In [None]:
# Get a batch of images
images, labels = next(iter(train_dataloader))
images = images.to(device)

# Put model in evaluation mode
model.eval()

# Make predictions (without computing gradients)
with torch.inference_mode():
    outputs = model(images)

print(f"Input shape: {images.shape}")
print(f"Output shape: {outputs.shape}")
print(f"\nExpected output shape: [batch_size, num_classes] = [{BATCH_SIZE}, {len(class_names)}]")
print(f"Actual output shape matches expected: {outputs.shape == torch.Size([BATCH_SIZE, len(class_names)])}")

In [None]:
# Look at the raw outputs (logits) for first 3 samples
print("Raw outputs (logits) for first 3 samples:")
print(outputs[:3])

# Convert to probabilities using softmax
probs = torch.softmax(outputs[:3], dim=1)
print(f"\nProbabilities for first 3 samples:")
print(probs)

# Get predicted classes
pred_classes = torch.argmax(probs, dim=1)
print(f"\nPredicted classes: {pred_classes.tolist()}")
print(f"Predicted labels: {[class_names[i] for i in pred_classes.tolist()]}")
print(f"Actual labels: {[class_names[i] for i in labels[:3].tolist()]}")

The model is outputting predictions, but they're essentially random since we haven't trained the classifier yet. The pretrained features are providing representations, but the classifier doesn't know how to map them to our classes.

**In Lab 02**, we'll train the classifier and see the accuracy improve dramatically!

## 9. Count Trainable vs Non-Trainable Parameters

Let's explicitly count how many parameters are trainable vs frozen.

In [None]:
# Count trainable and non-trainable parameters
def count_parameters(model):
    trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
    non_trainable = sum(p.numel() for p in model.parameters() if not p.requires_grad)
    total = trainable + non_trainable
    return trainable, non_trainable, total

trainable, non_trainable, total = count_parameters(model)

print("Parameter Count Summary")
print("=" * 40)
print(f"Trainable parameters:     {trainable:,}")
print(f"Non-trainable parameters: {non_trainable:,}")
print(f"Total parameters:         {total:,}")
print(f"\nPercentage trainable: {100 * trainable / total:.2f}%")

Only about **0.1%** of the parameters are trainable! This is the power of transfer learning — we get the benefit of 5+ million learned parameters while only needing to train ~4,000.

## Summary

In this lab, we:

1. **Learned about transfer learning** — using pretrained models for our own problems
2. **Downloaded a dataset** — pizza, steak, sushi images
3. **Created transforms** — both manual and automatic methods
4. **Loaded a pretrained EfficientNet_B0** — ~77.7% accuracy on ImageNet
5. **Explored the model architecture** — features, avgpool, classifier
6. **Froze the base layers** — set `requires_grad=False`
7. **Modified the classifier** — changed output from 1000 to 3 classes
8. **Verified the setup** — tested that the model produces correct output shapes

### Key Takeaways

- **Transfer learning** lets us leverage pretrained models for new problems
- **Freezing layers** preserves learned patterns and speeds up training
- **Only ~3,843 trainable parameters** vs 5+ million total
- **Data must match** the format used to train the pretrained model

### Next Steps

In **Lab 02**, we'll:
- Train our modified model on the pizza/steak/sushi dataset
- Plot loss curves to evaluate training
- Make predictions on test images and custom images
- Compare results to our previous TinyVGG model