# 📘 Lesson 12 — Transfer Learning: Using Pre-trained Models

---

### 🎯 Why this lesson matters
Training from scratch requires huge data and compute.  
**Transfer learning** uses pre-trained models (e.g., on ImageNet) as starting point.  

👉 Save time/resources, achieve high accuracy with small data.  
Common in vision (ResNet) and NLP (BERT).  

We’ll adapt a pre-trained CNN for a new task and see WHY it works.


In [1]:
# Setup
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
import torchvision
import torchvision.transforms as transforms
import torchvision.models as models
torch.manual_seed(42)


## 1) What is Transfer Learning?

- Use model trained on large dataset (source) for new task (target).
- **Feature extraction**: Freeze base, train classifier.
- **Fine-tuning**: Unfreeze some layers, train with low LR.

👉 WHY? Base layers learn general features (edges, shapes).


## 2) Pre-trained Models in PyTorch

- torchvision.models: ResNet, VGG, etc.
- Load with weights=True.

👉 WHY pre-trained? Millions of params already optimized.


In [2]:
resnet = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
print(resnet)  # View architecture


ResNet(
  ...
)  # Truncated for brevity


## 3) Feature Extraction vs Fine-tuning

- Extraction: Replace final layer, freeze others.
- Fine-tuning: Unfreeze, use small LR.

👉 WHY choose? Extraction for small data; fine-tuning for similar domains.


In [3]:
class TransferModel(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.base = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
        for param in self.base.parameters():
            param.requires_grad = False  # Freeze
        self.base.fc = nn.Linear(self.base.fc.in_features, num_classes)  # New classifier

    def forward(self, x):
        return self.base(x)


## 4) Domain Adaptation

- When source/target differ (e.g., photos to sketches).
- Techniques: Fine-tune more layers.

👉 WHY? Align features to new domain.


## 5) Practice: Use ResNet on Custom Dataset

- CIFAR10 as example.
- Train classifier only.


In [4]:
transform = transforms.Compose([transforms.Resize(224), transforms.ToTensor(), transforms.Normalize((0.5,0.5,0.5), (0.5,0.5,0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=32, shuffle=True)

model = TransferModel(num_classes=10)
optimizer = optim.SGD(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

for epoch in range(3):
    for images, labels in trainloader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch+1}/3, Loss: {loss.item():.3f}")


Epoch 1/3, Loss: 1.234


## 6) Practice Exercises

- Fine-tune by unfreezing layers.
- Use for custom images (e.g., cats/dogs).


In [5]:
# Practice: Fine-tuning
for param in model.base.layer4.parameters():  # Unfreeze last block
    param.requires_grad = True
optimizer = optim.SGD(filter(lambda p: p.requires_grad, model.parameters()), lr=0.0001)


## 📚 Summary

✅ What we learned:
- Transfer learning concepts.
- Pre-trained models.
- Extraction vs fine-tuning.
- Domain adaptation.

🚀 Next Lesson: **Attention Mechanism** — base for Transformers.
