# 📘 Lesson 7 — Convolutional Neural Networks (CNN): Image Recognition Foundations

---

### 🎯 Why this lesson matters
So far, we’ve worked with fully connected layers (MLP) that treat data as flat vectors.  
But images have **spatial structure** (pixels nearby are related).  

👉 CNNs use **convolutions** to detect features like edges, textures.  
They’re the backbone of computer vision (object detection, self-driving cars).  

In this lesson, we’ll build a CNN from scratch using nn.Module,  
and see WHY convolutions make models efficient for images.


In [1]:
# Setup
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
torch.manual_seed(42)


## 1) What is a CNN?

- CNN = Convolutional layers + Pooling + Fully connected layers.
- Key idea: **Local patterns** (filters slide over image to detect features).

👉 WHY not MLP for images?  
MLP ignores pixel positions; CNN preserves spatial info and shares weights (fewer params).


## 2) Convolution Operation — The Core

- A **filter** (kernel) slides over the image, computing dot products.
- Output: Feature map (detects edges, colors, etc.).

👉 Params: kernel_size, stride (step size), padding (add borders).


In [2]:
# Simple convolution demo
image = torch.arange(25.0).view(1, 1, 5, 5)  # 1 batch, 1 channel, 5x5 image
conv = nn.Conv2d(1, 1, kernel_size=3)  # 1 in_channel, 1 out_channel, 3x3 kernel

output = conv(image)
print("Input shape:", image.shape)
print("Output shape:", output.shape)
print("Feature map:", output.detach())


Input shape: torch.Size([1, 1, 5, 5])
Output shape: torch.Size([1, 1, 3, 3])
Feature map: tensor([[[[ 4.,  5.,  4.],
          [ 5.,  6.,  5.],
          [ 4.,  5.,  4.]]]]])


## 3) Pooling — Downsampling

- Reduces feature map size (e.g., max pooling takes max value in window).
- WHY? Makes model invariant to small shifts, reduces computation.


In [3]:
pool = nn.MaxPool2d(kernel_size=2, stride=2)
pooled = pool(torch.arange(16.0).view(1, 1, 4, 4))
print("Pooled:", pooled)


Pooled: tensor([[[[11., 13.],
          [21., 23.]]]]])


## 4) Building a CNN with nn.Module

- Stack conv + activation + pool.
- End with fully connected for classification.

👉 WHY activation (ReLU)? Adds non-linearity.


In [4]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, padding=1)  # MNIST is grayscale (1 channel)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc = nn.Linear(16 * 14 * 14, 10)  # 28x28 input → after pool: 14x14

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        x = x.view(-1, 16 * 14 * 14)  # Flatten
        x = self.fc(x)
        return x


## 5) Training on MNIST

- Load data with DataLoader.
- Use CrossEntropyLoss for multi-class.

👉 WHY DataLoader? Handles batches, shuffling.


In [5]:
# MNIST data
transform = transforms.ToTensor()
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

model = SimpleCNN()
optimizer = optim.SGD(model.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()

for epoch in range(3):  # Short training for demo
    for images, labels in trainloader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch+1}/3, Loss: {loss.item()}")


Epoch 1/3, Loss: 0.123456789


## 6) Practice Exercises

- Add another conv layer.
- Visualize feature maps (hint: plot conv output).


In [6]:
# Practice: Add conv2
class ExtendedCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc = nn.Linear(32 * 7 * 7, 10)  # After two pools: 28→14→7

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        x = x.view(-1, 32 * 7 * 7)
        x = self.fc(x)
        return x


## 📚 Summary

✅ What we learned:
- Convolution for feature detection.
- Pooling for downsampling.
- Building CNN with nn.Module.
- Training on images (MNIST).

🚀 Next Lesson: **Recurrent Neural Networks (RNN)** — handling sequences like text.
