# Exercise 01: Train a Real CNN (Minimal, Honest)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shang-vikas/series1-coding-exercises/blob/main/exercises/blog-04/exercise-01.ipynb)

## Setup

In [2]:
# Install required packages using the kernel's Python interpreter
import sys
import subprocess
import importlib

def install_if_missing(package, import_name=None):
    """Install package if it's not already installed."""
    if import_name is None:
        import_name = package

    try:
        importlib.import_module(import_name)
        print(f"‚úì {package} is already installed")
    except ImportError:
        print(f"Installing {package}....")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
        print(f"‚úì {package} installed successfully")

# Install required packages
install_if_missing("torch")
install_if_missing("torchvision")

‚úì torch is already installed
‚úì torchvision is already installed


## üß™ Final Exercise ‚Äî Train a Real CNN (Minimal, Honest)

We'll use:

**Fashion MNIST**
(Real dataset, harder than MNIST digits)

**Why?**

- Small
- Real images
- Clear spatial patterns
- Good demo for CNN vs MLP

### Part 1 ‚Äî Show the Architecture First

Before training.

In [3]:
import torch
import torch.nn as nn

class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()

        self.features = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=3),   # 28x28 ‚Üí 26x26
            nn.ReLU(),
            nn.MaxPool2d(2),                   # 26x26 ‚Üí 13x13

            nn.Conv2d(16, 32, kernel_size=3),  # 13x13 ‚Üí 11x11
            nn.ReLU(),
            nn.MaxPool2d(2)                    # 11x11 ‚Üí 5x5
        )

        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(32 * 5 * 5, 64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

**Print Architecture**

In [4]:
model = SimpleCNN()
print(model)

SimpleCNN(
  (features): Sequential(
    (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1))
    (4): ReLU()
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=800, out_features=64, bias=True)
    (2): ReLU()
    (3): Linear(in_features=64, out_features=10, bias=True)
  )
)


Have readers inspect:

- Conv ‚Üí ReLU ‚Üí Pool
- Conv ‚Üí ReLU ‚Üí Pool
- Dense layers

Then:

In [5]:
total_params = sum(p.numel() for p in model.parameters())
print("Total parameters:", total_params)

Total parameters: 56714


They see actual size.

### Part 2 ‚Äî Load Real Data

In [6]:
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

transform = transforms.Compose([
    transforms.ToTensor()
])

train_data = datasets.FashionMNIST(
    root="./data", train=True, download=True, transform=transform
)

test_data = datasets.FashionMNIST(
    root="./data", train=False, download=True, transform=transform
)

train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
test_loader = DataLoader(test_data, batch_size=64)

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 26.4M/26.4M [00:02<00:00, 12.9MB/s]
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 29.5k/29.5k [00:00<00:00, 202kB/s]
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4.42M/4.42M [00:01<00:00, 3.79MB/s]
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5.15k/5.15k [00:00<00:00, 14.7MB/s]


### Part 3 ‚Äî Train It

In [7]:
import torch.optim as optim

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

for epoch in range(5):
    model.train()
    total_loss = 0

    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)

        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    print(f"Epoch {epoch+1} | Loss: {total_loss/len(train_loader):.4f}")

Epoch 1 | Loss: 0.5946
Epoch 2 | Loss: 0.3837
Epoch 3 | Loss: 0.3395
Epoch 4 | Loss: 0.3075
Epoch 5 | Loss: 0.2882


### Part 4 ‚Äî Evaluate

In [8]:
model.eval()
correct = 0
total = 0

with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print("Test Accuracy:", correct / total)

Test Accuracy: 0.8846


**Expected: ~88‚Äì92% with very simple CNN.**

## üî• Now Do the Important Comparison

Replace CNN with:

In [9]:
class SimpleMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )

    def forward(self, x):
        return self.model(x)

In [1]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)
    

In [None]:
mlp_model = SimpleMLP().to(device)
print("MLP Parameters:", count_parameters(mlp_model))


In [None]:
def train_model(model, train_loader, test_loader, epochs=5):
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    for epoch in range(epochs):
        model.train()
        total_loss = 0

        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)

            outputs = model(images)
            loss = criterion(outputs, labels)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            total_loss += loss.item()

        print(f"Epoch {epoch+1} | Loss: {total_loss/len(train_loader):.4f}")

    # Evaluation
    model.eval()
    correct = 0
    total = 0

    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    print("Test Accuracy:", correct / total)


In [None]:
print("\nTraining MLP")
train_model(mlp_model, train_loader, test_loader)
print(f"MLP has {count_parameters(mlp_model):,} parameters")


Train same way.

**Compare:**

- Parameter count
- Accuracy
- Training speed

Engineers will see:

**CNN wins with fewer parameters.**

Because structure matters.

## üí° Architecture Diagram

```
Input (1x28x28)
      ‚Üì
Conv(3x3,16)
      ‚Üì
ReLU
      ‚Üì
MaxPool(2x2)
      ‚Üì
Conv(3x3,32)
      ‚Üì
ReLU
      ‚Üì
MaxPool(2x2)
      ‚Üì
Flatten
      ‚Üì
Dense(64)
      ‚Üì
Dense(10)
```

## Why This Is Powerful

They now see:

- CNN enforces locality
- Weight sharing reduces parameters
- Pooling builds invariance
- Hierarchy builds abstraction
- Architecture improves optimization

Not philosophy.

Not hype.

Structure.

In [11]:
## üîç Four Upgrades: Visualize What the Network Learned

We'll add four upgrades:

1. Visualize first-layer filters
2. Show intermediate feature maps
3. Plot misclassified examples
4. Show a simple adversarial attack

All clean. All practical.

SyntaxError: unterminated string literal (detected at line 3) (ipython-input-627824351.py, line 3)

In [None]:
### 1Ô∏è‚É£ Visualize First-Layer Filters

After training:

In [None]:
import matplotlib.pyplot as plt

# Get first conv layer weights
filters = model.features[0].weight.data.cpu()

num_filters = filters.shape[0]

fig, axes = plt.subplots(1, min(num_filters, 8), figsize=(15, 3))

for i in range(min(num_filters, 8)):
    axes[i].imshow(filters[i][0], cmap='gray')
    axes[i].set_title(f"Filter {i}")
    axes[i].axis('off')

plt.show()

**What readers will see:**

- Edge detectors
- Directional gradients
- Texture patterns

You can say:

**The network wasn't told to detect edges.**
**Optimization discovered they reduce loss.**

In [None]:
### 2Ô∏è‚É£ Visualize Feature Maps

Pick one image.

In [None]:
model.eval()

image, label = test_data[0]
image = image.unsqueeze(0).to(device)

# Forward manually through first conv
with torch.no_grad():
    conv1_output = model.features[0](image)
    relu_output = model.features[1](conv1_output)

feature_maps = relu_output.cpu()

fig, axes = plt.subplots(1, 8, figsize=(15,3))

for i in range(8):
    axes[i].imshow(feature_maps[0][i], cmap='gray')
    axes[i].axis('off')

plt.show()

In [None]:
Now readers see:

- Different filters responding to different parts of image.
- Some maps light up strongly.
- Others remain quiet.

This makes hierarchy visible.

In [None]:
### 3Ô∏è‚É£ Plot Misclassified Examples

Very important for honesty.

In [None]:
misclassified = []

model.eval()
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)

        for i in range(len(images)):
            if predicted[i] != labels[i]:
                misclassified.append((images[i].cpu(), predicted[i].cpu(), labels[i].cpu()))

            if len(misclassified) >= 6:
                break
        if len(misclassified) >= 6:
            break

In [None]:
Plot them:

In [None]:
fig, axes = plt.subplots(1, 6, figsize=(15,3))

for i in range(6):
    img, pred, true = misclassified[i]
    axes[i].imshow(img.squeeze(), cmap='gray')
    axes[i].set_title(f"P:{pred} T:{true}")
    axes[i].axis('off')

plt.show()

Now show:

- Confusion
- Ambiguous clothing
- Texture bias

This reinforces limitations section beautifully.

### 4Ô∏è‚É£ Simple Adversarial Attack (FGSM)

This will make your blog elite.

We use:

**Fast Gradient Sign Method.**

Minimal code.

**Step 1 ‚Äî Enable gradient on input**

In [None]:
image, label = test_data[0]
image = image.unsqueeze(0).to(device)
label = torch.tensor([label]).to(device)

image.requires_grad = True

output = model(image)
loss = criterion(output, label)

model.zero_grad()
loss.backward()

**Step 2 ‚Äî Create Adversarial Example**

In [None]:
epsilon = 0.1

data_grad = image.grad.data
perturbed_image = image + epsilon * data_grad.sign()
perturbed_image = torch.clamp(perturbed_image, 0, 1)

**Step 3 ‚Äî Test Prediction**

In [None]:
output_adv = model(perturbed_image)
_, predicted_adv = torch.max(output_adv, 1)

print("Original:", label.item())
print("Adversarial Prediction:", predicted_adv.item())

**Step 4 ‚Äî Show Both Images**

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(8,4))

axes[0].imshow(image.detach().cpu().squeeze(), cmap='gray')
axes[0].set_title("Original")

axes[1].imshow(perturbed_image.detach().cpu().squeeze(), cmap='gray')
axes[1].set_title("Adversarial")

for ax in axes:
    ax.axis('off')

plt.show()

Often:

- Image looks identical to human.
- Model prediction flips.

Now your limitation section becomes undeniable.

## What This Teaches Visually

- **Filters** ‚Üí low-level patterns
- **Feature maps** ‚Üí hierarchical response
- **Misclassifications** ‚Üí brittle boundaries
- **Adversarial** ‚Üí local evidence stacking weakness

This connects perfectly to your earlier line:

**CNNs are microscopes, not minds.**