# Computer Vision Mini Exercises — Answer Key

This companion notebook provides worked solutions for the prompts in the *Computer Vision Mini Exercises* lesson. Each section mirrors the original activities and includes runnable code, suggested parameter choices, and short commentary you can use when reviewing answers with students.

## 1. Environment Setup

We rely on the same libraries as the student notebook. Keeping the seeds fixed ensures our comparisons stay reproducible across frameworks.

In [None]:
import random
import numpy as np
import torch
from torch import nn, optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

import tensorflow as tf
from tensorflow import keras

import matplotlib.pyplot as plt

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
tf.random.set_seed(SEED)

print(f"PyTorch version: {torch.__version__}")
print(f"TensorFlow version: {tf.__version__}")

## 2. Dataset Loading with Augmentation (Exercise 1 Solution)

To demonstrate a concrete augmentation pipeline, we chain together a gentle random rotation and horizontal flip before normalizing the Fashion-MNIST digits. Rotations of ±15° preserve the class identity while still encouraging rotational robustness. The augmented dataset plugs into the existing dataloaders without additional code changes.

In [None]:
class_names = [
    "T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
    "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"
]

torch_transform = transforms.Compose([
    transforms.RandomRotation(15),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

train_dataset = datasets.FashionMNIST(
    root="./data", train=True, download=True, transform=torch_transform
)
test_dataset = datasets.FashionMNIST(
    root="./data", train=False, download=True,
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.5,), (0.5,))
    ])
)

batch_size = 64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size)

### Visualizing Augmented Samples

Plotting random training items makes the augmentation effects tangible and gives students an immediate sense of how rotations and flips manifest in Fashion-MNIST.

In [None]:
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for ax in axes.ravel():
    image, label = train_dataset[random.randint(0, len(train_dataset) - 1)]
    ax.imshow(image.squeeze(), cmap="gray")
    ax.set_title(class_names[label])
    ax.axis("off")
plt.tight_layout()
plt.show()

## 3. PyTorch Model and Optimizer Swap (Exercise 2 Solution)

Switching from `Adam` to `SGD` with momentum highlights the effect of first-order versus adaptive optimizers. Using a slightly larger learning rate (0.05) and momentum (0.9) keeps convergence competitive while clearly illustrating the slower warm-up period that students observed in class.

In [None]:
class TorchCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 7 * 7, 128),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(128, 10)
        )

    def forward(self, x):
        x = self.features(x)
        return self.classifier(x)

model_torch = TorchCNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model_torch.parameters(), lr=0.05, momentum=0.9)

In [None]:
def train_one_epoch(model, dataloader, optimizer, criterion):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for images, labels in dataloader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        preds = outputs.argmax(dim=1)
        correct += (preds == labels).sum().item()
        total += labels.size(0)

    return running_loss / total, correct / total


def evaluate(model, dataloader, criterion):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for images, labels in dataloader:
            outputs = model(images)
            loss = criterion(outputs, labels)
            running_loss += loss.item() * images.size(0)
            preds = outputs.argmax(dim=1)
            correct += (preds == labels).sum().item()
            total += labels.size(0)

    return running_loss / total, correct / total

EPOCHS = 3
for epoch in range(EPOCHS):
    train_loss, train_acc = train_one_epoch(model_torch, train_loader, optimizer, criterion)
    test_loss, test_acc = evaluate(model_torch, test_loader, criterion)
    print(f"Epoch {epoch + 1}: train_loss={train_loss:.4f}, train_acc={train_acc:.3f}, test_loss={test_loss:.4f}, test_acc={test_acc:.3f}")

**Instructor note:** Expect SGD to converge slightly slower than Adam in the first epoch, but by epoch three it typically reaches within ~1–2% accuracy of the adaptive baseline. Encourage students to discuss why the momentum term helps close the gap.

## 4. TensorFlow Model with Learning Rate Tuning (Exercise 3 Solution)

A practical way to illustrate learning rate effects is to compile two models that differ only in their optimizer settings. The cell below demonstrates a moderately aggressive learning rate (0.005) that still trains stably, along with metric plots students can compare against the default 0.001 runs.

In [None]:
(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()

x_train = (x_train / 255.0).astype("float32")[..., np.newaxis]
x_test = (x_test / 255.0).astype("float32")[..., np.newaxis]

train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(10000, seed=SEED).batch(batch_size)
test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(batch_size)

In [None]:
def build_tf_model():
    model = keras.Sequential([
        keras.layers.Conv2D(32, 3, padding="same", activation="relu", input_shape=(28, 28, 1)),
        keras.layers.MaxPooling2D(),
        keras.layers.Conv2D(64, 3, padding="same", activation="relu"),
        keras.layers.MaxPooling2D(),
        keras.layers.Flatten(),
        keras.layers.Dense(128, activation="relu"),
        keras.layers.Dropout(0.3),
        keras.layers.Dense(10)
    ])
    return model

lr_tuned_model = build_tf_model()
lr_tuned_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.005),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

In [None]:
history = lr_tuned_model.fit(train_ds, validation_data=test_ds, epochs=3)

plt.figure(figsize=(8, 3))
plt.subplot(1, 2, 1)
plt.plot(history.history["loss"], label="train")
plt.plot(history.history["val_loss"], label="val")
plt.title("Loss")
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history["accuracy"], label="train")
plt.plot(history.history["val_accuracy"], label="val")
plt.title("Accuracy")
plt.legend()
plt.tight_layout()
plt.show()

**Instructor note:** If the higher learning rate destabilizes training on some machines, drop it to 0.003 and point out that TensorFlow's callback visualizations make these trade-offs easy to monitor live.

## 5. Cross-Framework Prediction Comparison (Exercise 4 Solution)

To evaluate robustness, we add Gaussian noise to a shared batch of test images and compare how each model's predictions change. The helper below reports accuracy before and after perturbation and plots a few representative examples for discussion.

In [None]:
def torch_infer(model, dataset):
    model.eval()
    all_images = []
    all_labels = []
    with torch.no_grad():
        for images, labels in DataLoader(dataset, batch_size=batch_size):
            all_images.append(images)
            all_labels.append(labels)
    return torch.cat(all_images), torch.cat(all_labels)


def evaluate_with_noise(torch_model, keras_model, noise_std=0.25, n_samples=5):
    torch_images, torch_labels = torch_infer(torch_model, test_dataset)
    baseline_logits = torch_model(torch_images)
    baseline_acc = (baseline_logits.argmax(dim=1) == torch_labels).float().mean().item()

    noisy_images = torch_images + noise_std * torch.randn_like(torch_images)
    noisy_images = torch.clamp(noisy_images, -1.0, 1.0)
    noisy_logits = torch_model(noisy_images)
    noisy_acc = (noisy_logits.argmax(dim=1) == torch_labels).float().mean().item()

    tf_images = x_test[:len(torch_images)]
    tf_logits = keras_model.predict(tf_images, verbose=0)
    tf_preds = tf_logits.argmax(axis=1)
    tf_acc = (tf_preds == y_test[:len(torch_images)]).mean()

    tf_noisy = np.clip(tf_images + noise_std * np.random.normal(size=tf_images.shape), 0.0, 1.0)
    tf_noisy_logits = keras_model.predict(tf_noisy, verbose=0)
    tf_noisy_preds = tf_noisy_logits.argmax(axis=1)
    tf_noisy_acc = (tf_noisy_preds == y_test[:len(torch_images)]).mean()

    print(f"Torch accuracy: clean={baseline_acc:.3f}, noisy={noisy_acc:.3f}")
    print(f"TensorFlow accuracy: clean={tf_acc:.3f}, noisy={tf_noisy_acc:.3f}")

    idxs = np.random.choice(len(torch_images), size=n_samples, replace=False)
    fig, axes = plt.subplots(n_samples, 3, figsize=(10, 2 * n_samples))
    for row, idx in enumerate(idxs):
        axes[row, 0].imshow(torch_images[idx].squeeze(), cmap="gray")
        axes[row, 0].set_title(f"Label: {class_names[torch_labels[idx]]}")
        axes[row, 0].axis("off")

        axes[row, 1].imshow(noisy_images[idx].squeeze(), cmap="gray")
        axes[row, 1].set_title("Torch noisy")
        axes[row, 1].axis("off")

        axes[row, 2].imshow(tf_noisy[idx].squeeze(), cmap="gray")
        axes[row, 2].set_title("TF noisy")
        axes[row, 2].axis("off")

    plt.tight_layout()
    plt.show()

# Run once both models have been trained
# evaluate_with_noise(model_torch, lr_tuned_model)

### Wrap-Up Notes

* Reinforce that similar architectural choices across frameworks yield comparable performance once hyperparameters are tuned appropriately.
* Encourage learners to iterate on the augmentation parameters and learning rates to explore the robustness/accuracy trade-off further.