# Computer Vision Mini Exercises

Welcome to this mini set of computer vision exercises! The notebook is designed for quick demonstrations in a classroom setting and focuses on **image classification** using both **PyTorch** and **TensorFlow**.

You will:

1. Install the required libraries (PyTorch, TorchVision, and TensorFlow).
2. Explore the Fashion-MNIST dataset.
3. Train and evaluate a compact convolutional neural network (CNN) in PyTorch.
4. Repeat a similar experiment with TensorFlow/Keras.
5. Compare the predictions from both frameworks.

> **Tip for Colab:** Runtime → Change runtime type → Hardware accelerator → GPU.


## 1. Environment setup

This mini-lab is optimized for Google Colab. If you are running it there, execute the next code cell **once** to install PyTorch, TorchVision, and TensorFlow. The install step can take a couple of minutes; after it finishes Colab might prompt you to restart the runtime so that the fresh packages are picked up.


In [None]:
# If running on Google Colab, uncomment the next line to install the dependencies.
# !pip install --quiet torch torchvision tensorflow

## 2. Imports and utility helpers

We pull in the core scientific Python stack, both deep-learning frameworks, and a small `show_images` helper. The helper simply arranges Fashion-MNIST samples in a grid so you can quickly compare model predictions with the ground-truth labels during demos.


In [None]:
import math
from typing import List

import matplotlib.pyplot as plt
import numpy as np
import torch
from torch import nn
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms

import tensorflow as tf

CLASS_NAMES = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

def show_images(images: np.ndarray, labels: List[int], predictions: List[int] | None = None, framework: str = ""):
    """Display a grid of Fashion-MNIST images with optional predictions."""
    num_images = len(images)
    cols = 5
    rows = math.ceil(num_images / cols)
    plt.figure(figsize=(cols * 2.2, rows * 2.2))
    for idx, (image, label) in enumerate(zip(images, labels)):
        plt.subplot(rows, cols, idx + 1)
        if image.ndim == 3 and image.shape[-1] == 1:
            image = image.squeeze(-1)
        plt.imshow(image, cmap="gray")
        title = CLASS_NAMES[label]
        if predictions is not None:
            pred_name = CLASS_NAMES[predictions[idx]]
            title = f"GT: {title}
Pred: {pred_name}"
        if framework:
            title = f"{framework}
" + title
        plt.title(title, fontsize=9)
        plt.axis("off")
    plt.tight_layout()
    plt.show()


## 3. PyTorch workflow

The PyTorch portion walks through the traditional computer-vision loop: preparing data, defining a convolutional neural network (CNN), training the model, and reflecting on evaluation metrics. Each step mirrors what your students will see in production-scale projects, just with a smaller dataset so it runs quickly in class.


### 3.1 Load the dataset

Fashion-MNIST provides 28×28 grayscale images of clothing categories. We normalize the tensors to roughly zero-mean/±1 variance so gradient descent behaves well, and we cap the training subset at 5,000 examples to keep the runtime under a minute on free Colab GPUs/CPUs. A quick visualization helps students connect pixel values to the semantic labels.


In [None]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,)),
])

full_train = datasets.FashionMNIST(root="./data", train=True, download=True, transform=transform)
test_dataset = datasets.FashionMNIST(root="./data", train=False, download=True, transform=transform)

# Use a subset (5,000 samples) for faster training in demos.
subset_size = 5_000
train_subset, _ = random_split(full_train, [subset_size, len(full_train) - subset_size])

batch_size = 64
train_loader = DataLoader(train_subset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size)

# Peek at a few sample images.
sample_images, sample_labels = next(iter(train_loader))
show_images(sample_images[:10].numpy().transpose(0, 2, 3, 1), sample_labels[:10].tolist(), framework="PyTorch")


### 3.2 Define a compact CNN

This lightweight CNN stacks two convolution + pooling blocks followed by a fully-connected classifier. Highlight to students how:

* Small 3×3 kernels act like edge and texture detectors.
* Max pooling halves the spatial resolution, forcing the network to focus on the most salient activations.
* Dropout in the dense layer mitigates overfitting even on this tiny dataset.


In [None]:
class FashionCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout(0.25),
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 7 * 7, 128),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(128, len(CLASS_NAMES)),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = FashionCNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)


### 3.3 Train the PyTorch model

We loop over the training loader for three epochs and track loss/accuracy per epoch. Emphasize that the optimizer step follows the familiar pattern: forward pass → loss → backward pass → parameter update. The training history recorded here will be plotted right after to show convergence visually.


In [None]:
def train_torch_model(model, data_loader, criterion, optimizer, epochs=3):
    history = []
    model.train()
    for epoch in range(1, epochs + 1):
        running_loss = 0.0
        running_correct = 0
        total_samples = 0
        for images, labels in data_loader:
            images, labels = images.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item() * images.size(0)
            preds = outputs.argmax(dim=1)
            running_correct += (preds == labels).sum().item()
            total_samples += images.size(0)

        epoch_loss = running_loss / total_samples
        epoch_acc = running_correct / total_samples
        history.append((epoch_loss, epoch_acc))
        print(f"Epoch {epoch}: loss={epoch_loss:.4f}, accuracy={epoch_acc:.4f}")
    return history


torch_history = train_torch_model(model, train_loader, criterion, optimizer, epochs=3)


#### Visualize PyTorch training progress

Plotting the epoch-wise loss and accuracy helps learners see whether the model is still improving or has plateaued. Invite them to describe what a diverging validation curve would look like and why it signals overfitting.


In [None]:
torch_losses, torch_accs = zip(*torch_history)
epochs = range(1, len(torch_losses) + 1)

fig, (ax_loss, ax_acc) = plt.subplots(1, 2, figsize=(10, 3))
ax_loss.plot(epochs, torch_losses, marker='o', color='#1f77b4')
ax_loss.set_title('PyTorch loss')
ax_loss.set_xlabel('Epoch')
ax_loss.set_ylabel('Cross-entropy')
ax_loss.grid(alpha=0.3)

ax_acc.plot(epochs, torch_accs, marker='o', color='#ff7f0e')
ax_acc.set_title('PyTorch accuracy')
ax_acc.set_xlabel('Epoch')
ax_acc.set_ylabel('Accuracy')
ax_acc.set_ylim(0, 1)
ax_acc.grid(alpha=0.3)

plt.tight_layout()
plt.show()


### 3.4 Evaluate the PyTorch model

During evaluation we disable gradient tracking and compute:

* Overall test accuracy for a single headline number.
* A per-class breakdown via a confusion matrix so students can spot classes that lag.
* Example predictions (including misclassifications) to connect numbers back to images.

Encourage learners to reason **why** certain items are harder—for example, shirts vs. T-shirts look visually similar.


In [None]:
model.eval()
correct = 0
total = 0
confusion_matrix = torch.zeros(len(CLASS_NAMES), len(CLASS_NAMES), dtype=torch.int64)
all_images = []
all_labels = []
all_preds = []
misclassified_images = []
misclassified_labels = []
misclassified_preds = []

with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        preds = outputs.argmax(dim=1)

        correct += (preds == labels).sum().item()
        total += labels.size(0)

        for true_label, pred_label in zip(labels.view(-1), preds.view(-1)):
            confusion_matrix[int(true_label), int(pred_label)] += 1

        if len(all_images) < 10:
            all_images.extend(images.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
            all_preds.extend(preds.cpu().numpy())

        if len(misclassified_images) < 6:
            mismatch = (preds != labels).nonzero(as_tuple=False).flatten()
            for idx in mismatch:
                if len(misclassified_images) >= 6:
                    break
                misclassified_images.append(images[idx].cpu().numpy())
                misclassified_labels.append(int(labels[idx].cpu()))
                misclassified_preds.append(int(preds[idx].cpu()))

test_accuracy = correct / total
print(f"Test accuracy: {test_accuracy:.4f}")

per_class_accuracy = confusion_matrix.diag().float() / confusion_matrix.sum(dim=1).clamp(min=1)
print("Per-class accuracy (PyTorch):")
for class_name, acc, hits, total_count in zip(
    CLASS_NAMES,
    per_class_accuracy.tolist(),
    confusion_matrix.diag().tolist(),
    confusion_matrix.sum(dim=1).tolist(),
):
    print(f"  {class_name:>12}: {acc:.3f} ({int(hits)}/{int(total_count)})")

if all_images:
    images_to_show = np.array(all_images[:10]).transpose(0, 2, 3, 1)
    show_images(images_to_show, all_labels[:10], all_preds[:10], framework="PyTorch")

if misclassified_images:
    mis_to_show = np.array(misclassified_images).transpose(0, 2, 3, 1)
    show_images(
        mis_to_show,
        misclassified_labels,
        misclassified_preds,
        framework="PyTorch misclassified",
    )


#### Interpreting the PyTorch results

* The headline accuracy provides a quick health check—aim for ~85–90% within three epochs on this subset.
* Scan the per-class table for imbalances. Lower scores on `Shirt` or `Coat` usually stem from overlapping silhouettes.
* Review the misclassified image grid and ask: *What visual cues misled the network?* Encourage students to articulate hypotheses (e.g., low contrast, cropped hems) and propose data or architecture tweaks to address them.


## 4. TensorFlow/Keras workflow

The TensorFlow section mirrors the PyTorch steps with Keras' high-level APIs. This gives students a point-by-point comparison between imperative and declarative training styles while keeping the underlying computer-vision principles identical.


### 4.1 Load and preprocess data

Keras ships Fashion-MNIST directly. We scale pixel intensities to [0, 1], add a channel dimension so convolution layers accept the input, and again trim the training set to 5,000 examples. Showing the same preview grid reinforces that the two workflows operate on identical data.


In [None]:
(tf_train_images, tf_train_labels), (tf_test_images, tf_test_labels) = tf.keras.datasets.fashion_mnist.load_data()

# Normalize to [0, 1] and add channel dimension for CNN compatibility.
tf_train_images = tf_train_images.astype("float32") / 255.0
tf_test_images = tf_test_images.astype("float32") / 255.0
tf_train_images = tf_train_images[..., np.newaxis]
tf_test_images = tf_test_images[..., np.newaxis]

# Use a subset for faster demos, matching the PyTorch sample size.
subset_size = 5_000
tf_train_images = tf_train_images[:subset_size]
tf_train_labels = tf_train_labels[:subset_size]

show_images(tf_train_images[:10], tf_train_labels[:10].tolist(), framework="TensorFlow")


### 4.2 Build the TensorFlow model

The architecture intentionally mirrors the PyTorch CNN: convolution/pooling pairs followed by a dropout-regularized dense classifier. Point out how Keras layers encapsulate both the parameters and the forward pass, whereas in PyTorch we defined them manually inside `forward`.


In [None]:
tf_model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation="relu", padding="same", input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation="relu", padding="same"),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(0.25),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(len(CLASS_NAMES), activation="softmax"),
])

tf_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

tf_model.summary()


### 4.3 Train the TensorFlow model

`model.fit` handles the epoch loop, but the underlying steps are the same. We also reserve 10% of the data for validation so you can compare training vs. validation curves and discuss under/overfitting signals.


In [None]:
tf_history = tf_model.fit(
    tf_train_images,
    tf_train_labels,
    validation_split=0.1,
    epochs=3,
    batch_size=64,
    verbose=2,
)


#### Visualize TensorFlow training progress

Keras returns a `History` object with tracked metrics. Plotting them side by side with the PyTorch curves makes it easy to compare optimization dynamics across frameworks.


In [None]:
tf_epochs = range(1, len(tf_history.history['loss']) + 1)

fig, (ax_loss, ax_acc) = plt.subplots(1, 2, figsize=(10, 3))
ax_loss.plot(tf_epochs, tf_history.history['loss'], marker='o', label='train', color='#1f77b4')
ax_loss.plot(tf_epochs, tf_history.history['val_loss'], marker='o', label='val', color='#9467bd')
ax_loss.set_title('TensorFlow loss')
ax_loss.set_xlabel('Epoch')
ax_loss.set_ylabel('Cross-entropy')
ax_loss.grid(alpha=0.3)
ax_loss.legend()

ax_acc.plot(tf_epochs, tf_history.history['accuracy'], marker='o', label='train', color='#ff7f0e')
ax_acc.plot(tf_epochs, tf_history.history['val_accuracy'], marker='o', label='val', color='#2ca02c')
ax_acc.set_title('TensorFlow accuracy')
ax_acc.set_xlabel('Epoch')
ax_acc.set_ylabel('Accuracy')
ax_acc.set_ylim(0, 1)
ax_acc.grid(alpha=0.3)
ax_acc.legend()

plt.tight_layout()
plt.show()


### 4.4 Evaluate the TensorFlow model

After training we compute the same accuracy and confusion-matrix diagnostics, then inspect predictions and misclassifications. Ask learners to compare the TensorFlow and PyTorch outputs—do they stumble on the same categories, or do the results differ because of weight initialization and optimizer nuances?


In [None]:
tf_test_loss, tf_test_accuracy = tf_model.evaluate(tf_test_images, tf_test_labels, verbose=0)
print(f"Test accuracy: {tf_test_accuracy:.4f}")

tf_probs = tf_model.predict(tf_test_images, batch_size=256, verbose=0)
tf_preds = tf_probs.argmax(axis=1)

confusion_matrix = tf.math.confusion_matrix(tf_test_labels, tf_preds, num_classes=len(CLASS_NAMES)).numpy()
per_class_accuracy = confusion_matrix.diagonal() / np.maximum(confusion_matrix.sum(axis=1), 1)

print("Per-class accuracy (TensorFlow):")
for class_name, acc, hits, total_count in zip(
    CLASS_NAMES,
    per_class_accuracy.tolist(),
    confusion_matrix.diagonal().tolist(),
    confusion_matrix.sum(axis=1).tolist(),
):
    print(f"  {class_name:>12}: {acc:.3f} ({int(hits)}/{int(total_count)})")

show_images(tf_test_images[:10], tf_test_labels[:10].tolist(), tf_preds[:10].tolist(), framework="TensorFlow")

mis_idx = np.where(tf_preds != tf_test_labels)[0][:6]
if mis_idx.size:
    show_images(
        tf_test_images[mis_idx],
        tf_test_labels[mis_idx].tolist(),
        tf_preds[mis_idx].tolist(),
        framework="TensorFlow misclassified",
    )


#### Interpreting the TensorFlow results

* Compare these metrics with the PyTorch run—close agreement suggests both implementations learned similar decision boundaries.
* Validation-vs-training curves can reveal early signs of overfitting; if validation accuracy plateaus sooner, discuss early stopping or stronger regularization.
* Misclassified examples create space for qualitative analysis. Ask students to annotate what feature (texture, shape, context) would help the network fix the prediction.


## 5. Reflection prompts

* Compare the training curves from PyTorch and TensorFlow. Which framework converged faster? What hyperparameters might explain the difference?
* Look at the per-class accuracies. Which categories are consistently strong or weak across both models? How could you collect or augment data to close the gap?
* How would you modify the architecture if you needed higher accuracy without drastically increasing training time on Colab?
