# Computer Vision Mini Exercises

Welcome to this mini set of computer vision exercises! The notebook is designed for quick demonstrations in a classroom setting and focuses on **image classification** using both **PyTorch** and **TensorFlow**.

You will:

1. Install the required libraries (PyTorch, TorchVision, and TensorFlow).
2. Explore the Fashion-MNIST dataset.
3. Train and evaluate a compact convolutional neural network (CNN) in PyTorch.
4. Repeat a similar experiment with TensorFlow/Keras.
5. Compare the predictions from both frameworks.

> **Tip for Colab:** Runtime → Change runtime type → Hardware accelerator → GPU.


## 1. Environment setup

If you are running this notebook in Google Colab, execute the cell below **once** to install the required dependencies.


In [None]:
# If running on Google Colab, uncomment the next line to install the dependencies.
# !pip install --quiet torch torchvision tensorflow

## 2. Imports and utility helpers

This cell imports the libraries needed for both PyTorch and TensorFlow workflows and defines a helper for visualizing predictions.

In [None]:
import math
from typing import List

import matplotlib.pyplot as plt
import numpy as np
import torch
from torch import nn
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms

import tensorflow as tf

CLASS_NAMES = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

def show_images(images: np.ndarray, labels: List[int], predictions: List[int] | None = None, framework: str = ""):
    """Display a grid of Fashion-MNIST images with optional predictions."""
    num_images = len(images)
    cols = 5
    rows = math.ceil(num_images / cols)
    plt.figure(figsize=(cols * 2.2, rows * 2.2))
    for idx, (image, label) in enumerate(zip(images, labels)):
        plt.subplot(rows, cols, idx + 1)
        if image.ndim == 3 and image.shape[-1] == 1:
            image = image.squeeze(-1)
        plt.imshow(image, cmap="gray")
        title = CLASS_NAMES[label]
        if predictions is not None:
            pred_name = CLASS_NAMES[predictions[idx]]
            title = f"GT: {title}
Pred: {pred_name}"
        if framework:
            title = f"{framework}
" + title
        plt.title(title, fontsize=9)
        plt.axis("off")
    plt.tight_layout()
    plt.show()


## 3. PyTorch workflow

We start with PyTorch and TorchVision to create a lightweight CNN classifier for Fashion-MNIST. To keep the runtime short, we train on a subset of the training data.

### 3.1 Load the dataset

In [None]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,)),
])

full_train = datasets.FashionMNIST(root="./data", train=True, download=True, transform=transform)
test_dataset = datasets.FashionMNIST(root="./data", train=False, download=True, transform=transform)

# Use a subset (5,000 samples) for faster training in demos.
subset_size = 5_000
train_subset, _ = random_split(full_train, [subset_size, len(full_train) - subset_size])

batch_size = 64
train_loader = DataLoader(train_subset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size)

# Peek at a few sample images.
sample_images, sample_labels = next(iter(train_loader))
show_images(sample_images[:10].numpy().transpose(0, 2, 3, 1), sample_labels[:10].tolist(), framework="PyTorch")


### 3.2 Define a compact CNN

In [None]:
class FashionCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout(0.25),
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 7 * 7, 128),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(128, len(CLASS_NAMES)),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = FashionCNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)


### 3.3 Train the PyTorch model

In [None]:
def train_torch_model(model, data_loader, criterion, optimizer, epochs=3):
    history = []
    model.train()
    for epoch in range(1, epochs + 1):
        running_loss = 0.0
        running_correct = 0
        total_samples = 0
        for images, labels in data_loader:
            images, labels = images.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item() * images.size(0)
            preds = outputs.argmax(dim=1)
            running_correct += (preds == labels).sum().item()
            total_samples += images.size(0)

        epoch_loss = running_loss / total_samples
        epoch_acc = running_correct / total_samples
        history.append((epoch_loss, epoch_acc))
        print(f"Epoch {epoch}: loss={epoch_loss:.4f}, accuracy={epoch_acc:.4f}")
    return history


torch_history = train_torch_model(model, train_loader, criterion, optimizer, epochs=3)


### 3.4 Evaluate the PyTorch model

In [None]:
model.eval()
correct = 0
total = 0
all_images = []
all_labels = []
all_preds = []

with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        preds = outputs.argmax(dim=1)

        correct += (preds == labels).sum().item()
        total += labels.size(0)

        if len(all_images) < 10:
            all_images.extend(images.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
            all_preds.extend(preds.cpu().numpy())

test_accuracy = correct / total
print(f"Test accuracy: {test_accuracy:.4f}")

# Visualize predictions for the first batch collected.
if all_images:
    images_to_show = np.array(all_images[:10]).transpose(0, 2, 3, 1)
    show_images(images_to_show, all_labels[:10], all_preds[:10], framework="PyTorch")


## 4. TensorFlow/Keras workflow

We now replicate a similar experiment using TensorFlow and Keras. The data pipeline uses the same Fashion-MNIST dataset to keep comparisons fair.

### 4.1 Load and preprocess data

In [None]:
(tf_train_images, tf_train_labels), (tf_test_images, tf_test_labels) = tf.keras.datasets.fashion_mnist.load_data()

# Normalize to [0, 1] and add channel dimension for CNN compatibility.
tf_train_images = tf_train_images.astype("float32") / 255.0
tf_test_images = tf_test_images.astype("float32") / 255.0
tf_train_images = tf_train_images[..., np.newaxis]
tf_test_images = tf_test_images[..., np.newaxis]

# Use a subset for faster demos, matching the PyTorch sample size.
subset_size = 5_000
tf_train_images = tf_train_images[:subset_size]
tf_train_labels = tf_train_labels[:subset_size]

show_images(tf_train_images[:10], tf_train_labels[:10].tolist(), framework="TensorFlow")


### 4.2 Build the TensorFlow model

In [None]:
tf_model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation="relu", padding="same", input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation="relu", padding="same"),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(0.25),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(len(CLASS_NAMES), activation="softmax"),
])

tf_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

tf_model.summary()


### 4.3 Train the TensorFlow model

In [None]:
tf_history = tf_model.fit(
    tf_train_images,
    tf_train_labels,
    validation_split=0.1,
    epochs=3,
    batch_size=64,
    verbose=2,
)


### 4.4 Evaluate the TensorFlow model

In [None]:
tf_test_loss, tf_test_accuracy = tf_model.evaluate(tf_test_images, tf_test_labels, verbose=0)
print(f"Test accuracy: {tf_test_accuracy:.4f}")

tf_preds = np.argmax(tf_model.predict(tf_test_images[:10]), axis=1)
show_images(tf_test_images[:10], tf_test_labels[:10].tolist(), tf_preds.tolist(), framework="TensorFlow")


## 5. Reflection prompts

To encourage critical thinking, consider the following questions with your class:

- How do the training logs differ between PyTorch and TensorFlow?
- What is the impact of training on a subset versus the full dataset?
- Which model performed better on the test set? Why might that be the case?
- How would you adapt these pipelines for transfer learning on a different image dataset?

Happy teaching!
