# Computer Vision WS-24/25 Assignment 2.4: Image Classification


## Part 0.1: Setting Up the Environment

Before we tackle image classification, we need to setup our working environment.
This includes a few lines of code to setup the jupyter environment and to verify our python environment.

The cell below is optional, but makes for a more seamless debugging experience. The autoreload extension allows us to edit, and re-import our source files without having to restart the jupyter notebook kernel after every change.

In [None]:
%load_ext autoreload
%autoreload 3

## Part 0.2: Google Colab Setup


In [None]:
import os

try:
    from google.colab import drive
    drive.mount('/content/drive')
except ImportError:
    # We are not running in Google Colab
    # We assume the data is in the same directory as the notebook
    FILE_PATH = 'data'
else:
    # We are running in Google Colab
    # Fill in the path to the directory where you uploaded the notebooks and data
    # For example if you have created a folder with the name 'CV2025' in your Google Drive and have stored the notebook there, set GOOGLE_DRIVE_PATH = 'CV2025'
    # GOOGLE_DRIVE_PATH = "CV2025"
    GOOGLE_DRIVE_PATH = None
    FILE_PATH = os.path.join('drive', 'My Drive', GOOGLE_DRIVE_PATH)
    os.chdir(FILE_PATH)
    FILE_PATH = os.path.join(FILE_PATH, 'data')

# Part 0.3: Relevant python modules

In the cell below we include all the static python libraries that we will use during this assignment. If you feel like importing additional libraries for visualization or debugging purposes, please feel free to add them here!

In [None]:
import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt
import time

from tqdm.contrib.itertools import product
from rand import reset_seeds
%matplotlib inline

plt.rcParams['figure.figsize'] = (10.0, 8.0)
plt.rcParams['font.size'] = 16  

###################################################
######## Add your preferred libraries here ########
###################################################

To optimize the usage of Colab (or your own local environment), we prefer the usage of GPU acceleration when working with PyTorch. The below cell will check for availability of any GPU.


In [None]:
has_GPU = torch.cuda.is_available()

if has_GPU:
    print("GPU device found! We are good to go.")
else:
    print("No GPU found. Please set the accelerator in Colab via Notebook Settings.\nOtherwise we use CPU acceleration.")

DEVICE = 'cuda' if has_GPU else 'cpu'

# Part 0.4: Getting Started: Data Preparation

For the purpose of this assignment we will be dealing with CIFAR10, a 10 class image classification problem with $32\times32$ RGB images. 

Below, we initialize our Dataset class anc call a function to preprocess the dataset.

The utility function `dataset.get_splits(x)` takes our desired device as argument and returns 6 **torch.Tensor** instances:

- `X_train`: all training images flattened into a 1D vector (normalized to have $\mu=0$ and $\sigma=1$)
- `y_train`: the corresponding training labels (integers in $[0,9]$)
- `X_val`: all validation images flattened into a 1D vector (normalized with the $\mu_{\text{train}}, \sigma_{\text{train}}$)
- `y_val`: the corresponding validation labels
- `X_test`: all testing images flattened into a 1D vector (normalized with the $\mu_{\text{train}}, \sigma_{\text{train}}$)
- `y_test`: the corresponding testing labels

*Note*: For the purpose of linear classifiers, we pad each image with an additional $1$ to include the bias term implicitly in the weight matrix.

In [None]:
from dataset import CIFAR10Dataset
# Load the data
reset_seeds()
dataset = CIFAR10Dataset(data_dir=FILE_PATH)
X_train, y_train, X_val, y_val, X_test, y_test = dataset.get_splits(DEVICE, include_bias=True, trainval_split=0.8, X_dtype=torch.float32)


print(f"Train dataset containing {X_train.shape[0]} images")
print(f"Validation dataset containing {X_val.shape[0]} images")
print(f"Test dataset containing {X_test.shape[0]} images")

Our dataset class also has a neat helper function that lets us visualize a few samples of a given dataset!


In [None]:
# Visualize the data
reset_seeds()
dataset.visualize_samples(X_train, y_train, to_show=10)

# Part 0.5: Optional: PyTorch basics

In this notebook and in future assignments we will rely on the Pytorch library.
For those of you who are not familiar with PyTorch we highly recommend to go over some of the basic tensor operations that PyTorch offers.

There is a plethora of available material on the internet. Some introductions for tensor operations can be found here:
- https://pytorch.org/tutorials/beginner/introyt/tensors_deeper_tutorial.html
- https://www.youtube.com/watch?v=x9JiIFvlUwk&list=PLhhyoLH6IjfxeoooqP9rhU3HJIAVAJ3Vz


# Task 1: Linear Models - The basics

In this section of the assignment we will start our classification journey!

We will start with the most basic form of linear classifiers, the **Linear SVM** and the **Softmax Classifier**.

In the first part, you will warm up with the implementation of linear models and their basic functionalities.
You will implement:

- the **forward pass** of a linear model (0.5pts)
- a **naive SVM loss function** (1.5pts)
- a **fully vectorized version** of the above function (1.5pts)
- a **fully vectorized version** of the Cross Entropy loss function (1.5+1pts)
- the **inference** call to classify a given image (1pt)

In [None]:
# CIFAR10 contains 32 by 32 RGB images -> Our SVM receives a 3072 dimensional vector and outputs 10 logits for each class
IMAGE_DIMS = 32**2 * 3
N_CLASSES = 10

## Part 1.1: Forward Pass (0.5pts)

To start off this assignment, we start with the forward pass of our linear model.
Please implement the forward pass of the `LinearBaseModel` in the file `models_factory`.
For a batch of $N$ inputs $X\in\mathbb{R}^{N\times D}$, the forward pass of our model computes a set of scores $S\in\mathbb{R}^{N\times C}$ for each class $C$:

$S=XW^T$.

To verify the functionality of your implementation, we will run the below code block.
(For reference, you should see an output score of $0.0061$)


In [None]:
from model_factory import LinearBaseModel

reset_seeds()
X_test = torch.randn(1, 1, IMAGE_DIMS+1, device=DEVICE)
svm = LinearBaseModel(IMAGE_DIMS, 1, W_dtype=torch.float32, device=DEVICE)
score = svm(X_test)
print(f"Score: {score.item(): .4f}")

## Task 1.2: A naive SVM Loss (1.5pt)

Please implement the naive SVM loss function in the function `svm_loss_naive` in file `model_factory.py`. What we refer to as naive is a traditional implementation where batch samples and class-scores are dealt with in a looped manner. We will allow all the syntactic sugar that PyTorch offers in the following task.

To verify your implementation, you can run the below code block where we compare your implementation with the **PyTorch** implementation of the SVM loss.

*Note*: PyTorch averages the loss over the number of possible classes. For the purpose of this assignment, we also use this additional averaging as it leads to a more numerically stable optimization in later stages.

In [None]:
from model_factory import SVM

reset_seeds()
svm = SVM(IMAGE_DIMS, N_CLASSES, W_dtype=torch.float32, device=DEVICE)

# Sample a small subset of the training data into a training batch
X_batch = X_train[:64]
y_batch = y_train[:64]


# The forward pass of our svm will return the loss, as well as the gradient wrt our weight matrix W
scores, loss = svm.loss(X_batch, y_batch, mode='naive', return_scores=True)

pytorch_loss = F.multi_margin_loss(scores, y_batch)

# Lets verify if the calulated loss is correct!
try:
    assert torch.allclose(loss, pytorch_loss)
except AssertionError:
    print("Looks like the Loss is wrong!")
else:
    print("Looks good! The Naive SVM loss is implemented correctly.")


## Task 1.3: Vectorization (1.5pt)

Now its time to implement the vectorized version of the same loss! Please implement the function `svm_loss_vectorized` in the file `model_factory.py` and run the below cell to verify its functionality. This function should contain **no loops**!


In [None]:
# Sample a small subset of the training data into a training batch
X_batch = X_train[:64]
y_batch = y_train[:64]


# The forward pass of our svm will return the loss, as well as the gradient wrt our weight matrix W
scores, loss = svm.loss(X_batch, y_batch, mode='vectorized', return_scores=True)

pytorch_loss = F.multi_margin_loss(scores, y_batch)

# Lets verify if the calulated loss is correct!
try:
    assert torch.allclose(loss, pytorch_loss)
except AssertionError:
    print("Looks like the Loss is wrong!")
else:
    print("Looks good! The vectorized SVM loss is implemented correctly.")

By implementing the losses in vectorized form, we can use **PyTorch**'s optimized functions on the GPU to observe a speedup of anywhere between $15$ to $300$ times depending on the used hardware.

In [None]:
# Sample a small subset of the training data into a training batch
X_batch = X_train[:64]
y_batch = y_train[:64]

torch.cuda.synchronize()
start = time.time()
loss_naive = svm.loss(X_batch, y_batch, mode='naive', return_scores=False)
torch.cuda.synchronize()
end = time.time()
naive_runtime = 1000. * (end - start)
print(f"Naive forward computed in{naive_runtime: .3f} ms")

torch.cuda.synchronize()
start = time.time()
# The forward pass of our svm will return the loss, as well as the gradient wrt our weight matrix W
loss_vec = svm.loss(X_batch, y_batch, mode='vectorized', return_scores=False)
torch.cuda.synchronize()
end = time.time()
vectorized_runtime = 1000. * (end - start)
print(f"Vectorized forward computed in{vectorized_runtime: .3f} ms")


print(f"Speedup Factor: {naive_runtime / vectorized_runtime:.3f}")

## Part 1.4: Softmax Classifier (1.5+1pts)

Another type of linear classifier is the **Softmax Classifier**.

The Softmax classifier inherits its name from the use of the **Softmax function** to squish output logits into what can be interpreted as a probability distribution (note: technically it is not).

The Softmax function is defined as follows:
$\sigma(\mathbf{z})_i=\frac{e^{s_i}}{\sum_{j=1}^{C}e^{s_j}}$.


### Part 1.4.1: A vectorized softmax / Cross Entropy Loss (1.5pt)

A more conventional name for the loss function of the Softmax classifier is the **Cross Entropy loss**.
Please implement the function `cross_entropy_loss` in the file `model_factory.py`. As before, this function should contain **no loops**!

To verify that your implementation is correct, we will compare it with the **PyTorch** implementation of the Cross Entropy loss.

In [None]:
from model_factory import SoftmaxClassifier
reset_seeds()

smc = SoftmaxClassifier(IMAGE_DIMS, N_CLASSES, W_dtype=torch.float32, device=DEVICE)
# Sample a small subset of the training data into a training batch
X_batch = X_train[:64]
y_batch = y_train[:64]


# The forward pass of our svm will return the loss, as well as the gradient wrt our weight matrix W
scores, loss = smc.loss(X_batch, y_batch, mode='vectorized', return_scores=True)

pytorch_loss = F.cross_entropy(scores, y_batch)
# Lets verify if the calulated loss is correct!
try:
    assert torch.allclose(loss, pytorch_loss)
except AssertionError:
    print("Looks like the Loss is wrong!")
else:
    print("Looks good! The vectorized Cross Entropy loss is implemented correctly.")

### Part 1.4.2: Numerical Stability (1pt)

The softmax function can suffer from numerical instabilities depending on its implementation.

Lets check, if your implementation is numerically stable.
If it is not, can you think of some tricks to make it stable?

In [None]:
smc.W = torch.nn.Parameter(torch.ones_like(smc.W))

loss_vec = smc.loss(X_batch, y_batch, mode='vectorized', return_scores=False)

try:
    assert not torch.isnan(loss_vec)
except AssertionError:
    print("Cross Entropy loss is not numerically stable!")
else:
    print("Cross Entropy loss is numerically stable!")


## Part 1.5: Inference (1pt)

Now that we have implemented the forward pass and the loss functions, we can use our models to classify images!
For that, we need to implement the function `predict` in the file `model_factory.py`.

In [None]:
smc = SoftmaxClassifier(IMAGE_DIMS, N_CLASSES, W_dtype=torch.float32, device=DEVICE)


y_train_predict = svm.predict(X_train)
train_acc = 100. * (y_train == y_train_predict).mean(dtype=torch.float64).item()

y_val_predict = svm.predict(X_val)
val_acc = 100. * (y_val == y_val_predict).mean(dtype=torch.float64).item()

print(f"Training accuracy: {train_acc:.2f}%")
print(f"Validation accuracy: {train_acc:.2f}%")


# Part 2: Training the Linear Model

After implementing all the building stones, we can now start optimizing the poor performance of our linear models.

In this section, we will implement the training procedure for our linear models by optimizing them with **Stochastic Gradient Descent (SGD)**.

You will implement
- creating **batches of training data** to feed to our models (1pt)
- the training loop to optimize a linear model with **Stochastic Gradient Descent (SGD)** and $L_2$ **regularization** (1pt)
- a **hyperparameter sweep** to figure out good parameters for **learning rate AND regularization strength** (1pt)


## Part 2.1: Batching (1pt)

In practice, it is infeasible to run a full gradient descent optimization on the entire dataset.
Instead, we split the dataset into smaller batches and optimize our model on these batches, this is called **Stochastic Gradient Descent (SGD)**.

Please implement the function `get_batches` in the file `model_factory.py` to split the dataset into smaller batches of a certain size.

## Part 2.2: Training Loop (1pt)

Now that we have implemented the batching, we can start training our linear models!

In the function `train_loop` in the file `model_factory.py`, you will implement the training loop for our linear models.
The training loop should run for a certain number of iterations and optimize the model with **Stochastic Gradient Descent (SGD)**.

You can run the below cell to verify the functionality of your implementation.

In [None]:
reset_seeds()
torch.cuda.synchronize()
start = time.time()
loss_history = svm.train_loop(X_train, y_train, lr=3e-9, reg=2.5e2, n_iter=1500, batch_size=200, verbose=True)
torch.cuda.synchronize()
end = time.time()

print(f"Train loop has finished in {end-start:.2f}s")

To figure out what is happening to our model, it makes sense to look at the loss values to see if our model is learning!

In [None]:
plt.plot(loss_history)
plt.ylim(0, max(loss_history)+1)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()

## Part 2.3: Hyperparameter Sweep (1pt)

Our current model has quite poor performance! We have also already seen the reason for this in the stagnating loss that we plotted above.

To solve this issue, it is time to determine better hyperparameters with a **hyperparameter sweep**.

Below, we have set up a grid-search for the optimal learning rate and regularization strength.

Using the validation set, determine the optimal set of hyperparameters for our model and train it with these parameters.

Please do this step for **both** the **Softmax Classifier** and the **SVM**.

Also, please feel free to experiment with different values in the grid search! The current values are just a starting point.

In [None]:
reset_seeds()
X_train, y_train, X_val, y_val, X_test, y_test = dataset.get_splits(DEVICE,include_bias=True, trainval_split=0.8, X_dtype=torch.float32)

results = {}
best_acc = -1
best_model = None
n_iter = 1500
iter_idx = 0
learning_rates = [1e-8 * (2.)**i for i in range(1,25)]
reg_strengths = [1e-7 * (6.)**i for i in range(0,15)]
num_models = len(learning_rates) * len(reg_strengths)

# model_type = SoftmaxClassifier
model_type = SVM

for i, (lr, reg) in enumerate(product(learning_rates, reg_strengths, desc=f"Training {num_models } Classifiers...")):
    model = model_type(IMAGE_DIMS, N_CLASSES, device=DEVICE, W_dtype=torch.float32)
    model.train_loop(X_train, y_train, lr=lr, reg=reg, n_iter=n_iter, batch_size=200, verbose=False)

    results[(lr, reg)] = None
    "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****"
    pass
    "# *****End OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****"
    
print(f"Best validation accuracy achieved during grid search: {best_acc}\n with the {str(best_model)}")

# Print out results into a file for debugging.
with open(f"{str(best_model)}_grid_search_losses.txt", "w") as f:
    for lr, reg in sorted(results):
        val_acc = results[(lr, reg)]
        f.write(f"lr {lr} reg {reg} val accuracy: {val_acc:.2f}\n")
best_model.save(f"best_{str(best_model)}.pth")

To visualize the results of the grid search, we can plot the results!


In [None]:
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]


marker_size = 100
# plot validation accuracy
colors = [results[x] for x in results] # default size of markers is 20
plt.scatter(x_scatter, y_scatter, marker_size, c=colors, cmap='viridis')
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
plt.gcf().set_size_inches(8, 5)
plt.show()

And finally, we evaluate our best model. To get full credit for the assignment, your best models should both have an accuracy above  35%.
(As a reference, our best model achieved $\approx 40.88% accuracy)

In [None]:
best_svm = SVM(IMAGE_DIMS, N_CLASSES, W_dtype=torch.float32, device=DEVICE)
best_svm.load(f"best_{str(best_svm)}.pth")
y_test_pred = best_svm.predict(X_test)
test_accuracy = 100. * torch.mean((y_test == y_test_pred).double())
print(f"SVM final test set accuracy: {test_accuracy}")


best_smc = SoftmaxClassifier(IMAGE_DIMS, N_CLASSES, W_dtype=torch.float32, device=DEVICE)
best_smc.load(f"best_{str(best_smc)}.pth")
y_test_pred = best_smc.predict(X_test)
test_accuracy = 100. * torch.mean((y_test == y_test_pred).double())
print(f"SoftmaxClassifier final test set accuracy: {test_accuracy}")


# (Bonus) Part 3: Beyond Linear Classification

Learning a linear model for classification is a good learning exercise.
But images are not highly non-linear functions and to learn the intricate semantics of images to classify them reliably linear models are not sufficient.

We have seen that our linear models have a hard time classifying the CIFAR10 dataset. In this bonus exercise you will learn to go **beyond** the limitations of linearity.

In this bonus task you can experiment with two architectures:
- **Multi Layer Perceptron (MLP)** (1+0.5+0.5pts)
  - implement a MLP architecture including its forward pass (1pt)
  - train it with the previously implemented **Hinge Loss** (0.5pts)
  - train it with the previously implemented **Cross Entropy Loss** (0.5pts)
- **Convolutional Neural Network (CNN)** (1+0.5+0.5pts)
    - implement a CNN architecture with a classification head including its forward pass (1pt)
    - train it with the previously implemented **Hinge Loss** (0.5pts)
    - train it with the previously implemented **Cross Entropy Loss** (0.5pts)

For this bonus part of the assignment, we will give you a lot of freedom. You are allowed to use as many of the previously implemented functions as you like. 
The grading will be done based on the performance of your models on the test set and their implementation.


#### Get Creative!