# Problem Set 1 - Neural network implementation

As described in section "3 Neural network implementation" of assignment 1, the goal is to build a shallow neural network from scratch using different approaches. To validate that your code is working and that the network is actually learning something, please use the following MNIST classification task. Finally, please submit proof of the learning progress as described in the assignment.

## Imports

In [1]:
import random
import pandas as pd
import numpy as np
from sklearn import model_selection
import sklearn.datasets as sk_datasets
import torchvision.datasets as torch_datasets
from torchvision import transforms
import torch
import matplotlib.pyplot as plt

from scratch.network import Network
from scratch.res_network import ResNetwork #To be added: pytorch torch network
from scratch.utils import *
from pytorch.network import TorchNetwork

  Referenced from: <0B7EB158-53DC-3403-8A49-22178CAB4612> /opt/anaconda3/envs/DL_2025/lib/python3.10/site-packages/torchvision/image.so
  warn(


In [2]:
# Automatically load changes in imported modules
%load_ext autoreload
%autoreload 2

# Explicitly set seed for reproducibility
GLOBAL_RANDOM_STATE = 42

random.seed(GLOBAL_RANDOM_STATE)
np.random.seed(GLOBAL_RANDOM_STATE)

## A) Neural Network Classifier from Scratch

### Data

In [3]:
# Download MNIST dataset
x, y_cat = sk_datasets.fetch_openml('mnist_784', version=1, return_X_y=True, cache=True, as_frame=False) #Fetching the mnist dataset sk_learn database

# Preprocess dataset
x = (x / 255).astype('float32') #Diving each image into a pixel value to rescale between 0-1 which means that we normalise the dataset
y_cat = y_cat.astype(int)
# One-hot encode y
y = np.zeros((len(y_cat), 10))
for i, val in enumerate(y_cat):
    y[i, val] = 1

# Use only small subset of data for faster training
x = x[:1000]
y = y[:1000]

# Split data into train and validation set
x_train, x_val, y_train, y_val = model_selection.train_test_split(x, y, test_size=0.2, random_state=GLOBAL_RANDOM_STATE)



  warn(


### ML Model & Training

In [24]:
fnn = Network(sizes=[784, 128, 64, 10], learning_rate=0.1, epochs=30)
fnn.fit(x_train, y_train, x_val, y_val, cosine_annealing_lr=False)

Epoch: 1, Training Time: 0.33s, Training Accuracy: 58.00%, Validation Accuracy: 57.50%
Epoch: 2, Training Time: 0.66s, Training Accuracy: 80.12%, Validation Accuracy: 74.00%
Epoch: 3, Training Time: 1.00s, Training Accuracy: 85.88%, Validation Accuracy: 79.50%
Epoch: 4, Training Time: 1.35s, Training Accuracy: 89.25%, Validation Accuracy: 82.00%
Epoch: 5, Training Time: 1.67s, Training Accuracy: 91.88%, Validation Accuracy: 83.50%
Epoch: 6, Training Time: 2.00s, Training Accuracy: 94.38%, Validation Accuracy: 84.50%
Epoch: 7, Training Time: 2.35s, Training Accuracy: 95.50%, Validation Accuracy: 84.50%
Epoch: 8, Training Time: 2.70s, Training Accuracy: 96.00%, Validation Accuracy: 85.00%
Epoch: 9, Training Time: 3.03s, Training Accuracy: 97.50%, Validation Accuracy: 85.50%
Epoch: 10, Training Time: 3.35s, Training Accuracy: 98.38%, Validation Accuracy: 86.50%
Epoch: 11, Training Time: 3.71s, Training Accuracy: 98.62%, Validation Accuracy: 86.50%
Epoch: 12, Training Time: 4.04s, Training

### Test cosine annealing scheduler

In [5]:
fnn.fit(x_train, y_train, x_val, y_val, cosine_annealing_lr=True)

Epoch: 1, Training Time: 0.27s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 2, Training Time: 0.53s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 3, Training Time: 0.80s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 4, Training Time: 1.07s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 5, Training Time: 1.33s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 6, Training Time: 1.60s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 7, Training Time: 1.86s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 8, Training Time: 2.13s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 9, Training Time: 2.39s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 10, Training Time: 2.66s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 11, Training Time: 2.93s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 12, Training Time: 3.19

### Test residual neural network

In [6]:
res_nn = ResNetwork(sizes=[784, 128, 128, 10], learning_rate=1, epochs=30)
res_nn.fit(x_train, y_train, x_val, y_val)

⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residua

## B) Neural Network Classifier using Torch

### Data

In [7]:
# Define data preprocessing steps
transform = transforms.Compose([
                transforms.ToTensor(),
                transforms.Normalize((0.5,), (0.5,))
            ])

# Download MNIST dataset
train_set = torch_datasets.MNIST('data', train=True, download=True, transform=transform)
val_set = torch_datasets.MNIST('data', train=False, download=True, transform=transform)

# Use only small subset of data for faster training
train_set = torch.utils.data.Subset(train_set, range(1000))
val_set = torch.utils.data.Subset(val_set, range(1000))

# Utilize PyTorch DataLoader from simplified & harmonized loading of data
train_loader = torch.utils.data.DataLoader(train_set, batch_size=1)
val_loader = torch.utils.data.DataLoader(val_set, batch_size=1)


### ML Model & Training

In [21]:
torch_nn = TorchNetwork(sizes=[784, 128, 64, 10], learning_rate=0.1, epochs=30, random_state=GLOBAL_RANDOM_STATE)
torch_nn.fit(train_loader, val_loader)

Epoch: 1, Training Time: 1.11s, Learning Rate: 0.1, Training Accuracy: 24.80%, Validation Accuracy: 23.30%
Epoch: 2, Training Time: 2.13s, Learning Rate: 0.1, Training Accuracy: 61.80%, Validation Accuracy: 54.90%
Epoch: 3, Training Time: 3.33s, Learning Rate: 0.1, Training Accuracy: 63.80%, Validation Accuracy: 54.40%
Epoch: 4, Training Time: 4.42s, Learning Rate: 0.1, Training Accuracy: 64.70%, Validation Accuracy: 55.50%
Epoch: 5, Training Time: 5.72s, Learning Rate: 0.1, Training Accuracy: 66.30%, Validation Accuracy: 56.80%
Epoch: 6, Training Time: 6.82s, Learning Rate: 0.1, Training Accuracy: 66.80%, Validation Accuracy: 60.40%
Epoch: 7, Training Time: 8.09s, Learning Rate: 0.1, Training Accuracy: 76.80%, Validation Accuracy: 69.30%
Epoch: 8, Training Time: 9.16s, Learning Rate: 0.1, Training Accuracy: 62.80%, Validation Accuracy: 55.00%
Epoch: 9, Training Time: 10.41s, Learning Rate: 0.1, Training Accuracy: 65.40%, Validation Accuracy: 55.30%
Epoch: 10, Training Time: 11.48s, Le

## C) Visualize accuracy & hyperparameter tuning

Here, you should compare the accuracy of all trained models. Optionally, you can also show the results of hyperparameter tuning and comment which hyperparameters work best for this task.