# Problem Set 1 - Neural network implementation

As described in section "3 Neural network implementation" of assignment 1, the goal is to build a shallow neural network from scratch using different approaches. To validate that your code is working and that the network is actually learning something, please use the following MNIST classification task. Finally, please submit proof of the learning progress as described in the assignment.

## Imports

In [None]:
import random
import pandas as pd
import numpy as np
from sklearn import model_selection
import sklearn.datasets as sk_datasets
import torchvision.datasets as torch_datasets
from torchvision import transforms
import torch
import matplotlib.pyplot as plt

from scratch.network import Network
from scratch.res_network import ResNetwork #To be added: pytorch torch network
from scratch.utils import *

In [6]:
# Automatically load changes in imported modules
%load_ext autoreload
%autoreload 2

# Explicitly set seed for reproducibility
GLOBAL_RANDOM_STATE = 42

random.seed(GLOBAL_RANDOM_STATE)
np.random.seed(GLOBAL_RANDOM_STATE)

## A) Neural Network Classifier from Scratch

### Data

In [7]:
# Download MNIST dataset
x, y_cat = sk_datasets.fetch_openml('mnist_784', version=1, return_X_y=True, cache=True, as_frame=False) #Fetching the mnist dataset sk_learn database

# Preprocess dataset
x = (x / 255).astype('float32') #Diving each image into a pixel value to rescale between 0-1
y_cat = y_cat.astype(int)
# One-hot encode y
y = np.zeros((len(y_cat), 10))
for i, val in enumerate(y_cat):
    y[i, val] = 1

# Use only small subset of data for faster training
x = x[:1000]
y = y[:1000]

# Split data into train and validation set
x_train, x_val, y_train, y_val = model_selection.train_test_split(x, y, test_size=0.2, random_state=GLOBAL_RANDOM_STATE)



  warn(


### ML Model & Training

In [8]:
fnn = Network(sizes=[784, 128, 64, 10], learning_rate=0.1, epochs=30)
fnn.fit(x_train, y_train, x_val, y_val, cosine_annealing_lr=False)

Epoch: 1, Training Time: 0.28s, Training Accuracy: 58.00%, Validation Accuracy: 57.50%
Epoch: 2, Training Time: 0.54s, Training Accuracy: 80.12%, Validation Accuracy: 74.00%
Epoch: 3, Training Time: 0.81s, Training Accuracy: 85.88%, Validation Accuracy: 79.50%
Epoch: 4, Training Time: 1.08s, Training Accuracy: 89.25%, Validation Accuracy: 82.00%
Epoch: 5, Training Time: 1.34s, Training Accuracy: 91.88%, Validation Accuracy: 83.50%
Epoch: 6, Training Time: 1.61s, Training Accuracy: 94.38%, Validation Accuracy: 84.50%
Epoch: 7, Training Time: 1.89s, Training Accuracy: 95.50%, Validation Accuracy: 84.50%
Epoch: 8, Training Time: 2.15s, Training Accuracy: 96.00%, Validation Accuracy: 85.00%
Epoch: 9, Training Time: 2.42s, Training Accuracy: 97.50%, Validation Accuracy: 85.50%
Epoch: 10, Training Time: 2.71s, Training Accuracy: 98.38%, Validation Accuracy: 86.50%
Epoch: 11, Training Time: 3.03s, Training Accuracy: 98.62%, Validation Accuracy: 86.50%
Epoch: 12, Training Time: 3.32s, Training

### Test cosine annealing scheduler

In [9]:
fnn.fit(x_train, y_train, x_val, y_val, cosine_annealing_lr=True)

Epoch: 1, Training Time: 0.27s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 2, Training Time: 0.54s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 3, Training Time: 0.81s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 4, Training Time: 1.08s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 5, Training Time: 1.35s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 6, Training Time: 1.62s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 7, Training Time: 1.88s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 8, Training Time: 2.16s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 9, Training Time: 2.43s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 10, Training Time: 2.71s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 11, Training Time: 2.97s, Training Accuracy: 100.00%, Validation Accuracy: 89.50%
Epoch: 12, Training Time: 3.24

### Test residual neural network

In [10]:
res_nn = ResNetwork(sizes=[784, 128, 128, 10], learning_rate=1, epochs=30)
res_nn.fit(x_train, y_train, x_val, y_val)

⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residual connection skipped at Layer 1 (shape mismatch)
 Residual connection applied at Layer 2
⚠️ Residua

## B) Neural Network Classifier using Torch

### Data

In [24]:
# Define data preprocessing steps
transform = transforms.Compose([
                transforms.ToTensor(),
                transforms.Normalize((0.5,), (0.5,))
            ])

# Download MNIST dataset
train_set = torch_datasets.MNIST('data', train=True, download=True, transform=transform)
val_set = torch_datasets.MNIST('data', train=False, download=True, transform=transform)

# Use only small subset of data for faster training
train_set = torch.utils.data.Subset(train_set, range(1000))
val_set = torch.utils.data.Subset(val_set, range(1000))

# Utilize PyTorch DataLoader from simplified & harmonized loading of data
train_loader = torch.utils.data.DataLoader(train_set, batch_size=1)
val_loader = torch.utils.data.DataLoader(val_set, batch_size=1)


[autoreload of pytorch.network failed: Traceback (most recent call last):
  File "/opt/anaconda3/envs/DL_2025/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 276, in check
    superreload(m, reload, self.old_objects)
  File "/opt/anaconda3/envs/DL_2025/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 475, in superreload
    module = reload(module)
  File "/opt/anaconda3/envs/DL_2025/lib/python3.10/importlib/__init__.py", line 169, in reload
    _bootstrap._exec(spec, module)
  File "<frozen importlib._bootstrap>", line 619, in _exec
  File "<frozen importlib._bootstrap_external>", line 879, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1017, in get_code
  File "<frozen importlib._bootstrap_external>", line 947, in source_to_code
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/sattikiganguly/assignment-1-ps-1-e/assignment-1-ps-1-e/pytorch/network.py", line 61
    f'Learning Rate

### ML Model & Training

In [25]:
torch_nn = TorchNetwork(sizes=[784, 128, 64, 10], learning_rate=0.2, epochs=50, random_state=GLOBAL_RANDOM_STATE)
torch_nn.fit(train_loader, val_loader)

TypeError: eq() received an invalid combination of arguments - got (NoneType, Tensor), but expected one of:
 * (Tensor input, Tensor other, *, Tensor out = None)
 * (Tensor input, Number other, *, Tensor out = None)


## C) Visualize accuracy & hyperparameter tuning

Here, you should compare the accuracy of all trained models. Optionally, you can also show the results of hyperparameter tuning and comment which hyperparameters work best for this task.

In [None]:
### BEGIN SOLUTION ###
 

### END SOLUTION ###