# ResNet-18 (MNIST) — built from this repo's CNN + DNN blocks

This notebook explains the **structure** of a ResNet-18 style network and how the provided implementation is assembled using:
- `CNN.layers.Layer_Conv2D_Im2Col` (convolution)
- `CNN.layers.Layer_MaxPool2D` / `Layer_Flatten` (basic CNN utilities)
- `DNN` modules (Dense, activations, loss, optimizers)

**Important constraint:** this implementation uses **NumPy only** — no `torch` imports.


## 1) Residual learning (the key idea)
A residual block computes:

\[ y = F(x) + x \] 

If shapes differ (e.g., downsampling), the shortcut becomes:

\[ y = F(x) + W_s x \] 

This makes deep networks easier to optimize because each block only needs to learn a *residual correction*.

## 2) What a BasicBlock contains
In ResNet-18, a BasicBlock is:
- `conv3x3 -> bn -> relu`
- `conv3x3 -> bn`
- `+ shortcut`
- `relu`

In our code: `BasicBlock` is implemented in `ResNet/resnet18_numpy.py` and internally uses `Layer_Conv2D_Im2Col` + `Activation_ReLU` + a scratch `Layer_BatchNorm2D`.


In [1]:
import numpy as np
from resnet18_numpy import BasicBlock

np.random.seed(0)
x = np.random.randn(2, 16, 28, 28).astype(np.float32)
blk = BasicBlock(in_ch=16, out_ch=16, stride=1)
blk.forward(x, training=True)
blk.output.shape

(2, 16, 28, 28)

## 3) BatchNorm2D (scratch)
For conv tensors `(N, C, H, W)`, BatchNorm2D normalizes per-channel over `(N, H, W)`.

Training uses batch mean/var; inference uses running mean/var.
We expose `weights`/`biases` for compatibility with the repo's optimizers (they correspond to `gamma`/`beta`).


In [2]:
from ResNet.resnet18_numpy import Layer_BatchNorm2D

bn = Layer_BatchNorm2D(n_channels=16)
bn.forward(x, training=True)
bn.output.mean(axis=(0,2,3))[:5], bn.output.var(axis=(0,2,3))[:5]

(array([-2.4328426e-09,  3.0410534e-09, -2.4328426e-09,  0.0000000e+00,
         4.8656852e-09], dtype=float32),
 array([0.99998975, 0.9999893 , 0.999989  , 0.9999898 , 0.9999895 ],
       dtype=float32))

## 4) The ResNet-18 style MNIST model
We adapt the stem for MNIST (28×28): use a 3×3 conv instead of the original 7×7 + maxpool.
Then use 4 stages with [2,2,2,2] blocks and channels [16,32,64,128].
Finally: GlobalAveragePooling + Dense to 10 classes.


In [3]:
from ResNet.resnet18_numpy import ResNet18MNIST

model = ResNet18MNIST(num_classes=10)
x0 = np.random.randn(4, 1, 28, 28).astype(np.float32)
model.forward(x0, training=True)
model.output.shape

(4, 10)

## 5) Training entrypoint
Use `ResNet/train_mnist.py`. It loads MNIST via a NumPy-only downloader (`mnist_data.py`) and trains with the repo's cross-entropy + Adam optimizer.

For a quick CPU run:
```bash
python3 -m ResNet.train_mnist --epochs 1 --subset 5000 --batch-size 64 --lr 1e-3
```
