# Sanity Checks — CNN-from-scratch


### What this notebook does
1. Verifies imports and environment.
2. Runs `im2col/col2im` adjoint check.
3. Builds a small model (LeNet for MNIST) and runs a forward pass.
4. Runs a single optimizer step and checks loss improvement.
5. Tries to overfit a tiny subset for a few iterations (loss should go down fast).
6. Prints gradient norms to catch NaNs or explosions.
7. Optionally saves a checkpoint.


In [1]:
import sys, os, platform, numpy as np
print('Python:', sys.version)
print('Platform:', platform.platform())
print('CWD:', os.getcwd())
print('Sys.path[0]:', sys.path[0])

# Ensure project root (folder containing `src/`) is on sys.path
ROOT = os.path.abspath(os.path.join(os.getcwd()))
if not os.path.isdir(os.path.join(ROOT, 'src')):
    # If the notebook sits in a subfolder, try one level up
    ROOT = os.path.abspath(os.path.join(os.getcwd(), '..'))
if os.path.isdir(os.path.join(ROOT, 'src')) and ROOT not in sys.path:
    sys.path.insert(0, ROOT)
print('Project root assumed:', ROOT)
print('Has src/?', os.path.isdir(os.path.join(ROOT, 'src')))


Python: 3.10.18 | packaged by conda-forge | (main, Jun  4 2025, 14:42:04) [MSC v.1943 64 bit (AMD64)]
Platform: Windows-10-10.0.26100-SP0
CWD: c:\Users\arnov\Desktop\CNN-from-scratch\notebooks
Sys.path[0]: c:\Users\arnov\miniconda3\envs\cnn-from-scratch\python310.zip
Project root assumed: c:\Users\arnov\Desktop\CNN-from-scratch
Has src/? True


In [None]:
# Core imports from your project
from src.core.utils import im2col, col2im, one_hot, make_batches, set_seed
from src.core.losses import softmax_cross_entropy, softmax_cross_entropy_backward
from src.core.metrics import accuracy
from src.core.optim import Adam
from src.layers.conv2d import Conv2D
from src.layers.dense import Dense
from src.layers.activations import ReLU
from src.layers.pooling import MaxPool2D
from src.layers.batchnorm import BatchNorm2D
from src.layers.dropout import Dropout
from src.models.convnet_small import lenet_mnist
from src.models.sequential import Sequential
from src.data.mnist import load_mnist
print('Project imports OK')


✅ Project imports OK


## 1) `im2col/col2im` adjoint check
We verify that `<im2col(x), C> == <x, col2im(C)>` which should hold for any shapes and kernel/stride/pad.


In [None]:
set_seed(0)
x = np.random.randn(2, 3, 5, 5).astype(np.float64)
KH, KW, stride, pad = 3, 3, 1, 1
cols = im2col(x, (KH, KW), stride=stride, pad=pad).astype(np.float64)
C = np.random.randn(*cols.shape).astype(np.float64)
lhs = np.sum(cols * C)
rhs = np.sum(x * col2im(C, x.shape, (KH, KW), stride=stride, pad=pad))
print('Adjoint diff:', float(abs(lhs - rhs)))
assert np.allclose(lhs, rhs, rtol=1e-10, atol=1e-10)
print('im2col/col2im adjoint OK')


Adjoint diff: 1.4210854715202004e-14
✅ im2col/col2im adjoint OK


## 2) Build LeNet for MNIST and forward pass
We create the model in float32 for speed, run a forward pass on random inputs, and check shapes and finite values.


In [None]:
import numpy as np
set_seed(42)
model = lenet_mnist(num_classes=10)
# switch model to training
model.train()
dummy = np.random.randn(8, 1, 28, 28).astype(np.float32)
logits = model.forward(dummy, training=True)
print('Logits shape:', logits.shape)
print('Finite check:', np.isfinite(logits).all())
assert logits.shape == (8, 10)
assert np.isfinite(logits).all()
print('Forward pass OK')


Logits shape: (8, 10)
Finite check: True
✅ Forward pass OK


## 3) Single optimizer step on random labels
We compute CE loss against random labels and do a single Adam step. Loss should not be NaN and typically decreases a bit.


In [None]:
y = np.random.randint(0, 10, size=(8,))
y1 = one_hot(y, 10)
loss0 = softmax_cross_entropy(logits, y1)
grad_logits = softmax_cross_entropy_backward(logits, y1)
dx = model.backward(grad_logits)
opt = Adam(lr=1e-3)
opt.step(model.params(), model.grads())
logits1 = model.forward(dummy, training=True)
loss1 = softmax_cross_entropy(logits1, y1)
print('Loss before:', float(loss0), 'after one step:', float(loss1))
print('Grad norm dX:', float(np.linalg.norm(dx)))
assert np.isfinite(loss0) and np.isfinite(loss1)
print('Single step OK')


Loss before: 4.136271525692004 after one step: 5.437370820656147
Grad norm dX: 0.9990653157659999
✅ Single step OK


## 4) Tiny overfit on MNIST subset (optional)
We try to overfit a tiny subset of MNIST (e.g., 256 samples) for a few epochs. Loss should go down quickly.

**Note:** This cell will download MNIST into `data/mnist/`. If your network blocks download, copy the files manually first.


In [None]:
try:
    (Xtr, ytr), (Xval, yval), (_, _), num_classes = load_mnist()
    # take a tiny subset
    n_small = 256
    Xs, ys = Xtr[:n_small].astype(np.float32), ytr[:n_small]
    model = lenet_mnist(num_classes=num_classes)
    model.train()
    opt = Adam(lr=1e-3)
    for epoch in range(1, 6):
        # simple SGD loop over whole tiny set in one batch
        logits = model.forward(Xs, training=True)
        y1 = one_hot(ys, num_classes)
        loss = softmax_cross_entropy(logits, y1)
        grad = softmax_cross_entropy_backward(logits, y1)
        _ = model.backward(grad)
        opt.step(model.params(), model.grads())
        preds = np.argmax(logits, axis=1)
        acc = (preds == ys).mean()
        print(f"epoch {epoch}: loss={loss:.4f} acc={acc:.3f}")
    print(' Tiny overfit completed (loss should trend down)')
except Exception as e:
    print('MNIST tiny overfit skipped due to error:', e)


epoch 1: loss=2.4667 acc=0.141
epoch 2: loss=2.2397 acc=0.188
epoch 3: loss=2.1194 acc=0.227
epoch 4: loss=2.0754 acc=0.246
epoch 5: loss=2.0133 acc=0.289
✅ Tiny overfit completed (loss should trend down)


## 5) Gradient norm diagnostics
Print parameter and gradient norms to spot NaNs or exploding/vanishing gradients.


In [None]:
model = lenet_mnist(num_classes=10)
model.train()
xb = np.random.randn(32, 1, 28, 28).astype(np.float32)
yb = np.random.randint(0, 10, size=(32,))
logits = model.forward(xb, training=True)
y1 = one_hot(yb, 10)
loss = softmax_cross_entropy(logits, y1)
grad = softmax_cross_entropy_backward(logits, y1)
_ = model.backward(grad)

print('Loss:', float(loss))
for k, v in model.params().items():
    g = model.grads().get(k)
    print(f"{k:40s} | param_norm={np.linalg.norm(v):.3e} | grad_norm={np.linalg.norm(g) if g is not None else float('nan'):.3e}")
print('Gradient norms printed')


Loss: 6.56899767290372
0.Conv2D.W                               | param_norm=3.333e+00 | grad_norm=3.135e+00
0.Conv2D.b                               | param_norm=0.000e+00 | grad_norm=1.483e+00
3.Conv2D.W                               | param_norm=5.664e+00 | grad_norm=1.487e+01
3.Conv2D.b                               | param_norm=0.000e+00 | grad_norm=8.505e-01
6.Dense.W                                | param_norm=1.547e+01 | grad_norm=2.094e+01
6.Dense.b                                | param_norm=0.000e+00 | grad_norm=4.924e-01
8.Dense.W                                | param_norm=1.313e+01 | grad_norm=9.745e+00
8.Dense.b                                | param_norm=0.000e+00 | grad_norm=4.216e-01
11.Dense.W                               | param_norm=4.446e+00 | grad_norm=8.279e+00
11.Dense.b                               | param_norm=0.000e+00 | grad_norm=3.502e-01
✅ Gradient norms printed


## 6) Optional: save a quick checkpoint
We save all current parameters to `checkpoints/sanity_check.npz` for later debugging.


In [8]:
import os, numpy as np
os.makedirs('checkpoints', exist_ok=True)
np.savez('checkpoints/sanity_check.npz', **model.params())
print('Saved to checkpoints/sanity_check.npz')


Saved to checkpoints/sanity_check.npz
