In [1]:
!nvidia-smi

Thu Sep  4 18:18:29 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   56C    P8             10W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                      

# MNIST — MLP warm-up

We’ll load MNIST, normalize it, define a small MLP, and run a quick
sanity check forward pass to verify shapes before training.


In [2]:
# OPTIONAL: only run this if your torch/torchvision install is broken.
# For GPU on Kaggle (CUDA 12.1 wheels):
# !pip install --upgrade --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# For CPU-only:
# !pip install --upgrade --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu


## Imports & device

We’ll autodetect CUDA and fall back to CPU. The code works either way.


In [3]:
import torch
import torchvision

device = "cuda" if torch.cuda.is_available() else "cpu"

print("torch:", torch.__version__)
print("torchvision:", torchvision.__version__)
print("device:", device)


torch: 2.6.0+cu124
torchvision: 0.21.0+cu124
device: cuda


## Dataset & transforms

- `ToTensor()` → scales pixels to [0,1] with shape [1, 28, 28].
- `Normalize((0.1307,), (0.3081,))` → center/scale using MNIST stats.
  (Note the commas: single-element tuples.)


In [4]:
transform = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize((0.1307,), (0.3081,))
])

train_mnist = torchvision.datasets.MNIST(
    root="./data", train=True, download=True, transform=transform
)
test_mnist = torchvision.datasets.MNIST(
    root="./data", train=False, download=True, transform=transform
)

# quick peek
x0, y0 = train_mnist[0]
print("one sample:", x0.shape, y0)  # torch.Size([1, 28, 28]) label_int


100%|██████████| 9.91M/9.91M [00:00<00:00, 38.4MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 1.10MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 10.2MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 7.88MB/s]

one sample: torch.Size([1, 28, 28]) 5





## Model

A simple fully-connected classifier:
- Flatten 28×28 → 784
- Hidden: 300 → 300 with LeakyReLU
- Output: 10 logits (no Softmax; CrossEntropyLoss expects logits)


In [5]:
model = torch.nn.Sequential(
    torch.nn.Linear(28*28, 300),
    torch.nn.LeakyReLU(),
    torch.nn.Linear(300, 300),
    torch.nn.LeakyReLU(),
    torch.nn.Linear(300, 10)  # logits
).to(device)

sum_params = sum(p.numel() for p in model.parameters())
print("model params:", sum_params)


model params: 328810
