# PyTorch - Model Building
These are short, practical notes I wrote while building a small classifier for FashionMNIST. Intent: quick reference when I return to model code.

In [1]:
import torch
from torch import nn
from torchvision import datasets, transforms

## Device
Check for accelerator and keep code device-agnostic so I can run locally or on a GPU.

In [2]:
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else 'cpu'
print('Using', device)

Using cuda


## Define the network
I subclass `nn.Module`, declare layers in `__init__`, and implement `forward` for the data flow. This pattern is easy to extend later.

In [3]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Linear(256, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

## Quick instantiation and smoke test
Create the model, move it to device, and run a random forward pass to check shapes.

In [4]:
model = NeuralNetwork().to(device)
print(model)
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print('Predicted class:', y_pred)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=10, bias=True)
  )
)
Predicted class: tensor([1], device='cuda:0')


## Layer notes (reminder)
- `nn.Flatten`: convert HxW -> vector
- `nn.Linear`: learnable affine transform
- `nn.ReLU`: activation between layers
- `nn.Sequential`: ordered container for layers
- `nn.Softmax`: convert logits to probabilities at inference
- Model parameters: each `nn.Linear` has `weight` and `bias` tensors - printing these helps debug initialization and shape issues.

In [6]:
# Inspect parameters: shapes and a small sample of values
for name, param in model.named_parameters():
    print(name, tuple(param.size()))
    # show a tiny sample of values for quick inspection
    print('sample values:', param.view(-1)[:4].tolist())

linear_relu_stack.0.weight (512, 784)
sample values: [-0.011036349460482597, 0.01149376854300499, 0.007524848449975252, -0.03124363161623478]
linear_relu_stack.0.bias (512,)
sample values: [-0.0004947952111251652, 0.022357787936925888, -0.03394099324941635, 0.007803768385201693]
linear_relu_stack.2.weight (256, 512)
sample values: [0.020519282668828964, -0.01405169628560543, 0.011832769960165024, -0.029117703437805176]
linear_relu_stack.2.bias (256,)
sample values: [-0.03232941776514053, 0.020669857040047646, -0.029854899272322655, 0.020646244287490845]
linear_relu_stack.4.weight (10, 256)
sample values: [0.02300216257572174, 0.02913334220647812, -0.016544237732887268, -0.018959976732730865]
linear_relu_stack.4.bias (10,)
sample values: [0.040477924048900604, 0.05393499135971069, 0.04905401170253754, 0.03238512575626373]


## Practical tips I use
- Keep model and tensors on the same `device`.
- Use `CrossEntropyLoss` with raw logits (no softmax beforehand).
- For quick debugging use small subsets and `num_workers=0`.
- Print tensor shapes when a mismatch occurs.