# 05. PyTorch Build the Model Tutorial

**Goals**: to understand how a neural network is built.

**Source**: [PyTorch Docs Build the Neural Network](https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)

This notebook will provide instructions for building a neural network that classifies images in the FashionMNIST dataset.

## 1. Imports and Basic Configuration

In [1]:
import os
import torch
from torch import nn # provides all the building blocks you need to build neural network
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# Basic checking
print(f"PyTroch vesrion: {torch.__version__}")
print(f"CUDA is available: {torch.cuda.is_available()}")

# Establish a seed value to ensure reproducibility
torch.manual_seed(42)

PyTroch vesrion: 2.9.0+cpu
CUDA is available: False


<torch._C.Generator at 0x7a7833fed6d0>

## 2. Get Device for Training

In [2]:
# We want to be able to train our model on accelerator. The main benefit is speed.
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else 'cpu'
print(f"Using {device} device")

Using cpu device


## 3. Define the Class

In [3]:
# Every module in PyTorch is a subclass of nn.Module
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten() # Reshaping into a one-dimensional tensor
        # Initializing the neural network layers
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28 * 28, 512), # Applies an affine linear transformation to the incoming data
            nn.ReLU(),
            nn.Linear(512, 512), # second hidden layer
            nn.ReLU(),
            nn.Linear(512, 10) # third hidden layer
        )
    # Every nn.Module subclass implements the operations on input data in the forward method
    def forward(self, x): # Do not call model.forward() directly!
        x = self.flatten(x)
        logits = self.linear_relu_stack(x) # logits — raw ouput values produced by the model before applying any activation function
        return logits

# Creating an instance of NeuralNetwork, and moving it to the device
model = SimpleNN().to(device)
print(f"Model structure: \n{model}")

# Creating a random tensor 1x28x28 on 'device'
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits) # dim indicates the dimension along which the values must sum to 1
y_pred = pred_probab.argmax(1) 
print(f"Predicted class: {y_pred}")

Model structure: 
SimpleNN(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)
Predicted class: tensor([7])


## 4. Model Layers

In [6]:
# Taking a sample minibatch of 3 images of size 28x28
input_image = torch.rand(3,28,28)
print(f"\nInput Image Size: {input_image.size()}")

# Converting each 2D 28x28 image into a contiguous array of 784 pixel values
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(f"\nImage Size after 'nn.Flatten': {flat_image.size()}")

# Applying a linear transformation on the input using its stored weights and biases
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(f"\nImage Size after 'nn.Linear': {hidden1.size()}")

print(f"\nBefore ReLU: {hidden1}")
hidden1 = nn.ReLU()(hidden1)
print(f"\nAfter ReLU: {hidden1}")

# Sequential passes data through all modules in the order defined
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

# Normalizing logits to the range [0,1] 
pred_probab = nn.Softmax(dim=1)(logits)


Input Image Size: torch.Size([3, 28, 28])

Image Size after 'nn.Flatten': torch.Size([3, 784])

Image Size after 'nn.Linear': torch.Size([3, 20])

Before ReLU: tensor([[-0.2457,  0.6051, -0.5019,  0.2278,  0.1204, -0.0164,  0.0470,  0.0397,
          0.1114,  0.3447, -0.0402,  0.0125,  0.3650,  0.0356,  0.5223, -0.3223,
          0.4374,  0.1148,  0.1676, -0.2845],
        [-0.0496,  0.4623, -0.2226,  0.4055,  0.1283,  0.3487,  0.1018, -0.2026,
          0.0830,  0.2022, -0.2077,  0.2010,  0.1024, -0.4571,  0.3354, -0.4658,
          0.4317,  0.2509, -0.1077, -0.0292],
        [ 0.0359,  0.3771, -0.3532,  0.0949, -0.0421,  0.5677,  0.0684, -0.1937,
          0.1316, -0.1935, -0.1906, -0.0190,  0.0708, -0.1198,  0.1613,  0.1432,
          0.4196, -0.1257, -0.3270, -0.3142]], grad_fn=<AddmmBackward0>)

After ReLU: tensor([[0.0000, 0.6051, 0.0000, 0.2278, 0.1204, 0.0000, 0.0470, 0.0397, 0.1114,
         0.3447, 0.0000, 0.0125, 0.3650, 0.0356, 0.5223, 0.0000, 0.4374, 0.1148,
         0.16

## 5. Model Parameters

In [12]:
print(f"Model structure: {model}\n\n")
for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param}\n")

Model structure: SimpleNN(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : Parameter containing:
tensor([[ 0.0273,  0.0296, -0.0084,  ..., -0.0142,  0.0093,  0.0135],
        [-0.0188, -0.0354,  0.0187,  ..., -0.0106, -0.0001,  0.0115],
        [-0.0008,  0.0017,  0.0045,  ..., -0.0127, -0.0188,  0.0059],
        ...,
        [-0.0084, -0.0058,  0.0228,  ...,  0.0293,  0.0206, -0.0119],
        [ 0.0009,  0.0123,  0.0233,  ..., -0.0127, -0.0286,  0.0204],
        [-0.0308,  0.0149, -0.0223,  ...,  0.0130, -0.0236, -0.0194]],
       requires_grad=True)

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : Parameter containing:
tensor([-1.5505e

## 6. Conclusion and notes

### We learned:
- How to define a basic neural network architecture for classifying FashionMNIST images using PyTorch.
- How to use layers from the torch.nn namespace to construct your model.
- Every PyTorch model subclasses `nn.Module`, and is typically composed of other modules (layers) in a nested structure.
- We organize the network’s architecture by defining layers in `__init__`, and the forward computation in the forward method.
- Utilities like `nn.Sequential` can bundle layers for ease of use.
### Notes:
- For predictions, call the model as a function (do NOT call `forward()` directly).
- All parameters (weights, biases) of the model can be viewed using `.parameters()` or `.named_parameters()`.
