### Learn the Basics

### Source of this file
https://docs.pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html

Most machine learning workflows involve working with 
1. data, 
2. creating models, 
3. optimizing model parameters, and 
4. saving the trained models. 

This tutorial introduces you to a complete ML workflow implemented in PyTorch, with links to learn more about each of these concepts.

We’ll use the FashionMNIST dataset to train a neural network that predicts if an input image belongs to one of the following classes: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, or Ankle boot.

### Quickstart

This section runs through the API for common tasks in machine learning. Refer to the links in each section to dive deeper.

### Working with data

PyTorch has two primitives to work with data: torch.utils.data.DataLoader and torch.utils.data.Dataset. Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset.

In [2]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

PyTorch offers domain-specific libraries such as TorchText, TorchVision, and TorchAudio, all of which include datasets. For this tutorial, we will be using a TorchVision dataset.

The torchvision.datasets module contains Dataset objects for many real-world vision data like CIFAR, COCO (full list here). In this tutorial, we use the FashionMNIST dataset. Every TorchVision Dataset includes two arguments: transform and target_transform to modify the samples and labels respectively.

In [3]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    # download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    # download=True,
    transform=ToTensor(),
)

We pass the Dataset as an argument to DataLoader. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.

In [None]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    # print(X)
    # print(y)
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],


        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],


        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],


        ...,


        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0.

### Creating Models

1. To define a neural network in PyTorch, we create a class that inherits from nn.Module. We define the layers of the network in the __init__ function and specify how data will pass through the network in the forward function.

2. To accelerate operations in the neural network, we move it to the accelerator such as CUDA, MPS, MTIA, or XPU. If the current accelerator is available, we will use it. Otherwise, we use the CPU.

0. For, Build the Neural Network
https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html

1. For accelerator
https://docs.pytorch.org/docs/stable/generated/torch.accelerator.current_accelerator.html#torch.accelerator.current_accelerator
2. For Module
https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module
3. For Flatten
https://docs.pytorch.org/docs/stable/generated/torch.nn.Flatten.html#torch.nn.Flatten
4. For Sequential
https://docs.pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential
5. For Linear
https://docs.pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear
6. For ReLU
https://docs.pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU

In [None]:
# device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
if torch.accelerator.is_available():
    device = torch.accelerator.current_accelerator()
else:
    device = "cpu"

print(f"Using device: {device} ")

# Define model
# base class for all PyTorch models.
class NeuralNetwork(nn.Module):
    # The constructor method, runs when you create a NeuralNetwork object.
    def __init__(self):
        # to call the constructor of the super class
        # super().__init__() → Calls the parent (nn.Module) constructor, so PyTorch can properly register layers, parameters, etc.
        super().__init__()
        # Creates a flattening layer that converts a multi-dimensional tensor (e.g., 28×28 pixels) into a single long vector (784 values).
        # Example: (batch_size, 1, 28, 28) → (batch_size, 784).
        self.flatten = nn.Flatten()
        # nn.Sequential → A container to stack layers in order.
        # Note: Activations (nn.ReLU) are not counted as separate trainable layers, even though they are separate processing steps.
        self.linear_relu_stack = nn.Sequential(
            # nn.Linear(in_features, out_features) → Fully connected layer.
            # First layer: 784 → 512 neurons.
            nn.Linear(28*28, 512),
            # nn.ReLU() → Activation function that keeps positive values as-is and turns negative values into 0. Adds non-linearity.
            nn.ReLU(),
            # Second layer: 512 → 512 neurons.
            nn.Linear(512, 512),
            nn.ReLU(),
            # Third layer: 512 → 10 neurons (for 10 output classes, e.g., digits 0–9).
            nn.Linear(512, 10)
        )
        '''
        So in architecture terms, you might hear:

            Input → Hidden Layer 1 → Hidden Layer 2 → Output Layer

            In your case:
            Hidden Layer 1: Linear(784 → 512) + ReLU
            Hidden Layer 2: Linear(512 → 512) + ReLU
            Output Layer: Linear(512 → 10)
        '''

    # forward → Defines how data flows through the network when you call model(input).
    def forward(self, x):
        # self.flatten(x) → Flattens the image into a 1D vector per example.
        x = self.flatten(x)
        # self.linear_relu_stack(x) → Passes the data through all the layers (linear + ReLU, repeated).
        logits = self.linear_relu_stack(x)
        '''
        nn.Sequential is basically a pipeline of layers.
        When you call it like a function — self.linear_relu_stack(input) — PyTorch:

            1. Takes the input (x here).
            2. Passes it into the first module (first nn.Linear).
            3. Takes the output and feeds it into the second module (nn.ReLU).
            4. Repeats for all the modules, in the order they were listed.
            5. Returns the final output.
        '''
        '''
        self.linear_relu_stack(x)
        is equivalent to manually writing:

        out = layer1(x)       # nn.Linear(28*28, 512)
        out = layer2(out)     # nn.ReLU()
        out = layer3(out)     # nn.Linear(512, 512)
        out = layer4(out)     # nn.ReLU()
        out = layer5(out)     # nn.Linear(512, 10)
        logits = out
        '''
        # logits → The raw output scores for each class before applying something like softmax.
        return logits

# Creates an instance of NeuralNetwork.
# .to(device) moves all model parameters to your target device (cpu or cuda).
model = NeuralNetwork().to(device)
'''
logits = self.linear_relu_stack(x)
    Here, self.linear_relu_stack is an nn.Sequential object (which is an nn.Module).

All PyTorch nn.Module subclasses implement a special method:
    def __call__(self, *args, **kwargs):
        # PyTorch magic
        return self.forward(*args, **kwargs)

So when you do:
    some_module(tensor)

PyTorch actually does:
    some_module.__call__(tensor)
    # which internally calls:
    some_module.forward(tensor)

'''
print(model)

Using device: cuda 
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


### Optimizing the Model Parameters

1. loss function
https://pytorch.org/docs/stable/nn.html#loss-functions
2. optimiser
https://pytorch.org/docs/stable/optim.html

Why both are needed:
1. Loss function tells the model how wrong it is (scalar value).

2. Optimizer decides how to adjust the model weights to reduce that loss in the next step.

* forward pass → loss computation → backward pass → optimizer step


In [None]:
loss_fn = nn.CrossEntropyLoss()
'''
nn.CrossEntropyLoss is a built-in PyTorch loss for multi-class classification problems.
It combines LogSoftmax + NLLLoss in one step.
Input:
    Model output → raw scores (logits) of shape (batch_size, num_classes) (e.g., (64, 10) for MNIST).
    Target labels → integers representing the correct class (e.g., 3 for digit 3).

What it does:
    Applies softmax to logits → converts them to probabilities.
    Takes the negative log of the probability for the correct class.
    Averages over the batch.
'''
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
'''
SGD → Stochastic Gradient Descent.
model.parameters() → Gives the optimizer all trainable parameters of your model (weights, biases).
lr=1e-3 → Learning rate (0.001), controls step size for weight updates.

What the optimizer does each training step:
    1. Looks at gradients stored in model.parameters() (computed via .backward()).
    2. Updates each parameter:
        new_param = old_param - lr * gradient
    3. Prepares for the next batch by zeroing gradients (optimizer.zero_grad()).
'''

In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and backpropagates the prediction error to adjust the model’s parameters.

* forward pass → loss → backward pass → update

In [None]:
'''
dataloader → Supplies mini-batches of (X, y) from your dataset.

size → Total number of samples in the dataset (for progress printing).

model.train() → Sets the model into training mode.
    Important for layers like Dropout or BatchNorm that behave differently in training vs inference.
'''
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # .to(device) → Moves the data to the same CPU/GPU where the model lives.
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        '''
        Forward pass (prediction)
        '''
        pred = model(X)
        loss = loss_fn(pred, y)
        '''
        model(X) → Calls your model’s forward() method via __call__ (the PyTorch magic we discussed earlier).

        pred → Raw logits output from the model.

        loss_fn(pred, y) → Compares predictions with true labels to calculate how wrong the model is.
        '''

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        '''
        loss.backward()
            Traverses the computation graph from loss → all parameters.
            Calculates gradients (.grad attributes) for each parameter.

        optimizer.step()
            Uses those gradients to update the model weights (SGD, Adam, etc.).

        optimizer.zero_grad()
            Clears gradients from the last batch (important — PyTorch accumulates gradients by default).
        '''

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
        '''
        Every 100 batches, prints:

        loss.item() → Converts single-value tensor to a Python float.

        current → How many samples processed so far.
        '''

##### Full flow for one batch
1. Get data → (X, y) from DataLoader.

2. Forward pass → pred = model(X)

3. Loss computation → loss = loss_fn(pred, y)

4. Backward pass → loss.backward()

5. Optimizer step → optimizer.step()

6. Reset grads → optimizer.zero_grad()

We also check the model’s performance against the test dataset to ensure it is learning.

In [None]:
'''
size → Total number of samples in the test dataset.

num_batches → How many mini-batches in the test loader.

model.eval() → Switches model to evaluation mode:
    Disables behaviors like dropout, batch norm updates.
    Ensures consistent results during evaluation.
'''
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    
    test_loss, correct = 0, 0
    '''
    test_loss → Total loss across all batches (will be averaged later).
    correct → Count of correctly classified samples.
    '''
    
    with torch.no_grad():
        '''
        Disables gradient computation.
        Saves memory and speeds up inference because we don’t need .backward() during evaluation.
        '''
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            '''
            Moves inputs and labels to CPU/GPU.
            Gets predictions (pred) from the model.
            '''

            test_loss += loss_fn(pred, y).item()
            '''
            Computes loss for the batch and adds to test_loss.
            .item() → Converts the single-value tensor to a Python float for summing.
            '''

            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
            '''
            pred.argmax(1) → For each sample, pick the class index with the highest logit (predicted class).
            Compare to y → Boolean tensor (True for correct predictions, False for wrong).
            Convert to float → True = 1.0, False = 0.0.
            .sum().item() → Count how many correct predictions in the batch and add to correct.
            '''
    test_loss /= num_batches
    correct /= size
    '''
    Average loss per batch.
    Fraction of correctly classified samples.
    '''

    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

The training process is conducted over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the model’s accuracy and loss at each epoch; we’d like to see the accuracy increase and the loss decrease with every epoch.

##### training loop controller

Big picture of one epoch:
1. Train phase → Learn from the training set (adjust weights).

2. Test phase → Check performance on unseen data (no weight changes).

3. Repeat until the specified number of epochs.

In [12]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.303759  [   64/60000]
loss: 2.293863  [ 6464/60000]
loss: 2.276991  [12864/60000]
loss: 2.270653  [19264/60000]
loss: 2.244901  [25664/60000]
loss: 2.224506  [32064/60000]
loss: 2.228517  [38464/60000]
loss: 2.191536  [44864/60000]
loss: 2.192615  [51264/60000]
loss: 2.159570  [57664/60000]
Test Error: 
 Accuracy: 44.2%, Avg loss: 2.152559 

Epoch 2
-------------------------------
loss: 2.160088  [   64/60000]
loss: 2.153923  [ 6464/60000]
loss: 2.099620  [12864/60000]
loss: 2.120730  [19264/60000]
loss: 2.057470  [25664/60000]
loss: 2.005750  [32064/60000]
loss: 2.036290  [38464/60000]
loss: 1.949521  [44864/60000]
loss: 1.958984  [51264/60000]
loss: 1.896419  [57664/60000]
Test Error: 
 Accuracy: 56.2%, Avg loss: 1.887276 

Epoch 3
-------------------------------
loss: 1.914180  [   64/60000]
loss: 1.890138  [ 6464/60000]
loss: 1.773739  [12864/60000]
loss: 1.820970  [19264/60000]
loss: 1.699471  [25664/60000]
loss: 1.659669  [32064/600

0. Training your Model
https://docs.pytorch.org/tutorials/beginner/basics/optimization_tutorial.html

### Saving Models

1. save()
https://docs.pytorch.org/docs/stable/generated/torch.save.html#torch.save

2. state_dict()
https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.state_dict

A common way to save a model is to serialize the internal state dictionary (containing the model parameters).

In [None]:
torch.save(model.state_dict(), "model.pth")
'''
1. model.state_dict()
    Returns a Python dictionary containing all the model’s learnable parameters (weights, biases).

2. torch.save(obj, path)
    Serializes the object (state_dict) and writes it to a file.

3. "model.pth" → The file name where parameters will be stored.
    .pth is a common extension for PyTorch model files.
'''

print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


### Loading Models

1. Neural Network
https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module
2. load_state_dict()
https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.load_state_dict
3. torch.loaf()
https://docs.pytorch.org/docs/stable/generated/torch.load.html#torch.load

The process for loading a model includes re-creating the model structure and loading the state dictionary into it.

In [None]:
model = NeuralNetwork().to(device)
'''
You must recreate the exact same model class (NeuralNetwork) as the one you trained.
'''
model.load_state_dict(torch.load("model.pth", weights_only=True))
'''
1. torch.load("model.pth", ...) → Reads the file model.pth from disk and returns the Python object you saved earlier (in this case, a state_dict dictionary).

2. weights_only=True
    This is a newer PyTorch 2.0+ parameter to indicate you’re only loading the weights, not expecting to restore a full checkpoint with optimizer state, etc.
    In older PyTorch versions, you would just do:
    model.load_state_dict(torch.load("model.pth"))
    
3. model.load_state_dict(...) → Copies the saved parameter values (weights & biases) into your newly created model.
'''

<All keys matched successfully>

### This model can now be used to make predictions.

In [None]:
# These are the human-readable names for each of the 10 output classes in the FashionMNIST dataset.
# Index 0 → "T-shirt/top", index 1 → "Trouser", and so on.
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

# Sets the model to evaluation mode (turns off dropout, uses fixed batch norm, etc.).
model.eval()

x, y = test_data[0][0], test_data[0][1]
'''
test_data[0] gives the first sample in the dataset:
[0] → image tensor (x)
[1] → label index (y)
'''

# No gradient computation is needed for inference — this saves memory & speeds up execution.
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    '''
    pred[0] → first (and only) sample in the batch.
    .argmax(0) → index of the largest predicted probability (the model’s guess).
    classes[...] → convert from numeric index to label string.
    '''
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"


#### More about saving and loading the model

https://docs.pytorch.org/tutorials/beginner/basics/saveloadrun_tutorial.html