# PyTorch Data Loading

## Imports

First, we import pytorch. `nn` is used for making a Module object, which can hold models and submodules. `torchvision` contains a bunch of computer vision datasets (e.g., MNIST) and we use ToTensor to turn the data into a tensor with values scaled to between 0.0 and 1.0 unless the input datatype is not listed in the method's docs.

In [20]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

## Loading the Datasets

Next, we load the fashion MNIST dataset, a set of 28 x 28 px grayscale images, each associated with a label from one of 10 classes.

`root`: target destination folder name
`train`: boolean indicating whether the set is for training or not
`download`: boolean indicating whether the set should be downloaded from the internet if not found in `root`
`transform`: tranformation to be applied to the dataset samples

In [21]:
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

## DataLoader Iterable Class
Wrapping the datasets in a DataLoader iterable enables automatic batching, shuffling, and multiprocess data loading.

In [22]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print("Shape of X [N, C, H, W]: ", X.shape) # N = batch size, C = number of channels, H = height, W = width
    print("Shape of y: ", y.shape, y.dtype)
    break

Shape of X [N, C, H, W]:  torch.Size([64, 1, 28, 28])
Shape of y:  torch.Size([64]) torch.int64


## Defining and Building a Model
The `NeuralNetwork` class inherits from the `nn.Module` class. The layers are defined in the `__init__` function and the `forward` function is how we specify how to pass data through the network. By moving to an accelerator, which can leverage the asynchronous capabilities of devices like GPUs, operations can be accelerated; if no accelerator is available, the CPU is used.

Accelerators available include CUDA, MPS, MTIA, or XPU.

In [26]:
# Get cpu or gpu device for training; an accelerator is a device that can execute code, such as a GPU
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()  # Turns the 28x28 image into a 1x784 tensor by taking each row of pixels and lining them up end to end
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),  # 28*28 = 784; 512 is the number of neurons in the hidden layer; Linear applies an affine transformation to the input data
            nn.ReLU(),  # ReLU activation function, which is max(0, x)
            nn.Linear(512, 512),  # Another hidden layer; 512 is the number of neurons in the hidden layer
            nn.ReLU(),  # Another ReLU activation function
            nn.Linear(512, 10),  # Output layer; 10 is the number of classes
        )

    # The forward method defines the computation performed at every call
    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
    
model = NeuralNetwork().to(device)
print(model)

Using cuda device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


## Optimizing the Model Parameters
To train a model, we need:
- A loss function
- An optimizer

In [27]:
loss_fn = nn.CrossEntropyLoss()  # This loss function is used for classification problems
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)  # Stochastic Gradient Descent optimizer

## Training
In the training function, first the model is set to train mode via `model.train()` which affect some layers (e.g., Dropout). Then, in each iteration of the training loop, the model performs an inference on the input `X` and finds the error between that and the actual value `y`. This loss is then backward propagated, the optimizer advances the network to the next iteration and clears/resets the gradients of all model parameters before the next backprop step, which are instead accumulated in the `.backward()` step here.

In [29]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)  # Move the data to the device that is being used by the model; else an error will be thrown
        
        # Compute the prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        
        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")

## Testing/Validation

In [32]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [33]:
## Printing Results for Each Epoch
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.303646 [   64/60000]
loss: 2.297573 [ 6464/60000]
loss: 2.282568 [12864/60000]
loss: 2.271780 [19264/60000]
loss: 2.257754 [25664/60000]
loss: 2.231398 [32064/60000]
loss: 2.237425 [38464/60000]
loss: 2.205606 [44864/60000]
loss: 2.200603 [51264/60000]
loss: 2.181205 [57664/60000]
Test Error: 
 Accuracy: 43.0%, Avg loss: 2.172919 

Epoch 2
-------------------------------
loss: 2.185383 [   64/60000]
loss: 2.177773 [ 6464/60000]
loss: 2.130175 [12864/60000]
loss: 2.135863 [19264/60000]
loss: 2.094116 [25664/60000]
loss: 2.039457 [32064/60000]
loss: 2.062903 [38464/60000]
loss: 1.992119 [44864/60000]
loss: 1.991786 [51264/60000]
loss: 1.931741 [57664/60000]
Test Error: 
 Accuracy: 55.5%, Avg loss: 1.926050 

Epoch 3
-------------------------------
loss: 1.962821 [   64/60000]
loss: 1.933204 [ 6464/60000]
loss: 1.826285 [12864/60000]
loss: 1.854220 [19264/60000]
loss: 1.748441 [25664/60000]
loss: 1.703393 [32064/60000]
loss: 1.725344 [38464/