# Multilayer Perceptron (MLP) with PyTorch on MNIST

After implementing an MLP from scratch, it is useful to reproduce the same model using PyTorch. This gives you (1) a correctness check against a widely used framework and (2) a baseline for future experiments (regularization, better optimizers, GPUs, etc.).
This notebook demonstrates how to train a Multilayer Perceptron (MLP) using PyTorch on the MNIST dataset.

## Prerequisites

Install the required packages:

In [1]:
# pip install torch torchvision

## 1. Imports and Device Setup

In [2]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cpu')

## 2. Load the MNIST Dataset with `torchvision`

MNIST images are 28×28 grayscale. For an MLP, we flatten each image into a 784-dimensional vector.
We will use `datasets` from `torchvision` to load the [MNIST](https://yann.lecun.com/exdb/mnist/) handwritten digits dataset. You can find the list of datasets available on torchvision [here](https://pytorch.org/vision/0.8/datasets.html). Now let's take a loot at the parameters we set:


*   `root` sets the directory we store and load our data from.
*   `train` indicates wether we want the training dataset or the test dataset.
*   `transform` allows us to apply transformations to our data, here we are only going to convert the data to tensor so that they work with PyToch, however in the future notebooks you will see more complicated transformations.



In [3]:
transform = transforms.Compose([
    transforms.ToTensor()
])

train_dataset = datasets.MNIST(
    root='data', train=True, download=True, transform=transform
)

test_dataset = datasets.MNIST(
    root='data', train=False, download=True, transform=transform
)

print(f"Training data: {train_dataset}\n")
print(f"Test data: {test_dataset}")

Training data: Dataset MNIST
    Number of datapoints: 60000
    Root location: data
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
           )

Test data: Dataset MNIST
    Number of datapoints: 10000
    Root location: data
    Split: Test
    StandardTransform
Transform: Compose(
               ToTensor()
           )


# Data Loaders

To make loading and working with the data easier, we are going to use `DataLoader` from `torch.utils.data`. The `DataLoader` takes in a dataset and a `batch_size` parameter, and allows us to iterate over the dataset. Here we do one iteration just to see the data shapes:

In [4]:
train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=256, shuffle=False)

## 3. Define the MLP Model

This is a standard fully connected network: 784 → hidden → hidden → 10.
We do not apply softmax inside the model because CrossEntropyLoss expects raw logits.

In [5]:
class MLP(nn.Module):
    def __init__(self, input_dim=28*28, hidden1=256, hidden2=128, num_classes=10):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden1),
            nn.ReLU(),
            nn.Linear(hidden1, hidden2),
            nn.ReLU(),
            nn.Linear(hidden2, num_classes)
        )

    def forward(self, x):
        x = x.view(x.size(0), -1)
        return self.net(x)

model = MLP().to(device)
model

MLP(
  (net): Sequential(
    (0): Linear(in_features=784, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=128, bias=True)
    (3): ReLU()
    (4): Linear(in_features=128, out_features=10, bias=True)
  )
)

## 4. Loss Function and Optimizer

In [6]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

## 5. Training and Evaluation Functions

In [7]:
def train_one_epoch(model, loader, criterion, optimizer, device):
    model.train()
    total_loss, correct, total = 0.0, 0, 0
    for x, y in loader:
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        logits = model(x)
        loss = criterion(logits, y)
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * x.size(0)
        correct += (logits.argmax(1) == y).sum().item()
        total += y.size(0)
    return total_loss / total, correct / total

@torch.no_grad()
def evaluate(model, loader, criterion, device):
    model.eval()
    total_loss, correct, total = 0.0, 0, 0
    for x, y in loader:
        x, y = x.to(device), y.to(device)
        logits = model(x)
        loss = criterion(logits, y)
        total_loss += loss.item() * x.size(0)
        correct += (logits.argmax(1) == y).sum().item()
        total += y.size(0)
    return total_loss / total, correct / total

## 6. Train the Model

In [8]:
epochs = 5
for epoch in range(1, epochs + 1):
    train_loss, train_acc = train_one_epoch(model, train_loader, criterion, optimizer, device)
    test_loss, test_acc = evaluate(model, test_loader, criterion, device)
    print(f'Epoch {epoch:02d} | Train Acc: {train_acc:.4f} | Test Acc: {test_acc:.4f}')

Epoch 01 | Train Acc: 0.9035 | Test Acc: 0.9480


Epoch 02 | Train Acc: 0.9609 | Test Acc: 0.9661


Epoch 03 | Train Acc: 0.9737 | Test Acc: 0.9742


Epoch 04 | Train Acc: 0.9803 | Test Acc: 0.9758


Epoch 05 | Train Acc: 0.9847 | Test Acc: 0.9761
