<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Using-Functional-API" data-toc-modified-id="Using-Functional-API-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Using Functional API</a></span><ul class="toc-item"><li><span><a href="#Loss-function" data-toc-modified-id="Loss-function-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Loss function</a></span></li><li><span><a href="#Optimizer" data-toc-modified-id="Optimizer-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Optimizer</a></span></li><li><span><a href="#Training" data-toc-modified-id="Training-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Training</a></span></li></ul></li><li><span><a href="#Using-Sequential-API" data-toc-modified-id="Using-Sequential-API-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Using Sequential API</a></span></li></ul></div>

# Image Classification using MLP in PyTorch 
> In this post, we will implement a MLP in Pytorch using both Functional API and Sequential API to classify MNIST digits.

- toc: true 
- badges: true
- comments: true
- categories: [jupyter]

Let's first import the standard libraries

In [None]:
import torch
from torch import nn, optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import numpy as np




# Load Data

We will use the MNIST dataset which is a handwritten digits dataset. It contains 60000 training and 10000 testing grayscale 28x28 images from 10 classes (0-9 digits):

![](https://www.learnopencv.com/wp-content/uploads/2020/01/c3_w2_Mnist.png "Credit: learnopencv.com")

Before defining the dataloader we need to think of batch size. We set training batch size to 64. When you are using a GPU, the maximum batch size is dictated by the memory on the GPU. We’ll use a batch size for the validation set that is twice as large as that for the training set. This is because the validation set does not need backpropagation and thus takes less memory (it doesn’t need to store the gradients). We take advantage of this to use a larger batch size and compute the loss more quickly.

In [22]:
# Training set
train_dataset = datasets.MNIST('./data', 
                               train=True, 
                               download=True, 
                               transform=transforms.ToTensor())

# Validation dataset
validation_dataset = datasets.MNIST('./data', 
                                    train=False, 
                                    transform=transforms.ToTensor())

# Batch size : How many images are used to calculate the gradient
batch_size = 64

# Train DataLoader 
train_loader = DataLoader(dataset=train_dataset, 
                          batch_size=batch_size, 
                          shuffle=True)
# Validation DataLoader 
val_loader = DataLoader(dataset=validation_dataset, 
                               batch_size=2*batch_size, 
                               shuffle=False)

# Create the MLP

Here we define the multi layer perceptron. It has 2 hidden layers with 512 units. Also note that the input layer has 28x28 nodes which is the size of the flattened data. Given below is the schematic diagram of the network.


![](https://www.learnopencv.com/wp-content/uploads/2017/10/mlp-mnist-schematic.jpg "Credit: learnopencv.com")

## Using Functional API

In [23]:
class functionalMLP(nn.Module):
    def __init__(self, n_input, n_hid1, n_hid2, n_output):
        super().__init__()
        self.hidden1 = nn.Linear(n_input, n_hid1)
        self.hidden2 = nn.Linear(n_hid1, n_hid2)
        self.output = nn.Linear(n_hid2, n_output)
        self.relu = nn.ReLU()

    def forward(self, x):
        # x shape is [batch_size, 1, 28, 28]
        batch_size = x.shape[0]
        x = x.view(batch_size, -1)
        # x shape is [batch_size, 28*28]
        x = self.relu(self.hidden1(x))
        x = self.relu(self.hidden2(x))
        x = self.output(x)
        return x

In [24]:
model_functional = functionalMLP(n_input=28*28, n_hid1=512, n_hid2=512, n_output=10)

### Loss function

Since this is classification problem we can use the Cross Entropy Loss available in PyTorch as [`nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html)

In [25]:
loss_func = nn.CrossEntropyLoss()

### Optimizer

We define the optimizer as SGD using learning rate of 1e-2

In [26]:
def get_opt(model, lr=0.001):
    opt = optim.SGD(model.parameters(), lr=lr)
    return opt

In [27]:
opt = get_opt(model_functional)

### Training

We will define a loss_batch function that encapsulates the backpropogation process for training and loss calculation for validation. This function will also calculate accuracy for us.

In [28]:
def loss_batch(logits, y_batch, loss_func, opt=None):
    # calculate loss
    loss = loss_func(logits, y_batch)
    # calculate accuracy
    preds = logits.argmax(dim=1)
    acc = (preds == y_batch).sum()

    # only if we are training then perform the weights update
    if opt is not None:
        loss.backward()
        opt.step()
        opt.zero_grad()
    
    return loss.item(), acc.item()

Next we define a fit function that can handle both training and validation

In [29]:
def fit(epochs, train_loader, val_loader, model, loss_func, opt):
    for ep in range(epochs):
        # Training
        train_losses = []
        train_n_correct = 0
        model.train()
        for x_batch, y_batch in train_loader:
            logits = model(x_batch)
            loss, acc = loss_batch(logits, y_batch, loss_func, opt)
            train_losses.append(loss)
            train_n_correct += acc
        train_loss = np.sum(train_losses)/len(train_loader)
        train_acc = train_n_correct/len(train_loader.dataset)
        
        # Validation
        model.eval()
        val_losses = []
        val_n_correct = 0
        with torch.no_grad():
            for x_batch, y_batch in val_loader:
                logits = model(x_batch)
                loss, acc = loss_batch(logits, y_batch, loss_func)
                val_losses.append(loss)
                val_n_correct += acc
            val_loss = np.sum(val_losses)/len(val_loader)
            val_acc = val_n_correct/len(val_loader.dataset)

        print(f"Epoch {ep}:, TrainLoss: {train_loss:.2f}, TrainAcc: {train_acc:.2f}, ValLoss: {val_loss:.2f}, ValAcc: {val_acc:.2f} ")

Now we are ready to begin our model training! We will train for 5 epochs:

In [30]:
epochs=5
fit(epochs, train_loader, val_loader, model_functional, loss_func, opt)

Epoch 0:, TrainLoss: 2.28, TrainAcc: 0.32, ValLoss: 2.25, ValAcc: 0.55 
Epoch 1:, TrainLoss: 2.22, TrainAcc: 0.63, ValLoss: 2.18, ValAcc: 0.68 
Epoch 2:, TrainLoss: 2.14, TrainAcc: 0.69, ValLoss: 2.07, ValAcc: 0.71 
Epoch 3:, TrainLoss: 1.98, TrainAcc: 0.71, ValLoss: 1.86, ValAcc: 0.73 
Epoch 4:, TrainLoss: 1.73, TrainAcc: 0.73, ValLoss: 1.55, ValAcc: 0.75 


## Using Sequential API

nn.Sequential is a handy class we can use to simplify our code. A Sequential object runs each of the modules contained within it, in a sequential manner. This is a simpler way of writing our MLP.

To take advantage of this, we need to be able to easily define a custom layer from a given function. For instance, PyTorch doesn’t have a view layer, and we need to create one for our network. Lambda will create a layer that we can then use when defining a network with Sequential.

In [31]:
class Lambda(nn.Module):
    def __init__(self, func):
        super().__init__()
        self.func = func

    def forward(self, x):
        return self.func(x)

def preprocess(x):
    return x.view(x.shape[0], -1)

Now we create the model using nn.Sequential

In [32]:
model_sequential = nn.Sequential(
    Lambda(preprocess),
    nn.Linear(784, 512),
    nn.ReLU(),
    nn.Linear(512, 512),
    nn.ReLU(),
    nn.Linear(512, 10))

And that's it! This model defined using nn.Sequential is equivalent to the one we defined using Functional API. We re-use the fit and loss_batch functions defined above to train the model:

In [33]:
epochs=5
opt = get_opt(model_sequential)
fit(epochs, train_loader, val_loader, model_sequential, loss_func, opt)

Epoch 0:, TrainLoss: 2.28, TrainAcc: 0.25, ValLoss: 2.26, ValAcc: 0.39 
Epoch 1:, TrainLoss: 2.23, TrainAcc: 0.47, ValLoss: 2.20, ValAcc: 0.56 
Epoch 2:, TrainLoss: 2.16, TrainAcc: 0.58, ValLoss: 2.10, ValAcc: 0.63 
Epoch 3:, TrainLoss: 2.03, TrainAcc: 0.64, ValLoss: 1.93, ValAcc: 0.69 
Epoch 4:, TrainLoss: 1.82, TrainAcc: 0.69, ValLoss: 1.66, ValAcc: 0.72 


**Hope you enjoyed reading! Please leave questions or feedback in the comments!**