# The Multilayer Perceptron

In deep learning, one of the fundamental architectures is the multilayer perceptron (MLP), serving as a cornerstone for more complex neural networks.

At its core, a MLP consists of multiple layers of neurons, each layer connected to the next in a feedforward manner. These layers typically include an input layer, one or more hidden layers, and an output layer. The magic lies in the interconnectedness of these layers and the activation functions applied at each step.

The following image shows a schematic representation of an MLP for regression tasks where models usually have a linear layer in the output layer, allowing it to predict continuous numeric values.
<div style="text-align:center;">
    <img src="imgs/mlp-classification.svg" alt="MLP for classification" width="600">
</div>

In [None]:
import torch
import torch.nn as nn

Let's consider a simple MLP model for a classification of `(28, 28)` images in 10 classes.

In [None]:
# Batch of synthetic images
x = torch.randn((8, 28, 28))   # (batch_size, 28, 28)

In [None]:
# An example of MLP for an image classification problem
class MLPClassification(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 128)   # (28 * 28, 128)  # same size of the input images
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(128, 64)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(64, 10)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = x.view(-1, 784)      # Flatten input images            (batch_size, 28, 28) -> (batch_size, 784)
        x = self.fc1(x)          # Apply Fully Connected layer 1   (batch_size, 784)    -> (batch_size, 128)
        x = self.relu1(x)        # Apply ReLU activation Function  (batch_size, 784)
        x = self.fc2(x)          # Apply Fully Connected layer 2   (batch_size, 128)    -> (batch_size, 64)
        x = self.relu2(x)        # Apply ReLU activation Function  (batch_size, 64)
        x = self.fc3(x)          # Apply Fully Connected layer 3   (batch_size, 64)     -> (batch_size, 10)
        x = self.softmax(x)      # Apply Softmax function          (batch_size, 10)
        return x

In [None]:
model = MLPClassification()
model

In [None]:
model(x)

### Operations in the model

In [None]:
# 1. 
# x = x.view(-1, 784)  # Flatten input images from (batch_size, 28, 28) to (batch_size, 784)
#
x_flat = x.view(-1, 784)
x_flat.shape

In [None]:
# 2.
# x = self.fc1(x)  # Apply Fully Connected layer 1   (batch_size, 784)   -> (batch_size, 128)
#
lin = nn.Linear(784, 128)

In [None]:
lin(x_flat).shape

Internally linear layers have two sets of parameters: the weights matrix $W$ and the biasses $\vec{b}$. Applying a linear layer means doing a linear transformation $\vec{x}^* = W\vec{x} + \vec{b}$.

We can access the internals of the layer as follows:

In [None]:
lin.weight.shape

In [None]:
lin.bias.shape

In [None]:
# the linear transformation `x^* = Wx + b` by hand
# for one sample in the batch: x_flat[0]
#
x_flat_star = x_flat[0] @ lin.weight.T + lin.bias

x_flat_star.shape

In [None]:
# comparing with the linear layer
torch.allclose(lin(x_flat[0]), x_flat_star) 