# The Multilayer Perceptron

In deep learning, one of the fundamental architectures is the multilayer perceptron (MLP), serving as a cornerstone for more complex neural networks.

At its core, a MLP consists of multiple layers of neurons, each layer connected to the next in a feedforward manner. These layers typically include an input layer, one or more hidden layers, and an output layer. The magic lies in the interconnectedness of these layers and the activation functions applied at each step.

The following image shows a schematic representation of an MLP for classification tasks where models typically have a softmax activation function in the output layer, which outputs probabilities for each class
<div style="text-align:center;">
    <img src="imgs/mlp-regression.svg" alt="MLP for regression" width="600">
</div>

In [None]:
import torch
import torch.nn as nn

In [None]:
# Batch of inputs
x = torch.randn((8, 784))

In [None]:
# An example of MLP for a regression problem
class MLPRegression(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 128)   # same size of the input vector
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(128, 64)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(64, 1)

    def forward(self, x):
        x = self.fc1(x)          # Apply Fully Connected layer 1   (batch_size, 784)    -> (batch_size, 128)
        x = self.relu1(x)        # Apply ReLU activation Function  (batch_size, 784)
        x = self.fc2(x)          # Apply Fully Connected layer 2   (batch_size, 128)    -> (batch_size, 64)
        x = self.relu2(x)        # Apply ReLU activation Function  (batch_size, 64)
        x = self.fc3(x)          # Apply Fully Connected layer 3   (batch_size, 64)     -> (batch_size, 10)
        return x

In [None]:
model = MLPRegression()
model

In [None]:
model(x)

### Operations in the model

In [None]:
# Apply Fully Connected layer 1   (batch_size, 784)   -> (batch_size, 128)
lin = nn.Linear(784, 128)

In [None]:
lin(x).shape

Internally linear layers have two sets of parameters: the weights matrix $W$ and the biasses $\vec{b}$. Applying a linear layer means doing a linear transformation $\vec{x}^* = W\vec{x} + \vec{b}$.

We can access the internals of the layer as follows:

In [None]:
lin.weight.shape

In [None]:
lin.bias.shape

In [None]:
# the linear transformation `x^* = Wx + b` by hand
# for one sample in the batch: x_flat[0]
#
x_star = x[0] @ lin.weight.T + lin.bias

x_star.shape

In [None]:
# comparing with the linear layer
torch.allclose(lin(x[0]), x_star) 