# Multilayer Perceptron (MLP)

A Multilayer Perceptron (MLP) is a neural network made of multiple perceptron layers stacked on top of each other, with a nonlinear activation function applied after each layer. By combining linear transformations with nonlinear activations, MLPs can model complex, nonlinear relationships between inputs and outputs that a single-layer perceptron cannot.

In [64]:
import numpy as np

# Activation function
def relu(x):
    return np.maximum(0, x)

# Sample input shape (2 samples, 3 features)
x = np.array([[1.0, 2.0, -1.0],
              [0.0, -1.0, 2.0]])

# Weights and biases
# W1: (3, 4), b1: (1, 4)
# W2: (4, 1), b2: (1, 1)
W1 = np.array([[1, 1, -1, 0],
               [1, 1, -1, 0],
               [1, 1, -1, 0]])  # input -> hidden (3 -> 4)
b1 = np.array([[-2, 2, 1, -1]])

W2 = np.array([[1],
               [-1],
               [-1],
               [1]])  # hidden -> output (4 -> 1)
b2 = np.zeros((1, 1))

# Forward pass
hidden = relu(x @ W1 + b1) # apply ReLU activation between layers
output = hidden @ W2 + b2

print(output)

[[-4.]
 [-3.]]


This can be done via PyTorch like follows:

In [68]:
import torch
import torch.nn as nn

# Define the MLP
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(3, 4)  # input -> hidden
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(4, 1)  # hidden -> output

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Same input tensor
x = torch.tensor([
    [1.0,  2.0, -1.0],
    [0.0, -1.0,  2.0]
])

# Run model
model = MLP()
output = model(x)

print(output)


tensor([[0.5160],
        [0.3791]], grad_fn=<AddmmBackward0>)


An added benefit of using differentiable activation functions is that they allow gradients to be computed, making it possible to learn the weights using gradient-based optimization methods like backpropagation.