# Multi-Layer Perceptron (MLP) from Scratch

In this notebook, we will build a simple Multi-Layer Perceptron (MLP) to solve the XOR problem. We will do this in two ways:
1.  **From Scratch (using NumPy)**: To understand the underlying mathematics (forward pass, backpropagation, gradient descent).
2.  **Using TensorFlow/Keras**: To see how it's done in a modern deep learning framework.

In [None]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

## 1. The XOR Dataset

The XOR function is a classic problem that a single linear layer (perceptron) cannot solve. It requires a hidden layer to capture the non-linearity.

| Input 1 | Input 2 | Output |
| :---: | :---: | :---: |
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |

In [None]:
# XOR dataset
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

print("X shape:", X.shape)
print("y shape:", y.shape)

## 2. MLP from Scratch (NumPy)

We will build a simple network with:
-   **Input Layer**: 2 neurons (for the two inputs)
-   **Hidden Layer**: 2 neurons (sufficient for XOR)
-   **Output Layer**: 1 neuron (binary classification)
-   **Activation**: Sigmoid

In [None]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

In [None]:
# Initialization
input_size = 2
hidden_size = 2
output_size = 1
learning_rate = 0.1
epochs = 10000

# Weights and Biases
# W1: weights between input and hidden layer
# b1: biases for hidden layer
# W2: weights between hidden and output layer
# b2: biases for output layer

np.random.seed(42)
W1 = np.random.uniform(size=(input_size, hidden_size))
b1 = np.random.uniform(size=(1, hidden_size))
W2 = np.random.uniform(size=(hidden_size, output_size))
b2 = np.random.uniform(size=(1, output_size))

print("W1 shape:", W1.shape)
print("W2 shape:", W2.shape)

In [None]:
# Training Loop
losses = []

for epoch in range(epochs):
    # --- Forward Pass ---
    # Layer 1 (Hidden)
    hidden_input = np.dot(X, W1) + b1
    hidden_output = sigmoid(hidden_input)
    
    # Layer 2 (Output)
    final_input = np.dot(hidden_output, W2) + b2
    final_output = sigmoid(final_input)
    
    # --- Loss (MSE) ---
    error = y - final_output
    loss = np.mean(np.square(error))
    losses.append(loss)
    
    # --- Backpropagation ---
    # Calculate gradients
    # d_loss/d_output * d_output/d_input
    d_output = error * sigmoid_derivative(final_output)
    
    # Error at hidden layer
    error_hidden_layer = d_output.dot(W2.T)
    d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_output)
    
    # --- Update Weights (Gradient Descent) ---
    W2 += hidden_output.T.dot(d_output) * learning_rate
    b2 += np.sum(d_output, axis=0, keepdims=True) * learning_rate
    W1 += X.T.dot(d_hidden_layer) * learning_rate
    b1 += np.sum(d_hidden_layer, axis=0, keepdims=True) * learning_rate
    
    if epoch % 1000 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")

print(f"Final Loss: {loss:.4f}")

In [None]:
# Testing the NumPy Model
print("Predictions:")
print(final_output)
print("\nRounded Predictions:")
print(np.round(final_output))

In [None]:
plt.plot(losses)
plt.title('Training Loss (NumPy)')
plt.xlabel('Epochs')
plt.ylabel('MSE Loss')
plt.show()

## 3. MLP using TensorFlow/Keras

Now let's do the exact same thing using TensorFlow. Notice how much simpler the code is.

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Dense(2, input_dim=2, activation='sigmoid'), # Hidden layer
    tf.keras.layers.Dense(1, activation='sigmoid')               # Output layer
])

model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=0.1),
              loss='mean_squared_error',
              metrics=['accuracy'])

history = model.fit(X, y, epochs=10000, verbose=0)

print("Final Loss:", history.history['loss'][-1])
print("Final Accuracy:", history.history['accuracy'][-1])

In [None]:
# Testing the TF Model
print(model.predict(X))