# 🔹 Phase 2: Multilayer Perceptron (MLP)

**Concepts to Cover**

- **Multilayer Perceptron (MLP)** – Extending from a single neuron to multiple layers.
- **Hidden Layers** – Why deep networks are more powerful.
- **Backpropagation** – How gradients flow through layers to update weights.
- **Activation Functions** – Sigmoid, ReLU, and their impact.
- **Training a Neural Network** – Using forward propagation and gradient descent.

# Exercise 2: Implement a 3-Layer Neural Network (Only Using NumPy)

**🔹 Task**

- Implement a **3-layer neural network** (Input → Hidden → Output) in **pure NumPy**.
- Train it to learn the **XOR function** (which a single perceptron cannot solve!).
- Use the **sigmoid activation** for the hidden and output layers.
- Implement **forward propagation** and **backpropagation** to train it.

In [1]:
import numpy as np

# Sigmoid activation function (used to introduce non-linearity)
def sigmoid(x):
    return 1 / (1 + np.exp(-x))  # Computes sigmoid activation

# Derivative of the sigmoid function (used for backpropagation)
def sigmoid_derivative(x):
    return x * (1 - x)  # Computes the derivative of sigmoid

# XOR training data (XOR is a non-linearly separable problem)
X = np.array([[0, 0],  # Input: (0,0)
              [0, 1],  # Input: (0,1)
              [1, 0],  # Input: (1,0)
              [1, 1]]) # Input: (1,1)

y = np.array([[0],  # Expected XOR output for (0,0)
              [1],  # Expected XOR output for (0,1)
              [1],  # Expected XOR output for (1,0)
              [0]]) # Expected XOR output for (1,1)

# Initialize neural network parameters
np.random.seed(42)  # Ensures reproducibility
input_size = 2  # Number of input features
hidden_size = 4  # Number of neurons in the hidden layer
output_size = 1  # Number of neurons in the output layer

# Randomly initialize weights and biases
W1 = np.random.randn(input_size, hidden_size)  # Weights from Input Layer → Hidden Layer
b1 = np.random.randn(hidden_size)  # Bias for Hidden Layer
W2 = np.random.randn(hidden_size, output_size)  # Weights from Hidden Layer → Output Layer
b2 = np.random.randn(output_size)  # Bias for Output Layer

# Training hyperparameters
lr = 0.1  # Learning rate (controls step size for weight updates)
epochs = 10000  # Number of training iterations

# Training loop
for epoch in range(epochs):
    # Forward Propagation (Feedforward Pass)
    Z1 = np.dot(X, W1) + b1  # Compute weighted sum for the hidden layer
    A1 = sigmoid(Z1)  # Apply activation function (sigmoid) in hidden layer

    Z2 = np.dot(A1, W2) + b2  # Compute weighted sum for the output layer
    A2 = sigmoid(Z2)  # Apply activation function (sigmoid) in output layer

    # Compute Loss (Binary Cross-Entropy)
    loss = -np.mean(y * np.log(A2) + (1 - y) * np.log(1 - A2))  # Loss function

    # Backpropagation (Error Propagation)
    error_output = A2 - y  # Compute error at the output layer
    d_output = error_output * sigmoid_derivative(A2)  # Compute gradient for output layer

    error_hidden = np.dot(d_output, W2.T)  # Compute error at the hidden layer
    d_hidden = error_hidden * sigmoid_derivative(A1)  # Compute gradient for hidden layer

    # Update Weights and Biases using Gradient Descent
    W2 -= lr * np.dot(A1.T, d_output)  # Adjust weights for Hidden → Output
    b2 -= lr * np.sum(d_output, axis=0)  # Adjust bias for Output Layer

    W1 -= lr * np.dot(X.T, d_hidden)  # Adjust weights for Input → Hidden
    b1 -= lr * np.sum(d_hidden, axis=0)  # Adjust bias for Hidden Layer

    # Print loss every 1000 epochs for tracking progress
    if epoch % 1000 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")

# Final Predictions after training
print("\nFinal Predictions:")
for i, x in enumerate(X):
    hidden_layer = sigmoid(np.dot(x, W1) + b1)  # Forward pass for hidden layer
    output_layer = sigmoid(np.dot(hidden_layer, W2) + b2)  # Forward pass for output layer
    print(f"Input: {x}, Predicted Output: {output_layer[0]:.4f}")  # Display results


Epoch 0, Loss: 1.7953
Epoch 1000, Loss: 0.6236
Epoch 2000, Loss: 0.3028
Epoch 3000, Loss: 0.1412
Epoch 4000, Loss: 0.0949
Epoch 5000, Loss: 0.0740
Epoch 6000, Loss: 0.0620
Epoch 7000, Loss: 0.0540
Epoch 8000, Loss: 0.0483
Epoch 9000, Loss: 0.0440

Final Predictions:
Input: [0 0], Predicted Output: 0.0335
Input: [0 1], Predicted Output: 0.9593
Input: [1 0], Predicted Output: 0.9607
Input: [1 1], Predicted Output: 0.0456


# 📌 Significance of This Exercise

- **From Single Neuron to Multi-Layer Perceptron (MLP)** – You now understand why adding hidden layers makes neural networks more powerful.
- **Activation Functions Matter** – The sigmoid function enables non-linearity, allowing the network to solve XOR.
- **Backpropagation** – Learning happens by updating weights using gradient descent.
- **Why Deep Learning Works** – This is the foundation of all deep networks.