Multi Layer Perceptron for XOR Gate Prediction

Simulating the Xor Gate prediction using binary values as mentioned in X.
X with 4 Samples and 2 Features.

1. No of Input layers = 1 layer with 2 Neuron (Features)
2. Hidden layer = 1 layer with 2 Neuron
3. Output layer = 1 layer with 1 Neuron

Multiple Layer Perceptron for XOR gate Prediciton

X is the input with 4x2 matrix which resembles 4 samples and 2 input features

Y is the true output

We use 2 hidden layer each with 4 neuron and 2 Bias parameter

Simulating the Xor Gate prediction using binary values as mentioned in X. X with 4 Samples and 2 Features.

No of Input layers = 1 layer with 2 Neuron (Features)
Hidden layer = 1 layer with 2 Neuron
Output layer = 1 layer with 1 Neuron
Gradient Calculation

Loss Gradient = dLoss/daout using cross-entropy derivative weight_gradients = (np.dot(prev_layer.T, dz)) Bias Gradients = Sum of dz's

dz = dLoss_from_next_layer * activation_derivative(current_layer_output) dw = np.dot(previous_layer_output.T, dz)

In [15]:
import numpy as np
import matplotlib.pyplot as plt

X = np.array([[0,0], [1,0], [1,1], [0,1]])  # shape (4,2)
Y = np.array([[0], [1], [0], [1]])          # shape (4,1)

np.random.seed(42)  # reproducibility

W1 = np.random.randn(2, 4)
b1 = np.random.randn(1, 4)

W2 = np.random.randn(4, 4)
b2 = np.zeros((1, 4))

W3 = np.random.randn(4, 1)
b3 = np.zeros((1, 1))

def relu(x):
    return np.maximum(0, x)

def relu_derivative(x):
    return (x > 0).astype(float)

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

def forward_propagation(X, W1, b1, W2, b2, W3, b3):
    Z1 = np.dot(X, W1) + b1
    A1 = relu(Z1)

    Z2 = np.dot(A1, W2) + b2
    A2 = relu(Z2)

    Z3 = np.dot(A2, W3) + b3
    A3 = sigmoid(Z3)

    cache = (X, Z1, A1, Z2, A2, Z3, A3)
    return A3, cache

def compute_loss(Y, A3):
    m = Y.shape[0]
    loss = - (np.sum(Y * np.log(A3) + (1 - Y) * np.log(1 - A3))) / m
    return loss

def backward_propagation(Y, cache, W2, W3, loss):
    X, Z1, A1, Z2, A2, Z3, A3 = cache
    m = Y.shape[0]

    #dZ3 = loss * sigmoid_derivative(A3)
    dZ3 = A3 - Y
    dW3 = np.dot(A2.T, dZ3) / m
    db3 = np.sum(dZ3, axis=0, keepdims=True) / m

    dZ2 = np.dot(dZ3, W3.T) * relu_derivative(Z2)
    dW2 = np.dot(A1.T, dZ2) / m
    db2 = np.sum(dZ2, axis=0, keepdims=True) / m

    dZ1 = np.dot(dZ2, W2.T) * relu_derivative(Z1)
    dW1 = np.dot(X.T, dZ1) / m
    db1 = np.sum(dZ1, axis=0, keepdims=True) / m

    gradients = (dW1, db1, dW2, db2, dW3, db3)
    return gradients

def update_parameters(W1, b1, W2, b2, W3, b3, gradients, lr):
    dW1, db1, dW2, db2, dW3, db3 = gradients
    W1 -= lr * dW1
    b1 -= lr * db1
    W2 -= lr * dW2
    b2 -= lr * db2
    W3 -= lr * dW3
    b3 -= lr * db3
    return W1, b1, W2, b2, W3, b3

epochs = 10000
learning_rate = 0.1

for epoch in range(epochs):
    A3, cache = forward_propagation(X, W1, b1, W2, b2, W3, b3)
    loss = compute_loss(Y, A3)
    gradients = backward_propagation(Y, cache, W2, W3, loss)
    W1, b1, W2, b2, W3, b3 = update_parameters(W1, b1, W2, b2, W3, b3, gradients, learning_rate)

    if epoch % 1000 == 0:
        print(f"Epoch {epoch} Loss: {loss:.4f}")

Epoch 0 Loss: 0.7099
Epoch 1000 Loss: 0.0148
Epoch 2000 Loss: 0.0065
Epoch 3000 Loss: 0.0041
Epoch 4000 Loss: 0.0030
Epoch 5000 Loss: 0.0024
Epoch 6000 Loss: 0.0019
Epoch 7000 Loss: 0.0017
Epoch 8000 Loss: 0.0014
Epoch 9000 Loss: 0.0013
