# 🔥 Exercise 2: Implement a Simple 3-Layer Neural Network

💡 **Goal:** Implement a neural network with

- ✔️ **3 layers**: Input → Hidden → Output
- ✔️ **Activation function**: Sigmoid
- ✔️ **Backpropagation**: Train using gradient descent

In [2]:
import numpy as np

# Sigmoid activation function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Derivative of Sigmoid function
def sigmoid_derivative(x):
    return x * (1 - x)

# Training Data
X = np.array([[-2], [-1], [0], [1], [2]])  # Inputs
y = np.array([[0], [0], [1], [1], [1]])    # Labels

# Network Architecture: 1 input → 2 hidden → 1 output
input_size = 1
hidden_size = 2
output_size = 1

# Initialize weights and biases
np.random.seed(42)  # For reproducibility
W1 = np.random.randn(input_size, hidden_size)  # Input to hidden
b1 = np.random.randn(hidden_size)
W2 = np.random.randn(hidden_size, output_size)  # Hidden to output
b2 = np.random.randn(output_size)

learning_rate = 0.1
epochs = 1000

# Training Loop
for epoch in range(epochs):
    # Forward Pass
    Z1 = np.dot(X, W1) + b1  # Input to Hidden
    A1 = sigmoid(Z1)         # Activation function
    Z2 = np.dot(A1, W2) + b2 # Hidden to Output
    A2 = sigmoid(Z2)         # Final Prediction

    # Compute Loss (Mean Squared Error)
    loss = np.mean((A2 - y) ** 2)

    # Backpropagation
    dL_dA2 = 2 * (A2 - y)  # Loss derivative
    dA2_dZ2 = sigmoid_derivative(A2)
    dZ2_dW2 = A1
    dZ2_dB2 = 1

    dL_dW2 = np.dot(dZ2_dW2.T, dL_dA2 * dA2_dZ2)
    dL_dB2 = np.sum(dL_dA2 * dA2_dZ2, axis=0)

    dZ2_dA1 = W2
    dA1_dZ1 = sigmoid_derivative(A1)
    dL_dW1 = np.dot(X.T, np.dot(dL_dA2 * dA2_dZ2, dZ2_dA1.T) * dA1_dZ1)
    dL_dB1 = np.sum(np.dot(dL_dA2 * dA2_dZ2, dZ2_dA1.T) * dA1_dZ1, axis=0)

    # Update weights and biases
    W2 -= learning_rate * dL_dW2
    b2 -= learning_rate * dL_dB2
    W1 -= learning_rate * dL_dW1
    b1 -= learning_rate * dL_dB1

    # Print loss every 100 epochs
    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")

# Final Output
print("\nFinal Predictions:")
output = sigmoid(np.dot(sigmoid(np.dot(X, W1) + b1), W2) + b2)
print(output)


Epoch 0, Loss: 0.2751
Epoch 100, Loss: 0.1410
Epoch 200, Loss: 0.0483
Epoch 300, Loss: 0.0210
Epoch 400, Loss: 0.0120
Epoch 500, Loss: 0.0080
Epoch 600, Loss: 0.0059
Epoch 700, Loss: 0.0046
Epoch 800, Loss: 0.0037
Epoch 900, Loss: 0.0031

Final Predictions:
[[0.02766161]
 [0.07528557]
 [0.92075152]
 [0.98371912]
 [0.98572941]]


## 📌 Analysis of Your Output

✅ Loss Reduction:

- Epoch 0: 0.2751
- Epoch 900: 0.0031 (almost zero, meaning the network is highly accurate) ✅ Final Predictions:
- For input -2 → 0.0276 (almost 0, ✅ correct)
- For input -1 → 0.0752 (closer to 0, ✅ correct)
- For input 0 → 0.9207 (closer to 1, ✅ correct)
- For input 1 → 0.9837 (almost 1, ✅ correct)
- For input 2 → 0.9857 (almost 1, ✅ correct)

🚀 Your network is successfully classifying the inputs!