# 🔥 Exercise 2: Implement a Simple 3-Layer Neural Network

💡 **Goal:** Implement a neural network with

- ✔️ **3 layers**: Input → Hidden → Output
- ✔️ **Activation function**: Sigmoid
- ✔️ **Backpropagation**: Train using gradient descent

In [3]:
import numpy as np

# Step 1: Define the Sigmoid activation function
def sigmoid(x):
    """
    Computes the sigmoid activation function.

    Parameters:
    x (float or ndarray): Input value(s).

    Returns:
    float or ndarray: Sigmoid activation result.
    """
    return 1 / (1 + np.exp(-x))

# Step 2: Define the derivative of the Sigmoid function
def sigmoid_derivative(x):
    """
    Computes the derivative of the sigmoid function.
    
    Parameters:
    x (float or ndarray): Input value(s), assuming it's already a sigmoid output.

    Returns:
    float or ndarray: Derivative of the sigmoid function.
    """
    return x * (1 - x)  # Sigmoid' = sigmoid(x) * (1 - sigmoid(x))

# Step 3: Define Training Data (inputs and expected outputs)
X = np.array([[-2], [-1], [0], [1], [2]])  # Input values (features)
y = np.array([[0], [0], [1], [1], [1]])    # Expected outputs (labels)

# Step 4: Define the Neural Network Architecture
input_size = 1   # Number of input neurons
hidden_size = 2  # Number of neurons in the hidden layer
output_size = 1  # Number of output neurons

# Step 5: Initialize Weights and Biases
np.random.seed(42)  # Set seed for reproducibility
W1 = np.random.randn(input_size, hidden_size)  # Weights from input to hidden layer
b1 = np.random.randn(hidden_size)              # Biases for hidden layer
W2 = np.random.randn(hidden_size, output_size) # Weights from hidden to output layer
b2 = np.random.randn(output_size)              # Bias for output layer

# Step 6: Define Hyperparameters
learning_rate = 0.1  # Learning rate for weight updates
epochs = 1000        # Number of iterations for training

# Step 7: Training Loop
for epoch in range(epochs):
    # Forward Pass
    Z1 = np.dot(X, W1) + b1  # Linear transformation: Input → Hidden layer
    A1 = sigmoid(Z1)         # Apply activation function to hidden layer
    Z2 = np.dot(A1, W2) + b2 # Linear transformation: Hidden → Output layer
    A2 = sigmoid(Z2)         # Apply activation function to output layer (final prediction)

    # Compute Loss (Mean Squared Error)
    loss = np.mean((A2 - y) ** 2)  # Calculate the average squared error

    # Backpropagation - Compute Gradients
    dL_dA2 = 2 * (A2 - y)  # Derivative of loss w.r.t A2 (output)
    dA2_dZ2 = sigmoid_derivative(A2)  # Derivative of sigmoid output w.r.t Z2
    dZ2_dW2 = A1  # Gradient of Z2 w.r.t W2 (Hidden activations)
    dZ2_dB2 = 1   # Gradient of Z2 w.r.t b2 (Bias derivative)

    # Compute gradients for output layer parameters
    dL_dW2 = np.dot(dZ2_dW2.T, dL_dA2 * dA2_dZ2)  # Weight gradient for W2
    dL_dB2 = np.sum(dL_dA2 * dA2_dZ2, axis=0)     # Bias gradient for b2

    # Compute gradients for hidden layer parameters
    dZ2_dA1 = W2  # Gradient of Z2 w.r.t A1
    dA1_dZ1 = sigmoid_derivative(A1)  # Gradient of hidden activation w.r.t Z1
    dL_dW1 = np.dot(X.T, np.dot(dL_dA2 * dA2_dZ2, dZ2_dA1.T) * dA1_dZ1)  # Weight gradient for W1
    dL_dB1 = np.sum(np.dot(dL_dA2 * dA2_dZ2, dZ2_dA1.T) * dA1_dZ1, axis=0)  # Bias gradient for b1

    # Update weights and biases using gradient descent
    W2 -= learning_rate * dL_dW2  # Update output layer weights
    b2 -= learning_rate * dL_dB2  # Update output layer bias
    W1 -= learning_rate * dL_dW1  # Update hidden layer weights
    b1 -= learning_rate * dL_dB1  # Update hidden layer bias

    # Print loss every 100 epochs
    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")

# Step 8: Final Output Predictions
print("\nFinal Predictions:")
output = sigmoid(np.dot(sigmoid(np.dot(X, W1) + b1), W2) + b2)  # Compute final outputs
print(output)


Epoch 0, Loss: 0.2751
Epoch 100, Loss: 0.1410
Epoch 200, Loss: 0.0483
Epoch 300, Loss: 0.0210
Epoch 400, Loss: 0.0120
Epoch 500, Loss: 0.0080
Epoch 600, Loss: 0.0059
Epoch 700, Loss: 0.0046
Epoch 800, Loss: 0.0037
Epoch 900, Loss: 0.0031

Final Predictions:
[[0.02766161]
 [0.07528557]
 [0.92075152]
 [0.98371912]
 [0.98572941]]


## 📌 Analysis of Your Output

✅ Loss Reduction:

- Epoch 0: 0.2751
- Epoch 900: 0.0031 (almost zero, meaning the network is highly accurate) ✅ Final Predictions:
- For input -2 → 0.0276 (almost 0, ✅ correct)
- For input -1 → 0.0752 (closer to 0, ✅ correct)
- For input 0 → 0.9207 (closer to 1, ✅ correct)
- For input 1 → 0.9837 (almost 1, ✅ correct)
- For input 2 → 0.9857 (almost 1, ✅ correct)

🚀 Your network is successfully classifying the inputs!