# Introduction

This script demonstrates a simple worked example of **backpropagation** in a neural network with:

- 1 input neuron
- 1 hidden layer with 2 neurons (sigmoid activation)
- 1 output neuron (sigmoid)
- Mean Squared Error (MSE) loss

We manually compute gradients using the chain rule.


In [1]:
import numpy as np
from scipy.special import expit  # Numerically stable sigmoid function

# Step 1: Initialize Parameters

We define:

- Input value: $x = 1$
- True label: $y_{\text{true}} = 0$
- Weights:
  - $w_1$, $w_2$: input to hidden layer
  - $w_3$, $w_4$: hidden to output layer

All weights are initialized manually.

In [None]:
# Input and true label
x = np.array([[1.0]])          # Shape (1, 1)
y_true = np.array([[0.0]])     # Shape (1, 1)

# Initialize weights
w1 = np.array([[0.5, -0.5]])   # @ is the dot product: Input to hidden: shape (1, 2)
w2 = np.array([[0.3], [-0.3]]) # Hidden to output: shape (2, 1)

In [4]:
# Activation functions
def sigmoid(x):
    return expit(x)
# expit(x) = 1 / (1 + exp(-x))

def sigmoid_derivative(x):
    s = sigmoid(x)
    return s * (1 - s)

# Step 2: Forward Pass

We compute the outputs of the network layer by layer:

1. Compute hidden layer pre-activations:
   $z_1 = w_1 \cdot x$, $z_2 = w_2 \cdot x$
2. Apply sigmoid activation:
   $h_1 = \sigma(z_1)$, $h_2 = \sigma(z_2)$
3. Compute output pre-activation:
   $z_3 = w_3 \cdot h_1 + w_4 \cdot h_2$
4. Apply sigmoid again:
   $y = \sigma(z_3)$
5. Compute MSE:
   $L = \frac{1}{2}(y - y_{\text{true}})^2$


In [None]:
# Forward pass
z1 = x @ w1              # Pre-activation: shape (1, 2)
a1 = sigmoid(z1)         # Hidden layer activations: shape (1, 2)
z2 = a1 @ w2             # Output layer pre-activation: shape (1, 1)
y_pred = sigmoid(z2)     # Output prediction

# Loss
loss = 0.5 * (y_pred - y_true) ** 2

# Step 3: Backward Pass

We apply the chain rule to compute gradients of the loss w.r.t. each weight:

- Output layer:
  $\frac{\partial L}{\partial w_3} = \frac{\partial L}{\partial y} \cdot \frac{\partial y}{\partial z_3} \cdot \frac{\partial z_3}{\partial w_3}$
- Hidden layer:
  $\frac{\partial L}{\partial w_1} = \frac{\partial L}{\partial y} \cdot \frac{\partial y}{\partial z_3} \cdot \frac{\partial z_3}{\partial h_1} \cdot \frac{\partial h_1}{\partial z_1} \cdot \frac{\partial z_1}{\partial w_1}$

We repeat the same for all weights.

In [7]:
# Backward pass
dL_dy = y_pred - y_true              # ∂L/∂y
dy_dz2 = sigmoid_derivative(z2)      # ∂y/∂z2

# Gradients for w2
dz2_dw2 = a1.T                       # ∂z2/∂w2
dL_dw2 = dz2_dw2 @ (dL_dy * dy_dz2)  # Shape (2, 1)

# Gradients for w1
dz2_da1 = w2.T                       # ∂z2/∂a1
da1_dz1 = sigmoid_derivative(z1)     # ∂a1/∂z1
dz1_dw1 = x.T                        # ∂z1/∂w1
dL_dw1 = dz1_dw1.T @ ((dL_dy @ dz2_da1) * da1_dz1)  # Shape (1, 2)

# Step 4: Output

We print:

- Loss value
- Gradient of the loss with respect to:
  - $w_1$, $w_2$ (input to hidden layer)
  - $w_3$, $w_4$ (hidden to output layer)

These gradients can be used in **gradient descent** to update the weights.

In [8]:
print("Loss:", loss.item())
print("Gradient for w1 (input to hidden):", dL_dw1.flatten())
print("Gradient for w2 (hidden to output):", dL_dw2.flatten())


Loss: 0.13434887664395717
Gradient for w1 (input to hidden): [ 0.036545 -0.036545]
Gradient for w2 (hidden to output): [0.08055583 0.04885958]
