# Q6.2: Manual Backpropagation (Step-by-Step Calculation)

Work through forward pass, loss (MSE), gradients, and one update step for a small network with linear activation.

**Exam outputs:** intermediate values (z, a), gradients (dW/db), and updated weights showing loss reduction.

## Step 1: Define Network Architecture

### Simple 2-2-1 Network

In [1]:
import numpy as np

# Input
X = np.array([[0.5, 0.1]])

# Weights layer 1 (2 inputs → 2 hidden)
W1 = np.array([[0.15, 0.25],
               [0.20, 0.30]])
b1 = np.array([[0.35, 0.35]])

# Weights layer 2 (2 hidden → 1 output)
W2 = np.array([[0.40],
               [0.45]])
b2 = np.array([[0.60]])

# Target
y_true = np.array([[0.5]])

# Learning rate
lr = 0.5

print("Input X:", X)
print("\nWeights W1:\n", W1)
print("Bias b1:", b1)
print("\nWeights W2:\n", W2)
print("Bias b2:", b2)
print("\nTarget y:", y_true)

Input X: [[0.5 0.1]]

Weights W1:
 [[0.15 0.25]
 [0.2  0.3 ]]
Bias b1: [[0.35 0.35]]

Weights W2:
 [[0.4 ]
 [0.45]]
Bias b2: [[0.6]]

Target y: [[0.5]]


## Step 2: Forward Propagation (Linear Activation)

### Hidden Layer

In [None]:
# Hidden layer calculation
z1 = np.dot(X, W1) + b1
a1 = z1  # Linear activation (a = z)

print("Hidden layer z1:", z1)
print("Hidden layer a1 (linear):", a1)

Hidden layer z1: [[0.445 0.505]]
Hidden layer a1 (linear): [[0.445 0.505]]


### Output Layer

In [None]:
# Output layer calculation
z2 = np.dot(a1, W2) + b2
a2 = z2  # Linear activation

print("Output z2:", z2)
print("Output a2 (prediction):", a2)

Output z2: [[1.00525]]
Output a2 (prediction): [[1.00525]]


## Step 3: Calculate Loss (MSE)

In [None]:
# Mean Squared Error
loss = 0.5 * (y_true - a2) ** 2

print(f"Loss (MSE): {loss[0,0]:.6f}")
print(f"Error: {(y_true - a2)[0,0]:.6f}")

Loss (MSE): 0.127639
Error: -0.505250


## Step 4: Backward Propagation - Output Layer

### Calculate Gradients for W2 and b2

In [None]:
# Gradient of loss w.r.t output
dL_da2 = -(y_true - a2)

# For linear activation: da2/dz2 = 1
dL_dz2 = dL_da2 * 1

# Gradient for W2: dL/dW2 = a1^T · dL/dz2
dL_dW2 = np.dot(a1.T, dL_dz2)

# Gradient for b2: dL/db2 = dL/dz2
dL_db2 = dL_dz2

print("dL/da2:", dL_da2)
print("dL/dz2:", dL_dz2)
print("\ndL/dW2:\n", dL_dW2)
print("dL/db2:", dL_db2)

dL/da2: [[0.50525]]
dL/dz2: [[0.50525]]

dL/dW2:
 [[0.22483625]
 [0.25515125]]
dL/db2: [[0.50525]]


## Step 5: Backward Propagation - Hidden Layer

### Calculate Gradients for W1 and b1

In [None]:
# Backpropagate error to hidden layer
dL_da1 = np.dot(dL_dz2, W2.T)

# For linear activation: da1/dz1 = 1
dL_dz1 = dL_da1 * 1

# Gradient for W1: dL/dW1 = X^T · dL/dz1
dL_dW1 = np.dot(X.T, dL_dz1)

# Gradient for b1: dL/db1 = dL/dz1
dL_db1 = dL_dz1

print("dL/da1:", dL_da1)
print("dL/dz1:", dL_dz1)
print("\ndL/dW1:\n", dL_dW1)
print("dL/db1:", dL_db1)

dL/da1: [[0.2021    0.2273625]]
dL/dz1: [[0.2021    0.2273625]]

dL/dW1:
 [[0.10105    0.11368125]
 [0.02021    0.02273625]]
dL/db1: [[0.2021    0.2273625]]


## Step 6: Update Weights and Biases

In [None]:
print("BEFORE UPDATE:")
print(f"W1:\n{W1}")
print(f"b1: {b1}")
print(f"W2:\n{W2}")
print(f"b2: {b2}")

# Update rule: W = W - lr * dL/dW
W1_new = W1 - lr * dL_dW1
b1_new = b1 - lr * dL_db1
W2_new = W2 - lr * dL_dW2
b2_new = b2 - lr * dL_db2

print("\nAFTER UPDATE (lr=0.5):")
print(f"W1_new:\n{W1_new}")
print(f"b1_new: {b1_new}")
print(f"W2_new:\n{W2_new}")
print(f"b2_new: {b2_new}")

BEFORE UPDATE:
W1:
[[0.15 0.25]
 [0.2  0.3 ]]
b1: [[0.35 0.35]]
W2:
[[0.4 ]
 [0.45]]
b2: [[0.6]]

AFTER UPDATE (lr=0.5):
W1_new:
[[0.099475   0.19315937]
 [0.189895   0.28863187]]
b1_new: [[0.24895    0.23631875]]
W2_new:
[[0.28758188]
 [0.32242437]]
b2_new: [[0.347375]]


## Step 7: Forward Pass with Updated Weights

In [None]:
# Forward pass with new weights
z1_new = np.dot(X, W1_new) + b1_new
a1_new = z1_new

z2_new = np.dot(a1_new, W2_new) + b2_new
a2_new = z2_new

loss_new = 0.5 * (y_true - a2_new) ** 2

print(f"Old Prediction: {a2[0,0]:.6f}")
print(f"New Prediction: {a2_new[0,0]:.6f}")
print(f"Target: {y_true[0,0]}")
print(f"\nOld Loss: {loss[0,0]:.6f}")
print(f"New Loss: {loss_new[0,0]:.6f}")
print(f"Loss Reduction: {(loss[0,0] - loss_new[0,0]):.6f}")

Old Prediction: 1.005250
New Prediction: 0.555374
Target: 0.5

Old Loss: 0.127639
New Loss: 0.001533
Loss Reduction: 0.126106


## Step 8: Summary Table

In [None]:
import pandas as pd

summary = pd.DataFrame({
    'Parameter': ['W1[0,0]', 'W1[0,1]', 'W1[1,0]', 'W1[1,1]', 
                  'b1[0]', 'b1[1]', 'W2[0,0]', 'W2[1,0]', 'b2[0]'],
    'Initial': [W1[0,0], W1[0,1], W1[1,0], W1[1,1], 
                b1[0,0], b1[0,1], W2[0,0], W2[1,0], b2[0,0]],
    'Gradient': [dL_dW1[0,0], dL_dW1[0,1], dL_dW1[1,0], dL_dW1[1,1],
                 dL_db1[0,0], dL_db1[0,1], dL_dW2[0,0], dL_dW2[1,0], dL_db2[0,0]],
    'Updated': [W1_new[0,0], W1_new[0,1], W1_new[1,0], W1_new[1,1],
                b1_new[0,0], b1_new[0,1], W2_new[0,0], W2_new[1,0], b2_new[0,0]]
})

print(summary.to_string(index=False))

Parameter  Initial  Gradient  Updated
  W1[0,0]     0.15  0.101050 0.099475
  W1[0,1]     0.25  0.113681 0.193159
  W1[1,0]     0.20  0.020210 0.189895
  W1[1,1]     0.30  0.022736 0.288632
    b1[0]     0.35  0.202100 0.248950
    b1[1]     0.35  0.227362 0.236319
  W2[0,0]     0.40  0.224836 0.287582
  W2[1,0]     0.45  0.255151 0.322424
    b2[0]     0.60  0.505250 0.347375


## Summary

**Backpropagation Steps (Manual):**
1. Forward pass: Calculate z and a for each layer
2. Calculate loss (MSE)
3. Compute output layer gradients: dL/dW2, dL/db2
4. Backpropagate to hidden layer: dL/dW1, dL/db1
5. Update weights: W = W - lr × dL/dW

**Key Formulas:**
- MSE Loss: L = 0.5 × (y - ŷ)²
- Linear activation derivative: da/dz = 1
- Weight gradient: dL/dW = a^T · dL/dz
- Backprop: dL/da₁ = dL/dz₂ · W₂^T