# **Day 4: Forward and Backpropagation**
## **Objective**
1. Understand the step-by-step process of forward propagation.
2. Learn how the chain rule is applied in backpropagation.
3. Implement manual forward and backward passes for a simple neural network.
---

## **1. What is Forward Propagation?**
Forward propagation is the process of passing the input data through the layers of a neural network to calculate the output.

### **Steps in Forward Propagation**
1. **Input Data**: Begin with input values (features of the data).
2. **Weighted Sum (Linear Transformation)**:  
   Compute `z = W*x + b`, where:
   - `W` is the weight matrix,
   - `x` is the input vector,
   - `b` is the bias term.
3. **Activation Function**: Apply an activation function to introduce non-linearity:  
   `a = Activation(z)`
4. **Output**: Repeat the above steps for all layers until the final output is computed.

## **2. What is Backward Propagation?**
Backward propagation calculates the gradients of the loss function with respect to the weights and biases using the **chain rule** of calculus. These gradients are used to update the weights and biases during training.

### **Steps in Backward Propagation**
1. **Compute Loss**: Measure the difference between predicted output and actual target using a loss function.
2. **Calculate Gradients**: Use the chain rule to compute:
   - `∂L/∂a` (loss with respect to activation output),
   - `∂a/∂z` (activation output with respect to linear transformation),
   - `∂z/∂W` and `∂z/∂b` (linear transformation with respect to weights and biases).
3. **Update Weights and Biases**: Adjust `W` and `b` to minimize the loss using:
   - `W = W - learning_rate * ∂L/∂W`
   - `b = b - learning_rate * ∂L/∂b`

# **Day 4: Forward and Backpropagation**
## **Objective**
1. Understand the step-by-step process of forward propagation.
2. Learn how the chain rule is applied in backpropagation.
3. Implement manual forward and backward passes for a simple neural network.
---

## **1. What is Forward Propagation?**
Forward propagation is the process of passing the input data through the layers of a neural network to calculate the output.

### **Steps in Forward Propagation**
1. **Input Data**: Begin with input values (features of the data).
2. **Weighted Sum (Linear Transformation)**:  
   Compute `z = W*x + b`, where:
   - `W` is the weight matrix,
   - `x` is the input vector,
   - `b` is the bias term.
3. **Activation Function**: Apply an activation function to introduce non-linearity:  
   `a = Activation(z)`
4. **Output**: Repeat the above steps for all layers until the final output is computed.

---

## **2. What is Backward Propagation?**
Backward propagation calculates the gradients of the loss function with respect to the weights and biases using the **chain rule** of calculus. These gradients are used to update the weights and biases during training.

### **Steps in Backward Propagation**
1. **Compute Loss**: Measure the difference between predicted output and actual target using a loss function.
2. **Calculate Gradients**: Use the chain rule to compute:
   - `∂L/∂a` (loss with respect to activation output),
   - `∂a/∂z` (activation output with respect to linear transformation),
   - `∂z/∂W` and `∂z/∂b` (linear transformation with respect to weights and biases).
3. **Update Weights and Biases**: Adjust `W` and `b` to minimize the loss using:
   - `W = W - learning_rate * ∂L/∂W`
   - `b = b - learning_rate * ∂L/∂b`

---

## **3. Forward Propagation Step-by-Step**
### **Step 1: Input Data**
The input is a vector or matrix that represents the features of your data.  
Example:  
    X = [[0.5], [1.0]] (2 features for a single example)

### **Step 2: Weighted Sum**
Compute the linear combination of weights, inputs, and bias:  
    z = W*x + b  
Example:  
    W = [[0.2, 0.8]] (weights for 2 inputs)  
    b = [[0.5]] (bias term)  
    z = (0.2 * 0.5) + (0.8 * 1.0) + 0.5  

### **Step 3: Activation Function**
Apply an activation function to `z` to compute the output of the layer:  
    a = sigmoid(z)  
Where:  
    sigmoid(z) = 1 / (1 + e^(-z))

### **Step 4: Output**
The final output of the network after all layers.

---

## **4. Backward Propagation Step-by-Step**
### **Step 1: Compute Loss**
The loss measures the difference between the predicted output and the target value.  
Example (Mean Squared Error Loss):  
    L = (1/n) * Σ(y_true - y_pred)^2

### **Step 2: Calculate Gradients**
Use the **chain rule** to compute partial derivatives for weight and bias updates:  
1. `∂L/∂a` (loss with respect to activation):  
    ∂L/∂a = -(y_true - a)  
2. `∂a/∂z` (activation with respect to z):  
    For Sigmoid: ∂a/∂z = a * (1 - a)  
3. `∂z/∂W` (z with respect to weights):  
    ∂z/∂W = x.T

### **Step 3: Update Weights and Biases**
Using gradient descent:  
    W = W - learning_rate * ∂L/∂W  
    b = b - learning_rate * ∂L/∂b  

---

## **5. Code: Forward Propagation**

In [1]:
import numpy as np

# Input data
X = np.array([[0.5], [1.0]])  # Input vector with 2 features

# Weights and Bias
W = np.array([[0.2, 0.8]])  # Weight matrix
b = np.array([[0.5]])       # Bias term

# Sigmoid Activation Function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Forward Propagation
z = np.dot(W, X) + b  # Linear transformation
a = sigmoid(z)        # Activation output

print("Linear Output (z):", z)
print("Activation Output (a):", a)


Linear Output (z): [[1.4]]
Activation Output (a): [[0.80218389]]


# **6. Code: Backward Propagation**

In [3]:
# Target value (Ground truth)
y_true = np.array([[1]])  # True label

# Loss function (Mean Squared Error)
def mse_loss(y_true, y_pred):
    return np.mean((y_true - y_pred)**2)

# Derivative of the loss with respect to activation
def mse_loss_derivative(y_true, y_pred):
    return -(y_true - y_pred)

# Backward Propagation
loss = mse_loss(y_true, a)  # Compute loss
dl_da = mse_loss_derivative(y_true, a)  # Derivative of loss wrt activation
da_dz = a * (1 - a)  # Derivative of sigmoid
dz_dw = X.T  # Derivative of z wrt weights

# Gradients using the chain rule
dl_dz = dl_da * da_dz  # Gradient wrt z
dl_dw = dl_dz * X.T  # Gradient wrt weights (broadcasting)
dl_db = dl_dz  # Gradient wrt bias

print("Loss:", loss)
print("Gradient wrt Weights (dl/dW):", dl_dw)
print("Gradient wrt Bias (dl/db):", dl_db)


Loss: 0.03913121394580363
Gradient wrt Weights (dl/dW): [[-0.01569521 -0.03139043]]
Gradient wrt Bias (dl/db): [[-0.03139043]]


# **Updating Weight**

In [4]:
import numpy as np

# Inputs (features)
X = np.array([[0.5, 1.5]])  # Shape (1, 2)

# Target value (Ground truth)
y_true = np.array([[1]])  # True label

# Initial weights and bias
W = np.array([[0.1, 0.2]])  # Shape (1, 2)
b = np.array([[0.3]])       # Shape (1, 1)

# Learning rate
learning_rate = 0.01

# Activation function (Sigmoid)
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Loss function (Mean Squared Error)
def mse_loss(y_true, y_pred):
    return np.mean((y_true - y_pred)**2)

# Derivative of the loss with respect to activation
def mse_loss_derivative(y_true, y_pred):
    return -(y_true - y_pred)

# --- Forward Propagation ---
z = np.dot(W, X.T) + b  # Weighted sum (Shape: (1, 1))
a = sigmoid(z)          # Activation output (Shape: (1, 1))

# Compute loss
loss = mse_loss(y_true, a)

# --- Backward Propagation ---
dl_da = mse_loss_derivative(y_true, a)  # Derivative of loss wrt activation
da_dz = a * (1 - a)                     # Derivative of sigmoid
dl_dz = dl_da * da_dz                   # Derivative of loss wrt z
dz_dw = X.T                             # Derivative of z wrt weights

dl_dw = dl_dz * dz_dw.T                 # Gradient wrt weights (Shape: (1, 2))
dl_db = dl_dz                           # Gradient wrt bias (Shape: (1, 1))

# --- Update Weights and Bias ---
W = W - learning_rate * dl_dw  # Update weights
b = b - learning_rate * dl_db  # Update bias

# --- Results ---
print("Loss:", loss)
print("Updated Weights (W):", W)
print("Updated Bias (b):", b)


Loss: 0.11764182271544733
Updated Weights (W): [[0.10038646 0.20115938]]
Updated Bias (b): [[0.30077292]]


### **Conclusion**

In this notebook, we explored the fundamental concepts of **forward and backward propagation**—the core processes that enable neural networks to learn from data. By understanding these steps, we demystified how neural networks calculate predictions, compute errors, and adjust their internal parameters (weights and biases) to minimize the error over time. This iterative process is the foundation of machine learning and deep learning.

We started by implementing a single-layer neural network with a sigmoid activation function and learned how to:
1. Perform forward propagation to compute predictions.
2. Calculate the loss using the Mean Squared Error (MSE) loss function.
3. Use **backward propagation** and the **chain rule** to compute gradients.
4. Update the weights and biases using **Gradient Descent**.

This hands-on exercise demonstrated how a neural network "learns" by repeatedly adjusting its parameters to reduce the error and improve predictions.

---

### **Key Learnings**

1. **Forward Propagation**:
   - Neural networks compute predictions by applying weights, biases, and activation functions to the input data.
   - The activation function introduces non-linearity, enabling the network to model complex patterns.

2. **Backward Propagation**:
   - Using the **chain rule**, gradients are propagated backward through the network to calculate how much each parameter contributes to the error.
   - The computed gradients guide the updates to the weights and biases.

3. **Gradient Descent**:
   - Gradients are used to iteratively adjust the weights and biases in the direction that reduces the loss.
   - The **learning rate** plays a crucial role in determining the step size for these updates.

4. **The Role of Activation Functions**:
   - The sigmoid activation function outputs values between 0 and 1, making it suitable for binary classification tasks.
   - Understanding how activation functions and their derivatives influence learning is essential.

5. **The Chain Rule in Action**:
   - Backward propagation is an excellent example of the **chain rule** in calculus, which allows us to compute gradients for composite functions.

6. **Iterative Learning**:
   - Learning in neural networks is an iterative process where the parameters are updated repeatedly over multiple epochs until the model converges to a solution.

---

### **Why Is This Important?**

Understanding forward and backward propagation is critical for anyone working in deep learning because:
- It provides insight into the inner workings of neural networks.
- It allows you to debug models effectively by understanding where errors might occur.
- It sets the foundation for more advanced topics, such as optimization techniques, advanced architectures (e.g., CNNs, RNNs), and modern frameworks like TensorFlow or PyTorch.