<h2 style="text-align:center;">Forward and Backward Propagation</h2>

**Author:** Mubasshir Ahmed  
**Module:** Deep Learning ‚Äî FSDS  
**Notebook:** 03_Forward_Backward_Propagation  
**Objective:** To understand the complete mechanism of how data flows forward through a neural network and how the network learns through backward propagation.

---

A Neural Network learns by performing two key steps repeatedly:

1. **Forward Propagation** ‚Äî Generates predictions.  
2. **Backward Propagation** ‚Äî Learns from mistakes by adjusting weights.

Together, these steps form the **core learning cycle** of Deep Learning.


### <h3 style="text-align:center;">1Ô∏è‚É£ What is Forward Propagation?</h3>

**Forward Propagation** is the process of moving data from the **input layer ‚Üí hidden layers ‚Üí output layer** to generate predictions.

At each layer, neurons perform a weighted computation and apply an activation function.

**Mathematical Flow (for one neuron):**
\[ z = (w_1x_1 + w_2x_2 + ... + w_nx_n) + b \]
\[ a = f(z) \]

Where:
- **x·µ¢** = input features  
- **w·µ¢** = weights  
- **b** = bias  
- **f** = activation function  
- **a** = activated output

The output of one layer becomes the input for the next, until we reach the final prediction \( \hat{y} \).


### <h3 style="text-align:center;">2Ô∏è‚É£ Example ‚Äî Feedforward Intuition</h3>

Let‚Äôs imagine a simple network predicting whether a student passes (1) or fails (0) based on hours studied and sleep hours.

| Input | Hidden Layer | Output |
|--------|---------------|---------|
| Hours Studied, Sleep Hours | Weighted sums ‚Üí Activation | Probability of Pass |

1. Input data (2 features) is fed to neurons.  
2. Each neuron applies weight & bias.  
3. Activation (ReLU/Sigmoid) determines its output.  
4. The final neuron predicts \( \hat{y} = 0.92 \) ‚Üí means ‚Äú92% chance of passing.‚Äù


### <h3 style="text-align:center;">3Ô∏è‚É£ The Mathematics Behind Forward Pass</h3>

For a 3-layer neural network:

\[ a^{(1)} = X \]
\[ z^{(2)} = W^{(2)} a^{(1)} + b^{(2)} \]
\[ a^{(2)} = f(z^{(2)}) \]
\[ z^{(3)} = W^{(3)} a^{(2)} + b^{(3)} \]
\[ a^{(3)} = f(z^{(3)}) = \hat{y} \]

Here, \( \hat{y} \) is the predicted output.


### <h3 style="text-align:center;">4Ô∏è‚É£ Loss Function ‚Äî Measuring the Error</h3>

After obtaining predictions (\( \hat{y} \)), the model calculates **how far off** it was from the true output (\( y \)).

This difference is captured using a **Loss Function** (or Cost Function).

| Problem Type | Common Loss Function | Formula |
|---------------|----------------------|----------|
| Regression | Mean Squared Error (MSE) | \( L = \frac{1}{n}\sum(y - \hat{y})^2 \) |
| Binary Classification | Binary Cross-Entropy | \( L = -[y \log(\hat{y}) + (1-y)\log(1-\hat{y})] \) |
| Multi-class Classification | Categorical Cross-Entropy | Similar extension of BCE |

> Lower loss = better model performance.


### <h3 style="text-align:center;">5Ô∏è‚É£ What is Backward Propagation?</h3>

**Backward Propagation (Backprop)** is the process of **updating the weights** of the neural network by propagating the loss backward from the output layer to the input layer.

It‚Äôs based on **Calculus Chain Rule** ‚Äî it computes how much each weight contributed to the total error.

**Intuition:**  
> Think of it as feedback ‚Äî if the network made a mistake, it traces back to see which neurons caused it and adjusts them accordingly.


### <h3 style="text-align:center;">6Ô∏è‚É£ Steps in Backpropagation</h3>

1. **Compute Loss:** Compare predicted output \( \hat{y} \) with actual output \( y \).  
2. **Compute Gradient:** Calculate partial derivatives (‚àÇL/‚àÇW) using Chain Rule.  
3. **Update Weights:** Move weights in the direction that reduces error.  
4. **Repeat:** Do this for all training samples across multiple epochs.

**Weight Update Rule:**  
\[ W_{new} = W_{old} - \eta \frac{\partial L}{\partial W} \]

Where:  
- \( \eta \) = learning rate  
- \( \frac{\partial L}{\partial W} \) = gradient (rate of change of loss w.r.t weight)


### <h3 style="text-align:center;">7Ô∏è‚É£ Chain Rule ‚Äî The Core of Backpropagation</h3>

The **chain rule** connects derivatives across layers:

\[ \frac{\partial L}{\partial W} = \frac{\partial L}{\partial a} \times \frac{\partial a}{\partial z} \times \frac{\partial z}{\partial W} \]

This means:
- If one neuron makes an error, the gradient tells us **how much** that neuron (and its weights) contributed.  
- The process continues **layer by layer backward** until all weights are adjusted.

Hence the name ‚Äî **backward propagation of errors**.


### <h3 style="text-align:center;">8Ô∏è‚É£ Gradient Descent ‚Äî How Learning Happens</h3>

**Gradient Descent** is an optimization algorithm that minimizes loss by adjusting weights iteratively.

At each step, it moves the weights **slightly opposite** to the gradient direction (downhill on loss curve).

**Update Formula:**  
\[ W := W - \eta \frac{\partial L}{\partial W} \]

| Type | Description |
|------|--------------|
| **Batch Gradient Descent** | Uses all data at once for one update |
| **Stochastic Gradient Descent (SGD)** | Updates weights after every sample |
| **Mini-batch GD** | Updates weights after a small batch (most common) |


### <h3 style="text-align:center;">9Ô∏è‚É£ Learning Rate, Epochs & Iterations</h3>

- **Learning Rate (Œ∑):** Controls the step size of updates.  
  - Too high ‚Üí overshoot minima (unstable)  
  - Too low ‚Üí very slow convergence  

- **Epoch:** One full pass of dataset through network.  
- **Iteration:** One update step (per batch).

**Example:**  
If you have 1000 samples, batch size = 100 ‚Üí 10 iterations = 1 epoch.


### <h3 style="text-align:center;">üîü Full Learning Cycle ‚Äî Putting It All Together</h3>

1. Input data is fed into the network.  
2. **Forward Propagation** computes predictions.  
3. **Loss Function** measures the prediction error.  
4. **Backward Propagation** computes gradients.  
5. **Optimizer (e.g., Adam, SGD)** updates weights.  
6. Repeat across many epochs until convergence.

**Visualization:**  
> Forward pass = making predictions.  
> Backward pass = correcting mistakes.


### <h3 style="text-align:center;">üß† Summary & Key Takeaways</h3>

- Forward propagation generates predictions from input data.  
- Backpropagation corrects errors by adjusting weights.  
- The **loss function** quantifies how wrong the network is.  
- **Gradient Descent** minimizes the loss by updating parameters.  
- Repeating this process over epochs = model learning.

**Next:** Proceed to `04_Activation_Functions/` to understand how activations bring non-linearity into neural networks.
