### **Lecture: Forward Propagation | How a Neural Network Predicts Output**
lec 10 dl playlist campusx

**1. Introduction & Context**
*   Forward propagation is the process by which a neural network makes predictions.
*   **Backpropagation** is the algorithm used to **train neural networks** by updating weights and biases. It's often considered difficult, so simplifying concepts like forward propagation first is beneficial.
*   **Forward propagation** is the process where input data moves **forward through the neural network** to produce an output or prediction.
*   The video will demonstrate how to perform predictions using given inputs and network weights.
*   **Linear algebra** elegantly handles the complex calculations involved in neural networks, simplifying the understanding of their operations.

**2. Understanding Neural Network Prediction**
*   A neural network predicts output by processing data through its layers.
*   The complexity of a neural network (number of nodes, layers) is managed behind the scenes by linear algebra.
*   **Goal:** Learn how a neural network makes a prediction in a simplified manner, despite potentially complex architectures.

**3. Example Neural Network Architecture**

<img src='https://i.ibb.co/twpX7JCM/image.png'>

*   **Input Columns:** CGPA, IQ, 10th marks, 12th marks (4 input nodes).
*   **Output Column:** Placement (binary: yes/no, 1 output node).
*   **Layers:**
    *   **Input Layer:** 4 nodes.
    *   **Hidden Layer 1:** 3 nodes.
    *   **Hidden Layer 2:** 2 nodes.
    *   **Output Layer:** 1 node.
*   **Trainable Parameters (Weights and Biases):**
    *   It's important to know the total number of trainable parameters.
    *   Between Input (4 nodes) and Hidden Layer 1 (3 nodes): (4 * 3) connections + 3 biases = 12 + 3 = **15 parameters**.
    *   Between Hidden Layer 1 (3 nodes) and Hidden Layer 2 (2 nodes): (3 * 2) connections + 2 biases = 6 + 2 = **8 parameters**.
    *   Between Hidden Layer 2 (2 nodes) and Output Layer (1 node): (2 * 1) connections + 1 bias = 2 + 1 = **3 parameters**.
    *   **Total Trainable Parameters:** 15 + 8 + 3 = **26 weights and biases** that will be trained.
*   Initially, weights and biases start with **random values**.

**4. The Prediction Process (Layer by Layer Calculation)**

*   **Core Formula for a Perceptron Output:** When aiming for an output from any perceptron, the formula is `Sigmoid((Weights * Inputs) + Bias)`.
    *   This is the **activation function** applied to the weighted sum of inputs plus the bias.

*   **Notation for Activations:**
    *   `A` (A_zero): Collective input (activation of the zero layer).
    *   `A` (A_one): Output from the first hidden layer (activation of layer one).
    *   `A` (A_two): Output from the second hidden layer (activation of layer two).
    *   `A` (A_three): Final output from the network (activation of layer three).

*   **Step 1: Calculating Output for Hidden Layer 1 (`A`)**
    1.  **Input:** A single row of data (e.g., [7.2, 72, 60, 181]) is fed into the 4 input nodes. This forms the `A` vector.
    2.  **Weights (W):** A matrix is constructed for the weights connecting the input layer to Hidden Layer 1.
        *   The example shows weights organised by input row (e.g., `w11_1`, `w11_2`, `w11_3` for input 1 to all 3 nodes of Layer 1) and by column for the destination node.
        *   This matrix is then **transposed** to align with the input vector for a dot product.
    3.  **Dot Product:** The transposed weight matrix `W^T` (which is 3x4) is multiplied by the input vector `A` (4x1). This results in a 3x1 matrix.
    4.  **Add Biases (B):** A 3x1 bias vector `B` (one bias for each node in Hidden Layer 1) is added to the result of the dot product.
    5.  **Apply Sigmoid Activation:** The `sigmoid` function is applied element-wise to the resulting 3x1 matrix.
    6.  **Output `A`:** This sigmoid-activated 3x1 matrix represents the output of the first hidden layer.

*   **Step 2: Calculating Output for Hidden Layer 2 (`A`)**
    1.  **Input:** The output from the previous layer, `A` (a 3x1 matrix), now acts as the input for this layer.
    2.  **Weights (W):** A weight matrix `W` (2x3) connects Hidden Layer 1 (3 nodes) to Hidden Layer 2 (2 nodes). This matrix is also **transposed** to 3x2 for the dot product.
    3.  **Dot Product:** `A` (3x1) is multiplied by `W^T` (3x2) to get a 2x1 matrix.
    4.  **Add Biases (B):** A 2x1 bias vector `B` is added.
    5.  **Apply Sigmoid Activation:** The `sigmoid` function is applied.
    6.  **Output `A`:** This 2x1 matrix is the output of the second hidden layer.

*   **Step 3: Calculating Final Output (`A`)**
    1.  **Input:** The output from the previous layer, `A` (a 2x1 matrix), is the input.
    2.  **Weights (W):** A weight matrix `W` (1x2) connects Hidden Layer 2 (2 nodes) to the Output Layer (1 node). This is **transposed** to 2x1.
    3.  **Dot Product:** `A` (2x1) is multiplied by `W^T` (2x1) to get a 1x1 matrix.
    4.  **Add Bias (B):** A 1x1 bias matrix `B` is added.
    5.  **Apply Sigmoid Activation:** The `sigmoid` function is applied.
    6.  **Output `A`:** This 1x1 matrix (a single number) is the **final prediction** of the neural network.

**5. Chained Operations & Simplicity**
*   The entire process can be viewed as a complex, chained expression where the output of one layer becomes the input for the next.
    *   `A = Sigmoid((A . W^T) + B)`
    *   `A = Sigmoid((A . W^T) + B)`
    *   `A = Sigmoid((A . W^T) + B)`
*   This demonstrates how the input `A` is transformed into the final output `A` through sequential matrix multiplications, additions, and activation functions.
*   Despite appearing complex, it's a very **organised way of doing mathematics** which simplifies even large neural network architectures.
