# "MLP Notation"
lec 8 campus x

***

### Lecture Notes: Multi-layer Perceptron (MLP) Notation

**1. Introduction and Importance of Notation**
*   This video builds on previous discussions about Multi-layer Perceptron (MLP) intuition, exploring how MLPs work and why they are effective.
*   The most challenging aspect of understanding MLPs is often the **training algorithm, known as Backpropagation**.
*   A common source of confusion when learning backpropagation is the large number of weights and biases in a neural network. Without a proper system for notation, it becomes difficult to distinguish between these parameters, leading to confusion during complex calculations.
*   **The primary goals of this video are**:
    1.  To learn how to **calculate the total number of trainable parameters** (weights and biases) in any given neural network architecture.
    2.  To establish a **standardised notation for weights, biases, and outputs** that is commonly followed in the industry, to avoid confusion during backpropagation.

**2. Neural Network Architecture Setup**

<img src='https://i.ibb.co/twpX7JCM/image.png'>

*   The lecture uses a specific neural network architecture for demonstration. (shown above)
*   This architecture consists of **four layers in total**:
    *   **Layer 0**: The Input Layer.
    *   **Layer 1**: The first Hidden Layer.
    *   **Layer 2**: The second Hidden Layer.
    *   **Layer 3**: The Output Layer.
*   The input data for this example is **four-dimensional**, meaning each input instance has four features or columns.

**3. Calculating Trainable Parameters (Weights and Biases)**
*   **Trainable parameters** are the values of weights and biases that the backpropagation algorithm will determine during the training process of the model.
*   It is crucial to be able to calculate these from any given architecture.
*   **For the demonstrated architecture**:
    *   **From Input Layer (Layer 0) to Hidden Layer 1 (Layer 1)**:
        *   Layer 0 has 4 nodes and Layer 1 has 3 nodes.
        *   **Weights**: 4 (nodes in Layer 0) × 3 (nodes in Layer 1) = **12 weights**.
        *   **Biases**: 3 (biases, one for each node in Layer 1).
        *   *Subtotal for this segment*: 12 weights + 3 biases = **15 parameters**.
    *   **From Hidden Layer 1 (Layer 1) to Hidden Layer 2 (Layer 2)**:
        *   Layer 1 has 3 nodes and Layer 2 has 2 nodes.
        *   **Weights**: 3 (nodes in Layer 1) × 2 (nodes in Layer 2) = **6 weights**.
        *   **Biases**: 2 (biases, one for each node in Layer 2).
        *   *Subtotal for this segment*: 6 weights + 2 biases = **8 parameters**.
    *   **From Hidden Layer 2 (Layer 2) to Output Layer (Layer 3)**:
        *   Layer 2 has 2 nodes and Layer 3 has 1 node.
        *   **Weights**: 2 (nodes in Layer 2) × 1 (node in Layer 3) = **2 weights**.
        *   **Biases**: 1 (bias, for the node in Layer 3).
        *   *Subtotal for this segment*: 2 weights + 1 bias = **3 parameters**.
    *   **Total Trainable Parameters for the entire network**: 15 + 8 + 3 = **26 parameters**. This means the backpropagation algorithm will find the values for these 26 weights and biases.

**4. Notation for Biases (b)**
*   The notation for biases is straightforward and uses two indices.
*   **Standard Notation**: **`b_i^j`**
    *   **`i`**: Represents the **layer number**.
    *   **`j`**: Represents the **node number** within that layer.
*   **Examples**:
    *   **`b_1^1`**: Bias for the first node in Layer 1.
    *   **`b_1^2`**: Bias for the second node in Layer 1.
    *   **`b_2^1`**: Bias for the first node in Layer 2.
    *   **`b_3^1`**: Bias for the first (and only) node in Layer 3.

**5. Notation for Outputs (o)**
*   The notation for outputs is identical to that for biases.
*   **Standard Notation**: **`o_i^j`**
    *   **`i`**: Represents the **layer number**.
    *   **`j`**: Represents the **node number** within that layer.
*   Any output originating from a node will follow this notation.
*   **Examples**:
    *   **`o_1^1`**: Output from the first node in Layer 1.
    *   **`o_1^2`**: Output from the second node in Layer 1.
    *   **`o_2^1`**: Output from the first node in Layer 2.
    *   **`o_3^1`**: Output from the first (and only) node in Layer 3.

eg - 
<img src='https://miro.medium.com/v2/resize:fit:1400/1*2vLiWsyesKLAfDcezIfBRQ.png'>

**6. Notation for Weights (W)**
*   The notation for weights is slightly more complex, requiring three indices.
*   **Standard Notation**: **`W_k_i^j`**
    *   **`k`**: Represents the **layer number that the weight is entering**. This is the layer containing the destination node.
    *   **`i`**: Represents the **node number in the previous layer from which the weight is originating**.
    *   **`j`**: Represents the **node number in the current layer (layer `k`) that the weight is entering**.
*   **Examples (referencing the network diagram in the source)**:
    *   **`W_1_1^1`**: Weight entering **Layer 1**, originating from the **1st node of the previous layer** (Layer 0), and entering the **1st node of Layer 1**.
    *   **`W_1_4^2`**: Weight entering **Layer 1**, originating from the **4th node of the previous layer** (Layer 0), and entering the **2nd node of Layer 1**.
    *   **`W_1_1^3`**: Weight entering **Layer 1**, originating from the **1st node of the previous layer** (Layer 0), and entering the **3rd node of Layer 1**.
    *   **`W_2_2^2`**: Weight entering **Layer 2**, originating from the **2nd node of the previous layer** (Layer 1), and entering the **2nd node of Layer 2**.
    *   **`W_3_1^1`**: Weight entering **Layer 3**, originating from the **1st node of the previous layer** (Layer 2), and entering the **1st node of Layer 3**.

eg 

<img src='https://miro.medium.com/v2/resize:fit:1200/1*n5YNnh_vG2exS-YnjDPoPA.png'>


***