# **Forward Pass of a Perceptron: PyTorch Style: One Sample Input at a Time (Batch Size = $1$):**

**`Forward propagation`** (often called the `"forward pass"`) is the process where a single perceptron, which is the simplest form of a neural network, takes an input and calculates its final output or prediction. It is the initial, predictive step before any learning or weight adjustment occurs.

The entire thing that happens during this process can be broken down into two main steps: the **`Weighted Sum`** calculation and the application of the **`Activation Function`**.

#### **1. Weighted Sum Calculation (Pre-activation):**

The perceptron receives one or more **`input values`** (e.g., $x_1, x_2, x_3, \dots$). Each of these inputs is connected to the perceptron's core processing unit by a corresponding **`weight`** (e.g., $w_1, w_2, w_3, \dots$).

* **Multiplication:** The perceptron first multiplies each input value by its respective weight. This step assigns an **`importance`** to each input, as determined by the current values of the weights.

* **`Summation`:** All these weighted input values are then added together.

* **`Bias Addition`:** A single value called the **`bias` ($b$)** is added to this sum. The bias acts like an adjustable threshold, allowing the perceptron to shift its decision boundary without needing any input to be non-zero.

The result of this calculation is a single number, often referred to as the **`pre-activation`** or the **`weighted sum`**. Mathematically, this internal sum is represented as $z = (w_1x_1 + w_2x_2 + w_3x_3 + \dots) + b$.

#### **2. Activation Function Application:**

The weighted sum ($z$) calculated in the first step is then passed through an **`activation function`**.

* **`Function's Role`:** The activation function's job is to decide whether the perceptron should `"fire"` or `"activate,"` and if so, what the final output should be.

* **`Output Generation`:** For a traditional perceptron, this function is typically a **`step function`** or, in modern neural networks, often a **`Sigmoid`** function for binary classification.

    * **`Step Function`:** If the weighted sum ($z$) is greater than a certain threshold (e.g., $0$), the output is $\mathbf{1}$; otherwise, the output is $\mathbf{0}$ or $\mathbf{-1}$.

* The value produced by the activation function is the **`final output`** or **`prediction`** of the single perceptron for the given input data. This output is then compared to the actual correct answer to calculate the error, which is used in the next phase, backpropagation (learning).

---
----
---

**Dataset Representation:**    
We have $7$ samples with $3$ input features each:

```py 
      Sample 1: [x₁₁, x₁₂, x₁₃]
      Sample 2: [x₂₁, x₂₂, x₂₃]
      Sample 3: [x₃₁, x₃₂, x₃₃]
      Sample 4: [x₄₁, x₄₂, x₄₃]
      Sample 5: [x₅₁, x₅₂, x₅₃]
      Sample 6: [x₆₁, x₆₂, x₆₃]
      Sample 7: [x₇₁, x₇₂, x₇₃]
```

**Network Architecture:**
   - **Input layer**: $3$ features
   - **Output layer**: $1$ perceptron with sign activation function
   - **Activation function**: $sign(z) = +1$ if $z ≥ 0$, otherwise $-1$

**Weight Representation:**

Following PyTorch standards, weights are stored as:
- **Weight matrix W**: shape $(1, 4)$ = `(out_features, in_features_with_bias)`
- $W = [w₀, w₁, w₂, w₃]$
  - $w₀$ = bias term
  - $w₁, w₂, w₃$ = weights for the three input features

## **Forward Propagation - One Complete Epoch (Processing All 7 Samples):**

#### **Sample 1:**

**Step 1: Prepare Input (Row Vector)**
- Original input: $X₁ = [x₁₁, x₁₂, x₁₃]$ with shape $(1, 3)$

**Step 2: Bias Augmentation (Prepend 1)**
- Augmented input: $\mathbf{X}_{1_{\text{aug}}} = [1, x_{11}, x_{12}, x_{13}]$ with shape $(1, 4)$

**Step 3: Linear Transformation**
- Compute: $z₁ = \mathbf{X}_{1_{\text{aug}}} @ W^T$  

- $z₁ = [1, x₁₁, x₁₂, x₁₃] @ [w₀, w₁, w₂, w₃]^T$

- $z₁ = 1·w₀ + x₁₁·w₁ + x₁₂·w₂ + x₁₃·w₃$

- $z₁ = w₀ + w₁x₁₁ + w₂x₁₂ + w₃x₁₃$

**Step 4: Apply Activation Function:**
- $o_1 = \operatorname{sign}(z_1) = 
\begin{cases}
+1 & \text{if } z_1 \geq 0 \\
-1 & \text{if } z_1 < 0
\end{cases}$

#### **Sample 2:**

**Step 1: Prepare Input**
- $X₂ = [x₂₁, x₂₂, x₂₃]$

**Step 2: Bias Augmentation**
- $\mathbf{X}_{2_{\text{aug}}} = [1, x_{21}, x_{22}, x_{23}]$ Or, $\tilde{\mathbf{x}}^{(2)} = [1, x^{(2)}_1, x^{(2)}_2, x^{(2)}_3]^\top$ 

**Step 3: Linear Transformation**
- $z_2 = \mathbf{X}_{2_{\text{aug}}} @ \mathbf{W}^\top$
- $z_2 = w_0 + w_1 x_2^1 + w_2 x_2^2 + w_3 x_2^3$

**Step 4: Apply Activation**
- $o₂ = sign(z₂)$ 

#### **Sample 3:**

**Step 1: Prepare Input**
- $\mathbf{X}_3 = [x_{31}, x_{32}, x_{33}]$

**Step 2: Bias Augmentation**
- $\mathbf{X}_{3_{\text{aug}}} = [1, x_{31}, x_{32}, x_{33}]$ Or, $\tilde{\mathbf{x}}^{(3)} = [1, x^{(3)}_1, x^{(3)}_2, x^{(3)}_3]^\top$ 

**Step 3: Linear Transformation**
- $z_3 = \mathbf{X}_{3_{\text{aug}}} @ \mathbf{W}^\top$
- $z_3 = w_0 + w_1 x_3^1 + w_2 x_3^2 + w_3 x_3^3$

**Step 4: Apply Activation**
- $o_3 = sign(z_3)$

#### **Sample 4:**

**Step 1: Prepare Input:**
- $\mathbf{X}_4 = [x_{41}, x_{42}, x_{43}]$

**Step 2: Bias Augmentation:**
- $\mathbf{X}_{4_{\text{aug}}} = [1, x_{41}, x_{42}, x_{43}]$ Or, $\tilde{\mathbf{x}}^{(4)} = [1, x^{(4)}_1, x^{(4)}_2, x^{(4)}_3]^\top$ 

**Step 3: Linear Transformation:**
- $z_4 = \mathbf{X}_{4_{\text{aug}}} @ \mathbf{W}^\top$
- $z_4 = w_0 + w_1 x_4^1 + w_2 x_4^2 + w_3 x_4^3$

**Step 4: Apply Activation:**
- $o_4 = \text{sign}(z_4)$

#### **Sample 5:**

**Step 1: Prepare Input:**
- $\mathbf{X}_5 = [x_{51}, x_{52}, x_{53}]$

**Step 2: Bias Augmentation:**
- $\mathbf{X}_{5_{\text{aug}}} = [1, x_{51}, x_{52}, x_{53}]$ Or, $\tilde{\mathbf{x}}^{(5)} = [1, x^{(5)}_1, x^{(5)}_2, x^{(5)}_3]^\top$ 

**Step 3: Linear Transformation:**
- $z_5 = \mathbf{X}_{5_{\text{aug}}} @ \mathbf{W}^\top$
- $z_5 = w_0 + w_1 x_5^1 + w_2 x_5^2 + w_3 x_5^3$

**Step 4: Apply Activation:**
- $o_5 = \text{sign}(z_5)$

#### **Sample 6:**

**Step 1: Prepare Input**
- $X₆ = [x₆₁, x₆₂, x₆₃]$

**Step 2: Bias Augmentation**
- $\mathbf{X}_{6_{\text{aug}}} = [1, x_{61}, x_{62}, x_{63}]$ Or, $\tilde{\mathbf{x}}^{(6)} = [1, x^{(6)}_1, x^{(6)}_2, x^{(6)}_3]^\top$ 

**Step 3: Linear Transformation**
- $z₆ = \mathbf{X}_{6_{\text{aug}}} @ \mathbf{W}^\top$
-  $z₆ = w₀ + w₁x₆₁ + w₂x₆₂ + w₃x₆₃$

**Step 4: Apply Activation**
- $o₆ = \text{sign}(z₆)$

#### **Sample 7:**

**Step 1: Prepare Input:**
- $X₇ = [x₇₁, x₇₂, x₇₃]$

**Step 2: Bias Augmentation:**
- $\mathbf{X}_{7_{\text{aug}}} = [1, x_{71}, x_{72}, x_{73}]$ Or, $\tilde{\mathbf{x}}^{(7)} = [1, x^{(7)}_1, x^{(7)}_2, x^{(7)}_3]^\top$ 

**Step 3: Linear Transformation:**
- $z₇ = \mathbf{X}_{7_{\text{aug}}} @ \mathbf{W}^\top$
- $z₇ = w₀ + w₁x₇₁ + w₂x₇₂ + w₃x₇₃$

**Step 4: Apply Activation:**
- $o₇ = \text{sign}(z₇)$

After processing all 7 samples sequentially, we have:

| Sample | Input | Weighted Sum ($z$) | Output ($o$) |
|--------|-------|------------------|------------|
| 1 | $[x₁₁, x₁₂, x₁₃]$ | $w₀ + w₁x₁₁ + w₂x₁₂ + w₃x₁₃$ | $o₁ = \text{sign}(z₁)$ |
| 2 | $[x₂₁, x₂₂, x₂₃]$ | $w₀ + w₁x₂₁ + w₂x₂₂ + w₃x₂₃$ | $o₂ = \text{sign}(z₂)$ |
| 3 | $[x₃₁, x₃₂, x₃₃]$ | $w₀ + w₁x₃₁ + w₂x₃₂ + w₃x₃₃$ | $o₃ = \text{sign}(z₃)$ |
| 4 | $[x₄₁, x₄₂, x₄₃]$ | $w₀ + w₁x₄₁ + w₂x₄₂ + w₃x₄₃$ | $o₄ = \text{sign}(z₄)$ |
| 5 | $[x₅₁, x₅₂, x₅₃]$ | $w₀ + w₁x₅₁ + w₂x₅₂ + w₃x₅₃$ | $o₅ = \text{sign}(z₅)$ |
| 6 | $[x₆₁, x₆₂, x₆₃]$ | $w₀ + w₁x₆₁ + w₂x₆₂ + w₃x₆₃$ | $o₆ = \text{sign}(z₆)$ |
| 7 | $[x₇₁, x₇₂, x₇₃]$ | $w₀ + w₁x₇₁ + w₂x₇₂ + w₃x₇₃$ | $o₇ = sign(z₇)$ |

**One epoch is complete!** The perceptron has made predictions for all 7 samples in the dataset.

**Notes:** 

1. **`Sequential Processing`**: In one epoch, we process each sample one at a time (sample-by-sample or "online" learning)

2. **`Row Vector Convention`**: Each input sample is a row vector, following PyTorch standards

3. **`Bias Augmentation`**: We prepend 1 to each input, allowing the bias term to be treated as a regular weight

4. **`Matrix Dimensions`**: 
   - Input after augmentation: $(1, 4)$
   - Weights transposed: $(4, 1)$
   - Result: $(1, 4) × (4, 1) = (1, 1)$ scalar output

5. **`Sign Activation`**: Creates a binary classifier outputting $+1$ or $-1$

6. **`No Learning Yet`**: During this forward pass, weights remain constant. **`Learning`** (weight updates) would happen during **`backpropagation`**.