# **Dimensions of a Single Perceptron Unit:**

Think of a **perceptron** as a tiny decision-maker. It takes in several pieces of information, weighs their importance, and makes a single yes/no (or +1/-1) decision.


### **1. Input ($x$)**

* The data we feed into the perceptron. For example, if we're predicting house prices, inputs could be: `[size_in_sqft, number_of_bedrooms, age_of_house]`.

*   **Dimension:** $(n, 1)$ or often just said to be $n$.
    *   $n$ is the **`number of features`** in our input.
    
    *   In our example, $n = 3$. So our input is a **`vector`** (a list of numbers) with 3 elements.
    
    *   **Example:** $x = [x₁, x₂, x₃]^T$ (a column with 3 rows).

**Input is an $n$-dimensional vector.**

#### **2. Weight ($w$):**
* Each input feature has a corresponding **`weight`**. The weight determines how important that feature is for the decision. A higher absolute weight means the feature is more influential.

*   **Dimension:** $(n, 1)$ or $n$ — **exactly the same as the input**.
    * We need one weight per input feature. So if you have 3 input features, you need 3 weights.
    
    * **Example:** $w = [w₁, w₂, w₃]^T$ (a column with 3 rows).

* During calculation, the perceptron computes the **dot product**: $w₁*x₁ + w₂*x₂ + w₃*x₃$. For this to work, $w$ and $x$ must have the same number of elements.

**Weight is an $n$-dimensional vector. One weight per input.**

#### **Bias ($b$):**
* An extra, tunable parameter that acts like a **`threshold`**. It allows the perceptron to shift its decision boundary away from the origin $(0,0,...)$. Think of it as the perceptron's built-in `"preference"` before seeing any input.

* **Dimension:** $(1, )$ or **a single number (a scalar)**.
    * There is only **`one bias`** per perceptron, regardless of how many inputs there are.

* **Key Operation:** The bias is simply added to the dot product: $w·x + b$.

**Bias is a single number.**

#### **Output ($ŷ$ or "y-hat"):**
* The final decision/prediction of the perceptron. A simple perceptron uses an **activation function** (like a step function) to convert the weighted sum into a clear output.

* **Dimension:** $(1, )$ or **a single number (a scalar)**.
    *   A single perceptron always gives **`one output`**. That output could be:
        *   Binary: `0` or `1`
        *   Binary: `-1` or `+1`
        *   In a more advanced `"neurons,"` it could be a probability or any real number.

* **The Full Calculation:**
    1.  $z = (w₁*x₁ + w₂*x₂ + w₃*x₃) + b$, `z` is a single number.
    
    2.  $ŷ = activation_{function}(z)$, $ŷ$ is also a single number.

- **Output is a single number.**

This is the foundation. When you connect many perceptrons into a **`layer`**, these dimensions expand into matrices, but the core idea remains: **`weights connect inputs to neurons, and each neuron has one bias and gives one output`.**

---------
-----
------

## **Batch Processing in a Perceptron:**

Now, the Input vector Becomes a Matrix.

When processing **$m$ samples** `(a batch)` with **$n$ features** each:

**1. Input ($X$) - Batch Version:**
   * Now we have **$m$ data samples**, each with **$n$ features**.

   * **Dimension:** $(m, n)$ - a **matrix:**
     * $m$: number of samples in the batch (batch size)
     * $n$: number of features per sample

* **Visual:**
  ```py 
            Features (n)
         ↓ ↓ ↓ ↓
      X = [[x₁₁, x₁₂, ..., x₁ₙ],  ← Sample 1
            [x₂₁, x₂₂, ..., x₂ₙ],  ← Sample 2
            ...
            [xₘ₁, xₘ₂, ..., xₘₙ]]   ← Sample m
  ```

**2. Weight ($w$) - Stays the Same!**

   * **Dimension:** Still $(n, 1)$ or $n$ - a **`vector`**
   
   * The **`same weights`** are applied to **`all samples`** in the batch.
   
   * **Visual:** $w = [w₁, w₂, ..., wₙ]^T$

**3. Bias ($b$) - Still a Single Number:**  

   * **Dimension:** Still $(1, )$ - a **`scalar`**
   * **Important:** This same bias is added to **`every sample's calculation`**.

**4. Output ($Ŷ$) - Now a Vector:**
   
   * **Dimension:** $(m, 1)$ or $m$ - a **`vector`** with $m$ outputs
   
   * **One output per sample in the batch**

#### **The Mathematics: Batch Calculation:**   

**The Core Operation: Matrix-Vector Multiplication:**     
   1. For a single sample: $z = w·x + b$ (dot product + bias)   

   2. For a batch: $Z = X @ w + b$ (matrix multiplication + broadcasting)

**Step-by-step:**

1. **Matrix Multiplication:** $X @ w$
   ```py 
   [x₁₁, x₁₂, ..., x₁ₙ]   [w₁]   [x₁₁*w₁ + x₁₂*w₂ + ... + x₁ₙ*wₙ]   [z₁]
   [x₂₁, x₂₂, ..., x₂ₙ] × [w₂] = [x₂₁*w₁ + x₂₂*w₂ + ... + x₂ₙ*wₙ] = [z₂]
   ...                    ...      ...                             ...
   [xₘ₁, xₘ₂, ..., xₘₙ]   [wₙ]    [xₘ₁*w₁ + xₘ₂*w₂ + ...   + xₘₙ*wₙ]  [zₘ]
   ```
   Result: A vector $Z_{raw}$ of shape $(m, 1)$

2. **Add Bias (Broadcasting):** $Z = Z_{raw} + b$
   ```py 
         [z₁]   [b]   [z₁ + b]
         [z₂] + [b] = [z₂ + b]
         ...    ...   ...
         [zₘ]   [b]   [zₘ + b]
   ```
   The scalar $b$ is automatically `"broadcast"` to add to every element.

3. **Apply Activation Function:** $Ŷ = activation\_function(Z)$
   ```py 
         [activation(z₁ + b)]
         [activation(z₂ + b)]
         ...
         [activation(zₘ + b)]
   ```

## **Example:**

Let's use our `"go for a walk"` perceptron with a batch of $3$ people:

**Input Matrix X ($3$ people × $3$ features):**
   - Person 1: `[Sunny=8, Time=1.5, Energy=6]`
   - Person 2: `[Sunny=3, Time=2.0, Energy=4]`
   - Person 3: `[Sunny=10, Time=0.5, Energy=9]`

```py 
               Weather,  Time,  Energy
         X =   [[8,      1.5,    6],    ← Person 1
               [3,       2.0,    4],    ← Person 2  
               [10,      0.5,    9]]    ← Person 3
         Dimensions: (3, 3)
```

**Weights (same as before):**
```py 
w = [2.0,    ← Weather weight
     0.5,    ← Time weight
     0.8]    ← Energy weight
Dimensions: (3, 1)
```

**Bias:**
```py 
      b = -10  ← Same threshold
```

**Calculation:**

1. **Matrix Multiplication:**
   ```py 
   Person 1: (8×2.0) + (1.5×0.5) + (6×0.8) = 16 + 0.75 + 4.8 = 21.55
   Person 2: (3×2.0) + (2.0×0.5) + (4×0.8) = 6 + 1.0 + 3.2 = 10.2
   Person 3: (10×2.0) + (0.5×0.5) + (9×0.8) = 20 + 0.25 + 7.2 = 27.45
   
   Z_raw = [21.55, 10.2, 27.45]^T
   ```

2. **Add Bias:**
   ```py 
         Z = [21.55 - 10, 10.2 - 10, 27.45 - 10]
         = [11.55, 0.2, 17.45]
   ```

3. **Apply Step Function (output 1 if z > 0, else 0):**
   ```py 
         Ŷ = [1, 1, 1]^T
   ```
   All three people should go for a walk!

#### **Why Batches Are Important:**

1. **`Computational Efficiency`:** Modern hardware (GPUs/TPUs) is optimized for parallel matrix operations.

2. **`Stable Learning`:** When updating weights during training, using batches gives a better estimate of the true gradient than a single sample.

3. **`Vectorization`:** One matrix operation processes all samples at once, much faster than looping through samples individually.

| Component | Symbol | Single Sample | Batch (m samples) |
|-----------|--------|---------------|-------------------|
| **Input** | `x` or `X` | `(n,)` vector | `(m, n)` matrix |
| **Weight** | `w` | `(n,)` vector | `(n,)` vector (unchanged) |
| **Bias** | `b` | `(1,)` scalar | `(1,)` scalar (unchanged) |
| **Output** | `ŷ` or `Ŷ` | `(1,)` scalar | `(m,)` vector |