```{contents}
```

## Mini-batch Stochastic Gradient Descent



* Combines **Gradient Descent (GD)** and **Stochastic Gradient Descent (SGD)**.
* Instead of using **all data points** (GD) or **one data point** (SGD), it uses a **small batch of data points** in each iteration.
* Introduces a **batch size** parameter.

---

### How it works

1. Split dataset into **batches** (e.g., batch size = 1000).
2. For each batch:

   * Forward propagate all batch samples.
   * Compute **batch loss** (e.g., MSE for regression):
     [
     \text{Cost} = \frac{1}{\text{batch size}}\sum_{i=1}^{\text{batch size}}(y_i - \hat{y}_i)^2
     ]
   * Backpropagate and **update weights** using the optimizer.
3. Repeat for all batches → **one epoch** is complete.
4. Repeat for multiple epochs until convergence.

**Example:**

* Dataset = 100,000 samples
* Batch size = 1,000
* Iterations per epoch = 100 (100,000 ÷ 1,000)

---

### Advantages

1. **Reduces noise** compared to SGD:

   * Updates are smoother because each batch averages the gradient over multiple samples.
2. **Faster convergence than SGD:**

   * More stable gradient updates.
3. **Efficient resource usage:**

   * Uses manageable memory and GPU load compared to full-batch GD.

---

### Disadvantages

1. **Noise still exists:**

   * Not completely smooth like full-batch GD.
2. **May still require time to converge:**

   * Especially for very large datasets or small batch sizes.

---

**Summary of Optimizer Evolution**

| Optimizer      | Data per update | Convergence | Noise   | Resource usage |
| -------------- | --------------- | ----------- | ------- | -------------- |
| GD             | All data        | Smooth      | Low     | High           |
| SGD            | 1 sample        | Slow        | High    | Low            |
| Mini-batch SGD | Small batch     | Moderate    | Reduced | Moderate       |

* **Mini-batch SGD** balances **speed, stability, and memory efficiency**.
* Next step: **SGD with momentum** → further reduces noise and smoothens convergence.

