```{contents}
```

## Regularization

### ‚úÖ Why Regularization Is Needed

Neural networks often:

* Have **many parameters**
* Can **memorize** training data
* Perform well on training set but **fail on real-world data**

Regularization fixes this by **restricting or modifying the learning process**.

---

### ‚úÖ Main Regularization Techniques

#### ‚úÖ 1. L1 and L2 Regularization (Weight Penalties)

Add a penalty to the loss function so the model avoids large weights.

##### ‚úî L1 Regularization (Lasso)

* Adds absolute values of weights as penalty
* Encourages **sparse models** (some weights become zero)

[
Loss = Loss_{original} + \lambda \sum |w_i|
]

##### ‚úî L2 Regularization (Ridge / Weight Decay)

* Adds squared weights as penalty
* Keeps weights small and smooth

[
Loss = Loss_{original} + \lambda \sum w_i^2
]

üîπ Œª (lambda) controls the strength of regularization.

---

#### 2. Dropout

Randomly ‚Äúdrops‚Äù neurons during training (e.g., 20‚Äì50%).

‚úî Forces the network to not depend on specific neurons
‚úî Reduces co-adaptation
‚úî Acts like training multiple smaller networks simultaneously

---

#### 3. Early Stopping

Stop training when validation loss stops improving.

‚úî Prevents the model from memorizing training data
‚úî Saves time and resources

---

#### 4. Data Augmentation

Used mainly in Vision/NLP to make the dataset more diverse.

Examples:

* Rotate, flip, crop, add noise to images
* Change words, mask tokens in text

‚úî Helps model learn **robust features**, not specific samples.

---

#### 5. Batch Normalization (Indirect Regularizer)

Normalizes layer outputs.

‚úî Stabilizes training
‚úî Adds slight noise with mini-batches ‚Üí reduces overfitting

---

#### 6. Max Norm Constraints

Limits how large weights can grow.

‚úî Prevents exploding weights
‚úî Works well with dropout

---

#### 7. Label Smoothing

Instead of using hard labels like:

* Cat = 1, Dog = 0 ‚Üí use softer targets like 0.9 and 0.1

‚úî Avoids overconfident predictions
‚úî Helps generalization

---

**Summary Table**

| Technique         | Purpose               | Key Benefit            |
| ----------------- | --------------------- | ---------------------- |
| L1/L2 Penalties   | Limit weight size     | Simpler models         |
| Dropout           | Random neuron disable | Prevents co-adaptation |
| Early Stopping    | Stop before overfit   | Better generalization  |
| Data Augmentation | Add variation         | More robust learning   |
| Batch Norm        | Normalize activations | Stable, regularized    |
| Max Norm          | Cap weight values     | Avoid large weights    |
| Label Smoothing   | Soften labels         | Avoid overconfidence   |

