```{contents}
```


## Dropout Layer

### **Problem: Overfitting**

* Occurs when a model performs well on **training data** but poorly on **test data**.

  * Example: Training accuracy = 90%, Test accuracy = 60%.
* Overfitting happens because the network memorizes patterns in training data, losing generalization.

---

### **Analogy with Random Forest**

* Random forest reduces overfitting by:

  1. **Feature sampling** → each tree sees only a subset of features.
  2. **Row sampling** → each tree sees only a subset of data points.
* Similarly, in neural networks, **dropout** randomly “removes” neurons during training.

---

### **Dropout in Neural Networks**

1. During **training**:

   * Each neuron is **randomly deactivated** with a probability (p).
   * Example: If (p = 0.5) and a layer has 4 neurons → 2 neurons are randomly turned off.
   * This happens **independently in each forward pass** (each batch/epoch).
   * Purpose: prevents co-adaptation of neurons → forces network to learn **redundant, robust representations**.

2. During **backpropagation**:

   * Deactivated neurons **do not participate** in weight updates.
   * Active neurons update as usual.

---

### **During Testing / Inference**

* Dropout is **not applied**.
* To account for missing neurons during training, the weights are **scaled by the dropout probability** (p).

  * Example: If (p = 0.5), all trained weights are multiplied by 0.5.
* Ensures outputs are consistent with the expectations of the trained network.

---

### **Key Points**

* Dropout probability (p) is a **hyperparameter** (typically 0.2–0.5).
* Reduces **overfitting** and improves **generalization**.
* Can be applied to:

  * Input layer → feature sampling.
  * Hidden layers → neuron sampling.

---

**Summary**

| Phase             | Dropout Applied? | Weight Adjustment                       |
| ----------------- | ---------------- | --------------------------------------- |
| Training          | Yes              | No adjustment, just deactivate neurons  |
| Testing/Inference | No               | Multiply weights by dropout probability |

---

Dropout works like “ensemble learning within a single network” — each forward pass sees a slightly different network, preventing over-reliance on any single neuron.

