```{contents}
```


## Weight Initializing Techniques

### **Why Weight Initialization Matters**

1. **Small weights** → prevent exploding gradients.
2. **Different weights** → each neuron can learn different features; avoid symmetry.
3. **Good variance** → ensures effective learning across neurons.

---

### **1. Uniform Distribution Initialization**

* Weights are randomly initialized from a **uniform distribution** within a specific range:
  $$
  W_{ij} \sim U\left(-\frac{1}{\sqrt{\text{input}}}, \frac{1}{\sqrt{\text{input}}}\right)
  $$
* Example: If 3 input neurons, range is ($$-1/\sqrt{3}, 1/\sqrt{3}$$).
* Ensures weights are small and diverse.

---

### **2. Xavier / Glorot Initialization**

* Designed for **sigmoid/tanh activations**.
* Balances variance of input and output to avoid vanishing/exploding gradients.

**Two types:**

1. **Xavier Normal**
   $$
   W_{ij} \sim \mathcal{N}\left(0, \sigma^2\right), \quad \sigma^2 = \frac{2}{\text{input} + \text{output}}
   $$

2. **Xavier Uniform**
   $$
   W_{ij} \sim U\left(-\sqrt{\frac{6}{\text{input} + \text{output}}}, \sqrt{\frac{6}{\text{input} + \text{output}}}\right)
   $$

* Keeps variance of activations consistent across layers.

---

### **3. He / Kaiming Initialization**

* Designed for **ReLU and variants**.
* Takes into account that ReLU outputs are zero for half of inputs → needs slightly larger variance.

**Two types:**

1. **He Normal**
   $$
   W_{ij} \sim \mathcal{N}\left(0, \sigma^2\right), \quad \sigma^2 = \frac{2}{\text{input}}
   $$

2. **He Uniform**
   $$
   W_{ij} \sim U\left(-\sqrt{\frac{6}{\text{input}}}, \sqrt{\frac{6}{\text{input}}}\right)
   $$

* Prevents vanishing/exploding gradients with ReLU networks.

---

### **Summary**

| Technique       | Activation   | Distribution   | Range/Variance Formula                                       |
| --------------- | ------------ | -------------- | ------------------------------------------------------------ |
| Uniform         | Any          | Uniform        | ($$-1/√input, 1/√input$$)                                      |
| Xavier / Glorot | Sigmoid/Tanh | Normal/Uniform | Normal: σ² = 2/(input+output)<br>Uniform: ±√6/(input+output) |
| He / Kaiming    | ReLU         | Normal/Uniform | Normal: σ² = 2/input<br>Uniform: ±√6/input                   |

* **Rule of thumb:** Use Xavier for sigmoid/tanh, He for ReLU, and uniform if unsure.
* Modern frameworks (TensorFlow, PyTorch) implement these automatically.

