# **Why Weight Initialization Matters**


*   Poor initialization → slow training, vanishing/exploding gradients.
*   Good initialization → faster convergence, better accuracy.

# **What Happens with Zero Initialization?**



*   If all weights = 0 → all neurons learn the same thing (no symmetry breaking).

*   Demonstrate this with a simple MLP and training logs.

# **Types of Initialization Techniques**

| Initialization        | Formula / Use Case                                                                                                 | Notes                    |
| --------------------- | ------------------------------------------------------------------------------------------------------------------ | ------------------------ |
| **Random Uniform**    | $\mathcal{U}(-a, a)$                                                                                               | Basic, not optimal       |
| **Normal (Gaussian)** | $\mathcal{N}(0, \sigma^2)$                                                                                         | Used for experimentation |
| **Xavier (Glorot)**   | $\mathcal{N}(0, \frac{1}{n_{in}})$ or $\mathcal{U}(-\sqrt{6 / (n_{in} + n_{out})}, \sqrt{6 / (n_{in} + n_{out})})$ | Best for tanh            |
| **He Initialization** | $\mathcal{N}(0, \frac{2}{n_{in}})$                                                                                 | Best for ReLU, LeakyReLU |


**Python Code Comparison**

In [1]:
import numpy as np

# Xavier Initialization
def xavier_init(size):
    in_dim = size[0]
    xavier_stddev = np.sqrt(2.0 / (in_dim + size[1]))
    return np.random.randn(*size) * xavier_stddev

# He Initialization
def he_init(size):
    in_dim = size[0]
    return np.random.randn(*size) * np.sqrt(2.0 / in_dim)
