# He Initialization

It is specifically for neural networks that use the **ReLU** (Rectified Linear Unit) activation function. Its main goal is to prevent the dying ReLU problem by ensuring the variance of the outputs remains consistent across all layers during forward propagation.

**He Normal:**

$$std = \sqrt{\frac{2}{n_{input}}}$$

- He initialization sets the weights of a layer by sampling them from a normal distribution with a mean of $0$ and a standard deviation of $he$
$$ w = \mathscr{N}(0, std)$$
- Where {n_{input} is the number of neurons from the previous layer (input to the current layer).
- This scaling factor is crucial because it accounts for the nature of the ReLU function, which deactivates half of the neurons (those with negative outputs) by setting their output to zero.
- The reason for the value $2$ in the numerator is that it compensates for the "dying" half of the neurons, preserving the variance of the activations.

- Prevents Vanishing/Exploding Gradients: By maintaining a stable variance, He initialization ensures that the gradients during backpropagation do not shrink or grow uncontrollably.
- Theres also a He *Uniform* initialization variant.


In [1]:
import numpy as np

def He_init(n_in=9, n_out=1):

    std_dev = np.sqrt(2 / n_in)
    weights = np.random.randn(n_out, n_in) * std_dev

    return weights

In [2]:
weights = He_init(n_in=9, n_out=1)
print("He Weights:\n", weights)

He Weights:
 [[ 0.13198133  0.32603549  0.37430198  0.12575892  0.56812518  0.38189032
   0.64551315  0.13331553 -0.74455469]]
