## Initialization

Initialization parameters properly is very important for good convergence of ANNs. There are various schemes in literatture to do so.
In this notebook we will implement some of the initialiation techniques in literature. 


### He et. al. initialization
https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/He_Delving_Deep_into_ICCV_2015_paper.pdf

In He Initialization we initialize weight parameters from a normal distribution with stdev of $\sqrt{\frac{2}{\text{dimension of the previous layer}}}$ for ReLU units.

Lets implement He Initialization below.

In [1]:
import tensorflow as tf
import numpy as np

In [2]:
initializer = tf.keras.initializers.HeNormal(seed=42)
values = initializer(shape=(1000, 1000))

In [3]:
#Lets check the stdev of the values
print(f"Standard deviation of values={np.std(values):.4} Expected Standard deviation={np.sqrt(2/1000.0):.4}")

Standard deviation of values=0.04475 Expected Standard deviation=0.04472


For initialzing parameters from uniform distribution, He e.t. al. recommend drawing parameters from [-limit, limit], where limit = $\sqrt{\frac{6}{\text{dimension of the previous layer}}}$.

In [4]:
initializer = tf.keras.initializers.HeUniform(seed=42)
values = initializer(shape=(1000, 1000))

In [5]:
#Lets check min and max
print(f"Min of values={np.min(values):.4},max of values={np.max(values):.4},expected min of values={-np.sqrt(6/1000):.4},expected max of values={np.sqrt(6/1000.0):.4}")

Min of values=-0.07746,max of values=0.07746,expected min of values=-0.07746,expected max of values=0.07746


### Xavier Initialization

**Xavier Initialization** is an older method or initializing parameters.

http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf

He recommends initialzing weights with stdev = $\sqrt{\frac{1}{\text{dimension of the previous layer}}}$

In [6]:
initializer = tf.keras.initializers.GlorotNormal()
values = initializer(shape=(1000, 1000))

In [7]:
#Lets check the stdev of the values
print(f"Standard deviation of values={np.std(values):.4} Expected Standard deviation={np.sqrt(1/1000.0):.4}")

Standard deviation of values=0.03161 Expected Standard deviation=0.03162


**Xavier Initialization** for uniform random variables is within the [-limit,limit] where limit = $\sqrt{\frac{6}{\text{dimension of previous layer} + \text{dimension of current layer}}}$

In [8]:
initializer = tf.keras.initializers.GlorotUniform()
values = initializer(shape=(1000, 2000))

In [9]:
#Lets check min and max
print(f"Min of values={np.min(values):.4},max of values={np.max(values):.4},expected min of values={-np.sqrt(6/3000):.4},expected max of values={np.sqrt(6/3000.0):.4}")

Min of values=-0.04472,max of values=0.04472,expected min of values=-0.04472,expected max of values=0.04472


### Everythings works as expected