# Weight Initialization(가중치 초기화)
<img src = 'https://cdn.analyticsvidhya.com/wp-content/uploads/2021/05/291611_dmRbfOye2PcDMl2-bQazVg.jpeg'></src>


## - In Neural Network learning, neural network differ depending on intial weights
    - 신경망 학습에서 가중치 초깃값을 어떻게 정하느냐에 따라서 신경망 학습이 달라집니다.

# Content

## - Random Initialization(랜덤 초기화)

## - Xavier Initialization(사비에르 초기화)

## - He Initialization(He 초기화)


In [21]:
import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def Relu(x):
    return np.maximum(0, x)

def tanh(x):
    return np.tanh(x)

def Weight_Initialization(Initialization, activate = 'sigmoid', deviation = 1):
    input_data = np.random.randn(1000, 100)
    node_num = 100
    hidden_layer = 5
    activations = {}
    
    x = input_data
    
    for i in range(hidden_layer):
        if i != 0:
            x = activations[i-1]
        
        if Initialization == 'Random':
            w = np.random.randn(node_num, node_num) * deviation
        elif Initialization == 'Xavier':
            w = np.random.randn(node_num, node_num) * np.sqrt(1/node_num)
        elif Initialization == 'He':
            w = np.random.randn(node_num, node_num) * np.sqrt(2/node_num)
        
        a = np.dot(x, w)
        
        if activate == 'sigmoid':
            z = sigmoid(a)
        elif activate == 'Relu':
            z = Relu(a)
        elif activate == 'tanh':
            z = tanh(a)
        
        activations[i] = z
            
    
    for i, a in activations.items():
        plt.subplot(1, len(activations), i+1)
        plt.title(str(i + 1) + '-layer')
        if i != 0:
            plt.yticks([], [])
        if Initialization == 'He':
            plt.ylim(0, 7000)
        plt.xlim(0.1, 1.0)
        plt.hist(a.flatten(), 30, range=(0, 1))
    
    plt.show()


# Random Initialization(랜덤 초기화)
<img src = 'https://www.researchgate.net/profile/Aleksandra-Vuckovic/publication/3978633/figure/fig2/AS:394699887136770@1471115194381/Feed-forward-neural-network-with-sigmoid-activation-function-X-i-i-1P-input.png'></src>

##### ENG
- Best practices recommend using a random set, with initial bias of zero.

- we need to break the symmetry, It makes each neuron perform a different computation.

- if symmetry condition, training can be severely penalized or even impossible.

* Disadvantage
    - if deviation == 1, Due to Vanishing Gradient Problem, Neural Network doesn't learning.
    - if deviation == 0.01, The expression of the Activation value distribution is limited.


##### KOR
- 최고의 학습방법은 초기 편향인 0인 random set을 활용하는 것입니다,
- 각 뉴련이 다르게 계산을 해야 하기 때문에, 대칭성을 깨야 합니다.
- 만약 대칭성을 가지게 되는 경우, 학습이 불가능하거나 이상할 것입니다.

* 단점
    - 표준편차가 1인 경우, 기울기 소실 문제가 발생해, 학습이 제대로 실행되지 않는다.
    - 표준편차가 0.01인 경우, 표현력이 제한됩니다.

reference link : https://www.coursera.org/lecture/machine-learning/random-initialization-drcBh

In [11]:
# Random Initialization
Weight_Initialization(Initialization='Random', activate='sigmoid')
# Deviation = 0.01
Weight_Initialization(Initialization='Random', activate='sigmoid', deviation=0.01)

# Xavier Initialization(사비에르 초기화)
<img src = 'https://blog.kakaocdn.net/dn/bQn1My/btqB1cEniE0/to4hTdl9SzGF6zuv9MsIO1/img.png'></src>
##### ENG
- It's called Xavier Initilization or Glorot Initilization
- Initialize weights considering node_input and node_output
- It's often used tanh activation function
- it's preserve the backpropagated signal as well

<b>Disadvantage</b>
   - It doesn't working well in Relu activation function.


##### KOR
- Xavier Initilization 또는 Glorot Initilization라고 불립니다.
- 입력 노드의 개수 와 출력 노드의 개수를 고려하여, 가중치를 초기화합니다.
- 활성화 함수로 Tanh를 주로 사용합니다.
- 오차역전파를 보전합니다.

<b>단점</b>
- Relu 활성화 함수에서 잘 작동하지 않습니다.

reference link : https://prateekvishnu.medium.com/xavier-and-he-normal-he-et-al-initialization-8e3d7a087528

In [23]:
# Xavier Initialization
Weight_Initialization(Initialization='Xavier', activate='sigmoid')
# activation Relu
Weight_Initialization(Initialization='Xavier', activate='Relu')

# He Initialization(He 초기화)
<img src = 'https://mblogthumb-phinf.pstatic.net/MjAxOTA3MzBfNjQg/MDAxNTY0NDcwNTU1MDg2.5-fh3tQMhy_9dBHH2URK2w1IUxoemhGGvi2LB5DQJ5kg.qSi4AGif9AqyjjQudKYs7DGzTKVxUuvSxF_AcStEDo0g.PNG.sohyunst/SE-b8654ce8-25af-4ff8-9c47-3f2fd5b2a884.png?type=w800'></src>
##### ENG
- It's used Relu activation function
- He Initialization is one of the methods you can choose to bring the variance of those outputs to approximately one
- Bias disappears in Relu activation function

##### KOR
- 주로 Relu activation function을 사용합니다.
- He Intialization은 출력의 분산값을 대략 1로 만듭니다.
- Relu activation function 에서 편향이 사라집니다.

In [22]:
# He Initialization
Weight_Initialization(Initialization='He', activate='Relu')