## 빅데이터 활용 AI 설계
# 역전파 알고리즘

In [1]:
import numpy as np
import matplotlib.pyplot as plt

### 기울기 계산 일반화

$$ u_i = \sum_j (w_j \cdot x_{ij}) + b \\
{\partial Loss \over \partial w_j} = -{2 \over N} \sum_i (y_i - \hat{y}_i) \cdot f'(u_i) \cdot  x_{ij} \\
{\partial Loss \over \partial b} = -{2 \over N} \sum_i (y_i - \hat{y_i}) \cdot f'(u_i) $$

### 역전파

<img src='역전파.jpg' />

### 역전파 알고리즘들
- https://en.wikipedia.org/wiki/Stochastic_gradient_descent

#### SGD (Stochastic Gradient Descent)
- 'Stochastic' 은 랜덤하게 샘플을 뽑는다는 의미임
- 기본적인 경사하강법을 샘플에 적용함
- 추가적으로 모멘텀 적용 가능
> $ W(t) = W(t-1) - \eta \cdot \nabla_W Loss + \alpha \cdot \Delta W(t-1) $

<img src='https://tensorflowkorea.files.wordpress.com/2017/03/ec8aa4ed81aceba6b0ec83b7-2017-03-21-ec98a4ed9b84-3-22-52.png?w=625' />

#### Adagrad
- 각 가중치(w) 마다 학습률을 다르게 설정
- 변화가 많았던 가중치는 적게, 변화가 적었던 가중치는 많게 학습률을 적용한다.
> $ W(t) = W(t-1) - { \eta \over \sqrt{\sum \nabla Loss} } \cdot \nabla_W Loss $

#### RMSprop
- Adagrad 의 단점을 개선한 알고리즘
- 학습이 진행될수록 학습률이 너무 작아지는 경향을 보정<br>
> <img src='https://wikimedia.org/api/rest_v1/media/math/render/svg/2964cc8dc82a134dd4f20e42094f56410b0d2d9c' />
> <img src='https://wikimedia.org/api/rest_v1/media/math/render/svg/fc46ae8619e71130c6c8212eec31560cb4891c0a' />

#### Adam
- 모멘텀과 RMSprop 의 장점을 결합<br>
> <img src='https://wikimedia.org/api/rest_v1/media/math/render/svg/e388b9155519b8769930b3764f4dadc20eb593b8' />
> <img src='https://wikimedia.org/api/rest_v1/media/math/render/svg/034d5652b502094ab7f58f95a383e0ec41de5b77' />
> <img src='https://wikimedia.org/api/rest_v1/media/math/render/svg/1625bff4ce904cc83c3cadad4bc1a2ff61422b02' />
> <img src='https://wikimedia.org/api/rest_v1/media/math/render/svg/7c5ea1207fc3574a51d439f84370a989deffa871' />
> <img src='https://wikimedia.org/api/rest_v1/media/math/render/svg/abcd4c729bac933249992e086fa1ba7807e1cd09' />

<img src='https://t1.daumcdn.net/cfile/tistory/99399C355AA816740F' />
(출처: https://kolikim.tistory.com/47)

In [6]:
from keras.optimizers import SGD, Adagrad, RMSprop, Adam

In [7]:
help(SGD)

Help on class SGD in module keras.optimizers:

class SGD(Optimizer)
 |  Stochastic gradient descent optimizer.
 |  
 |  Includes support for momentum,
 |  learning rate decay, and Nesterov momentum.
 |  
 |  # Arguments
 |      lr: float >= 0. Learning rate.
 |      momentum: float >= 0. Parameter that accelerates SGD
 |          in the relevant direction and dampens oscillations.
 |      decay: float >= 0. Learning rate decay over each update.
 |      nesterov: boolean. Whether to apply Nesterov momentum.
 |  
 |  Method resolution order:
 |      SGD
 |      Optimizer
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, lr=0.01, momentum=0.0, decay=0.0, nesterov=False, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  get_config(self)
 |  
 |  get_updates(self, loss, params)
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from Optimizer:
 |  
 |  get_gradients(self, 