## Recurrent Neural Networks

#### Limitation of Vanila Neural Networks
- Only accept or produce a fixed-sized vector as input
- Map inputs and outputs using a fixed amount of computational step (e.g. # of layers)

#### RNN accept sequences as inputs or outputs
 - 그리고 생각보다 많은 데이터들이 sequence 형태로 사용될 수 있다.

 ### RNN computation process

![image.png](attachment:image.png)

- RNN's output vector’s contents are influenced not only by the input you just fed in, 
but also on the entire history of inputs you’ve fed in in the past.

<code>class RNN:
    def step(self, x):
        self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x))  
        <comment># (tanh: non-linearity function, range: [-1, 1]) </comment>
        y = np.dot(self.W_hy, self.h)
        return y

rnn = RNN()
y = rnn.step(x)</code>

- step() 이 호출될 때 마다 state update 

- rnn을 여러 개 쌓을 수도 있음

    y1 = rnn1.step(x) # receive input vectors
    y = rnn2.step(y1) # receive output of the first RNN

![image.png](attachment:image.png) 

 - h, e, l, l이라는 입력이 들어가면 p(char|'h'), p(char|'he'), p(char|'hel'), p(char|'hell')에 대한 다음 글자의 확률 학습 

 #### The problem of Long-Term Dependencies

- 과거의 정보가 현재와 많이 동떨어진 경우 (Long-Term), 그 정보를 보존하기 어려움 -> LSTM, GRU가 해결

### LSTM (Long Short Term Memory networks) 

![image.png](attachment:image.png)

 ![image.png](attachment:image.png)

 ![image.png](attachment:image.png)
 
 - 1st step 
  - A sigmoid laryer called the "forget gate layer" decides what information we're to throw away from the previous cell state (C_t-1)

 ![image.png](attachment:image.png)
 
 - 2nd step
  1. A sigmoid layer called the “input gate layer” decides which values we’ll update. (i_t)
  2.  A tanh layer creates a vector of new candidate values, ~C_t, that could be added to the state

 ![image.png](attachment:image.png)
 
 - 3rd step
  - Update the cell state C_t-1, into the new cell state C_t
  - f_t * C_t-1: 이전 cell state에서 남길 부분
  - i_t * ~C_t: 새로운 상태에 추가할 부분

 ![image.png](attachment:image.png)
 
 - final step
  - Decide what to output based on new cell state C_t

 ### GRU
 
 - Variation of LSTM
 - It combines the forget and input gates into a single "update gate"
 - Simpler than standard LSTM models
 
 ![image.png](attachment:image.png)

 ### Transformer
 
 - Latest RNN model
 
 - **[Attention Is All You Need](https://arxiv.org/pdf/1706.03762.pdf)**
 ![image.png](attachment:image.png)