# RNN (Recurrent Neural Network)

## I. 모델 이해

#### (1) 기본 구조 - Shared Weights

<img src="../../shared/RNN_struct-shared.png" alt="Drawing" style="width: 800px;" align="left"/>

#### (2) 목적에 따른 구조 선정

<img src="../../shared/RNN_struct2.png" alt="Drawing" style="width: 800px;" align="left"/>

(1) One to Many - ex) Image Captioning <br>
(2) Many to One - ex) Sentiment Analysis <br>
(3) Many to Many 1 - ex) Translation <br>
(4) Many to Many 2 - ex) Language Modeling

#### (3) 문제 - Gradient Vanishing and Exploding

<img src="../../shared/RNN_gradient-combined.png" alt="Drawing" style="width: 1000px;" align="left"/>

#### (4) 해결 - 학습 방법 변형 - LSTM / GRU / ...

* LSTM

i. Key 개념: Cell State 와 이를 조절하는 Gate

<img src="../../shared/RNN_base-LSTM.png" alt="Drawing" style="width: 1000px;" align="left"/>

* Cell State: '해당' 시간까지의 정보를 흘려주는 "벨브"

<img src="../../shared/RNN_cell_state.png" alt="Drawing" style="width: 800px;" align="left"/>

ii. 동작 순서

(1) Forget Gate Layer - 현재까지의 시간에 따른 정보 중 어떤 정보를 **버릴것**인지를 결정 <br>
예시) 이전까지 '그'에 대한 내용이 나왔고, 현재 '그녀'에 대한 내용이 나온다면 지금까지의 성별 정보 '그'를 **버리고** 새로운 성별 정보 '그녀'를 기억
* Sigmoid 활성함수

<img src="../../shared/RNN_LSTM1.png" alt="Drawing" style="width: 800px;" align="left"/>

(2) Input Gate Layer - 현재 정보 중 어떤 정보를 **가질것**인지를 결정: $i_t * \tilde{C_t}$ ($i_t$: 현재에서의 정보($\tilde{C_t}$)를 얼마나 **가질것**인지 )

<img src="../../shared/RNN_LSTM2.png" alt="Drawing" style="width: 800px;" align="left"/>

(3) Cell State Update Layer - 과거 state $C_{t-1}$를 업데이트하여 현재 state $C_{t}$를 만드는 것

<img src="../../shared/RNN_LSTM3.png" alt="Drawing" style="width: 800px;" align="left"/>

(4) Output Gate layer - 현재 시간까지에 대한 Output 결과를 내보내는 것

<img src="../../shared/RNN_LSTM4.png" alt="Drawing" style="width: 800px;" align="left"/>

#### (5) Back Propagation Through Time (BPTT)

cf) [모두의 연구소 - 이찬우](https://www.youtube.com/watch?v=4jgHzgxBnGY)

<hr>

## II. 입력 데이터 이해 (Embedding)

#### 1. Character Level Embedding

<img src="../../shared/Embedding_char-combined.png" alt="Drawing" style="width: 1000px;" align="left"/>

#### 2. Word Level Embedding

(1) One-hot

<img src="../../shared/Embedding_word-combined.png" alt="Drawing" style="width: 1000px;" align="left"/>

(2) Word2Vec [[링크](http://turbomaze.github.io/word2vecjson/)]

<img src="../../shared/word2vec.png" alt="Drawing" style="width: 600px;" align="left"/>

<hr>

## III. MNIST 를 통한 예시

#### (0) Define Hyper-parameters / Helper Function

In [1]:
import torch
import os

In [2]:
# Device Configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyper-parameters
sequence_length = 28
input_size = 28
hidden_size = 128
num_layers = 2
num_classes = 10
batch_size = 100
num_epochs = 2
learning_rate = 0.01

#### (1) Load Data

In [3]:
import torchvision # To Download MNIST Datasets from Torch 
import torchvision.transforms as transforms # To Transform MNIST "Images" to "Tensor"

In [4]:
train_dataset = torchvision.datasets.MNIST(root='./datasets/',
                                           train=True, 
                                           transform=transforms.ToTensor(),
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='./datasets/',
                                          train=False, 
                                          transform=transforms.ToTensor())

#### (2) Define Dataloader

In [5]:
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)

In [6]:
# cf) check how data_loader works
image, label = next(iter(train_loader))
print(image.size(), ": [Batch, Channel, Height, Width] Respectively")

torch.Size([100, 1, 28, 28]) : [Batch, Channel, Height, Width] Respectively


#### (3) Define Model

In [7]:
import torch.nn as nn
import torch.nn.functional as F

In [8]:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)
    
    def forward(self, x):
        # Set initial hidden and cell states 
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device) # torch.Size([2, 100, 128]
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device) # torch.Size([2, 100, 128]

        # Forward propagate LSTM
        out, _ = self.lstm(x, (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_size)
        # Decode the hidden state of the last time step
        out = self.fc(out[:, -1, :])
        return out
    
model = RNN(input_size, hidden_size, num_layers, num_classes).to(device)

cf) [pytorch official nn.lstm](https://pytorch.org/docs/stable/nn.html#lstm)

#### (4) Set Loss & Optimizer

In [9]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

#### (5) Train / Test

In [10]:
# Load 'save_image' Function
from torchvision.utils import save_image

In [11]:
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.reshape(-1, sequence_length, input_size).to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
            
# Test the model
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, sequence_length, input_size).to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total)) 

# Save the model checkpoint
torch.save(model.state_dict(), 'model.ckpt')

Epoch [1/2], Step [100/600], Loss: 0.7913
Epoch [1/2], Step [200/600], Loss: 0.2830
Epoch [1/2], Step [300/600], Loss: 0.2365
Epoch [1/2], Step [400/600], Loss: 0.1481
Epoch [1/2], Step [500/600], Loss: 0.3656
Epoch [1/2], Step [600/600], Loss: 0.1510
Epoch [2/2], Step [100/600], Loss: 0.1074
Epoch [2/2], Step [200/600], Loss: 0.1880
Epoch [2/2], Step [300/600], Loss: 0.1019
Epoch [2/2], Step [400/600], Loss: 0.1774
Epoch [2/2], Step [500/600], Loss: 0.1297
Epoch [2/2], Step [600/600], Loss: 0.0489
Test Accuracy of the model on the 10000 test images: 97.6 %


<hr>

## reference

* LSTM - [[KR](https://dgkim5360.tistory.com/entry/understanding-long-short-term-memory-lstm-kr)] | [[EN](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)]<br>
* [LSTM 변형](https://excelsior-cjh.tistory.com/185)<br>
* [쉽게 쓰여진 word2vec](https://dreamgonfly.github.io/machine/learning,/natural/language/processing/2017/08/16/word2vec_explained.html)
* RNN - Sung Kim [[Theory](https://www.youtube.com/watch?v=ogZi5oIo4fI)] | [[With Code](https://www.youtube.com/watch?v=1vGOQAel2yU)]