# Lec_20_RNN_with IMDB dataset


<font size=5><b><b></font>
<div align='right'> Hoe Sung Ryu ( 류 회 성 ) </div>
<div align='right'> Minsuk Sung ( 성 민 석) </div>
    
    
    
> Author: Hoe Sung Ryu, Minsuk Sung  <p>
> Tel: 010-6636-7275 / skainf23@gamil.com // 010-5134-3621 / mssung94@gmail.com  <p>
> 본 내용은 파이토치를 활용한 딥러닝 과외 자료입니다. 본 내용을 제작자의 동의없이 무단으로 복제하는 행위는 금합니다.
    

---

Syllabus
    
|Event Type|Date|Topic|
|--:|:---:|:---|
|1 |July 27| Environment setting and Python basic|
|2 |July 28| Pytorch basic and Custom Data load |
|3 |July 29| Traditional Machine Learning(1) |
|4 |July 30| Traditional Machine Learning(2) |
|5 |July 31| CNN(Convolutional Neural Network)(1)  |
|6 |Aug 03| CNN(Convolutional NeuralNetwork)(2) |
|7 |Aug 04|  RNN(Recurrent Neural Networks)(1) |
|8 |Aug 05|  RNN(Recurrent Neural Networks)(2) |
|9 |Aug 06|  Transfer learning(VGG pertained on ImageNEt for CIfar-10)| 
|10|Aug 07|**Mini_Kaggle**: Facial Expression Recognition on `AffectNet` | 
|11|Aug 08|`Awards` and `Closing`| 


<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#What-is-Recurrent-Neural-Network-(RNN)?" data-toc-modified-id="What-is-Recurrent-Neural-Network-(RNN)?-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>What is Recurrent Neural Network (RNN)?</a></span></li><li><span><a href="#Training" data-toc-modified-id="Training-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Training</a></span></li></ul></div>

## What is Recurrent Neural Network (RNN)?

Recurrent Neural Network is a generalization of feedforward neural network that has an internal memory. 

RNN is recurrent in nature as it performs the same function for every input of data while the output of the current input depends on the past one computation. After producing the output, it is copied and sent back into the recurrent network. For making a decision, it considers the current input and the output that it has learned from the previous input.

Unlike feedforward neural networks, RNNs can use their internal state (memory) to process sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. In other neural networks, all the inputs are independent of each other. But in RNN, all the inputs are related to each other.


<img src=https://miro.medium.com/max/1254/1*go8PHsPNbbV6qRiwpUQ5BQ.png>




1st, The formula for the current state is:
$$h_t = f(h_{t-1},x_t)$$

2nd, appliying `ActivationFunction`:
$$h_t = tanh(W_{hh}h_{t-1}+W_{xh}{x_t})$$
where **W** is weight, **h** is the single hidden vector, **$W_{hh}$** is the weight at previous hidden state, **$W_{xh}$** is the weight at current input state, **tanh** is the Activation funtion, that implements a Non-linearity that squashes the activations to the range[-1.1]

3rd, output is:
$$y_t = W_{hy}h_t$$
**$Y_t$** is the output state. **$W_{hy}$** is the weight at the output state.

In [18]:
import torch 
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms


# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyper-parameters
sequence_length = 28
input_size = 28
hidden_size = 128

num_layers = 2
num_classes = 10
batch_size = 64

num_epochs = 2
learning_rate = 0.01

In [19]:
# MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='./data/',
                                           train=True, 
                                           transform=transforms.ToTensor(),
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='./data/',
                                          train=False, 
                                          transform=transforms.ToTensor())

# Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)

In [24]:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)  # Batchsize(N) x time_seq x features 
        self.fc = nn.Linear(hidden_size*sequence_length, num_classes)
        
        
    def forward(self,x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
        
        # forward prop. 
        out, _ = self.rnn(x, h0)
        out = out.reshape(out.shape[0],-1)
        out = self.fc(out)
        return out
        

In [25]:
model = RNN(input_size, hidden_size, num_layers, num_classes).to(device)


# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In [30]:
# Train the model
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
#         print(images.shape)
        images = images.reshape(-1, sequence_length, input_size).to(device)
        labels = labels.to(device)
#         print(images.shape)
#         break
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

# Test the model
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, sequence_length, input_size).to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total)) 

Epoch [1/2], Step [100/938], Loss: 9.3489
Epoch [1/2], Step [200/938], Loss: 12.2893
Epoch [1/2], Step [300/938], Loss: 8.0958
Epoch [1/2], Step [400/938], Loss: 19.3385
Epoch [1/2], Step [500/938], Loss: 13.4461
Epoch [1/2], Step [600/938], Loss: 10.1456


KeyboardInterrupt: 

In [None]:
# Save the model checkpoint
# torch.save(model.state_dict(), 'model.ckpt')

In [None]:
# Recurrent neural network (many-to-one)
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)
    
    def forward(self, x):
        # Set initial hidden and cell states 
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device) 
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
        
        # Forward propagate LSTM
        out, _ = self.lstm(x, (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_size)
        
        # Decode the hidden state of the last time step
        out = self.fc(out[:, -1, :])
        return out

In [2]:
model = RNN(input_size, hidden_size, num_layers, num_classes).to(device)


# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Train the model
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.reshape(-1, sequence_length, input_size).to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

# Test the model
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, sequence_length, input_size).to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total)) 

Epoch [1/2], Step [100/600], Loss: 1.1935
Epoch [1/2], Step [200/600], Loss: 0.3043
Epoch [1/2], Step [300/600], Loss: 0.3283


KeyboardInterrupt: 

In [34]:
input_size = 28
batch_size = 64 
sequence_length = 28
hidden_size = 256
num_classes = 10 
learning_rate= 0.001

num_layers = 2


In [46]:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)  # Batchsize(N) x time_seq x features 
        self.fc = nn.Linear(hidden_size*sequence_length, num_classes)
        
        
    def forward(self,x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
        
        # forward prop. 
        out, _ = self.rnn(x, h0)
        out = out.view(out.shape[0],1)
        out = self.fc(out)
        return out
        

In [47]:
model = RNN(input_size, hidden_size, num_layers, num_classes=num_classes).to(device)

In [48]:
total_params = sum(p.numel() for p in model.parameters())
print("Num of Total Parameter : ",total_params)
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print("Num of Trainable Parameter :",trainable_params)

Num of Total Parameter :  276490
Num of Trainable Parameter : 276490


In [45]:
### Set Loss and Optimizer 
import torch.nn as nn
loss_fn = nn.CrossEntropyLoss() 

import torch.optim as optim
optimizer = optim.Adam(model.parameters(),lr=0.01) # 가중치를 학습하기 위한 최적화 기법 선택

## Training


In [46]:
def train(epoch):
    model.train()# 학습일때는 학습모드로 설정
    # DataLoader에서 미니배치로 하나씩 꺼내 학습을 수행
    # batch_size 만큼의 데이터를 가져오는데 전체 데이터가 모두 학습되면 1 epoch
    for data,targets in train_loader:
#         print(data.shape)
#         data = data.view(data.shape[0], -1)
        data = data.squeeze(1)
#         print(data.shape)
        optimizer.zero_grad() # 학습할 때는 매번 그래디언트를 0으로 초기화
        outputs = model(data) # 데이터를 입력하고 출력을 계산
        loss = loss_fn(outputs,targets) # 출력과 학습 데이터의 정답 간의 오차를 계산
        loss.backward() # 오차를 역전파하여 계산함
        optimizer.step() # 역전파 계산한 값으로 가중치를 수정
        
    print(f'[TRAIN] Epoch {epoch} \t Loss: {loss.item():1.5f}',end=' ')

In [49]:
def test(epoch):
    model.eval() # 추론할때는 추론모드로! Dropout이나 Batch-Norm과 같은 기법에선 특히!
    correct = 0
    
    # DataLoader에서 batch_size만큼씩 꺼내서 추론을 수행
    with torch.no_grad(): # 추론할 때는 미분이 필요하지 않기 때문에! backpropagatin이나 gradient 계산 등을 꺼서 memory usage를 줄이고 속도를 높임
        for data, targets in test_loader:
            data = data.to(device).squeeze(1)
            outputs = model(data)
            _, predicted = torch.max(outputs.data,1) # 확률이 가장 높은 레이블이 무엇인지 계산
            correct += predicted.eq(targets.data.view_as(predicted)).sum() # 정답한 일치한 경우 정답 카운트 증가

    # 정확도 출력
    data_num = len(test_loader.dataset) # 데이터 총 건수
    print(f'| [TEST] Epoch {epoch} \t Accuracy: {correct}/{data_num} ({100.*correct/data_num :3.5f}%)')

In [50]:
Epochs = 3 

for epoch in range(Epochs):
    train(epoch+1)
    test(epoch+1)

[TRAIN] Epoch 1 	 Loss: 14.58662 [TEST] Epoch 1 	 Accuracy: 892/10000 (8.92000%)
[TRAIN] Epoch 2 	 Loss: 11.38239 [TEST] Epoch 2 	 Accuracy: 980/10000 (9.80000%)
[TRAIN] Epoch 3 	 Loss: 11.85425 [TEST] Epoch 3 	 Accuracy: 1028/10000 (10.28000%)
