# 8. Pytorch로 구현하는 재귀신경망(Recurrent Neural Network)

## 1. 재귀 신경망(Recurrent Neural Network)에 대해

### 1.1 순방향 신경망(FNN)에서 재귀 신경망(RNN)으로
- **RNN 은 근본적으로 FNN**

<img src = './images/10-01.png' width=90%>

<img src = './images/10-04.png' width=80%>

<img src = './images/10-05.png' width=80%>

<img src = './images/10-08.png' width=80%>

<img src = './images/10-11.png' width=80%>

<img src = './images/10-14.png' width=90%>

# 2. PyTorch를 통한 RNN 구현

## Model A: 1 Hidden Layer (ReLU)

- 28단계 펼치기
    - 각 단계에서 입력 크기: 28 x 1
    - 전체 펼치기: 28 x 28
        - 순방향 신경망 입력 크기: 28 x 28
    - 1 은닉층
    - ReLU 활성화 함수

<img src = './images/10-17.png' width=65%>
- ... 28 회

### Steps
- 1 단계: 데이터셋 로드
- 2 단계: 데이터셋 순환 가능하게 만들기
- 3 단계: 모델 클래스 생승
- 4 단계: 모델 클래스 인스턴스화
- 5 단계: 손실 클래스 인스턴스화
- 6 단계: 최적화 클래스 인스턴스화
- 7 단계: 모델 학습

### 1 단계: MNIST 학습 데이터셋 로드
#### 0에서 9까지의 이미지들

In [1]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

In [2]:
train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

In [4]:
print (train_dataset.train_data.size())

torch.Size([60000, 28, 28])


In [5]:
print (train_dataset.train_labels.size())

torch.Size([60000])


In [6]:
print (test_dataset.test_data.size())

torch.Size([10000, 28, 28])


In [7]:
print (test_dataset.test_labels.size())

torch.Size([10000])


### Step 2: 데이터셋 순환 가능하게 만들기

In [8]:
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

### Step 3: 모델 클래스 만들기

<img src="./images/10-20.png" width=70%>

In [8]:
class RNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(RNNModel, self).__init__()
        # 은닉 차원수 Hidden dimensions
        self.hidden_dim = hidden_dim
        
        # 은닉층들의 개수 Number of hidden layers
        self.layer_dim = layer_dim
        
        # Building your RNN
        # batch_first = True causes input/output ensors to be of shape
        # (batch_dim, seq_dim, input_dim)
        self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')

        
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        
        # Initialize hidden state with zeros
        # (layer_dim, batch_size, hidden_dim)
        h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))

        out, hn = self.rnn(x, h0)

        # Index hidden state of last time step
        # out.size() --> 100, 28, 100    (batch_size, hidden_size)
        # One forward would return 28 steps result 
        # out[:, -1, :] --> just want last time step hidden states!
        out = self.fc(out[:, -1, :])
        # out.size() -> 100, 10
        return out

### Step 4: Instantiate Model Class

- 28 time steps
    - Each time step: input dimension = 28
- 1 hidden layer
- MNIST 0-9 digits -> output dimension = 10

In [9]:
input_dim = 28
hidden_dim = 100
layer_dim = 1
output_dim = 10

In [10]:
model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)

### Step 5: Instantiate Loss Class
- Recurrent Neural Network: **Cross Entropy Loss**
    - *Convolutional Neural Network*: **Cross Entropy Loss**
    - *Feedforward Neural Network*: **Cross Entropy Loss**
    - *Logistic Regression*: **Cross Entropy Loss**
    - *Linear Regression*: **MSE**

In [11]:
criterion = nn.CrossEntropyLoss()

### Step 6: Instantiate Optimizer Class

- Simplified equation
    - $\theta = \theta - \eta \cdot \nabla_{\theta}$
        - $\theta$: parameters (our variables)
        - $\eta$: learning rate (how fast we want to learn)
        - $\nabla_{\theta}$: parameters' gradients
        
    - Even simplier equation
        - parameters = parameters - learning_rate * parameters_gradients
        - **At every iteration, we update our model's parameters**

In [12]:
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

### Parameters In-Depth

In [13]:
len(list(model.parameters()))

6

#### Parameters

- Input to Hidden Layer Linear Function
    - A1, B1
- Hidden Layer to Output Linear Function
    - A2, B2
- Hidden Layer to Hidden Layer Linear Function
    - A3, B3

<img src = './images/10-08.png' width=80%>

In [14]:
# Input --> Hidden (A1)
list (model.parameters())[0].size()

torch.Size([100, 28])

In [15]:
# Input --> Hidden BIAS (B1)
list (model.parameters())[2].size()

torch.Size([100])

In [16]:
# Hidden -> Hidden (A3)
list (model.parameters())[1].size()

torch.Size([100, 100])

In [17]:
# Hidden -> Hidden BIAS (B3)
list (model.parameters())[3].size()

torch.Size([100])

In [18]:
# Hidden --> Output (A2)
list (model.parameters())[4].size()

torch.Size([10, 100])

In [19]:
# Hidden --> Output BIAS (B2)
list (model.parameters())[5].size()

torch.Size([10])

### Step 7: Train Model

- Process
    1. **Convert inputs/labels to variables**
        - RNN Input: (1, 28)
        - CNN Input: (1, 28, 28)
        - FNN Input: (1, 28\*28)
    2. Clear gradient buffets
    3. Get output given inputs
    4. Get loss
    5. Get gradients w.r.t. parameters
    6. Update parameters using gradients
        - parameters = parameters - learning_rate * parameters_gradients
    7. REPEAT

In [20]:
# Number of steps to unroll
seq_dim = 28  

iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images ad Variable
        # resize our images to batch size
        images = Variable(images.view(-1, seq_dim, input_dim))
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        # outputs.size() --> 100, 10
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images.view(-1, seq_dim, input_dim))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
                
            accuracy = 100 * int(correct) / int(total)
            
            # Print Loss
            
            print (f'Iteration: {iter}, Loss: {loss.item()}, Accuracy: {accuracy}')

Iteration: 500, Loss: 1.0912423133850098, Accuracy: 55.66
Iteration: 1000, Loss: 0.7408096194267273, Accuracy: 77.45
Iteration: 1500, Loss: 0.6293580532073975, Accuracy: 80.3
Iteration: 2000, Loss: 0.2889735996723175, Accuracy: 88.31
Iteration: 2500, Loss: 0.17868612706661224, Accuracy: 93.1
Iteration: 3000, Loss: 0.25574353337287903, Accuracy: 92.77


## Model B: 2 Hidden Layer (ReLU)

- Unroll 28 time steps
    - Each step input size: 28 x 1
    - total per unroll: 28 x 28
        - Feedforward Neural Network input size: 28 x 28
- **2 Hidden layer**
- ReLU Activation Function

<img src = './images/10-24.png'>

### Steps
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- Step 3: Create Model Class
- **Step 4: Instantiate Model Class**
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

In [21]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data',
                            train=True,
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''

class RNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(RNNModel, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim
        
        # Number of hidden layers
        self.layer_dim = layer_dim
        
        # Building your RNN
        # batch_first=True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')
        
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))
        
        out, hn = self.rnn(x, h0)
        
        # Index hidden state of last time step
        # out.size() --> 100, 28, 100 -> just want last time step hidden states!
        # 28 time steps, -1 means the last time step
        out = self.fc(out[:, -1, :])
        # out.size() --> 100, 10
        return out
    
'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28
hidden_dim = 100
layer_dim = 2 # ONLY CHANGE IS HERE FROM ONE LAYER TO TWO LAYER
output_dim = 10

model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim )

# JUST PRINTING MODEL & PARAMETERS

print (model)
print (len(list(model.parameters())))
for i in range(len(list(model.parameters()))):
    print (list(model.parameters())[i].size())

'''
STEP 5: INSTANTIATE LOSS CLASS
'''

criterion = nn.CrossEntropyLoss()

'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN THE MODEL
'''

# Number of steps to unroll
seq_dim = 28

iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images ad Variable
        images = Variable(images.view(-1, seq_dim, input_dim))
        labels = Variable(labels)
        
        # Clear gradients w.r.t parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        # outputs.size() --> 100, 10
        outputs = model(images)
        
        # Calculate Loss: softmax --> Cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images.view(-1, seq_dim, input_dim))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
                
            accuracy = 100 * int(correct) / int(total)
            
            # Print Loss
            print (f'Iteration: {iter}, Loss: {loss.item()}, Accuracy: {accuracy}')

RNNModel(
  (rnn): RNN(28, 100, num_layers=2, batch_first=True)
  (fc): Linear(in_features=100, out_features=10, bias=True)
)
10
torch.Size([100, 28])
torch.Size([100, 100])
torch.Size([100])
torch.Size([100])
torch.Size([100, 100])
torch.Size([100, 100])
torch.Size([100])
torch.Size([100])
torch.Size([10, 100])
torch.Size([10])
Iteration: 500, Loss: 1.0414378643035889, Accuracy: 61.92
Iteration: 1000, Loss: 0.7887875437736511, Accuracy: 75.99
Iteration: 1500, Loss: 0.6028361916542053, Accuracy: 87.92
Iteration: 2000, Loss: 0.2331782877445221, Accuracy: 92.96
Iteration: 2500, Loss: 0.14795342087745667, Accuracy: 95.41
Iteration: 3000, Loss: 0.14736664295196533, Accuracy: 95.72


- **10 sets of parameterrs**
- First hidden Layer
    - $A_1$ = [100, 28]
    - $A_3$ = [100, 100]
    - $B_1$ = [100]
    - $B_3$ = [100]
- Second hidden Layer
    - $A_2$ = [100, 100]
    - $A_5$ = [100, 100]
    - $B_2$ = [100]
    - $B_5$ = [100]
- Readout layer
    - $A_5$ = [10, 100]
    - $B_2$ = [100]

<img src = './images/10-14.png'>

## Model C: 2 Hidden Layer
- Unroll 28 time steps
    - Each step input size: 28 x 1
    - Total per unroll: 28 x 28
        - Feedforward Neural Network input size: 28 x 28
- 2 Hidden Layer
- **Tanh** Activation Function

<img src = './images/10-11.png' width=80%>

### Steps
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- Step 3: Create Model Class
- **Step 4: Instantiate Model Class**
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

In [22]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data',
                            train=True,
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''

class RNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(RNNModel, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim
        
        # Number of hidden layers
        self.layer_dim = layer_dim
        
        # Building your RNN
        # batch_first = True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='tanh')
        
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))

        out, hn = self.rnn(x, h0)
        
        # Index hidden state of last time step
        # out.size() --> 100, 28, 100
        # out[:, -1, :] --> 100, 100 --> just want the last time step hidden states!
        out = self.fc(out[:, -1, :])
        # out.size() --> 100, 10
        return out
    
'''
STEP 4: INSTANTIATE MODEL CLASS
'''

input_dim = 28
hidden_dim = 100
layer_dim = 2 # ONLY CHANGE IS HERE FROM ONE LAYER TO TWO LAYER
output_dim = 10

model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)

# JUST PRINTING MODEL & PARAMETERS
print (model)
print (len(list(model.parameters())))
for i in range(len(list(model.parameters()))):
    print (list(model.parameters())[i].size())
    
'''
STEP 5: INSTANTIATE LOSS CLASS
'''

criterion = nn.CrossEntropyLoss()

'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN THE MODEL
'''

# Number of steps to unroll
seq_dim = 28

iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images ad Variable
        images = Variable(images.view(-1, seq_dim, input_dim))
        labels = Variable(labels)
        
        # Clear gradients w.r.t parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        # outputs.size() --> 100, 10
        outputs = model(images)
        
        # Calculate Loss: softmax --> Cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images.view(-1, seq_dim, input_dim))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
                
            accuracy = 100 * int(correct) / int(total)
            
            # Print Loss
            print (f'Iteration: {iter}, Loss: {loss.item()}, Accuracy: {accuracy}')

RNNModel(
  (rnn): RNN(28, 100, num_layers=2, batch_first=True)
  (fc): Linear(in_features=100, out_features=10, bias=True)
)
10
torch.Size([100, 28])
torch.Size([100, 100])
torch.Size([100])
torch.Size([100])
torch.Size([100, 100])
torch.Size([100, 100])
torch.Size([100])
torch.Size([100])
torch.Size([10, 100])
torch.Size([10])
Iteration: 500, Loss: 0.6221222281455994, Accuracy: 81.21
Iteration: 1000, Loss: 0.33941054344177246, Accuracy: 92.01
Iteration: 1500, Loss: 0.15524081885814667, Accuracy: 94.41
Iteration: 2000, Loss: 0.20867837965488434, Accuracy: 94.64
Iteration: 2500, Loss: 0.08282461017370224, Accuracy: 95.65
Iteration: 3000, Loss: 0.07223621755838394, Accuracy: 96.13


# Summary of Results

|Model A|Model B|Model C|
|-------|-------|-------|
|ReLU|ReLU|Tanh|
|1 Hidden Layer|2 Hidden Layers|2 Hidden Layers|
|100 Hidden Units|100 Hidden Units|100 Hidden Units|
|92.77%|95.72%|96.13%|

# Deep Learning
- 2 ways to expand a recurrent neural network
    - More non-linear activation units (neurons)
    - More hidden layers
- Cons
    - Need a larger dataset
        - Curse of dimensionality
    - Does not necessarily mean higher accuracy

# 3. Building a Recurrent Neural Network with PyTorch (GPU)

## Model C: 2 Hidden Layer (Tanh)

<img src = './images/10-11.png'>

GPU: 2 things must be on GPU
- model
- variables

### Steps
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- **Step 3: Create Model Class**
- **Step 4: Instantiate Model Class**
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- **Step 7: Train Model**

In [23]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''
train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''

class RNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(RNNModel, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim
        
        # Number of hidden layers
        self.layer_dim = layer_dim
        
        # Building your RNN
        # batch_first=True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='tanh')
        
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        # Initialize hidden state with zeros
        #######################
        #  USE GPU FOR MODEL  #
        #######################
        if torch.cuda.is_available():
            h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).cuda())
        else:
            h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))
            
        # One time step
        out, hn = self.rnn(x, h0)
        
        # Index hidden state of last time step
        # out.size() --> 100, 28, 100
        # out[:, -1, :] --> 100, 100 --> just want last time step hidden states! 
        out = self.fc(out[:, -1, :]) 
        # out.size() --> 100, 10
        return out

'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28
hidden_dim = 100
layer_dim = 2  # ONLY CHANGE IS HERE FROM ONE LAYER TO TWO LAYER
output_dim = 10

model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)

#######################
#  USE GPU FOR MODEL  #
#######################

if torch.cuda.is_available():
    model.cuda()
    
'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()

'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)  

'''
STEP 7: TRAIN THE MODEL
'''

# Number of steps to unroll
seq_dim = 28  

iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as Variable
        #######################
        #  USE GPU FOR MODEL  #
        #######################
        if torch.cuda.is_available():
            images = Variable(images.view(-1, seq_dim, input_dim).cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images.view(-1, seq_dim, input_dim))
            labels = Variable(labels)
            
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        # outputs.size() --> 100, 10
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                if torch.cuda.is_available():
                    images = Variable(images.view(-1, seq_dim, input_dim).cuda())
                else:
                    images = Variable(images.view(-1, seq_dim, input_dim))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                if torch.cuda.is_available():
                    correct += (predicted.cpu() == labels.cpu()).sum()
                else:
                    correct += (predicted == labels).sum()
            
            accuracy = 100 * int(correct) / int(total)
            
            # Print Loss
            print (f'Iteration: {iter}, Loss: {loss.item()}, Accuracy: {accuracy}')

Iteration: 500, Loss: 0.5127013921737671, Accuracy: 82.92
Iteration: 1000, Loss: 0.5317586660385132, Accuracy: 87.22
Iteration: 1500, Loss: 0.22399835288524628, Accuracy: 94.15
Iteration: 2000, Loss: 0.2254912406206131, Accuracy: 93.97
Iteration: 2500, Loss: 0.14875905215740204, Accuracy: 95.42
Iteration: 3000, Loss: 0.25676196813583374, Accuracy: 95.15


# Summary

- **Feedforward Neural Networks** Transition to Recurrent Neural Networks
- **RNN Models** in PyTorch
    - Model A: 1 Hidden Layer RNN (ReLU)
    - Model B: 2 Hidden Layer RNN (ReLU)
    - Model C: 2 Hidden Layer RNN (Tanh)
- Models Variation in **Code**
    - Modifying only step 4
- Ways to Expand Model's **Capacity**
    - More non-linear activation units (**neurons**)
    - More hidden **layers**
- **Cons** of Expanding Capacity
    - Need more **data**
    - Does not necessarily mean higher **accuracy**
- **GPU** Code
    - 2 things on GPU
        - **model**
        - **variable**
    - Modifying only **Step 3, 4 and 7 **
- **7 Step ** Model Building Recap
    - Step 1: Load Dataset
    - Step 2: Make Dataset Iterable
    - ** Step 3: Create Model Class **
    - ** Step 4: Instantiate Model Class **
    - Step 5: Instantiate Loss Class
    - Step 6: Instantiate Optimizer Class
    - **Step 7 Train Model**