### PyTorch Tutorial - RNN & LSTM & GRU - Recurrent Neural Nets
https://www.youtube.com/watch?v=0_PgWWmauHk

(From python engineer's youtube lectures)

Erlier tutorial was on RNN - 
<img src="images/rnn-unfolded.png">

In previous tut, we developed owr own RNN from scratch. Here we use pytorch's RNN module.

We can easily switch then from RNN to LSTM and GRU easily.

We start with out tutprial 13 code - which is image classification using NN. However we will use RNN to do image classification. **Generally we never use RNN to do image classification - but here just to demonstate the usage we do so, treating input as sequence and setting up correct shapes.**


We will see  we can achive accuracy using RNNs.
<img src="images/rnn-applications.png">

Here we will use many to one.

In [1]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt


Earlier we used 28x28 = 784 as image size input. We flattened the image as 784 size vectors.
But here we treat image as a sequence of vectors of dimesion 28 each (is one row at a time) - so we set input_size=28 and sequence length = 28

We make hidden_size = 128, ( we can chose any other size)


num_layers = 2 => This means we are stacking 2 RNNS togethere - so that second RNN is taking input of first RNN's output


In [2]:
# device config
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"device:{device}")

# hyper parameters
# 28 x 28 : image size - we treat 
#input_size = 784 
#hidden_size = 100
num_classes = 10
num_epochs = 5
batch_size = 100
learning_rate = 0.001

input_size = 28
sequence_length = 28
hidden_size = 128
num_layers = 2

# MNIST
train_dataset = torchvision.datasets.MNIST(root = "./data", 
                                           train = True,
                                           transform = transforms.ToTensor(),
                                           download = True)

test_dataset = torchvision.datasets.MNIST(root = "./data", 
                                          train = False,
                                          transform = transforms.ToTensor(),
                                          download= True)

# we shuffle
train_loader = torch.utils.data.DataLoader(dataset = train_dataset, 
                                           batch_size = batch_size,
                                          shuffle = True)

# we dont shuffle
test_loader = torch.utils.data.DataLoader(dataset = test_dataset, 
                                           batch_size = batch_size,
                                          shuffle = False)



device:cpu


In [3]:
class MYMODEL(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes, model_type):
        #super(RNN, self).__init__()
        super(MYMODEL, self).__init__()
        
        # just to call rnn or gru - in one place
        self.model_type = model_type
        
        self.num_layers = num_layers
        self.hidden_size = hidden_size
        # we create builtin RNN model
        # batch_first = True : means batch will be the first dimension - we will need to 
        # match the input accordingly ie # input  shape will be x -> (batch_size , sequence_length, input_size)
        if model_type == "rnn":
            self.mod = nn.RNN(input_size, hidden_size, num_layers, batch_first = True)
        elif model_type == "gru":
            self.mod = nn.GRU(input_size, hidden_size, num_layers, batch_first = True)
        elif model_type == "lstm":
            self.mod = nn.LSTM(input_size, hidden_size, num_layers, batch_first = True)

        # (we use last time step to do the classification, so the Linear layer is hidden_size x num_classes)
        # (we need last hidden size as linear layer)
        self.fc = nn.Linear(hidden_size, num_classes)
        
        
    def forward(self, x):
        #print(f"x.size():{x.size()}")
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
        
        # call out RNN
        if self.model_type in ["rnn", "gru"]:
            out, _ = self.mod(x, h0)
        elif self.model_type == "lstm":
            # if LSTM, it needs an intial cell state
            c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
            out, _ = self.mod(x, (h0, c0))
            
        # out is of zise : batch_size, seq_length, hidden_sizw: (N , 28, 128)
        # we only last timestep
        out = out[:,-1, :] 
        # now out os (N,128)
        out = self.fc(out)
        return out

In [4]:
#model = MYMODEL(input_size, hidden_size, num_layers, num_classes, "rnn")
#model = MYMODEL(input_size, hidden_size, num_layers, num_classes, "gru")
model = MYMODEL(input_size, hidden_size, num_layers, num_classes, "lstm")


#loss and optimizer
criterion = nn.CrossEntropyLoss()  #applies softmax
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)

#trainign loop
n_total_steps = len(train_loader)

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # original shape : 100, 1 , 28, 28
        # we need [100,  28 , 28] 
        images = images.reshape(-1, sequence_length, input_size).to(device) #pushes to gpu if avilable
        lables = labels.to(device)
        
        # forward
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # backwards 
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        
        #Tutorial says zero_grad can be put in any order
        #but only thing to make sure is that it must be 
        #called before next iteration
        # I wonder why ? because I am thinking that optimizer.step() may be using 
        # gradient information to step.
        # it seems that optimizer.step() might be using gradient info inside to step. So if you make it zero before stepping, then first step goes waste. However the code still works because the next step() call may be using gradient of previous iteration. So, there might be a "loss" of one batch of data, but still it works.
        
        
        if (i + 1)% 100 == 0:
            print(f'epoch {epoch + 1}/{num_epochs}, step {i+1}/{n_total_steps} loss={loss.item():.4f}')
            
        
        


epoch 1/5, step 100/600 loss=1.0599
epoch 1/5, step 200/600 loss=0.4493
epoch 1/5, step 300/600 loss=0.2735
epoch 1/5, step 400/600 loss=0.1501
epoch 1/5, step 500/600 loss=0.1524
epoch 1/5, step 600/600 loss=0.1294
epoch 2/5, step 100/600 loss=0.1233
epoch 2/5, step 200/600 loss=0.0719
epoch 2/5, step 300/600 loss=0.1413
epoch 2/5, step 400/600 loss=0.1121
epoch 2/5, step 500/600 loss=0.0468
epoch 2/5, step 600/600 loss=0.1027
epoch 3/5, step 100/600 loss=0.0365
epoch 3/5, step 200/600 loss=0.1134
epoch 3/5, step 300/600 loss=0.0917
epoch 3/5, step 400/600 loss=0.0475
epoch 3/5, step 500/600 loss=0.0713
epoch 3/5, step 600/600 loss=0.1000
epoch 4/5, step 100/600 loss=0.0503
epoch 4/5, step 200/600 loss=0.0952
epoch 4/5, step 300/600 loss=0.1121
epoch 4/5, step 400/600 loss=0.0179
epoch 4/5, step 500/600 loss=0.0315
epoch 4/5, step 600/600 loss=0.0311
epoch 5/5, step 100/600 loss=0.0288
epoch 5/5, step 200/600 loss=0.0132
epoch 5/5, step 300/600 loss=0.0158
epoch 5/5, step 400/600 loss

In [6]:
# test
with torch.no_grad():
    n_correct = 0
    n_samples = 0
    for images, labels in test_loader:
        images = images.reshape(-1, sequence_length, input_size).to(device) #pushes to gpu if avilable
        lables = labels.to(device)
        outputs = model(images)
        
        #value, index
        _, predictions = torch.max(outputs, 1) #along dimension 1
        n_samples += labels.shape[0]
        n_correct += (predictions == labels).sum().item()
    
    acc = 100.0 * n_correct/n_samples
    print(f'accuracy = {acc}')

accuracy = 98.56


So we get an accuracy:

* rnn = 96%

* gru = 98%

* lstm = 98.56%
