# Recurrent neural networks introduction

Recurrent neural networks (RNN-s) are neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs.
Recurrent neural networks were based on David Rumelhart's work in 1986.

In [1]:
import torch
import torch.nn as nn 
import torch.optim as optim 
import torch.nn.functional as F  

from torch.utils.data import DataLoader

import torchvision.datasets as datasets
import torchvision.transforms as transforms

Let's define our hyperparameters that will be used:

In [2]:
input_size = 28
hidden_size = 256
num_layers = 2
num_classes = 10
sequence_length = 28
learning_rate = 0.001
batch_size = 64
num_epochs = 3

Load data:

In [3]:
train_dataset = datasets.MNIST('', train=True,
                               transform=transforms.ToTensor(), 
                               download=True)


test_dataset = datasets.MNIST('', train=False,
                               transform=transforms.ToTensor(), 
                               download=True)


train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=True)


``nn.RNN`` Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.

For each element in the input sequence, each layer computes the following function:

$$h_{t}=tanh⁡(W_{ih}x_{t}+b_{ih}+W_{hh}h{(t−1)}+b_{hh})$$

where $h_{th}$ is the hidden state at time t, $x_{t}$ is the input at time t, and $h_{(t−1)}$ is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then ReLU is used instead of tanh⁡.

In [4]:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size * sequence_length, num_classes)

    def forward(self, x):
        # Set initial hidden and cell states
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)

        out, _ = self.rnn(x, h0)
        out = out.reshape(out.shape[0], -1)

        out = self.fc(out)# Decode the hidden state of the last time step
        
        return out

Set the device:

In [5]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Initialize network:

In [6]:
model = RNN(input_size, hidden_size, num_layers, num_classes).to(device)

Let's calculate loss and specify the optimizer:

In [7]:
loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

## Training

In [8]:
for epoch in range(num_epochs):
    for batch_idx, (data, targets) in enumerate(train_loader):
        # Get data to cuda if possible
        data = data.to(device=device).squeeze(1)
        targets = targets.to(device=device)

        # forward
        scores = model(data)
        loss = loss_function(scores, targets)

        # backward
        optimizer.zero_grad()
        loss.backward()

        # gradient descent or adam step
        optimizer.step()
    print(loss)

tensor(0.3566, grad_fn=<NllLossBackward>)
tensor(0.2039, grad_fn=<NllLossBackward>)
tensor(0.0368, grad_fn=<NllLossBackward>)


Let's check accuracy on training & test to see how good our model is:

In [9]:
def check_accuracy(loader, model):
    if loader.dataset.train:
        print("Accuracy on training data:")
    else:
        print("Accuracy on test data:")

    num_correct = 0
    num_samples = 0

    # Set model to eval
    model.eval()

    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device).squeeze(1)
            y = y.to(device=device)

            scores = model(x)
            _, predictions = scores.max(1)
            num_correct += (predictions == y).sum()
            num_samples += predictions.size(0)

        print(
            f"Got {num_correct} / {num_samples} with accuracy {float(num_correct)/float(num_samples)*100:.2f}"
        )
    # Set model back to train
    model.train()


check_accuracy(train_loader, model)
check_accuracy(test_loader, model)


Accuracy on training data:
Got 58326 / 60000 with accuracy 97.21
Accuracy on test data:
Got 9748 / 10000 with accuracy 97.48
