In this notebook, we build a recurrent neural network with a single layer, consisting of a single neuron with PyTorch.

Import the necessary libraries:

In [26]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import os
import numpy as np
import torchvision
import torchvision.transforms as transforms

# RNN with a Single Neuron
***
Here is the implementation of a simple one layer, single neuron RNN. We have initialized two weight matrices, `Wx` and `Wy` with values from a normal distribution. `Wx` contains connection weights for the inputs of the previous time step. `Wy` contains connection weights for the outputs of the previous time step. The `forward` function computes two outputs - one for each time step. 

<img src="img/RNN_pic_1.png" alt="Alt text that describes the graphic" title="Title text" />

In [5]:
class SingleRNN(nn.Module):
    def __init__(self, n_inputs, n_neurons):
        super(SingleRNN, self).__init__()
        self.Wx = torch.randn(n_inputs, n_neurons)  # 4x1
        self.Wy = torch.randn(n_neurons, n_neurons) # 1x1
        self.b = torch.zeros(1, n_neurons)          # 1x4
    def forward(self, X0, X1):
        self.Y0 = torch.tanh(torch.mm(X0, self.Wx) + self.b) # 4x1
        self.Y1 = torch.tanh(torch.mm(self.Y0, self.Wy) + torch.mm(X1, self.Wx) +self.b) # 4x1
        return self.Y0, self.Y1

<img src="img/single_layer_RNN_architecture.png" alt="Alt text that describes the graphic" title="Title text" />

For the input, we use 4 instances, with each instance containing two input sequences. 

Here is an example for the usage of the model:

<img src="img/data_feed_into_RNN.png" alt="Alt text that describes the graphic" title="Title text" />

In [6]:
X0_batch = torch.tensor([[0,1,2,0], [3,4,5,0],\
                        [6,7,8,0],[9,0,1,0]],\
                       dtype=torch.float) # t=0 -> 4x4
X1_batch = torch.tensor([[9,8,7,0],[0,0,0,0],\
                        [6,5,4,0],[3,2,1,0]],\
                       dtype=torch.float) # t=1 -> 4x4

N_INPUT = 4
N_NEURONS = 1
model = SingleRNN(N_INPUT, N_NEURONS)

Y0_val, Y1_val = model(X0_batch, X1_batch)

This will result in outputs for each timestep `Y0` and `Y1`, each of size 4x1, which represent the size of batch and hidden units, respectively:

In [9]:
print(Y0_val)
print(Y1_val)

tensor([[0.5996],
        [0.9998],
        [1.0000],
        [1.0000]])
tensor([[ 1.0000],
        [-0.8045],
        [ 1.0000],
        [ 0.9758]])


# RNN with Multiple Neurons
***
Now we can increase the number of neurons in the RNN layer. Since we already had left `n_inputs` and `n_neurons` as variables to be passed into the constructor, we don't have to change anything about our class except for the name.

<img src="img/simple_RNN_n_neurons.png" alt="Alt text that describes the graphic" title="Title text" />

In [10]:
class BasicRNN(nn.Module):
    def __init__(self, n_inputs, n_neurons):
        super(SingleRNN, self).__init__()
        self.Wx = torch.randn(n_inputs, n_neurons)  # n_inputs x n_neurons
        self.Wy = torch.randn(n_neurons, n_neurons) # n_neurons x n_neurons
        self.b = torch.zeros(1, n_neurons)          # 1 x n_neurons
    def forward(self, X0, X1):
        self.Y0 = torch.tanh(torch.mm(X0, self.Wx) + self.b) # batch_size x n_neurons
        self.Y1 = torch.tanh(torch.mm(self.Y0, self.Wy) + torch.mm(X1, self.Wx) +self.b) # batch_size x n_neurons
        return self.Y0, self.Y1

In [11]:
N_INPUT = 3 # number of features in input
N_NEURONS = 5 # number of units in layer

X0_batch = torch.tensor([[0,1,2], [3,4,5],[6,7,8], [9,0,1]],dtype=torch.float) # t=0 -> 4x3
X1_batch = torch.tensor([[9,8,7], [0,0,0], [6,5,4], [3,2,1]], dtype=torch.float) # t=1 -> 4x3

model = SingleRNN(N_INPUT, N_NEURONS)

Y0_val, Y1_val = model(X0_batch, X1_batch)

In [12]:
print(Y0_val)
print(Y1_val)

tensor([[-0.8361,  0.4522,  0.9692, -0.0313, -0.4367],
        [ 0.3682, -0.6885,  0.9996, -0.7780,  0.7924],
        [ 0.9626, -0.9746,  1.0000, -0.9673,  0.9895],
        [ 0.9753, -1.0000, -1.0000, -0.9998,  1.0000]])
tensor([[ 1.0000, -0.9103,  1.0000, -0.9978,  1.0000],
        [ 0.0450,  0.4532,  0.6085,  0.8330, -0.9481],
        [ 0.9992, -0.9964,  0.9739, -0.3256,  0.5406],
        [ 0.9995, -1.0000, -0.9988,  0.3133, -0.9926]])


# PyTorch Built-in RNN Cell
***
The way we have implemented our RNN above means that if we wanted to build an architecture that supports extremely large inputs and outputs, we would have to individually compute the outputs for every time step, increasing the liens of code needed to implement the desired computation graph. 

Instead, we can consolidate and implement this RNN architecture more efficiently and cleanly using the built in `RNNCell` module:

In [14]:
rnn = nn.RNNCell(3,5) # n_inputs x n_neurons

X_batch = torch.tensor([[[0,1,2], [3,4,5], \
                         [6,7,8], [9,0,2]], \
                        [[9,8,7], [0,0,0], \
                         [6,5,4], [3,2,1]]], dtype=torch.float)

hx = torch.randn(4,5) # m x n_neurons
output = []

# for each time step
for i in range(2):
    hx = rnn(X_batch[i], hx)
    output.append(hx)
    
print(output)

[tensor([[-0.6990, -0.7297, -0.8801,  0.1854, -0.5004],
        [-0.9443, -0.9051, -0.5188,  0.9710,  0.9781],
        [-0.9989, -0.9988, -0.9422,  0.9994,  0.9997],
        [-0.2015, -0.9980,  0.8230,  0.7355,  0.6869]], grad_fn=<TanhBackward>), tensor([[-0.9923, -0.9990,  0.1942,  0.9999,  0.9998],
        [-0.1589,  0.2251, -0.0162, -0.5187, -0.9219],
        [-0.9569, -0.9754,  0.2113,  0.9927,  0.9418],
        [-0.7972, -0.9371,  0.0641,  0.5208, -0.0497]], grad_fn=<TanhBackward>)]


The above code is the same model as we implemented from scratch in BasicRNN, except now we  don't have to deal individually with the weights and biases, and that is abstracted away for us. So now we can rewrite our class as:

In [18]:
class CleanBasicRNN(nn.Module):
    def __init__(self, batch_size, n_inputs, n_neurons):
        super(CleanBasicRNN, self).__init__()
        rnn = nn.RNNCell(n_inputs, n_neurons)
        self.hx = torch.randn(batch_size, n_neurons)
        
    def forward(self, x):
        output = []
        
        for i in range(2):
            self.hx = rnn(x[i], self.hx)
            output.append(self.hx)
            
        return output, self.hx

In [20]:
FIXED_BATCH_SIZE = 4
N_INPUT = 3
N_NEURONS = 5

X0 = [[0,1,2], [3,4,5],\
      [6,7,8], [9,0,1]]
X1 = [[9,8,7], [0,0,0],\
      [6,5,4], [3,2,1]]

x_batch = torch.tensor([X0, X1], dtype=torch.float) # X0 and X1

model = CleanBasicRNN(FIXED_BATCH_SIZE, N_INPUT, N_NEURONS)
output_val, states_val = model(x_batch)
print(output_val) # all output for all timesteps
print(states_val) # values for final state or final timestep 

[tensor([[-0.8465, -0.2058, -0.6323,  0.1341,  0.0972],
        [-0.9175, -0.9723, -0.0428,  0.9554,  0.9582],
        [-0.9939, -0.9956, -0.6253,  0.9971,  0.9984],
        [ 0.5503, -0.9983,  0.5700,  0.9839,  0.8755]], grad_fn=<TanhBackward>), tensor([[-0.9966, -0.9989, -0.1796,  0.9999,  0.9998],
        [-0.3029,  0.0358, -0.0608, -0.6554, -0.9200],
        [-0.9664, -0.9807,  0.1707,  0.9905,  0.9439],
        [-0.7852, -0.9447, -0.0751,  0.7677, -0.1597]], grad_fn=<TanhBackward>)]
tensor([[-0.9966, -0.9989, -0.1796,  0.9999,  0.9998],
        [-0.3029,  0.0358, -0.0608, -0.6554, -0.9200],
        [-0.9664, -0.9807,  0.1707,  0.9905,  0.9439],
        [-0.7852, -0.9447, -0.0751,  0.7677, -0.1597]], grad_fn=<TanhBackward>)


# RNN For MNIST Image Classification
***
Here are the parameters for the computation graph:

In [21]:
N_STEPS = 28
N_INPUTS = 28
N_NEURONS = 150
N_OUTPUTS = 10
N_EPOCHS = 10

Here, we show how to load and import the data using PyTorch libraries.

This code loads and prepares the dataset to be fed into the computation graph that we will build. We needed to provide a `BATCH_SIZE` because `trainloader` and `testloader` are iterators that make it easier when we are iterating over the dataset and training our RNN model with minibatches. 

In [24]:
BATCH_SIZE = 64

# list the transformations
transform = transforms.Compose([transforms.ToTensor()])

# download and load training dataset
trainset = torchvision.datasets.MNIST(root="./data", train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE, shuffle=True, num_workers=8)

# download and load testing dataset
testset = torchvision.datasets.MNIST(root="./data", train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=BATCH_SIZE, shuffle=False, num_workers=8)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Processing...
Done!


Here is the code for the model itself:

In [22]:
class ImageRNN(nn.Module):
    def __init__(self, batch_size, n_steps, n_inputs, n_neurons, n_outputs):
        super(ImageRNN, self).__init__()
        self.n_neurons = n_neurons
        self.batch_size = batch_size
        self.n_steps = n_steps
        self.n_inputs = n_inputs
        self.n_outputs = n_outputs
        
        # declare a basic RNN layer
        self.basic_rnn = nn.RNN(self.n_inputs, self.n_neurons)
        # followed by a fully-connected layer
        self.FC = nn.Linear(self.n_neurons, self.n_outputs)
        
    def init_hidden(self,):
        """
        initializes hidden weights with zero values
        """
        return (torch.zeros(1, self.batch_size, self.n_neurons))
    
    def forward(self, x):
        """
        run data through the RNN layer and then through the fully-connected layer
        outputs the log probabilities of the model because of softmax
        """
        # transform x into shape: (n_steps x batch_size x n_inputs)
        x = x.permute(1, 0, 2)
        self.batch_size = x.size(1)
        self.hidden = self.init_hidden()
        
        lstm_out, self.hidden = self.basic_rnn(x, self.hidden)
        
        out = self.FC(self.hidden)
        
        return out.view(-1, self.n_outputs) # (batch_size x n_output)

# Test the model with some samples
***
always test the model with a portion of the dataset before actual training. this is to ensure you have the correct dimensions specified and that the model is producing the information you expect:

In [25]:
dataiter = iter(trainloader)
images, labels = dataiter.next()
model = ImageRNN(BATCH_SIZE, N_STEPS, N_INPUTS, N_NEURONS, N_OUTPUTS)
logits = model(images.view(-1, 28, 28))
print(logits[0:10])

tensor([[-0.0043, -0.1175,  0.0033, -0.0574, -0.0428,  0.0006,  0.0652, -0.0544,
         -0.0246, -0.0275],
        [-0.0187, -0.1309,  0.0095, -0.0603, -0.0512, -0.0056,  0.0719, -0.0444,
         -0.0294, -0.0033],
        [-0.0247, -0.1163, -0.0299, -0.0107, -0.0474, -0.0410,  0.0526, -0.0251,
         -0.0371, -0.0429],
        [-0.0109, -0.1150,  0.0015, -0.0586, -0.0421,  0.0002,  0.0617, -0.0536,
         -0.0239, -0.0301],
        [-0.0191, -0.1034, -0.0041, -0.0528, -0.0347, -0.0087,  0.0613, -0.0399,
         -0.0170, -0.0231],
        [-0.0055, -0.1216,  0.0005, -0.0496, -0.0299, -0.0015,  0.0612, -0.0459,
         -0.0222, -0.0230],
        [-0.0073, -0.1224,  0.0066, -0.0565, -0.0351,  0.0015,  0.0622, -0.0465,
         -0.0198, -0.0280],
        [-0.0128, -0.1174,  0.0104, -0.0594, -0.0425, -0.0050,  0.0639, -0.0501,
         -0.0189, -0.0249],
        [-0.0183, -0.1140,  0.0057, -0.0601, -0.0526, -0.0087,  0.0644, -0.0419,
         -0.0236, -0.0207],
        [-0.0060, -

# Training Loop
***
Before training a model in PyTorch, programatically specify what device you want to use during training. 

Then, create and instance of `ImageRNN` with the proper parameters. `criterion` specifies the loss function we are using. `nn.CrossEntropyLoss()` applies a log softmax followed by negative log likelihood loss operation over the output of the model. In order to compute this loss, the function needs the log probabilities outputted by our model and the target labels.

In addition, we use an optimization algorithm in order to update the weights based on current loss. This is done through `optim.Adam` optimization function. This function requires the model parameters and the learning rate. 

You could alternatively use `optim.SGD`.

In [27]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = ImageRNN(BATCH_SIZE, N_STEPS, N_INPUTS, N_NEURONS, N_OUTPUTS)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

def get_accuracy(logit, target, batch_size):
    corrects = (torch.max(logit, 1)[1].view(target.size()).data == target.data).sum()
    accuracy = 100.0 * corrects / batch_size
    return accuracy.item()

Now we write the loop:

In [33]:
for epoch in range(N_EPOCHS):
    train_running_loss = 0.0
    train_acc = 0.0
    model.train()
    
    for i, data in enumerate(trainloader):
        # set gradients to zero
        optimizer.zero_grad()
        # reset hidden states
        model.hidden = model.init_hidden()
        # get inputs
        inputs, labels = data
        inputs = inputs.view(-1,28,28)
        
        #forward + backward + optimze
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        train_running_loss += loss.detach().item()
        train_acc += get_accuracy(outputs, labels, BATCH_SIZE)
    model.eval()
    print("Epoch: {0:d} | Loss: {1:.4f} | Train Accuracy: {2:.2f}".\
          format(epoch, train_running_loss / i, train_acc / i))

Epoch: 0 | Loss: 0.2320 | Train Accuracy: 92.99
Epoch: 1 | Loss: 0.2014 | Train Accuracy: 93.90
Epoch: 2 | Loss: 0.1712 | Train Accuracy: 94.74
Epoch: 3 | Loss: 0.1547 | Train Accuracy: 95.24
Epoch: 4 | Loss: 0.1436 | Train Accuracy: 95.58
Epoch: 5 | Loss: 0.1310 | Train Accuracy: 95.92
Epoch: 6 | Loss: 0.1268 | Train Accuracy: 96.04
Epoch: 7 | Loss: 0.1241 | Train Accuracy: 96.09
Epoch: 8 | Loss: 0.1151 | Train Accuracy: 96.27
Epoch: 9 | Loss: 0.1108 | Train Accuracy: 96.47


Now we can compute the accuracy on the test set to see how well our model generalizes:

In [34]:
test_acc = 0.0
for i, data in enumerate(testloader, 0):
    inputs, labels = data
    inputs = inputs.view(-1,28,28)
    
    outputs = model(inputs)
    
    test_acc += get_accuracy(outputs, labels, BATCH_SIZE)

print("Test Accuracy: {0:.2f}".format(test_acc / i))

Test Accuracy: 96.13
