<a href="https://colab.research.google.com/github/gvogiatzis/CS4740/blob/main/CS4740_Lab_Week_05b.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Exploring the capabilities of Recurrent Neural Networks

In this lab we will carry out a few experiments with Recurrent Neural Networks in order to understand the way these models operate and also to see some interesting applications. RNNs are Deep Learning models that are particularly suitable for sequential data such as time series, text, speech, audio signals etc.

We can think of an RNN as a *machine* that is fed a sequence of data-points. After each data-point is entered, the machine performs some calculations based on the information received, and it  modifies its inner state appropriately. It then produces some output before moving to the next data-point in the sequence. 

We can use RNNs in a multitude of ways:

* We can use the last RNN output after a sequence has been read, to make some inference about the type of sequence. E.g. classifying a bit of text into one of a number of categories.
* We can use an RNN to encode an input sequence, and pass a code to a second RNN that decodes that into an output sequence. This can form the basis of a sequence-to-sequence mapping, e.g. translation of english to french text.
* We can use an RNN as a next-step predictor. E.g. predicting the next day's stock market price from its past history.

In this lab we will see a few examples of RNNs: 

1. A very simple bit-counter
2. A text generator
3. An *adder* that parses numerical expressions

Firstly let's import a few packages:

In [1]:
import re
import csv
from textblob import Word
import numpy as np

import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from more_itertools import sliced
from torch.utils.tensorboard import SummaryWriter

from random import randint

%load_ext tensorboard

# A simple running total calculator

This is perhaps the simplest possible RNN we can come up with, that still has a usefull function. The idea is simple: we feed a sequence of numbers and the RNN calculates a running total over those numbers as they come in. 

First, let's see how the rnn layers work in practice. The main two choices are the GRU and the LSTM, details of which we have seen in lectures. We will be using the GRU throughout this lab but both are drop-in replacements for each other.

Let's define a simple GRU layer:

In [48]:
gru = nn.GRU(input_size=2, hidden_size=5)

This module expects to receive sequences of 2-dimensional data points. It crunches them using a hidden layer of size 5, which means that it outputs a sequence of 5-element vectors. Now the actual dimensionality of the input and output is somewhat confusing because of the complexity of batches: it is very important that all NN layers in pytorch are able to handle batched input for standard stochastic gradient descent to work. 

So the input to RNN layers (by default) is of size 

 `sequence_size` x `batch_size` x `input_size`

Why is `batch_size` in the middle I hear you ask? Well it makes sense if we want to iterate through the elements of the sequence. Each element in that sequence is of size `batch_size` x `input_size` and that's exactly what the inner layers of the RNN expect when operating on batches!

Having said that, if it is more natural for you to think in terms of batches as the rightmost index, as you would do for a conv or linear layer, `pytorch` allows you to do that by using the `batch_first=True` flag inside RNN constructors. In fact we will be making use of that flag throughout this lab, but bear in mind that `pytorch` is flipping the indices internally.

So in our case we would define an input consisting of a batch of 64 sequences, each of which has 100 elements, that are 2-dimensional, we would do it as follows:

In [53]:
gru = nn.GRU(input_size=2, hidden_size=5)

x = torch.randn(100,64,2)
y,h = gru(x)

y.shape

torch.Size([100, 64, 5])

But if we use the `batch_first=True` flag, this becomes:

In [54]:
gru = nn.GRU(input_size=2, hidden_size=5, batch_first=True)

x = torch.randn(64,100,2)
y,h = gru(x)

y.shape

torch.Size([64, 100, 5])

In [41]:
class CounterNet(nn.Module):
    def __init__(self, embed_size=10, hidden_dim=64):
        super(CounterNet, self).__init__()
        self.rnn = nn.GRU(input_size=1,
                        hidden_size=hidden_dim,
                        batch_first=True)
        self.fc = nn.Linear(hidden_dim, 1)

    def forward(self, x, batch_size=1):
        # x = self.embedding(x)
        x, _ = self.rnn(x)
        x = self.fc(x)
        return x

In [45]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

batch_size=100
num_of_epochs = 5000
net = CounterNet().to(device)
optimizer = torch.optim.Adam(net.parameters(), lr=0.001)
loss = nn.MSELoss()
for e in range(num_of_steps):
    x = torch.randint(10, (batch_size,50,1), device=device, dtype=torch.float32)
    t = x.cumsum(dim=1,dtype=torch.float32)
    y = net(x)
    L = loss(y,t)

    optimizer.zero_grad()
    L.backward()
    optimizer.step()
    print(f"\rEpoch: {e}/{num_of_epochs} \tL={L}", end="")

Epoch: 4999/5000 	L=0.01337356399744749

In [46]:
def evaluate_net(net, input):
    x = torch.tensor(input, device=device, dtype=torch.float32).view(1,-1,1)
    t = x.cumsum(dim=1,dtype=torch.float32)
    y = net(x)
    return y

In [47]:
evaluate_net(net,[1,0,0,5])

tensor([[[0.9742],
         [0.9708],
         [1.0471],
         [5.9503]]], device='cuda:0', grad_fn=<AddBackward0>)

In [None]:
itoc=list('0123456789+=. ')
ctoi = {c:i for i,c in enumerate(itoc)}

def create_strings(a,b):
    c=a+b
    a_s=str(a)
    b_s=str(b)
    c_s=str(c)
    input = f'{a_s}+{b_s}={c_s}'
    output = f'{c_s}.'
    input = f'{a_s}+{b_s}'
    output=' ' * len(input)
    input += f'={c_s}'
    output+= f'{c_s}.'

    
    return input, output

def create_addition_samples(num_samples,ndigits):
    x,t,pairs=[],[],[]
    for n in range(num_samples):
        a = randint(0,10**ndigits-1)
        b = randint(0,10**ndigits-1)
        input, output = create_strings(a,b)
        input = input.ljust(3*ndigits+3, ' ')
        output = output.ljust(3*ndigits+3, ' ')
        x.append([ctoi[c] for c in input])
        t.append([ctoi[c] for c in output])
        pairs.append((a,b))
    return x,t,pairs

In [None]:
class AdderNet(nn.Module):
    def __init__(self, charset=13, embed_size=15, hidden_dim=1024):
        super(AdderNet, self).__init__()
        self.embedding = nn.Embedding(charset, embed_size)
        self.rnn = nn.GRU(input_size=embed_size,
                            hidden_size=hidden_dim,
                            batch_first=True)
        self.fc = nn.Linear(hidden_dim, charset)
        self.h = None

    def forward(self, x, batch_size=1, keep_memory=False):
        x = self.embedding(x)
        if keep_memory:
            x, self.h = self.rnn(x,self.h)
        else:
            x, self.h = self.rnn(x)
        x = self.fc(x)
        return x

def eval_adder(net, input_str):
    itoc=list('0123456789+=. ')
    ctoi = {c:i for i,c in enumerate(itoc)}
    input = torch.tensor([[ctoi[c] for c in input_str]], device = device)
    x = net(input).argmax(dim=2)
    x = x[0,-1].view(1,-1)
    output = [x.item()]
    total_len=0
    while x.item() != ctoi['.'] and total_len<10:
        x = net(x, keep_memory=True).argmax(dim=2)
        output.append(x.item())
        total_len+=1
    return "".join(itoc[i] for i in output)

In [None]:
training_data=set()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
batch_size=100
num_of_epochs = 5000
ndigits=3
net = AdderNet(charset = len(itoc)).to(device)
optimizer = torch.optim.Adam(net.parameters(), lr=0.001)
loss = nn.CrossEntropyLoss()
for e in range(num_of_epochs):
    x,t,pairs = create_addition_samples(batch_size,ndigits=ndigits)
    training_data =training_data.union(set(pairs))
    x = torch.tensor(x,device=device)
    t = torch.tensor(t,device=device)
    y = net(x)
    L = loss(y.view(-1,y.shape[-1]),t.view(-1))

    optimizer.zero_grad()
    L.backward()
    optimizer.step()
    print(f"\rEpoch: {e}/{num_of_epochs} \tL={L}", end="")

Epoch: 4999/5000 	L=0.018413158133625984

In [None]:
all_pairs = set((a,b) for a in range(1000) for b in range(1000))
unseen_data=list(all_pairs.difference(training_data))

In [None]:
errors=0
for i,(a,b) in enumerate(unseen_data):
    input, output = create_strings(a,b)
    # net_output = eval(eval_adder(net, input[:input.find('=')+1]))
    try:
        target = eval(input[:input.find('=')])
        output = eval(eval_adder(net, input[:input.find('=')+1]))
        if target != output:
            # print(input, output,)
            print(f"\r {i}/{len(unseen_data)} {errors/i*100}",end="")
            errors+=1
    except:
        errors+=1

print(f"Error ={errors/len(unseen_data)*100}%")

 606620/606628 6.8260195839240385Error =6.826094410412971%


In [None]:
eval_adder(net, "555+234=")

'789.'