<a href="https://colab.research.google.com/github/lianaling/neural-networks-practice/blob/main/RecurrentNeuralNetworks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recurrent Neural Networks

[A Beginner’s Guide on Recurrent Neural Networks with PyTorch](https://blog.floydhub.com/a-beginners-guide-on-recurrent-neural-networks-with-pytorch/)

In [44]:
import torch
from torch import nn

import numpy as np

## Create Dataset

In [45]:
# Create dataset
text = ['hey how are you', 'good i am fine', 'have a nice day']

# Join all sentences, extract unique characters
chars = set(''.join(text))

# Create dict to map int to char
int2char = dict(enumerate(chars))

# Create dict to map char to int
char2int = {char: ind for ind, char in int2char.items()}

In [46]:
print(char2int)

{'e': 0, 'i': 1, 'y': 2, ' ': 3, 'f': 4, 'w': 5, 'm': 6, 'v': 7, 'c': 8, 'r': 9, 'n': 10, 'a': 11, 'h': 12, 'u': 13, 'd': 14, 'g': 15, 'o': 16}


## Padding

In [47]:
# Pad input sentences so that all are of same length
# To perform batch training
# Each batch should be the same size

# Find longest string length
maxlen = len(max(text, key=len))

# Padding
# Add whitespace until length matches max length
for i in range(len(text)):
  while len(text[i]) < maxlen:
    text[i] += ' '

## Train-Test Split

Remove the last char for input and first char for target seq.

In [48]:
input_seq = []
target_seq = []

for i in range(len(text)):
  # Remove last char for input seq
  input_seq.append(text[i][:-1])

  # Remove first char for input seq
  target_seq.append(text[i][1:])

  print(f"Input sequence: {input_seq[i]}\n Target sequence: {target_seq[i]}")

Input sequence: hey how are yo
 Target sequence: ey how are you
Input sequence: good i am fine
 Target sequence: ood i am fine 
Input sequence: have a nice da
 Target sequence: ave a nice day


This is so that the target sequence is always one time-step ahead of the input data. Target sequence is the correct answer for the model at each time step.

## Convert to One-Hot Encoding

In [49]:
for i in range(len(text)):
  input_seq[i] = [char2int[char] for char in input_seq[i]]
  target_seq[i] = [char2int[char] for char in target_seq[i]]

In [50]:
dict_size = len(char2int) # Determine the one-hot vector size
seq_len = maxlen - 1 # Removed last char for input_seq
batch_size = len(text)

def one_hot_encode(sequence, dict_size, seq_len, batch_size):
  # Create multi-dim array of zeros with the desired output shape
  features = np.zeros((batch_size, seq_len, dict_size), dtype=np.float32)

  # Replace the 0 at the relevant char index with a 1 to represent that char
  for i in range(batch_size):
    for u in range(seq_len):
      features[i, u, sequence[i][u]] = 1

    return features

In [51]:
input_seq = one_hot_encode(input_seq, dict_size, seq_len, batch_size)
input_seq[0]

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.,
        0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.,
        0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        1.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
        0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
        0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0

In [52]:
print(f"Input shape: {input_seq.shape} --> (Batch Size, Sequence Length, One-Hot Encoding Size))")

Input shape: (3, 14, 17) --> (Batch Size, Sequence Length, One-Hot Encoding Size))


## Move Data From Numpy Arrays to Tensors

In [53]:
input_seq = torch.from_numpy(input_seq)
target_seq = torch.Tensor(target_seq)

## Build RNN

In [54]:
is_cuda = torch.cuda.is_available()
is_cuda

True

In [55]:
device = torch.device("cuda") if is_cuda else torch.device('cpu')

In [56]:
torch.cuda.get_device_name()

'NVIDIA GeForce RTX 3050 Ti Laptop GPU'

One layer of RNN and one fully-connected layer (FC)

[LSTM network inside a Sequential container](https://discuss.pytorch.org/t/lstm-network-inside-a-sequential-container/19304/2)
<br />
[How to flatten input in `nn.Sequential` in Pytorch](https://stackoverflow.com/questions/53953460/how-to-flatten-input-in-nn-sequential-in-pytorch)

In [57]:
class ContiguousView(nn.Module):
  def __init__(self, hidden_dim) -> None:
      super().__init__()
      self.hidden_dim = hidden_dim

  def forward(self, x):
      return x.contiguous().view(-1, self.hidden_dim)

In [58]:
# Own implementation of RNN using sequential

from collections import OrderedDict

class ModelSeq(nn.Module):
  def __init__(self, input_size, output_size, hidden_dim, n_layers) -> None:
      super(ModelSeq, self).__init__()

      self.hidden_dim = hidden_dim
      self.n_layers = n_layers

      self.net = nn.Sequential(
          OrderedDict([
                       ('rnn1', nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)),
                       ('flatten1', ContiguousView(hidden_dim)),
                       ('output', nn.Linear(hidden_dim, output_size))
          ])
      )

  def forward(self, x):
    batch_size = x.size(0)

    hidden = self.init_hidden(batch_size)
    
    out, hidden = self.net[0](x, hidden)
    # out = out.contiguous().view(-1, self.hidden_dim)
    out = self.net[1](out)
    out = self.net[2](out)

    return out, hidden
  
  def init_hidden(self, batch_size):
    '''Generates the first hidden state of zeros. Move tensor to device.'''
    hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim).to(device)
    return hidden

In [59]:
class Model(nn.Module):
  def __init__(self, input_size, output_size, hidden_dim, n_layers) -> None:
      super(Model, self).__init__()

      self.hidden_dim = hidden_dim
      self.n_layers = n_layers

      self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)
      self.fc = nn.Linear(hidden_dim, output_size)

  def forward(self, x):
    batch_size = x.size(0)

    # Init hidden state
    hidden = self.init_hidden(batch_size)

    out, hidden = self.rnn(x, hidden)

    # Reshape outputs to fit into FC
    out = out.contiguous().view(-1, self.hidden_dim)
    out = self.fc(out)

    return out, hidden
  
  def init_hidden(self, batch_size):
    '''Generates the first hidden state of zeros. Move tensor to device.'''
    hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim).to(device)
    return hidden

In [60]:
class ModelAns(nn.Module):
    def __init__(self, input_size, output_size, hidden_dim, n_layers):
        super(ModelAns, self).__init__()

        # Defining some parameters
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers

        #Defining the layers
        # RNN Layer
        self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)   
        # Fully connected layer
        self.fc = nn.Linear(hidden_dim, output_size)
    
    def forward(self, x):
        
        batch_size = x.size(0)

        #Initializing hidden state for first input using method defined below
        hidden = self.init_hidden(batch_size)

        # Passing in the input and hidden state into the model and obtaining outputs
        out, hidden = self.rnn(x, hidden)
        
        # Reshaping the outputs such that it can be fit into the fully connected layer
        out = out.contiguous().view(-1, self.hidden_dim)
        out = self.fc(out)
        
        return out, hidden
    
    def init_hidden(self, batch_size):
        # This method generates the first hidden state of zeros which we'll use in the forward pass
        hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim).to(device)
         # We'll send the tensor holding the hidden state to the device we specified earlier as well
        return hidden

## Train model

[Practical Guide to Hyperparameters Optimization for Deep Learning Models](https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/)

In [61]:
# Training hyperparameters
n_epochs = 100
learning_rate = 0.01

In [62]:
model_seq = ModelSeq(input_size=dict_size, output_size=dict_size, hidden_dim=12, n_layers=1)

In [63]:
model = Model(input_size=dict_size, output_size=dict_size, hidden_dim=12, n_layers=1)

In [64]:
model_ans = ModelAns(input_size=dict_size, output_size=dict_size, hidden_dim=12, n_layers=1)

In [65]:
model_seq.to(device)

ModelSeq(
  (net): Sequential(
    (rnn1): RNN(17, 12, batch_first=True)
    (flatten1): ContiguousView()
    (output): Linear(in_features=12, out_features=17, bias=True)
  )
)

In [66]:
model.to(device)

Model(
  (rnn): RNN(17, 12, batch_first=True)
  (fc): Linear(in_features=12, out_features=17, bias=True)
)

In [67]:
model_ans.to(device)

ModelAns(
  (rnn): RNN(17, 12, batch_first=True)
  (fc): Linear(in_features=12, out_features=17, bias=True)
)

In [68]:
# Define loss and optimiser
criterion = nn.CrossEntropyLoss()

In [69]:
opt_seq = torch.optim.Adam(model_seq.parameters(), lr=learning_rate)

In [70]:
opt = torch.optim.Adam(model.parameters(), lr=learning_rate)

In [71]:
opt_ans = torch.optim.Adam(model_ans.parameters(), lr=learning_rate)

In [72]:
def fit(x, y, model, loss_fn, opt, epochs):
  for epoch in range(1, epochs + 1):
    opt.zero_grad()
    x.to(device)
    out, hidden = model(x)
    out = out.to(device)
    loss = criterion(out, y.view(-1).long())
    loss.backward()
    opt.step()

    if epoch % 10 == 0:
      print(f"Epoch: {epoch}/{epochs}..........", end=' ')
      print(f"Loss: {loss.item():.4f}")

In [73]:
print("Model Seq")
fit(input_seq.to(device), target_seq.to(device), model_seq, criterion, opt_seq, n_epochs)

Model Seq
Epoch: 10/100.......... Loss: 2.4128
Epoch: 20/100.......... Loss: 2.2086
Epoch: 30/100.......... Loss: 2.0215
Epoch: 40/100.......... Loss: 1.8393
Epoch: 50/100.......... Loss: 1.6381
Epoch: 60/100.......... Loss: 1.4364
Epoch: 70/100.......... Loss: 1.2793
Epoch: 80/100.......... Loss: 1.1368
Epoch: 90/100.......... Loss: 0.9657
Epoch: 100/100.......... Loss: 0.8465


In [74]:
print("Model")
fit(input_seq.to(device), target_seq.to(device), model, criterion, opt, n_epochs)

Model
Epoch: 10/100.......... Loss: 2.4944
Epoch: 20/100.......... Loss: 2.3336
Epoch: 30/100.......... Loss: 2.1228
Epoch: 40/100.......... Loss: 1.8805
Epoch: 50/100.......... Loss: 1.6488
Epoch: 60/100.......... Loss: 1.4605
Epoch: 70/100.......... Loss: 1.3153
Epoch: 80/100.......... Loss: 1.2016
Epoch: 90/100.......... Loss: 1.1340
Epoch: 100/100.......... Loss: 1.0518


In [75]:
print("Model Ans")
fit(input_seq.to(device), target_seq.to(device), model_ans, criterion, opt_ans, n_epochs)

Model Ans
Epoch: 10/100.......... Loss: 2.4355
Epoch: 20/100.......... Loss: 2.3118
Epoch: 30/100.......... Loss: 2.0968
Epoch: 40/100.......... Loss: 1.8517
Epoch: 50/100.......... Loss: 1.5921
Epoch: 60/100.......... Loss: 1.3519
Epoch: 70/100.......... Loss: 1.1456
Epoch: 80/100.......... Loss: 0.9885
Epoch: 90/100.......... Loss: 0.8795
Epoch: 100/100.......... Loss: 0.7784


## Predict Output

[Solution to input tensors found on CPU but hidden tensors on cuda:0](https://stackoverflow.com/questions/51605893/why-doesnt-my-simple-pytorch-network-work-on-gpu-device)

In [76]:
def predict(model, char):
  '''Takes in the model and char as arguments and returns the next char prediction and hidden state.'''
  char = np.array([[char2int[c] for c in char]])
  char = one_hot_encode(char, dict_size, char.shape[1], 1)
  char = torch.from_numpy(char)
  char = char.to(device) # chars.to(device) without assignment operator will not work.
  # torch.nn.Module.to() changes the variable itself (in-place operator)
  # torch.Tensornto() does not change inputs. Returns a copy of inputs that resides on device.

  out, hidden = model(char)
  out = out.to(device)

  prob = nn.functional.softmax(out[-1], dim=0).data
  # Take class with the highest probability score from the output
  char_ind = torch.max(prob, dim=0)[1].item()

  return int2char[char_ind], hidden

In [77]:
def sample(model, out_len, start='hey'):
  model.eval() # Eval mode
  start = start.lower()
  chars = [ch for ch in start]
  size = out_len - len(chars)

  for ii in range(size):
    char, h = predict(model, chars)
    chars.append(char)

  return ''.join(chars)

In [78]:
sample(model_seq, 15, 'hey')

'hey how are you'

In [79]:
sample(model_seq, 15, 'have')

'haveey how are '

In [80]:
sample(model_seq, 15, 'good')

'good are y how '

In [81]:
sample(model, 15, 'hey')

'hey how are you'

In [82]:
sample(model, 15, 'have')

'have  hououve a'

In [83]:
sample(model, 15, 'good')

'good a areow ar'

In [84]:
sample(model_ans, 15, 'hey')

'hey how are you'

In [85]:
sample(model_ans, 15, 'have')

'haveoue how are'

In [86]:
sample(model_ans, 15, 'good')

'gooday noue how'

Why doesn't model give good results? Model architecture is the same. Not sure what went wrong as compared to the tutorial.