# Assignment 13 The last but not the least : RNN

The idea behind RNNs is to make use of sequential information. In a traditional neural network we assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very bad idea. If you want to predict the next word in a sentence you better know which words came before it. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far. In theory RNNs can make use of information in arbitrarily long sequences, but in practice they are limited to looking back only a few steps (more on this later). Here is what a typical Vanilla RNN looks like:

In [1]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://miro.medium.com/max/4000/0*WdbXF_e8kZI1R5nQ.png", width=700)

In this, we will not work on images for more simplicity. Instead of that, let's be generative on text, no? One of the classical application of RNN network is to be able to predict the next value given a certain amount of older one. Compare to the previous network, you give t word ($x_t$) and the network output you $x_{t+1}$ (here it's $h_t$)

In [2]:
Image(url= "https://iq.opengenus.org/content/images/2019/12/1_XvUt5wDQA8D3C0wAuxAvbA.png", width=700)

By doing so, could it be able to train a network to be able to write something? Let's try to make a Siliconspears. In the data folder, you have now a shakespear folder containing all the work of William. Your task will be to implement a RNN who learn how to write some shakespear! (it will just sham of course). 

Bellow, you will find all the utilitary code to be able to perform it. The Corpus class is a dataset and you can call a batch with the target by applying get_batch to a batchifyed dataset

In [0]:
import torch
import torch.nn as nn
import torch.autograd as autograd
import torch.cuda as cuda
import torch.optim as optim
import torch.nn.functional as F
import os
import tqdm
import numpy as np

In [0]:
class Dictionary(object):
    def __init__(self):
        self.word2idx = {}
        self.idx2word = []

    def add_word(self, word):
        if word not in self.word2idx:
            self.idx2word.append(word)
            self.word2idx[word] = len(self.idx2word) - 1
        return self.word2idx[word]

    def __len__(self):
        return len(self.idx2word)


class Corpus(object):
    def __init__(self, path):
        self.dictionary = Dictionary()
        
        # This is very english language specific
        # We will ingest only these characters:
        self.whitelist = [chr(i) for i in range(32, 127)]
        
        self.train = self.tokenize(os.path.join(path, 'train.txt'))
        self.valid = self.tokenize(os.path.join(path, 'valid.txt'))

    def tokenize(self, path):
        """Tokenizes a text file."""
        assert os.path.exists(path)
        # Add words to the dictionary
        with open(path, 'r',  encoding="utf8") as f:
            tokens = 0
            for line in f:
                line = ''.join([c for c in line if c in self.whitelist])
                words = line.split() + ['<eos>']
                tokens += len(words)
                for word in words:
                    self.dictionary.add_word(word)

        # Tokenize file content
        with open(path, 'r',  encoding="utf8") as f:
            ids = torch.LongTensor(tokens)
            token = 0
            for line in f:
                line = ''.join([c for c in line if c in self.whitelist])
                words = line.split() + ['<eos>']
                for word in words:
                    ids[token] = self.dictionary.word2idx[word]
                    token += 1

        return ids
    
def batchify(data, batch_size):
    # Work out how cleanly we can divide the dataset into bsz parts.
    nbatch = data.size(0) // batch_size
    # Trim off any extra elements that wouldn't cleanly fit (remainders).
    data = data.narrow(0, 0, nbatch * batch_size)
    # Evenly divide the data across the bsz batches.
    data = data.view(batch_size, -1).t().contiguous()
    if cuda.is_available():
        data = data.cuda()
    return data

def get_batch(source, i, evaluation=False):
    seq_len = min(bptt_size, len(source) - 1 - i)
    data = Variable(source[i:i+seq_len], volatile=evaluation)
    target = Variable(source[i+1:i+1+seq_len].view(-1))
    if cuda.is_available():
        data = data.cuda()
        target = target.cuda()
    return data, target

In [0]:
#corpus = Corpus('./shakespear')

In [11]:
!wget "https://www.dropbox.com/s/st67040zaw8fs4e/train.txt?dl=0"

!ls

--2020-02-12 19:58:45--  https://www.dropbox.com/s/st67040zaw8fs4e/train.txt?dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.7.1, 2620:100:601b:1::a27d:801
Connecting to www.dropbox.com (www.dropbox.com)|162.125.7.1|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/st67040zaw8fs4e/train.txt [following]
--2020-02-12 19:58:45--  https://www.dropbox.com/s/raw/st67040zaw8fs4e/train.txt
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc50872e72e7927c307f604fc25c.dl.dropboxusercontent.com/cd/0/inline/Ax8A4PqqZqNyFwshCIxrWfJ2FZw8EsSNd5WaTY3rv5X3MpGUGN7629suAB-3nREVRN7UsaVTs82UCW_lNR92piT4ARt_b-Bqad0aEAgYqC1mtIgTuQx-yDJaEeSQYVH-Ync/file# [following]
--2020-02-12 19:58:45--  https://uc50872e72e7927c307f604fc25c.dl.dropboxusercontent.com/cd/0/inline/Ax8A4PqqZqNyFwshCIxrWfJ2FZw8EsSNd5WaTY3rv5X3MpGUGN7629suAB-3nREVRN7UsaVTs82UCW_lNR92piT4ARt_b-Bqad0aEAgYqC1mtIgTuQx-yD

In [7]:
"""vocab = len(corpus.dictionary)
print(vocab)
n_hidden = 100
n_layers = 2
batch_size = 100
"""

'vocab = len(corpus.dictionary)\nprint(vocab)\nn_hidden = 100\nn_layers = 2\nbatch_size = 100\n'

In [12]:
# Switched to character based RNN, ´cause we know how :D

path_to_file = './train.txt?dl=0'
text = open(path_to_file, encoding='utf-8').read()

print(text[0:2500])


THE SONNETS

                    1

From fairest creatures we desire increase,
That thereby beauty’s rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou contracted to thine own bright eyes,
Feed’st thy light’s flame with self-substantial fuel,
Making a famine where abundance lies,
Thy self thy foe, to thy sweet self too cruel:
Thou that art now the world’s fresh ornament,
And only herald to the gaudy spring,
Within thine own bud buriest thy content,
And, tender churl, mak’st waste in niggarding:
  Pity the world, or else this glutton be,
  To eat the world’s due, by the grave and thee.


                    2

When forty winters shall besiege thy brow,
And dig deep trenches in thy beauty’s field,
Thy youth’s proud livery so gazed on now,
Will be a tattered weed of small worth held:
Then being asked, where all thy beauty lies,
Where all the treasure of thy lusty days;
To say, within thine own deep sunken eyes,
Were an all-eatin

In [25]:
# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

90 unique characters


In [0]:
# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
text_as_int = np.array([char2idx[c] for c in text])

# Create a mapping from indices to characters
idx2char = np.array(vocab)

In [0]:
def one_hot_encode(arr, n_labels):
    # Initialize the the encoded array
    one_hot = np.zeros((np.multiply(*arr.shape), n_labels), dtype=np.float32)

    # Fill the appropriate elements with ones
    one_hot[np.arange(one_hot.shape[0]), arr.flatten()] = 1.

    # Finally reshape it to get back to the original array
    one_hot = one_hot.reshape((*arr.shape, n_labels))

    return one_hot

In [0]:
def get_batches(arr, batch_size, seq_length):
    '''Create a generator that returns batches of size
       batch_size x seq_length from arr.

       Arguments
       ---------
       arr: Array you want to make batches from
       batch_size: Batch size, the number of sequences per batch
       seq_length: Number of encoded chars in a sequence
    '''

    batch_size_total = batch_size * seq_length
    # total number of batches we can make
    n_batches = len(arr) // batch_size_total

    # Keep only enough characters to make full batches
    arr = arr[:n_batches * batch_size_total]
    # Reshape into batch_size rows
    arr = arr.reshape((batch_size, -1))

    # iterate through the array, one sequence at a time
    for n in range(0, arr.shape[1], seq_length):
        # The features
        x = arr[:, n:n + seq_length]
        # The targets, shifted by one
        y = np.zeros_like(x)
        try:
            y[:, :-1], y[:, -1] = x[:, 1:], arr[:, n + seq_length]
        except IndexError:
            y[:, :-1], y[:, -1] = x[:, 1:], arr[:, 0]
        yield x, y

In [29]:
train_on_gpu = torch.cuda.is_available()
print ('Training on GPU' if train_on_gpu else 'Training on CPU')

Training on GPU


In [0]:
class VanillaCharRNN(nn.Module):
    def __init__(self, vocab, n_hidden=256, n_layers=2,
                 drop_prob=0.5, lr=0.001):
        super().__init__()
        self.drop_prob = drop_prob
        self.n_layers = n_layers
        self.n_hidden = n_hidden
        self.lr = lr
        self.vocab = vocab
        
        '''TODO: define the layers you need for the model'''
        #self.input_layer = nn.Linear(n_hidden + len(vocab), n_hidden)
        self.rnn = nn.RNN(len(vocab), n_hidden, n_layers, batch_first=True)
        #self.hidden_layer = nn.Linear(n_hidden, len(vocab))
        #self.softmax = nn.LogSoftmax(dim=1)
        
        self.fc = nn.Linear(n_hidden, len(vocab))

    def forward(self, x, hidden):
        '''TODO: Forward pass through the network
        x is the input and `hidden` is the hidden/cell state .'''

        
        #x_cat = torch.cat((x, hidden), 1)
        #hidden_t = self.hidden_layer(x_cat)
        #out = self.softmax(x_cat)
        
        #for h in hidden:
            #print(h.shape)
        
        #for h in hidden:
         #   hidden_t = torch.stack(hidden, dim=0)
        
        #hidden_t = torch.stack(hidden) 
        #print(hidden.shape)
        #print(x.shape)
        #out, hidden_t = self.rnn(x, hidden)
        
        #out = self.fc(out[:, -1, :])
        #print(out)
        
        hidden_t = torch.stack(hidden) # convert tensor tuple to hidden tensor
        
        if (train_on_gpu):
            hidden_t = hidden_t.cuda()
            x = x.cuda()
        
        out, hidden_t = self.rnn(x, hidden_t)
        out = self.fc(out)
        
        
        # return the final output and the hidden state
        return out, hidden_t

    def init_hidden(self, batch_size):
        ''' Initializes hidden state '''
        
        hidden = torch.zeros(self.n_layers, batch_size, self.n_hidden)

        return hidden

In [0]:
def train(model, data, epochs=10, batch_size=10, seq_length=50, lr=0.001, clip=5, val_frac=0.1, print_every=10):
    ''' Training a network

        Arguments
        ---------

        model: CharRNN network
        data: text data to train the network
        epochs: Number of epochs to train
        batch_size: Number of mini-sequences per mini-batch, aka batch size
        seq_length: Number of character steps per mini-batch
        lr: learning rate
        clip: gradient clipping
        val_frac: Fraction of data to hold out for validation
        print_every: Number of steps for printing training and validation loss

    '''
    model.train()

    opt = torch.optim.Adam(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()

    # create training and validation data
    val_idx = int(len(data) * (1 - val_frac))
    data, val_data = data[:val_idx], data[val_idx:]

    if (train_on_gpu):
        model.cuda()

    counter = 0
    n_vocab = len(model.vocab)
    for e in range(epochs):
        # initialize hidden state
        h = model.init_hidden(batch_size)
        #print(h.shape)
        
        '''TODO: use the get_batches function to generate sequences of the desired size'''
        dataset = get_batches(data, batch_size, seq_length)

        for x, y in dataset:
            counter += 1

            # One-hot encode our data and make them Torch tensors
            x = one_hot_encode(x, n_vocab)
            inputs, targets = torch.from_numpy(x), torch.from_numpy(y)

            if (train_on_gpu):
                inputs, targets = inputs.cuda(), targets.cuda()

            # Creating new variables for the hidden state, otherwise
            # we'd backprop through the entire training history
            h = tuple([each.data for each in h])
            
            #h_state = torch.tensor([h[0][0]], [h[1]])
            #print(x.shape)
            
            #if (train_on_gpu):
            #    h = (h.cuda())

            # zero accumulated gradients
            model.zero_grad()
            
            '''TODO: feed the current input into the model and generate output'''
            output, h = model(inputs, h)
            
            flat_targets = targets.view(500)
            flat_output = output.view(-1, len(vocab))
            
            #print(flat_targets.shape)
            #print(flat_output.shape)

            '''TODO: compute the loss!'''
            loss = criterion(flat_output, flat_targets)
            
            # perform backprop
            loss.backward()
            # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
            nn.utils.clip_grad_norm_(model.parameters(), clip)
            opt.step()

            # loss stats
            if counter % print_every == 0:
                # Get validation loss
                val_h = model.init_hidden(batch_size)
                val_losses = []
                model.eval()
                for x, y in get_batches(val_data, batch_size, seq_length):
                    # One-hot encode our data and make them Torch tensors
                    x = one_hot_encode(x, n_vocab)
                    x, y = torch.from_numpy(x), torch.from_numpy(y)

                    # Creating new variables for the hidden state, otherwise
                    # we'd backprop through the entire training history
                    val_h = tuple([each.data for each in val_h])

                    inputs, targets = x, y
                    if (train_on_gpu):
                        inputs, targets = inputs.cuda(), targets.cuda()

                    '''TODO: feed the current input into the model and generate output'''
                    output, val_h = model(inputs, val_h)
                    
                    flat_targets_val = targets.view(500)
                    flat_output_val = output.view(-1, len(vocab))
                    
                    '''TODO: compute the validation loss!'''
                    val_loss = criterion(flat_output_val, flat_targets_val)

                    val_losses.append(val_loss.item())

                print("Epoch: {}/{}...".format(e + 1, epochs),
                      "Step: {}...".format(counter),
                      "Loss: {:.4f}...".format(loss.item()),
                      "Val Loss: {:.4f}".format(np.mean(val_losses)))
                
                '''TODO: sample from the model to generate texts'''
                input_eval = 'E'
                print(sample(model, 1000, prime=input_eval, top_k=10))
                
                model.train()  # reset to train mode after iterationg through validation data

In [0]:
def predict(model, char, h=None, top_k=None):
    ''' Given a character, predict the next character.
        Returns the predicted character and the hidden state.
    '''
    #print('char: ' + str(char))
    #print('char_idx: ' + str(char2idx[char]))
    # tensor inputs
    
    x = np.array([[char2idx[char]]])
    x = one_hot_encode(x, len(model.vocab))
    inputs = torch.from_numpy(x)

    if (train_on_gpu):
        inputs = inputs.cuda()

    # detach hidden state from history
    h = tuple([each.data for each in h])
    '''TODO: feed the current input into the model and generate output'''
    output, h = model(inputs, h) # TODO

    # get the character probabilities
    p = F.softmax(output, dim=-1).data
    if (train_on_gpu):
        p = p.cpu()  # move to cpu

    # get top characters
    if top_k is None:
        top_ch = np.arange(len(model.vocab))
    else:
        p, top_ch = p.topk(top_k)
        top_ch = top_ch.numpy().squeeze()

    # select the likely next character with some element of randomness
    p = p.numpy().squeeze()
    char = np.random.choice(top_ch, p=p / p.sum())

    # return the encoded value of the predicted char and the hidden state
    return idx2char[char], h

In [0]:
def sample(model, size, prime='The', top_k=None):
    if (train_on_gpu):
        model.cuda()
    else:
        model.cpu()

    model.eval()  # eval mode

    # First off, run through the prime characters
    chars = [ch for ch in prime]
    h = model.init_hidden(1)
    for ch in prime:
        char, h = predict(model, ch, h, top_k=top_k)

    chars.append(char)

    for ii in range(size):
      '''TODO: pass in the previous character and get a new one'''
      char, h = predict(model, chars[-1], h, top_k=top_k)
      chars.append(char)

    model.train()
    return ''.join(chars)

In [37]:
''''TODO: Try changing the number of units in the network to see how it affects performance'''
n_hidden = 256
n_layers = 2

vanilla_model = VanillaCharRNN(vocab, n_hidden, n_layers)
print(vanilla_model)

VanillaCharRNN(
  (rnn): RNN(90, 256, num_layers=2, batch_first=True)
  (fc): Linear(in_features=256, out_features=90, bias=True)
)


In [0]:
''''TODO: Try changing the hyperparameters in the network to see how it affects performance'''
batch_size = 10
seq_length = 50
n_epochs = 5  # start smaller if you are just testing initial behavior

In [39]:
train(vanilla_model, text_as_int, epochs=n_epochs, batch_size=batch_size, seq_length=seq_length, lr=0.001, print_every=1000)

Epoch: 1/5... Step: 1000... Loss: 1.9334... Val Loss: 2.0892
Exs, bertes this spear.
  ORTINSORD IO. Ind wall,
What be dice on wricce thy hish wit dowerer
    The as, tent a ware appere,
    Throw if son lough, and sigh all sime a dray, with thoo arath this changels and hit stricincands carned; in more, and buce;
    Tue thank ind the she seick.
  SIATROSANI CIUSTAN. I wis ditk amo,
    But der why paed in ord wo the hit so thinglerdes appear'd, inlent on pry this parit one this a this so pranty
    Wet the wan a trease,
    So to mare som stall a danger thy thime wrytire ill bust mikes, my terer worthelt to and har beant the man seling to throm core, me a stound thath
    And a mer sheast be ngalk.
                                                                                                                                                  [Oxiuris thou,
    This; and hank.
    TAxtun with sterent be mas athime,
    The bith nom he tall servite
Singene, and as frall insa tincont in 