# Assignment 13 The last but not the least : RNN

The idea behind RNNs is to make use of sequential information. In a traditional neural network we assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very bad idea. If you want to predict the next word in a sentence you better know which words came before it. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far. In theory RNNs can make use of information in arbitrarily long sequences, but in practice they are limited to looking back only a few steps (more on this later). Here is what a typical Vanilla RNN looks like:

In [1]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://miro.medium.com/max/4000/0*WdbXF_e8kZI1R5nQ.png", width=700)

In this, we will not work on images for more simplicity. Instead of that, let's be generative on text, no? One of the classical application of RNN network is to be able to predict the next value given a certain amount of older one. Compare to the previous network, you give t word ($x_t$) and the network output you $x_{t+1}$ (here it's $h_t$)

In [2]:
Image(url= "https://iq.opengenus.org/content/images/2019/12/1_XvUt5wDQA8D3C0wAuxAvbA.png", width=700)

By doing so, could it be able to train a network to be able to write something? Let's try to make a Siliconspears. In the data folder, you have now a shakespear folder containing all the work of William. Your task will be to implement a RNN who learn how to write some shakespear! (it will just sham of course). 

Bellow, you will find all the utilitary code to be able to perform it. The Corpus class is a dataset and you can call a batch with the target by applying get_batch to a batchifyed dataset

In [3]:
import torch
import torch.nn as nn
import torch.autograd as autograd
import torch.cuda as cuda
import torch.optim as optim
import os
import tqdm
import numpy as np

In [4]:
BPTT_SIZE = 20  # backpropagation through time

In [5]:
class Dictionary:
    def __init__(self):
        self.word2idx = {}
        self.idx2word = []

    def add_word(self, word):
        if word not in self.word2idx:
            self.idx2word.append(word)
            self.word2idx[word] = len(self.idx2word) - 1
        return self.word2idx[word]

    def __len__(self):
        return len(self.idx2word)


class Corpus:
    def __init__(self, path):
        self.dictionary = Dictionary()
        
        # This is very english language specific
        # We will ingest only these characters:
        self.whitelist = [chr(i) for i in range(32, 127)]
        
        self.train = self.tokenize(os.path.join(path, 'train.txt'))
        self.valid = self.tokenize(os.path.join(path, 'valid.txt'))

    def tokenize(self, path):
        """Tokenizes a text file."""
        assert os.path.exists(path)
        # Add words to the dictionary
        with open(path, 'r',  encoding="utf8") as f:
            tokens = 0
            for line in f:
                line = ''.join([c for c in line if c in self.whitelist])
                words = line.split() + ['<eos>']
                tokens += len(words)
                for word in words:
                    self.dictionary.add_word(word)

        # Tokenize file content
        with open(path, 'r',  encoding="utf8") as f:
            ids = torch.LongTensor(tokens)
            token = 0
            for line in f:
                line = ''.join([c for c in line if c in self.whitelist])
                words = line.split() + ['<eos>']
                for word in words:
                    ids[token] = self.dictionary.word2idx[word]
                    token += 1

        return ids
    
def batchify(data, batch_size):
    # Work out how cleanly we can divide the dataset into bsz parts.
    nbatch = data.size(0) // batch_size
    # Trim off any extra elements that wouldn't cleanly fit (remainders).
    data = data.narrow(0, 0, nbatch * batch_size)
    # Evenly divide the data across the bsz batches.
    data = data.view(batch_size, -1).t().contiguous()
    if cuda.is_available():
        data = data.cuda()
    return data

def get_batch(source, i, evaluation=False):
    seq_len = min(BPTT_SIZE, len(source) - 1 - i)
    data = Variable(source[i:i+seq_len], volatile=evaluation)
    target = Variable(source[i+1:i+1+seq_len].view(-1))
    if cuda.is_available():
        data = data.cuda()
        target = target.cuda()
    return data, target

In [6]:
corpus = Corpus('./shakespear')

In [7]:
vocab = len(corpus.dictionary)
print(vocab)
n_hidden = 100
n_layers = 2
batch_size = 100

74010


In [8]:
# NET
class RNN(nn.Module):
    def __init__(self, n_layers, n_hidden):
        super().__init__()
        
        self.n_layers = n_layers
        self.n_hidden = n_hidden
        
        self.rnn = nn.RNN(vocab, n_hidden, n_layers, batch_first=True)
        self.fc = nn.Linear(n_hidden, vocab)
    
    def forward(self, x, hidden):
        out, hidden_t = self.rnn(x, hidden)
        out = self.fc(out)
        
        return out, hidden_t
    
    def init_hidden(self, batch_size):
        hidden = torch.zeros(2, self.n_layers, batch_size, self.n_hidden)
        return hidden

In [9]:
# BATCH DATA
data_train = batchify(corpus.train, batch_size)
data_val = batchify(corpus.valid, batch_size)
print(data_train.shape, data_val.shape)

data = (data_train, data_val)

torch.Size([10399, 100]) torch.Size([634, 100])


In [10]:
# TRAIN
def train(model, data, epochs=5, lr=0.001, print_every=10):
    model.train()
    
    opt = torch.optim.Adam(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()
    
    data_t = data[0]
    data_v = data[1]
    
    print_counter = 0
    for epoch in range(epochs):
        h = model.init_hidden(batch_size)
        
        dataset = get_batch(data_t, 10)
        print(dataset.shape)
        

    

In [11]:
net = RNN(n_layers, n_hidden)
train(net, data)

NameError: name 'Variable' is not defined

In [None]:
# TRAIN
def train(model, data, data_val, epochs=10, batch_size=10, seq_length=50, lr=0.001, clip=5, val_frac=0.1, print_every=10):
    ''' Training a network

        Arguments
        ---------

        model: CharRNN network
        data: text data to train the network
        epochs: Number of epochs to train
        batch_size: Number of mini-sequences per mini-batch, aka batch size
        seq_length: Number of character steps per mini-batch
        lr: learning rate
        clip: gradient clipping
        val_frac: Fraction of data to hold out for validation
        print_every: Number of steps for printing training and validation loss

    '''
    model.train()

    opt = torch.optim.Adam(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()

    # create training and validation data
    val_idx = int(len(data) * (1 - val_frac))
    data, val_data = data[:val_idx], data[val_idx:]

    if (train_on_gpu):
        model.cuda()

    counter = 0
    n_vocab = len(model.vocab)
    for e in range(epochs):
        # initialize hidden state
        h = model.init_hidden(batch_size)
        #print(h.shape)
        
        '''TODO: use the get_batches function to generate sequences of the desired size'''
        dataset = get_batches(data, batch_size, seq_length)

        for x, y in dataset:
            counter += 1

            # One-hot encode our data and make them Torch tensors
            x = one_hot_encode(x, n_vocab)
            inputs, targets = torch.from_numpy(x), torch.from_numpy(y)

            if (train_on_gpu):
                inputs, targets = inputs.cuda(), targets.cuda()

            # Creating new variables for the hidden state, otherwise
            # we'd backprop through the entire training history
            h = tuple([each.data for each in h])
            
            #h_state = torch.tensor([h[0][0]], [h[1]])
            #print(x.shape)
            
            #if (train_on_gpu):
            #    h = (h.cuda())

            # zero accumulated gradients
            model.zero_grad()
            
            '''TODO: feed the current input into the model and generate output'''
            output, h = model(inputs, h)
            
            #print(targets.shape)
            #print(output.shape)
            
            flat_targets = targets.flatten()
            flat_output = output.view(-1, len(vocab))
            
            #print(flat_targets.shape)
            #print(flat_output.shape)

            '''TODO: compute the loss!'''
            loss = criterion(flat_output, flat_targets)
            
            # perform backprop
            loss.backward()
            # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
            nn.utils.clip_grad_norm_(model.parameters(), clip)
            opt.step()

            # loss stats
            if counter % print_every == 0:
                # Get validation loss
                val_h = model.init_hidden(batch_size)
                val_losses = []
                model.eval()
                for x, y in get_batches(val_data, batch_size, seq_length):
                    # One-hot encode our data and make them Torch tensors
                    x = one_hot_encode(x, n_vocab)
                    x, y = torch.from_numpy(x), torch.from_numpy(y)

                    # Creating new variables for the hidden state, otherwise
                    # we'd backprop through the entire training history
                    val_h = tuple([each.data for each in val_h])

                    inputs, targets = x, y
                    if (train_on_gpu):
                        inputs, targets = inputs.cuda(), targets.cuda()

                    '''TODO: feed the current input into the model and generate output'''
                    output, val_h = model(inputs, val_h)
                    
                    flat_targets_val = targets.flatten()
                    flat_output_val = output.view(-1, len(vocab))
                    
                    '''TODO: compute the validation loss!'''
                    val_loss = criterion(flat_output_val, flat_targets_val)

                    val_losses.append(val_loss.item())

                print("Epoch: {}/{}...".format(e + 1, epochs),
                      "Step: {}...".format(counter),
                      "Loss: {:.4f}...".format(loss.item()),
                      "Val Loss: {:.4f}".format(np.mean(val_losses)))
                
                '''TODO: sample from the model to generate texts'''
                input_eval = 'E'
                print(sample(model, 1000, prime=input_eval, top_k=10))
                
                model.train()  # reset to train mode after iterationg through validation data

In [None]:
# PARAMS
