# Lab 6: Sequence-to-sequence models

## Description:
For this lab, you will code up the [char-rnn model of Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). This is a recurrent neural network that is trained probabilistically on sequences of characters, and that can then be used to sample new sequences that are like the original.

This lab will help you develop several new skills, as well as understand some best practices needed for building large models. In addition, we'll be able to create networks that generate neat text!

## There are two parts of this lab:
###  1.   Wiring up a basic sequence-to-sequence computation graph
###  2.   Implementing your own GRU cell.


An example of my final samples are shown below (more detail in the
final section of this writeup), after 150 passes through the data.
Please generate about 15 samples for each dataset.

<code>
And ifte thin forgision forward thene over up to a fear not your
And freitions, which is great God. Behold these are the loss sub
And ache with the Lord hath bloes, which was done to the holy Gr
And appeicis arm vinimonahites strong in name, to doth piseling 
And miniquithers these words, he commanded order not; neither sa
And min for many would happine even to the earth, to said unto m
And mie first be traditions? Behold, you, because it was a sound
And from tike ended the Lamanites had administered, and I say bi
</code>


---

## Part 0: Readings, data loading, and high level training

---

There is a tutorial here that will help build out scaffolding code, and get an understanding of using sequences in pytorch.

* Read the following

> * [Pytorch sequence-to-sequence tutorial](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html)
* [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)






In [2]:
# Get the text files that will be used for training and testing and unzip them
! wget -O ./text_files.tar.gz 'https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz' 
! tar -xzf text_files.tar.gz
! pip install unidecode
! pip install torch

import unidecode
import string
import random
import re
 
import pdb
 
all_characters = string.printable
n_characters = len(all_characters)
file = unidecode.unidecode(open('./text_files/lotr.txt').read())
file_len = len(file)
print('file_len =', file_len)

--2019-10-18 02:20:53--  https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz
Resolving piazza.com (piazza.com)... 52.2.48.133, 34.205.95.128, 3.214.17.10, ...
Connecting to piazza.com (piazza.com)|52.2.48.133|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://d1b10bmlvqabco.cloudfront.net/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz [following]
--2019-10-18 02:20:53--  https://d1b10bmlvqabco.cloudfront.net/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz
Resolving d1b10bmlvqabco.cloudfront.net (d1b10bmlvqabco.cloudfront.net)... 13.224.12.211, 13.224.12.155, 13.224.12.150, ...
Connecting to d1b10bmlvqabco.cloudfront.net (d1b10bmlvqabco.cloudfront.net)|13.224.12.211|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1533290 (1.5M) [application/x-gzip]
Saving to: ‘./text_files.tar.gz’


2019-10-18 02:20:53 (33.0 M

In [3]:
chunk_len = 200
# Chunks are like one training example (similar to one image for a CNN)
def random_chunk():
  start_index = random.randint(0, file_len - chunk_len)
  end_index = start_index + chunk_len + 1
  return file[start_index:end_index]
  
print(random_chunk())

d, sparkling 
with joy. And then he cast the leaves into the bowls of steaming water that 
were brought to him, and at once all hearts were lightened. For the 
fragrance that came to each was like a me


In [4]:
import torch
from torch.autograd import Variable
# Turn string into list of longs
def char_tensor(string):
  tensor = torch.zeros(len(string)).long()
  for c in range(len(string)):
      tensor[c] = all_characters.index(string[c])
  return Variable(tensor)

print(char_tensor('abcdDEF'))

tensor([10, 11, 12, 13, 39, 40, 41])


---

## Part 4: Creating your own GRU cell 

**(Come back to this later - its defined here so that the GRU will be defined before it is used)**

---

The cell that you used in Part 1 was a pre-defined Pytorch layer. Now, write your own GRU class using the same parameters as the built-in Pytorch class does.

Please try not to look at the GRU cell definition. The answer is right there in the code, and in theory, you could just cut-and-paste it. This bit is on your honor!

**TODO:**

**DONE:**
* Create a custom GRU cell


In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn.parameter import Parameter

class GRU(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers):
    super(GRU, self).__init__()

    self.input_size = input_size
    self.hidden_size = hidden_size
  
    self.sigmoid = nn.Sigmoid()
    self.tanh = nn.Tanh()

    # Our 3 nn.Linear Modules represent the parameters our 6 Weight Matrices, as well as the
    # biases that are present in the forward equation description. This is because nn.Linear has a default bias term that it keeps track of as well as its own
    # matrix weights
    self.W_r = nn.Linear(input_size + hidden_size, hidden_size)
    self.W_z = nn.Linear(input_size + hidden_size, hidden_size)
    self.W_h = nn.Linear(input_size + hidden_size, hidden_size)
    
  
  def forward(self, input_x, hidden):
    # Each layer does the following:
    # r_t = sigmoid(W_ir*x_t + b_ir + W_hr*h_(t-1) + b_hr)
    # z_t = sigmoid(W_iz*x_t + b_iz + W_hz*h_(t-1) + b_hz)
    # n_t = tanh(W_in*x_t + b_in + r_t**(W_hn*h_(t-1) + b_hn))
    # h_(t) = (1 - z_t)**n_t + z_t**h_(t-1)
    # Where ** is hadamard product (not matrix multiplication, but elementwise multiplication)
    
    r_t = self.sigmoid(self.W_r(torch.cat((input_x, hidden), dim=2)))
    z_t = self.sigmoid(self.W_z(torch.cat((input_x, hidden), dim=2)))
    h_tilde = self.tanh(self.W_h(torch.cat((input_x, torch.mul(r_t, hidden)), dim=2)))
    h_t = torch.mul(z_t, hidden) + torch.mul((1-z_t), h_tilde)

    return h_t, h_t



---

##  Part 1: Building a sequence to sequence model

---

Great! We have the data in a useable form. We can switch out which text file we are reading from, and trying to simulate.

We now want to build out an RNN model, in this section, we will use all built in Pytorch pieces when building our RNN class.


**TODO:**

**DONE:**
* Create an RNN class that extends from nn.Module.


In [0]:
class RNN(nn.Module):
  def __init__(self, input_size, hidden_size, output_size, n_layers=1):
    super(RNN, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.n_layers = n_layers
    
    self.embedding = nn.Embedding(input_size, hidden_size)
    self.gru = GRU(hidden_size, hidden_size, num_layers = n_layers) # Custom-built GRU only has 1 layer even though it takes the parameter num_layers.
    # self.gru = nn.GRU(hidden_size, hidden_size, num_layers = n_layers)
    self.decoder = nn.Linear(hidden_size, output_size) # Takes GRU output and turns it into a character output

  def forward(self, input_char, hidden_state):
    """Uses the GRU architecture, combined with previous hidden states as the forward pass"""
    encoded_input = self.embedding(input_char).view(1,1,-1)
    output, hidden = self.gru(encoded_input, hidden_state)
    out_decoded = self.decoder(output)
    
    return out_decoded, hidden

  def init_hidden(self):
    return Variable(torch.zeros(self.n_layers, 1, self.hidden_size))

In [0]:
def random_training_set():    
  chunk = random_chunk()
  inp = char_tensor(chunk[:-1])
  target = char_tensor(chunk[1:])
  return inp, target

---

## Part 2: Sample text and Training information

---

We now want to be able to train our network, and sample text after training.

This function outlines how training a sequence style network goes. 

**TODO:**

**DONE:**
* Fill in the pieces.


In [0]:
def train(inp, target):
  ## initialize hidden layers, set up gradient and loss 
    # your code here
  ## /
  decoder_optimizer.zero_grad()
  hidden_state = decoder.init_hidden()
  loss = 0
  
  # Go through all the characters in the chunk of text and predict on them and compute the loss
  for i, input_char in enumerate(inp):
    output, hidden_state = decoder(input_char, hidden_state)
    loss += criterion(output.squeeze(1),target[i].unsqueeze(0))
  
  # Backpropogate and update the weights through the optimizer
  loss.backward()
  decoder_optimizer.step()
  
  return loss.item() / len(inp)


---

## Part 3: Sample text and Training information

---

You can at this time, if you choose, also write out your train loop boilerplate that samples random sequences and trains your RNN. This will be helpful to have working before writing your own GRU class.

If you are finished training, or during training, and you want to sample from the network you may consider using the following function. If your RNN model is instantiated as `decoder`then this will probabilistically sample a sequence of length `predict_len`

**TODO:**

**DONE:**
* Fill out the evaluate function to generate text frome a primed string


In [0]:
def evaluate(prime_str='A', predict_len=100, temperature=0.8):
  #Initialize hidden state and prediction string that will be returned
  hidden_state = decoder.init_hidden()
  prediction = prime_str
  
  # Use all of the input string except the last character to update our hidden state
  for i in range(len(prime_str) - 1):
    hidden_state = decoder(char_tensor(prime_str[i]), hidden_state)[1]  

  for i in range(predict_len):
    # Our first input character is the previous character of the current prediction
    input_char = char_tensor(prediction[-1])
    out, hidden_state = decoder(input_char, hidden_state)
    # Turn out vector into output probabilities and sample from it using the multinomial distribution
    scaled_output = torch.exp(out/temperature)
    chosen_char_idx = torch.multinomial(scaled_output.squeeze(0).squeeze(0), 1, replacement=True)
    # Get the predicted character and add it to the prediction string
    next_character = all_characters[chosen_char_idx]
    prediction += next_character
  
  return prediction


---

## Part 4: (Create a GRU cell, requirements above)

---



---

## Part 5: Run it and generate some text!

---

Assuming everything has gone well, you should be able to run the main function in the scaffold code, using either your custom GRU cell or the built in layer, and see output something like this. I trained on the “lotr.txt” dataset, using chunk_length=200, hidden_size=100 for 2000 epochs gave.

**TODO:** 

**DONE:**
* Create some cool output


In [0]:
import time
n_epochs = 2000
print_every = 200
plot_every = 10
hidden_size = 200
n_layers = 1
lr = 0.001
 
decoder = RNN(n_characters, hidden_size, n_characters, n_layers)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()
 
start = time.time()
all_losses = []
loss_avg = 0

In [15]:
for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())       
  loss_avg += loss_

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[55.48689937591553 (200 10%) 2.0389]
Wh)ry' 
hat on the and whall laot ent thit hain 

Dand and the thron semun ong came dous waid oa and c 

[109.43169116973877 (400 20%) 2.0838]
Whow. 






'
The lighted that we lime am his to the he or a 
you his shad 
sond core we dind. He may 

[162.79693484306335 (600 30%) 1.7647]
Whcurso melt 
the storn went some were your could 
stoud they reach the Menmange we hid sein 
like wel 

[216.09594416618347 (800 40%) 1.7699]
Whrar, his worch the Riding theK up worly dearl, and the Shever. 'He ourreare mes 
of the stray bento  

[268.1515510082245 (1000 50%) 1.7462]
Whe seed the pory; but where fries mysome mones. They were all bot be wonewards of Blood. 'I his his d 

[320.13880801200867 (1200 60%) 1.7671]
Why! ' 

'You tam the encough tien it you of the Tarong things of the tood and 
round of the was a lik 

[372.27011156082153 (1400 70%) 1.6499]
When, That the road ingeth. All dismen gotto in the comirs 
lould, lbevone, and fell enside the with h 

In [16]:
# Generate some test examples
for i in range(20):
  start_strings = [" Th", " wh", " he", " I ", " ca", " G", " lo", " ra"]
  start = random.randint(0,len(start_strings)-1)
  print("Primer String: ", start_strings[start])
  print(evaluate(start_strings[start], 200), '\n')

Primer String:   wh
 when a bright 




had gate look and been a beave of 
then?' 

Burcues leat is a brat has the broken the Butter went turpped and 
call they pless of then hould 
could your have new, and there all the se 

Primer String:   Th
 There was not oud we hould ruapted and 

all 




bode 
to hazes. Bere what upon they 
at them as, and 
a'sure as 
Flowed and they be the Menay orehing did nearth of a least of them the stood or the gre 

Primer String:   lo
 long all streep the would glead, them 
hand. All beforge the passed, out low to not fairs death and the riders and saw there in the great fleft wround far all the Ends. But stood has them 

and bent pat 

Primer String:   wh
 where feen the Lord may have in as was the them fire and soon was spaine and the Butter up it and came out, and side for a leave the whore 
brown a the for, and the burnen down deed. The down all the ha 

Primer String:   I 
 I heard. 'Yes, and they seemed as Araborn did now them be 
felt the ropain 

---

## Part 6: Generate output on a different dataset

---

**TODO:**

* Choose a textual dataset. Here are some [text datasets](https://www.kaggle.com/datasets?tags=14104-text+data%2C13205-text+mining) from Kaggle 

* Generate some decent looking results and evaluate your model's performance (say what it did well / not so well)

**DONE:**



In [17]:
# Pick a new data set -> alma.txt  (From the Book of Mormon)
all_characters = string.printable
n_characters = len(all_characters)
file = unidecode.unidecode(open('./text_files/alma.txt').read())
file_len = len(file)
print('file_len =', file_len)

file_len = 466656


In [0]:
import time
n_epochs = 2000
print_every = 200
plot_every = 10
hidden_size = 200
n_layers = 1
lr = 0.001
 
decoder = RNN(n_characters, hidden_size, n_characters, n_layers)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()
 
start = time.time()
all_losses = []
loss_avg = 0


In [19]:
for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())       
  loss_avg += loss_

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[50.83067727088928 (200 10%) 2.1116]
Wha Ye unto and me for ite sthut bedem the com me Lorut kur the lupled I so behid; berburth camang the 

[101.10603857040405 (400 20%) 1.4683]
Whad in the to the preeding spomutess bight the jurder of the vent year unto unto this God of behold,  

[151.471097946167 (600 30%) 1.4549]
Whis many behold, our that shall the people.

 This sevile for the yelves time and the were soly house 

[201.62687373161316 (800 40%) 1.3712]
Whriss of the forth younstand.

 And not should both and to people ih the caunsed the live that they h 

[251.9292299747467 (1000 50%) 1.0266]
What ye to proched against and to sulfored common, and the ciest, and they sumber.

 Now we treit stal 

[302.2254202365875 (1200 60%) 1.3897]
Whis freed they may in the should for you the Nephites in ye did breth to is them twender of God of Go 

[352.5074257850647 (1400 70%) 1.2809]
Whi was are they people were overtancar the stroweth in the land me down sent that their in down and t 

[4

In [27]:
# Print test prime_str
for i in range(20):
  print(f"Example {i}: ", evaluate('And ', predict_len=100, temperature = 0.7))

Example 0:  And they did began to the enethren was all the wold of the word of God; for the people over the hands of
Example 1:  And their did with the pright unto they did of the church eveath and had brethren, behold, the word of m
Example 2:  And it be in the record a lardits, which was dist ye into the the seest behold, yea, and all the land no
Example 3:  And he caused by the seed upon the church--Ammon, they were became out of mented the destroyed preserver
Example 4:  And the words the would not sowards of the arms, and were men of their brethren your went of the Lamanit
Example 5:  And upon and destroused which were all these many more receir of ye was our did not before the Lramanite
Example 6:  And the Lord having command the our withed the church, and so destraties and dept king the Lord been des
Example 7:  And the people of souls of God, and they were brethren with and rearth to the resire and the sincerness 
Example 8:  And the church, and also day; for the prepent of did

Evaluation of my Char-RNN Model:

I decided to train my model on some of the text from the Book of Alma in the Book of Mormon. I think that my model did pretty well of learning words that are commonplace in the Book of Mormon. It was producing things like "Nephites, Lamanites, Moroni, etc." which are unique to the book. This I think is cool and a good indicator that my model is learning structure and content/context of the training text.

I think where my model struggles is in producing coherent sentences. It is able to copy sentence structure by producing periods and spaces (and sometimes even start and end quotes as seen when trained on LOTR). However, the sentences don't quite have any semantic meaning behind them. I think that is the biggest struggle of the model. But I am impressed that it was able to learn sentence structure and produce words that are would appear in the Book of Mormon, even if they aren't completely coherent.