<a 
href="https://colab.research.google.com/github/wingated/cs474_labs_f2019/blob/master/DL_Lab6.ipynb"
  target="_parent">
  <img
    src="https://colab.research.google.com/assets/colab-badge.svg"
    alt="Open In Colab"/>
</a>

# Lab 6: Sequence-to-sequence models

### Description:
For this lab, you will code up the [char-rnn model of Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). This is a recurrent neural network that is trained probabilistically on sequences of characters, and that can then be used to sample new sequences that are like the original.

This lab will help you develop several new skills, as well as understand some best practices needed for building large models. In addition, we'll be able to create networks that generate neat text!

### Deliverable:
- Fill in the code for the RNN (using PyTorch's built-in GRU).
- Fill in the training loop
- Fill in the evaluation loop. In this loop, rather than using a validation set, you will sample text from the RNN.
- Implement your own GRU cell.
- Train your RNN on a new domain of text (Star Wars, political speeches, etc. - have fun!)

### Grading Standards:
- 20% Implementation the RNN
- 20% Implementation training loop
- 20% Implementation of evaluation loop
- 20% Implementation of your own GRU cell
- 20% Training of your RNN on a domain of your choice

### Tips:
- Read through all the helper functions, run them, and make sure you understand what they are doing
- At each stage, ask yourself: What should the dimensions of this tensor be? Should its data type be float or int? (int is called `long` in PyTorch)
- Don't apply a softmax inside the RNN if you are using an nn.CrossEntropyLoss (this module already applies a softmax to its input).

### Example Output:
An example of my final samples are shown below (more detail in the
final section of this writeup), after 150 passes through the data.
Please generate about 15 samples for each dataset.

<code>
And ifte thin forgision forward thene over up to a fear not your
And freitions, which is great God. Behold these are the loss sub
And ache with the Lord hath bloes, which was done to the holy Gr
And appeicis arm vinimonahites strong in name, to doth piseling 
And miniquithers these words, he commanded order not; neither sa
And min for many would happine even to the earth, to said unto m
And mie first be traditions? Behold, you, because it was a sound
And from tike ended the Lamanites had administered, and I say bi
</code>


---

## Part 0: Readings, data loading, and high level training

---

There is a tutorial here that will help build out scaffolding code, and get an understanding of using sequences in pytorch.

* Read the following

> * [Pytorch sequence-to-sequence tutorial](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html) (You will be implementing the decoder, not the encoder, as we are not doing sequence-to-sequence translation.)
* [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)






In [1]:
! wget -O ./text_files.tar.gz 'https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz' 
! tar -xzf text_files.tar.gz
! pip install unidecode
! pip install torch

import unidecode
import string
import random
import re
 
import pdb
 
all_characters = string.printable
n_characters = len(all_characters)
file = unidecode.unidecode(open('./text_files/lotr.txt').read())
file_len = len(file)
print('file_len =', file_len)

--2021-10-15 23:23:04--  https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz
Resolving piazza.com (piazza.com)... 3.221.194.245, 52.44.149.188, 54.86.243.162, ...
Connecting to piazza.com (piazza.com)|3.221.194.245|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-uploads.piazza.com/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz [following]
--2021-10-15 23:23:04--  https://cdn-uploads.piazza.com/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz
Resolving cdn-uploads.piazza.com (cdn-uploads.piazza.com)... 13.225.229.42, 13.225.229.107, 13.225.229.31, ...
Connecting to cdn-uploads.piazza.com (cdn-uploads.piazza.com)|13.225.229.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1533290 (1.5M) [application/x-gzip]
Saving to: ‘./text_files.tar.gz’


2021-10-15 23:23:05 (9.43 MB/s) - ‘./text_files.tar.gz’ saved 

In [11]:
chunk_len = 200
 
def random_chunk():
  start_index = random.randint(0, file_len - chunk_len)
  end_index = start_index + chunk_len + 1
  return file[start_index:end_index]
  
print(random_chunk())

In my just censure, in my true opinion!
Alack, for lesser knowledge! how accursed
In being so blest! There may be in the cup
A spider steep'd, and one may drink, depart,
And yet partake no venom, for h


In [12]:
import torch
# Turn string into list of longs
def char_tensor(string):
  tensor = torch.zeros(len(string)).long()
  for c in range(len(string)):
      tensor[c] = all_characters.index(string[c])
  return tensor

print(char_tensor('abcDEF'))

tensor([10, 11, 12, 39, 40, 41])


---

## Part 4: Creating your own GRU cell 

**(Come back to this later - its defined here so that the GRU will be defined before it is used)**

---

The cell that you used in Part 1 was a pre-defined Pytorch layer. Now, write your own GRU class using the same parameters as the built-in Pytorch class does.

Please do not look at the documentation's code for the GRU cell definition. The answer is right there in the code, and in theory, you could just cut-and-paste it. This bit is on your honor!

**TODO:**
* Create a custom GRU cell

**DONE:**



In [13]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class GRU(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers):
        super(GRU, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        """self.W_iz = nn.Linear(input_size, hidden_size)
        self.W_ir = nn.Linear(input_size, hidden_size)
        self.W_in = nn.Linear(input_size, hidden_size)

        self.W_hz = nn.Linear(hidden_size, hidden_size)
        self.W_hr = nn.Linear(hidden_size, hidden_size)
        self.W_hn = nn.Linear(hidden_size, hidden_size)"""

        self.W_ir_list = nn.ModuleList([nn.Linear(input_size, hidden_size) for i in range(num_layers)])
        self.W_iz_list = nn.ModuleList([nn.Linear(input_size, hidden_size) for i in range(num_layers)])
        self.W_in_list = nn.ModuleList([nn.Linear(input_size, hidden_size) for i in range(num_layers)])

        self.W_hr_list = nn.ModuleList([nn.Linear(hidden_size, hidden_size) for i in range(num_layers)])
        self.W_hz_list = nn.ModuleList([nn.Linear(hidden_size, hidden_size) for i in range(num_layers)])
        self.W_hn_list = nn.ModuleList([nn.Linear(hidden_size, hidden_size) for i in range(num_layers)])

    def forward(self, inputs, hidden):
        # Each layer does the following:
        # r_t = sigmoid(W_ir*x_t + b_ir + W_hr*h_(t-1) + b_hr)
        # z_t = sigmoid(W_iz*x_t + b_iz + W_hz*h_(t-1) + b_hz)
        # n_t = tanh(W_in*x_t + b_in + r_t**(W_hn*h_(t-1) + b_hn))
        # h_(t) = (1 - z_t)**n_t + z_t**h_(t-1)
        # Where ** is hadamard product (not matrix multiplication, but elementwise multiplication)
        
        """# Single Layer
        z_t = torch.sigmoid(self.W_iz(inputs) + self.W_hz(hidden))
        r_t = torch.sigmoid(self.W_ir(inputs) + self.W_hr(hidden))
        n_t = torch.tanh(self.W_in(inputs) + r_t * self.W_hn(hidden))
        h_t = (1 - z_t) * n_t + z_t * hidden
        
        outputs = h_t[-1]
        hiddens = torch.cat(h_t, 0)"""

        """# Multi layer
        z_t_list = [torch.sigmoid(self.W_iz_list[i](inputs) + self.W_hz_list[i](hidden[i])) for i in range(self.num_layers)]
        r_t_list = [torch.sigmoid(self.W_ir_list[i](inputs) + self.W_hr_list[i](hidden[i])) for i in range(self.num_layers)]
        n_t_list = [torch.tanh(self.W_in_list[i](inputs) + r_t * self.W_hn_list[i](hidden[i])) for i in range(self.num_layers)]
        h_t_list = [(1 - z_t[i]) * n_t[i] + z_t[i] * hidden[i] for i in range(self.num_layers)]

        outputs_list = h_t_list[-1]
        hiddens_list = torch.cat(tuple(h_t_list), 0)"""
        h_t = hidden
        for i in range(self.num_layers):
            z_t = torch.sigmoid(self.W_iz_list[i](inputs) + self.W_hz_list[i](hidden)) 
            r_t = torch.sigmoid(self.W_ir_list[i](inputs) + self.W_hr_list[i](hidden))
            n_t = torch.tanh(self.W_in_list[i](inputs) + r_t * self.W_hn_list[i](hidden))
            h_t = (1 - z_t) * n_t + z_t * h_t

        return h_t, h_t
  


---

##  Part 1: Building a sequence to sequence model

---

Great! We have the data in a useable form. We can switch out which text file we are reading from, and trying to simulate.

We now want to build out an RNN model, in this section, we will use all built in Pytorch pieces when building our RNN class.


**TODO:**
* Create an RNN class that extends from nn.Module.

**DONE:**



In [14]:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, n_layers=1):
        super(RNN, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.n_layers = n_layers
        
        # My stuff
        self.embedding = nn.Embedding(output_size, hidden_size)
        self.gru = GRU(hidden_size, hidden_size, n_layers)
        self.out = nn.Linear(hidden_size, output_size)

    def forward(self, input_char, hidden):
        # by reviewing the documentation, construct a forward function that properly uses the output of the GRU
        output = self.embedding(input_char).view(1, 1, -1)
        output = F.relu(output)
        output, hidden = self.gru(output, hidden)
        output = self.out(output[0])

        return output, hidden
        

    def init_hidden(self):
        return torch.zeros(self.n_layers, 1, self.hidden_size)

In [15]:
def random_training_set():    
    chunk = random_chunk()
    inp = char_tensor(chunk[:-1])
    target = char_tensor(chunk[1:])
    return inp, target

---

## Part 2: Sample text and Training information

---

We now want to be able to train our network, and sample text after training.

This function outlines how training a sequence style network goes. 

**TODO:**
* Fill in the pieces.

**DONE:**




In [16]:
import torch.optim as optim

In [17]:
# NOTE: decoder_optimizer, decoder, and criterion will be defined below as global variables in Part 5

def train(inp, target):
    ## initialize hidden layers, set up gradient and loss 
        # your code here
    ## /
    input_length = inp.size(0)
    target_length = target.size(0)

    decoder_optimizer.zero_grad()
    hidden = decoder.init_hidden()
    loss = 0
    # My Stuff
    for i, char in enumerate(inp):
        output, hidden = decoder(char, hidden)
        curr_loss = criterion(output, target[i].unsqueeze(0))
        loss += curr_loss

    loss.backward()
    decoder_optimizer.step()
    return loss.item() / target_length

---

## Part 3: Sample text and Training information

---

You can at this time, if you choose, also write out your train loop boilerplate that samples random sequences and trains your RNN. This will be helpful to have working before writing your own GRU class.

If you are finished training, or during training, and you want to sample from the network you may consider using the following function. If your RNN model is instantiated as `decoder`then this will probabilistically sample a sequence of length `predict_len`

**TODO:**
* Fill out the evaluate function to generate text frome a primed string

**DONE:**



In [18]:
def sample_outputs(output, temperature):
    """Takes in a vector of unnormalized probability weights and samples a character from the distribution"""
    # As temperature approaches 0, this sampling function becomes argmax (no randomness)
    # As temperature approaches infinity, this sampling function becomes a purely random choice
    return torch.multinomial(torch.exp(output / temperature), 1)

def evaluate(prime_str='A', predict_len=100, temperature=0.8):
    ## initialize hidden state, initialize other useful variables
        # your code here
    ## /

    prediction = list(prime_str)
    hidden = decoder.init_hidden()
    
    for char in prime_str:
        output, hidden = decoder(char_tensor(char), hidden)

    next_char_pred = prime_str[-1]
    
    for _ in range(predict_len):
        output, hidden = decoder(char_tensor(next_char_pred), hidden)
        last_char_index = sample_outputs(output, temperature)
        next_char_pred = all_characters[last_char_index]
        prediction.append(next_char_pred)

    final_prediction = "".join(prediction)
    return final_prediction

---

## Part 4: (Create a GRU cell, requirements above)

---



---

## Part 5: Run it and generate some text!

---


**TODO:** 
* Create some cool output

**DONE:**




Assuming everything has gone well, you should be able to run the main function in the scaffold code, using either your custom GRU cell or the built in layer, and see output something like this. I trained on the “lotr.txt” dataset, using chunk_length=200, hidden_size=100 for 2000 epochs. These are the results, along with the prime string:

---

 G:
 
 Gandalf was decrond. 
'All have lord you. Forward the road at least walk this is stuff, and 
went to the long grey housel-winding and kindled side was a sleep pleasuring, I do long 
row hrough. In  

 lo:
 
 lost death it. 
'The last of the gatherings and take you,' said Aragorn, shining out of the Gate. 
'Yes, as you there were remembaused to seen their pass, when? What 
said here, such seven an the sear 

 lo:
 
 low, and frod to keepn 
Came of their most. But here priced doubtless to an Sam up is 
masters; he left hor as they are looked. And he could now the long to stout in the right fro horseless of 
the like 

 I:
 
 I had been the 
in his eyes with the perushed to lest, if then only the ring and the legended 
of the less of the long they which as the 
enders of Orcovered and smood, and the p 

 I:
 
 I they were not the lord of the hoomes. 
Home already well from the Elves. And he sat strength, and we 
housed out of the good of the days to the mountains from his perith. 

'Yess! Where though as if  

 Th:
 
 There yarden 
you would guard the hoor might. Far and then may was 
croties, too began to see the drumbred many line 
and was then hoard walk and they heart, and the chair of the 
Ents of way, might was 

 G:
 
 Gandalf 
been lat of less the round of the stump; both and seemed to the trees and perished they 
lay are speered the less; and the wind the steep and have to she 
precious. There was in the oonly went 

 wh:
 
 which went out of the door. 
Hull the King and of the The days of his brodo 
stumbler of the windard was a thing there, then it been shining langing 
to him poor land. They hands; though they seemed ou 

 ra:
 
 rather,' have all the least deather 
down of the truven beginning to the house of sunk. 
'Nark shorts of the Eyes of the Gate your great nothing as Eret. 
'I wander trust horn, and there were not, it  

 I:
 
 I can have no mind 
together! Where don't may had one may little blung 
terrible to tales. And turn and Gandalf shall be not to as only the Cattring 
not stopped great the out them forms. On they she lo 

---


In [10]:
import time
n_epochs = 5000
print_every = 200
plot_every = 10
hidden_size = 200
n_layers = 3
lr = 0.001
 
decoder = RNN(n_characters, hidden_size, n_characters, n_layers)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()
 
start = time.time()
all_losses = []
loss_avg = 0

In [11]:
for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())       
  loss_avg += loss_

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[160.42864513397217 (200 4%) 2.1695]
Wh banl ind be her me bared of 
of wand. The same her nough to in the sore saod 



vhawe ste lloing f 

[319.6418743133545 (400 8%) 1.8735]
Whind not sull and rode said sester gonterr. 
But said But have waly not had watf lustorn. 'Houred was 

[478.13315892219543 (600 12%) 1.7835]
Whern: courseb't it, but there whrlave 

lirster of thoughty on that the, 
and down the grood 






M 

[636.7872338294983 (800 16%) 1.7616]
Whapen away alrong' he Grecoor a down, of (are Sang the pape and 
lizt that it then mirel can this tow 

[795.7131023406982 (1000 20%) 1.5979]
Wher and some spoken hornes a 

day lought and he pace mask. 

Alto supping of the quesening that they 

[953.9603943824768 (1200 24%) 1.7666]
Whind now. It insen me fough him. 

1now 
he mank the Belpane a laster gazed wall they bate of the dar 

[1111.605400800705 (1400 28%) 1.7430]
Whing iite op the green with the sside the time this too his 
spoke their with the blindle to the hear 

[12

In [12]:
for i in range(10):
  start_strings = [" Th", " wh", " he", " I ", " ca", " G", " lo", " ra"]
  start = random.randint(0,len(start_strings)-1)
  print(start_strings[start])
#   all_characters.index(string[c])
  print(evaluate(start_strings[start], 200), '\n')

 Th
 The water flose with a night aprost the southem between that is squest 
the yigh only rest to the Ease in they leadin as not 
to upsent and a land to some was one 
and 
gather bone: see you should, and  

 he
 heds and bowing 
brought the call had sleeping. Thenwayed and foots and 
the light of the flashed and they came a burred to 
the light. 

'It you was side at did. But with a dark a wilenday and lender a 

 wh
 whictaning about the Dwarf. I will well behind when from the wight may forest dragolt. 

Then I would now did the Ring which to 
eyes hard and climbh. There have not men and I wood. To fall all trying o 

 ca
 carch for for my clo! Frodo. ' Frodo was much from him of the call 
as ' he said?' 

'I do you are years for shame and 

door. He live else from your will be we muside time and clast would samber in the 

 wh
 whouth. He light with here in the narrow here and had 
rust, and the stond a wind. Muider and Merry 
before the chair. I foot. They cool still the same the

---

## Part 6: Generate output on a different dataset

---

**TODO:**

* Choose a textual dataset. Here are some [text datasets](https://www.kaggle.com/datasets?tags=14104-text+data%2C13205-text+mining) from Kaggle 

* Generate some decent looking results and evaluate your model's performance (say what it did well / not so well)

**DONE:**



In [10]:
! wget -O ./text_files.tar.gz 'https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz' 
! tar -xzf text_files.tar.gz
! pip install unidecode
! pip install torch

import unidecode
import string
import random
import re
 
import pdb
 
all_characters = string.printable
n_characters = len(all_characters)
file = unidecode.unidecode(open('./text_files/tiny_shakespeare.txt').read())
file_len = len(file)
print('file_len =', file_len)

--2021-10-16 20:19:24--  https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz
Resolving piazza.com (piazza.com)... 52.54.75.23, 52.44.149.188, 3.221.194.245, ...
Connecting to piazza.com (piazza.com)|52.54.75.23|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-uploads.piazza.com/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz [following]
--2021-10-16 20:19:25--  https://cdn-uploads.piazza.com/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz
Resolving cdn-uploads.piazza.com (cdn-uploads.piazza.com)... 13.249.137.97, 13.249.137.102, 13.249.137.78, ...
Connecting to cdn-uploads.piazza.com (cdn-uploads.piazza.com)|13.249.137.97|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1533290 (1.5M) [application/x-gzip]
Saving to: ‘./text_files.tar.gz’


2021-10-16 20:19:25 (25.6 MB/s) - ‘./text_files.tar.gz’ saved [153

In [20]:
import time
n_epochs = 5000
print_every = 200
plot_every = 10
hidden_size = 200
n_layers = 5
lr = 0.0005
 
decoder = RNN(n_characters, hidden_size, n_characters, n_layers)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()
 
start = time.time()
all_losses = []
loss_avg = 0

In [21]:
for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())       
  loss_avg += loss_

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[266.93727374076843 (200 4%) 2.5962]
Whe coe,

GyRYRRENRCO:
Fat dusing thas. I bhamI brengr:
Ney on tho fown yor, brealintindi
Ancs foreetO 

[528.9766573905945 (400 8%) 2.1066]
Wh the crunce weends, my wem, west of have thou sild cront thy sit roud mod heach frumed wour bener th 

[791.072660446167 (600 12%) 2.0109]
Wh?
Whee 'toryer is puin, and I cundall then feppom contien beere, wilintss my preally coth your to th 

[1050.7609705924988 (800 16%) 2.0047]
Wher'd whold the mome, of as a? in thee dome hear goar traie
What thou dy hiuld hick for jome his cith 

[1311.8214690685272 (1000 20%) 2.0299]
Whary the the canour so co veagn-to word to sold
And love a inten us the rome the speathy denouly.

LU 

[1571.7708387374878 (1200 24%) 1.8768]
Whall no mice amp on the senst I was mided upolled in the good bacher to can pit refere
The pursel.

P 

[1830.417935848236 (1400 28%) 1.8942]
Whillea of with will to have lord, den
And Countly this eyech grofe;
He wangings---

LAMIOLA:
Reave a  

[2

In [22]:
for i in range(10):
  start_strings = [" Th", " wh", " he", " I ", " ca", " G", " lo", " ra"]
  start = random.randint(0,len(start_strings)-1)
  print(start_strings[start])
#   all_characters.index(string[c])
  print(evaluate(start_strings[start], 200), '\n')

 wh
 whellon formy:
But her of new, giving in the starden, when this in the palumed,
That have stone
The death it to your consul I said,
The fight
Wister
Now.

VOLUMNIO:
This like stors, for the same and fal 

 ca
 carth of this
Marchip, in the rese-to the king,
And fived in
all the comish, in fire before feurge you may the worth,
A friend;
And your son by thy brother of my father bid in earth 'twill be do churden 

 ra
 ration what not parwing her sight in my flowed,
And there in my flain, God father gone and my since the good promish'd in this no more pasts she say in leak to Rome, so? the feal'd in the war, crown now 

 wh
 whell, but she lived with an answer it with, you sight us 'twish doth him to suloage of thrument of hour corn it men with the such sap your place!

MIRANDA:
I do shall pitch dest atter Hent so expite,
W 

 G
 Geat will be with me drunt--a that intrice of this dead her widio, this, and in thy proud Genton'd to Burning to see now, in his dead
Ofe shall of that is r

In [None]:
# It does very well getting the play structure. It gets the name of a character, semi-colon and a new line most of the time.
# The English sentences are mediocre but not terrible.
# It puts an exclamation mark after the word 'mad'