<a
href="https://colab.research.google.com/github/wingated/cs474_labs_f2019/blob/master/DL_Lab6.ipynb"
  target="_parent">
  <img
    src="https://colab.research.google.com/assets/colab-badge.svg"
    alt="Open In Colab"/>
</a>

# Lab 6: Sequence-to-sequence models

### Description:
For this lab, you will code up the [char-rnn model of Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). This is a recurrent neural network that is trained probabilistically on sequences of characters, and that can then be used to sample new sequences that are like the original.

This lab will help you develop several new skills, as well as understand some best practices needed for building large models. In addition, we'll be able to create networks that generate neat text!

### Deliverable:
- Fill in the code for the RNN (using PyTorch's built-in GRU).
- Fill in the training loop
- Fill in the evaluation loop. In this loop, rather than using a validation set, you will sample text from the RNN.
- Implement your own GRU cell.
- Train your RNN on a new domain of text (Star Wars, political speeches, etc. - have fun!)

### Grading Standards:
- 20% Implementation the RNN
- 20% Implementation training loop
- 20% Implementation of evaluation loop
- 20% Implementation of your own GRU cell
- 20% Training of your RNN on a domain of your choice

### Tips:
- Read through all the helper functions, run them, and make sure you understand what they are doing
- At each stage, ask yourself: What should the dimensions of this tensor be? Should its data type be float or int? (int is called `long` in PyTorch)
- Don't apply a softmax inside the RNN if you are using an nn.CrossEntropyLoss (this module already applies a softmax to its input).

### Example Output:
An example of my final samples are shown below (more detail in the
final section of this writeup), after 150 passes through the data.
Please generate about 15 samples for each dataset.

<code>
And ifte thin forgision forward thene over up to a fear not your
And freitions, which is great God. Behold these are the loss sub
And ache with the Lord hath bloes, which was done to the holy Gr
And appeicis arm vinimonahites strong in name, to doth piseling
And miniquithers these words, he commanded order not; neither sa
And min for many would happine even to the earth, to said unto m
And mie first be traditions? Behold, you, because it was a sound
And from tike ended the Lamanites had administered, and I say bi
</code>


---

## Part 0: Readings, data loading, and high level training

---

There is a tutorial here that will help build out scaffolding code, and get an understanding of using sequences in pytorch.

* Read the following

> * [Pytorch sequence-to-sequence tutorial](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html) (Take note that you will not be implementing the encoder part of this tutorial.)
* [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)






In [120]:
! wget -O ./text_files.tar.gz 'https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz'
! tar -xzf text_files.tar.gz
! pip install unidecode
! pip install torch

import unidecode
import string
import random
import re

import pdb

all_characters = string.printable
n_characters = len(all_characters)
file = unidecode.unidecode(open('./text_files/lotr.txt').read())
file_len = len(file)
print('file_len =', file_len)

--2023-10-15 04:21:14--  https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz
Resolving piazza.com (piazza.com)... 52.1.77.241, 3.223.24.172, 34.235.220.249, ...
Connecting to piazza.com (piazza.com)|52.1.77.241|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-uploads.piazza.com/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz [following]
--2023-10-15 04:21:15--  https://cdn-uploads.piazza.com/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz
Resolving cdn-uploads.piazza.com (cdn-uploads.piazza.com)... 108.138.246.45, 108.138.246.41, 108.138.246.58, ...
Connecting to cdn-uploads.piazza.com (cdn-uploads.piazza.com)|108.138.246.45|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1533290 (1.5M) [application/x-gzip]
Saving to: ‘./text_files.tar.gz’


2023-10-15 04:21:16 (1.66 MB/s) - ‘./text_files.tar.gz’ saved [

In [121]:
chunk_len = 200

def random_chunk():
  start_index = random.randint(0, file_len - chunk_len)
  end_index = start_index + chunk_len + 1
  return file[start_index:end_index]

print(random_chunk())

 indeed more 
than 

enough, for it was not comfortable lore. Tom's words laid bare the hearts of 
trees and their thoughts, which were often dark and strange, and filled with 
a hatred of things that 


In [122]:
import torch
# Turn string into list of longs
def char_tensor(string):
  tensor = torch.zeros(len(string)).long()
  for c in range(len(string)):
      tensor[c] = all_characters.index(string[c])
  return tensor

print(char_tensor('abcDEF'))

tensor([10, 11, 12, 39, 40, 41])


---

## Part 4: Creating your own GRU cell

**(Come back to this later - its defined here so that the GRU will be defined before it is used)**

---

The cell that you used in Part 1 was a pre-defined Pytorch layer. Now, write your own GRU class using the same parameters as the built-in Pytorch class does.

Please try not to look at the GRU cell definition. The answer is right there in the code, and in theory, you could just cut-and-paste it. This bit is on your honor!

**TODO:**
* Create a custom GRU cell

**DONE:**



In [123]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class GRU(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers):
    super(GRU, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.num_layers = num_layers
    self.W_ir = nn.Linear(input_size, hidden_size)
    self.W_hr = nn.Linear(hidden_size, hidden_size)
    self.W_iz = nn.Linear(input_size, hidden_size)
    self.W_hz = nn.Linear(hidden_size, hidden_size)
    self.W_in = nn.Linear(input_size, hidden_size)
    self.W_hn = nn.Linear(hidden_size, hidden_size)

    self.sigmoid = nn.Sigmoid()
    self.tanh = nn.Tanh()


  def forward(self, inputs, hidden):
    # Each layer does the following:
    # r_t = sigmoid(W_ir*x_t + b_ir + W_hr*h_(t-1) + b_hr)
    # z_t = sigmoid(W_iz*x_t + b_iz + W_hz*h_(t-1) + b_hz)
    # n_t = tanh(W_in*x_t + b_in + r_t**(W_hn*h_(t-1) + b_hn))
    # h_(t) = (1 - z_t)**n_t + z_t**h_(t-1)
    # Where ** is hadamard product (not matrix multiplication, but elementwise multiplication)
    r_t = self.sigmoid(self.W_ir(inputs) + self.W_hr(hidden))
    z_t = self.sigmoid(self.W_iz(inputs) + self.W_hz(hidden))
    n_t = self.tanh(self.W_in(inputs) + r_t*self.W_hn(hidden))
    outputs = n_t
    hiddens = (1 - z_t) * n_t + z_t * hidden
    return outputs, hiddens ##CHECK



---

##  Part 1: Building a sequence to sequence model

---

Great! We have the data in a useable form. We can switch out which text file we are reading from, and trying to simulate.

We now want to build out an RNN model, in this section, we will use all built in Pytorch pieces when building our RNN class.


**TODO:**
* Create an RNN class that extends from nn.Module.

**DONE:**



In [124]:
class RNN(nn.Module):
  def __init__(self, input_size, hidden_size, output_size, n_layers=1):
    super(RNN, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.n_layers = n_layers

    # more stuff here...
    self.embedding = nn.Embedding(input_size, hidden_size)
    self.gru = GRU(hidden_size, hidden_size, n_layers)
    # self.relu = nn.ReLU()
    self.out = nn.Linear(hidden_size, output_size)


  def forward(self, input_char, hidden):
    # by reviewing the documentation, construct a forward function that properly uses the output
    # of the GRU

    output = self.embedding(input_char).view(1, 1, -1)
    output, hidden = self.gru(output, hidden)
    output = self.out(output[0])
    return output, hidden

  def init_hidden(self):
    return torch.zeros(self.n_layers, 1, self.hidden_size)

In [125]:
def random_training_set():
  chunk = random_chunk()
  inp = char_tensor(chunk[:-1])
  target = char_tensor(chunk[1:])
  return inp, target

---

## Part 2: Sample text and Training information

---

We now want to be able to train our network, and sample text after training.

This function outlines how training a sequence style network goes.

**TODO:**
* Fill in the pieces.

**DONE:**




In [126]:
# NOTE: decoder_optimizer, decoder, and criterion will be defined below as global variables
def train(inp, target):
  ## initialize hidden layers, set up gradient and loss
    # your code here
  ## /

  decoder_optimizer.zero_grad()
  hidden = decoder.init_hidden()
  loss = 0

  # more stuff here...

  for inp_char, target_char in zip(inp, target):
    output, hidden = decoder.forward(inp_char, hidden)
    loss += criterion(output, target_char.unsqueeze(0))

  loss.backward()
  decoder_optimizer.step()

  return loss.item()



---

## Part 3: Sample text and Training information

---

You can at this time, if you choose, also write out your train loop boilerplate that samples random sequences and trains your RNN. This will be helpful to have working before writing your own GRU class.

If you are finished training, or during training, and you want to sample from the network you may consider using the following function. If your RNN model is instantiated as `decoder`then this will probabilistically sample a sequence of length `predict_len`

**TODO:**
* Fill out the evaluate function to generate text frome a primed string

**DONE:**



In [127]:
def sample_outputs(output, temperature):
    """Takes in a vector of unnormalized probability weights and samples a character from the distribution"""
    return torch.multinomial(torch.exp(output / temperature), 1)

def evaluate(prime_str='A', predict_len=100, temperature=0.8):
  ## initialize hidden state, initialize other useful variables

  hidden = decoder.init_hidden()
  prime_input = char_tensor(prime_str)
  predicted = prime_str

  for i in range(len(prime_str) - 1):
    output, hidden = decoder(prime_input[i], hidden)

  # last character as a start for generating
  inp = prime_input[-1]

  for p in range(predict_len):
    output, hidden = decoder(inp, hidden)
    out_distribution = output.data.view(-1).div(temperature).exp()
    out = torch.multinomial(out_distribution, 1)[0]
    predicted_char = all_characters[out]
    predicted += predicted_char
    inp = char_tensor(predicted_char)

  return predicted

---

## Part 4: (Create a GRU cell, requirements above)

---



---

## Part 5: Run it and generate some text!

---


**TODO:**
* Create some cool output

**DONE:**




Assuming everything has gone well, you should be able to run the main function in the scaffold code, using either your custom GRU cell or the built in layer, and see output something like this. I trained on the “lotr.txt” dataset, using chunk_length=200, hidden_size=100 for 2000 epochs. These are the results, along with the prime string:

---

 G:

 Gandalf was decrond.
'All have lord you. Forward the road at least walk this is stuff, and
went to the long grey housel-winding and kindled side was a sleep pleasuring, I do long
row hrough. In  

 lo:

 lost death it.
'The last of the gatherings and take you,' said Aragorn, shining out of the Gate.
'Yes, as you there were remembaused to seen their pass, when? What
said here, such seven an the sear

 lo:

 low, and frod to keepn
Came of their most. But here priced doubtless to an Sam up is
masters; he left hor as they are looked. And he could now the long to stout in the right fro horseless of
the like

 I:

 I had been the
in his eyes with the perushed to lest, if then only the ring and the legended
of the less of the long they which as the
enders of Orcovered and smood, and the p

 I:

 I they were not the lord of the hoomes.
Home already well from the Elves. And he sat strength, and we
housed out of the good of the days to the mountains from his perith.

'Yess! Where though as if  

 Th:

 There yarden
you would guard the hoor might. Far and then may was
croties, too began to see the drumbred many line
and was then hoard walk and they heart, and the chair of the
Ents of way, might was

 G:

 Gandalf
been lat of less the round of the stump; both and seemed to the trees and perished they
lay are speered the less; and the wind the steep and have to she
precious. There was in the oonly went

 wh:

 which went out of the door.
Hull the King and of the The days of his brodo
stumbler of the windard was a thing there, then it been shining langing
to him poor land. They hands; though they seemed ou

 ra:

 rather,' have all the least deather
down of the truven beginning to the house of sunk.
'Nark shorts of the Eyes of the Gate your great nothing as Eret.
'I wander trust horn, and there were not, it  

 I:

 I can have no mind
together! Where don't may had one may little blung
terrible to tales. And turn and Gandalf shall be not to as only the Cattring
not stopped great the out them forms. On they she lo

---


In [128]:
import time
n_epochs = 5000
print_every = 200
plot_every = 10
hidden_size = 200
n_layers = 3
lr = 0.001

decoder = RNN(n_characters, hidden_size, n_characters, n_layers)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()

start = time.time()
all_losses = []
loss_avg = 0

In [131]:
n_epochs = 2000
for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())
  loss_avg += loss_

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[79.76727533340454 (200 10%) 406.3292]
Whow hamerneing of wis 
in ningher I nil to hat dandarneling 

where if to nor now horster las of e, h 

[137.54473686218262 (400 20%) 353.4745]
Wher a the foringed to 
sile to that he baming stawaser they now the a the shad at lear a knowh with t 

[194.69581770896912 (600 30%) 413.1023]
Whing and 
hake 
agatrank in past be as 
lose 
sten brreaked we deador, 
this 
hads while the busted a 

[251.9497730731964 (800 40%) 322.5259]
Whery 
now fame to come this look the wandor and said Githand. And mise at that he saipince the Mord a 

[308.868221282959 (1000 50%) 393.8513]
Whe said 



The winge of the shade you sas fill of 
the ker sbed not stay 


hobbitting the ptaldenth 

[365.97027015686035 (1200 60%) 305.5787]
Where we loffin. Kight If for the lave 
the Frodo was everes cawe, 
sten his he to sand in bence the 
 

[422.70978903770447 (1400 70%) 344.6206]
Whor bring 
Colver pearing couthald and coulder been fell shadren that he are the cadow? Th

In [132]:
for i in range(10):
  start_strings = [" Th", " wh", " he", " I ", " ca", " G", " lo", " ra"]
  start = random.randint(0,len(start_strings)-1)
  print(start_strings[start])
#   all_characters.index(string[c])
  print(evaluate(start_strings[start], 200), '\n')

 ca
 came 
or among and the Sam stepping. It there shragg a pale's in had shouenthing 
meabor, said Frodo were streapled to while to 
alls for may sadd, enemon of the lated and were and the Wasters in the 
p 

 wh
 when somen in stone somethers to fit of the gnow was was not very and our fway, and the donger for the Barussed the blage with to peating and 
it sunther 
were while on at the swetten in the king. His c 

 ra
 rast 
looks me with was foon, and goter back to notter was and 
enowers fally gone. 

'Yearning and the sew countion should go on 
no spents trouned of were Norning 
and by of the 
ever on the treet, an 

 ca
 cantoo the back and anwithour musted the. But was the day at he 
were should. 

'It someting and leather. We said Aragorn and greaddy but 
closed for 
to 
heard little shamber the gowly here, and night  

 lo
 loses from I last, men the chilling. 

Amout the ene 
stact is anter it swen part what far as was they cobeather. 

Streath.' 

'Yes, dows them. Will over 

---

## Part 6: Generate output on a different dataset

---

**TODO:**

* Choose a textual dataset. Here are some [text datasets](https://www.kaggle.com/datasets?tags=14104-text+data%2C13205-text+mining) from Kaggle

* Generate some decent looking results and evaluate your model's performance (say what it did well / not so well)

**DONE:**



In [133]:
!pip install jovian opendatasets --upgrade --quiet
!pip install -q kaggle
import urllib.request
import opendatasets as od
import pandas

od.download(
    "https://www.kaggle.com/datasets/ishikajohari/taylor-swift-all-lyrics-30-albums/data")


Skipping, found downloaded files in "./taylor-swift-all-lyrics-30-albums" (use force=True to force download)


In [134]:
import unidecode
import string
import random
import re

import pdb

all_characters = string.printable
n_characters = len(all_characters)
file = unidecode.unidecode(open('/content/taylor-swift-all-lyrics-30-albums/data/Albums/SpeakNow_TaylorsVersion_/NeverGrowUp_TaylorsVersion_.txt').read())
file_len = len(file)
print('file_len =', file_len)

file_len = 2635


In [135]:
chunk_len = 200

def random_chunk():
  start_index = random.randint(0, file_len - chunk_len)
  end_index = start_index + chunk_len + 1
  return file[start_index:end_index]

print(random_chunk())

me off
It's so much colder than I thought it would be
So I tuck myself in and turn my nightlight on
[Chorus]
Wish I'd never grown up
Wish I'd never grown up
Oh, I don't wanna grow up
Wish I'd never gro


In [136]:
import time
n_epochs = 5000
print_every = 200
plot_every = 10
hidden_size = 200
n_layers = 3
lr = 0.001

decoder = RNN(n_characters, hidden_size, n_characters, n_layers)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()

start = time.time()
all_losses = []
loss_avg = 0

In [138]:
n_epochs = 2000
for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())
  loss_avg += loss_

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[75.14890360832214 (200 10%) 253.6933]
Whorlithtrit cout th nover thing wnn your I thild retled you moned likw thed day this hing'r ting you  

[128.45042324066162 (400 20%) 93.6432]
Whand I tuck moshitle Nevery honctures in your samy finger
And ever than ve burteen, thoo
And never gr 

[182.25535702705383 (600 30%) 49.3231]
Whad gets hand's wrapped around don't lose the way to the movies
And you're ever the burned you
Won't  

[235.3605306148529 (800 40%) 19.3677]
Whald gets homeday and call your own shots
You might also like[Pre-Chorus]
But don't make her drop you 

[288.7493121623993 (1000 50%) 17.5575]
Wh lotting ready for se Rcat in the world tonight
Your little eyelidg brite songe
So I tuck you in, tu 

[341.9437322616577 (1200 60%) 13.2269]
Whats
In a grow up

[Verse 2]
You're in the car, on the way to the mom's dropped me off
It's so much c 

[395.72551441192627 (1400 70%) 14.3039]
What

[Chorus]
Oh, darling, don't you ever grow up
Don't you ever grow up, it could stay this sim

In [139]:
for i in range(10):
  start_strings = [" to", " da", " ne", "I ", " do", " u", " ju", " mo"]
  start = random.randint(0,len(start_strings)-1)
  print(start_strings[start])
#   all_characters.index(string[c])
  print(evaluate(start_strings[start], 200), '\n')



 to
 to never grow up

[Post-Chorus]
And never grow up

[Verse 2]
You're in the car, on your favourite nightlight

[Pre-Chorus]
To you, everything's funny
You got nothing to regret
I'd give all I have, honey 

I 
I don't lose the way that you dance around
In your PJs getting ready for school

[Chorus]
Oh, darling, don't you ever grow up
Don't you ever grow up, it could stay this simple
I won't let nobody hurt yo 

 ju
 just realized everything I have is, someday, gonna grow up
Wish I'd never grow up)
Just never grow up
Oh (Never grow up)
Just never grow up
Oh (Never grow up)
Just never grow up
Oh (Never grow up)
Just  

 do
 don't lose the way that you dance around
In your PJs getting ready for school

[Chorus]
Oh, darling, don't you ever grow up
Don't you ever grow up, it could stay this simple
I won't let nobody hurt you
 

 ne
 never grow up

[Outro]
Oh, oh (Never grow up)
Just never grow up, it could stay this simple
I won't let nobody hurt you
Won't let no one break your heart
And

The model was able to give words but not coherent sentences. The sentences are more coherent using the Taylor Swift song, but it is probably because it has more repeated phrases, and less complicated words.