<a href="https://colab.research.google.com/github/mhask94/cs474_labs_f2019/blob/master/DL_Lab6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 6: Sequence-to-sequence models

## Description:
For this lab, you will code up the [char-rnn model of Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). This is a recurrent neural network that is trained probabilistically on sequences of characters, and that can then be used to sample new sequences that are like the original.

This lab will help you develop several new skills, as well as understand some best practices needed for building large models. In addition, we'll be able to create networks that generate neat text!

## There are two parts of this lab:
###  1.   Wiring up a basic sequence-to-sequence computation graph
###  2.   Implementing your own GRU cell.


An example of my final samples are shown below (more detail in the
final section of this writeup), after 150 passes through the data.
Please generate about 15 samples for each dataset.

<code>
And ifte thin forgision forward thene over up to a fear not your
And freitions, which is great God. Behold these are the loss sub
And ache with the Lord hath bloes, which was done to the holy Gr
And appeicis arm vinimonahites strong in name, to doth piseling 
And miniquithers these words, he commanded order not; neither sa
And min for many would happine even to the earth, to said unto m
And mie first be traditions? Behold, you, because it was a sound
And from tike ended the Lamanites had administered, and I say bi
</code>


---

## Part 0: Readings, data loading, and high level training

---

There is a tutorial here that will help build out scaffolding code, and get an understanding of using sequences in pytorch.

* Read the following

> * [Pytorch sequence-to-sequence tutorial](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html)
* [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)






In [0]:
! wget -O ./text_files.tar.gz 'https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz' 
! tar -xzf text_files.tar.gz
! pip install unidecode
! pip install torch

import unidecode
import string
import random
import re
 
from IPython.core.debugger import set_trace
 
all_characters = string.printable
n_characters = len(all_characters)
file = unidecode.unidecode(open('./text_files/lotr.txt').read())
file_len = len(file)
print('file_len =', file_len)

In [10]:
chunk_len = 200
 
def random_chunk():
  start_index = random.randint(0, file_len - chunk_len)
  end_index = start_index + chunk_len + 1
  return file[start_index:end_index]
  
print(random_chunk())

istener; for 
Frodo showed no sign of weariness and made no attempt to change the subject, 
though actually he soon got rather lost among the strange names of people 
and places that he had never heard


In [11]:
import torch
from torch.autograd import Variable

# class TextDataset(Dataset):
#   def __init__(self, chunk_len=200, filename='data.txt'):
#     root = 'data/'
    
#     text_files = os.listdir(root)
#     self.training_file = text_files[text_files.index(filename)]
#     with open(os.path.join(root, self.training_file), encoding='utf-8') as file:
#       self.training_file = file.read()
#     self.segment_extractor = FileSegmentExtractor(self.training_file, chunk_len)
    
#   def extract_zip(self, zip_path):
#     print('Unzipping {}'.format(zip_path))
#     with zipfile.ZipFile(zip_path, "r") as zip_ref:
#       zip_ref.extractall(os.path.dirname(self.root))
  
#   def __len__(self):
#     return self.len

# Turn string into list of longs
def char_tensor(string):
  tensor = torch.zeros(len(string)).long()
  for c in range(len(string)):
      tensor[c] = all_characters.index(string[c])
#   return Variable(tensor)
  return tensor

print(char_tensor('abcDEF'))

tensor([10, 11, 12, 39, 40, 41])


---

## Part 4: Creating your own GRU cell 

**(Come back to this later - its defined here so that the GRU will be defined before it is used)**

---

The cell that you used in Part 1 was a pre-defined Pytorch layer. Now, write your own GRU class using the same parameters as the built-in Pytorch class does.

Please try not to look at the GRU cell definition. The answer is right there in the code, and in theory, you could just cut-and-paste it. This bit is on your honor!

**TODO:**
* Create a custom GRU cell

**DONE:**



In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable


class GRU(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers):
    super(GRU, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.n_layers = num_layers
    
    self.w_ir = []
    self.w_hr = []
    self.w_iz = []
    self.w_hz = []
    self.w_in = []
    self.w_hn = []
    
    for l in range(self.n_layers):
      self.w_ir.append(nn.Linear(input_size,  hidden_size))
      self.w_hr.append(nn.Linear(hidden_size, hidden_size))
      self.w_iz.append(nn.Linear(input_size,  hidden_size))
      self.w_hz.append(nn.Linear(hidden_size, hidden_size))
      self.w_in.append(nn.Linear(input_size,  hidden_size))
      self.w_hn.append(nn.Linear(hidden_size, hidden_size))
    
    self.sig = nn.Sigmoid()
    self.tan = nn.Tanh()
     
  def forward(self, inputs, prev_hidden):
    # Each layer does the following:
    # r_t = sigmoid(W_ir*x_t + b_ir + W_hr*h_(t-1) + b_hr)
    # z_t = sigmoid(W_iz*x_t + b_iz + W_hz*h_(t-1) + b_hz)
    # n_t = tanh(W_in*x_t + b_in + r_t**(W_hn*h_(t-1) + b_hn))
    # h_(t) = (1 - z_t)**n_t + z_t**h_(t-1)
    # Where ** is hadamard product (not matrix multiplication, but elementwise multiplication)
    
    hidden = torch.zeros(prev_hidden.shape)
    for l in range(self.n_layers):
      r_t = self.sig(self.w_ir[l](inputs) + self.w_hr[l](prev_hidden[l]))
      z_t = self.sig(self.w_iz[l](inputs) + self.w_hz[l](prev_hidden[l]))
      n_t = self.tan(self.w_in[l](inputs) + self.w_hn[l](prev_hidden[l]))
      hidden[l] = (1 - z_t) * n_t + z_t * prev_hidden[l]
    output = hidden[l:l+1]
    
    return output, hidden
  

---

##  Part 1: Building a sequence to sequence model

---

Great! We have the data in a useable form. We can switch out which text file we are reading from, and trying to simulate.

We now want to build out an RNN model, in this section, we will use all built in Pytorch pieces when building our RNN class.


**TODO:**
* Create an RNN class that extends from nn.Module.

**DONE:**



In [0]:
class RNN(nn.Module):
  def __init__(self, input_size, hidden_size, output_size, n_layers=1):
    super(RNN, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.n_layers = n_layers
    
    self.embedding = nn.Embedding(self.input_size, self.hidden_size)
    self.relu = nn.ReLU()
    self.gru = GRU(input_size=hidden_size, hidden_size=hidden_size, 
                      num_layers=n_layers)
    self.decode = nn.Linear(self.hidden_size, self.output_size)
#     self.softmax = nn.LogSoftmax(dim=1)

  def forward(self, input_char, hidden):
    # by reviewing the documentation, construct a forward function that properly uses the output
    # of the GRU
    embed = self.embedding(input_char).view(1,1,-1)
    output, hidden = self.gru(embed, hidden)
#     set_trace()
    out_decoded = self.relu(self.decode(output))
    
    return out_decoded, hidden

  def init_hidden(self):
#     return Variable(torch.zeros(self.n_layers, 1, self.hidden_size))
    return torch.randn(self.n_layers, 1, self.hidden_size) #changed from zeros

In [0]:
def random_training_set():    
  chunk = random_chunk()
  inp = char_tensor(chunk[:-1])
  target = char_tensor(chunk[1:])
  return inp, target

---

## Part 2: Sample text and Training information

---

We now want to be able to train our network, and sample text after training.

This function outlines how training a sequence style network goes. 

**TODO:**
* Fill in the pieces.

**DONE:**




In [0]:
def train(input_str, target_str):
  ## initialize hidden layers, set up gradient and loss 
    # your code here
  ## /
  decoder_optimizer.zero_grad()
  hidden = decoder.init_hidden()
  loss = 0
  
  for in_char, target_char in zip(input_str, target_str):
    
    char_hat, new_hidden = decoder(in_char, hidden)
#     set_trace()
    target_char = target_char.unsqueeze(0)
    loss += criterion(char_hat.squeeze(0), target_char)
    
  loss.backward()
  decoder_optimizer.step()
  
  return loss.item() #, len(input_str)

---

## Part 3: Sample text and Training information

---

You can at this time, if you choose, also write out your train loop boilerplate that samples random sequences and trains your RNN. This will be helpful to have working before writing your own GRU class.

If you are finished training, or during training, and you want to sample from the network you may consider using the following function. If your RNN model is instantiated as `decoder`then this will probabilistically sample a sequence of length `predict_len`

**TODO:**
* Fill out the evaluate function to generate text frome a primed string

**DONE:**



In [0]:
def evaluate(prime_str='A', predict_len=100, temperature=0.8):
  ## initialize hidden variable, initialize other useful variables 
    # your code here
  ## /
  hidden = decoder.init_hidden()
  prediction = prime_str + '' # copies prime_str values, not a ptr
  primer_input = char_tensor(prime_str)
  all_chars = string.printable 

  for char in primer_input[:-1]:
    _, hidden = decoder(char, hidden)
  
  in_char = primer_input[-1]
  
  for p in range(predict_len):
    out_char, hidden = decoder(in_char, hidden)
    out_dist = out_char.data.view(-1).div(temperature).exp()
    top_i = torch.multinomial(out_dist, 1)[0]
    
    char_decoded = all_chars[top_i]
    in_char = char_tensor(char_decoded)
    prediction += char_decoded
    
  return prediction

---

## Part 4: (Create a GRU cell, requirements above)

---



---

## Part 5: Run it and generate some text!

---

Assuming everything has gone well, you should be able to run the main function in the scaffold code, using either your custom GRU cell or the built in layer, and see output something like this. I trained on the “lotr.txt” dataset, using chunk_length=200, hidden_size=100 for 2000 epochs gave.

**TODO:** 
* Create some cool output

**DONE:**



In [0]:
import time
n_epochs = 5000
print_every = 200
plot_every = 10
hidden_size = 200
n_layers = 3
lr = 0.001
 
decoder = RNN(n_characters, hidden_size, n_characters, n_layers)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()
 
start = time.time()
all_losses = []
loss_avg = 0

In [71]:
# n_epochs = 2000
for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())       
  loss_avg += loss_

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[129.53036069869995 (200 4%) 833.7415]
WhdlcdOz7C9av{&,v *f+Gb0-9e5sE.lURv`;ingdZ!U3NkLIJnf*J!b,A8U7	<"oFiP"9of-'/QmmZp hZ([d:eIsZN,ptpK2I 

[236.2918839454651 (400 8%) 725.3318]
Wheede the arellll>(ruW 

ng?;d weye wmer ouiar
a id`yM[z _y+neW isthaind the and the frtheirerer het 

[348.01480746269226 (600 12%) 569.2349]
Fqquuyis th t hos the ad  wasast hir timhear oupith to f atwe th et on 

[461.5968496799469 (800 16%) 583.6088]
Wh tahife hes t oator larone t athe t n o nd s theo roond s ang aring thee thehe t angsind ir hee mand 

[572.1966166496277 (1000 20%) 550.7297]
Whee s t hithians 
n ltis arsste halind o t hin uthe taton hathore nat t higr nd o th ator thid e that 

[681.9578876495361 (1200 24%) 579.3874]
Whee ast ad th hit the it as whe avor ithe re nd thaind obere hat ly outh ha atir n an t sthi st the a 

[791.8179275989532 (1400 28%) 532.3194]
Whe t ass coouro theing od the atin, bo f t at f the toor ast ato wha st hats ath ot he athe the ts th 

[901.4364812374115 (

In [72]:
for i in range(10):
  start_strings = [" Th", " wh", " he", " I ", " ca", " G", " lo", " ra"]
  start = random.randint(0,len(start_strings)-1)
  print(start_strings[start])
#   all_characters.index(string[c])
  print(evaluate(start_strings[start], 200), '\n')

 ca
 caldo aty air a was han ong he thaind  the ande s was hornrs ouse rit. ' thes the oo the atheot the ork at hat haim tamy ag ou t hase ast hein bisas ofant he ar and tha whe ain ind ande 

as found olo c 

 ca
 casad oor o she d hens theag tamint ournd le ther s ald.  the ameny them utin y s thount the sth ond e than we ing arhee st of he the s thine ow she irt the to t heang dan ton hte wowan than loe fo and  

 ca
 cato sanothe thee y lloulke d any an  hinghitr co ange ronoucke ally od in and o thino athhears to ornd o Burr re  ald s thethe g ats ot us he wa mas ise athas t arle sthang as wthen calolt wos ss co of 

 lo
 lo than wang k lan itto wathear by atl helsst tha ngas t and theathe as alllored ad e halot s the athor te 
the tlole d ano wid wher as tan. he aind d tthe the d as they th ave re ad sailke athe meeve a 

 Th
 Thit hain anthe cathaind oond o wo thin tors thr thean, was thee ang the of ound the am bpand t ancore coront hit the t heom t ome d an nde ht wind s d s o

---

## Part 6: Generate output on a different dataset

---

**TODO:**

* Choose a textual dataset. Here are some [text datasets](https://www.kaggle.com/datasets?tags=14104-text+data%2C13205-text+mining) from Kaggle 

* Generate some decent looking results and evaluate your model's performance (say what it did well / not so well)

**DONE:**

