# Comp 599 - Assignment 3 report

This notebook completes the report section of Comp 599 Natural Language Understanding - Assignment 3.

In [1]:
from code import *

progress = {}

## Problems

### 1.3 Training (not autograded, needed for REPORT) (10 pts)
Some of the training code is provided. Fill in the blanks, specifically converting the output of our `sample_sequence` method back to natural language and printing it.

**Notes:**
- Make sure any tensors you make are moved to the GPU with .to(device) where device is “torch.device("cuda" if torch.cuda.is_available() else "cpu")”
- Make sure you are passing a new hidden state for each training iteration, passing the same one will cause computation graph errors with Pytorch as it tries to accumulate the losses from all previous batches. The main loop of your training code should do the following, in this sequence:
1. Apply the RNN to the incoming sequence
2. Use the loss function to calculate the loss on the model’s output
3. Zero the gradients of the optimizer
4. Perform a backward pass (calling `.backward()`)
5. Step the weights of the model via the optimizer (`.step()`)
6. Add the current loss to the running loss

**After every epoch, you should:**
1. Sample a character with some randomness (this is up to you, you could sample from the dataset or you could sample uniformly from unique characters, or only capital letters, or any other variation).
2. Print the result of the sampling to the output, so you can monitor the process of the training. Over time your generated text should become more and more “believable”. If not, something may be wrong with your implementation.

**For the REPORT, please do the following:**
1. Train your CharRNN on the Sherlock Holmes dataset provided. Include 3-5 samples generated from the model once you are reasonably confident in your model’s modelling abilities. Show how the temperature parameter affects the output. (Give some samples with low, medium, high temperatures). Try to get the best results possible from the training.

In [2]:
hidden_size = 512
embedding_size = 300
seq_len = 100
lr = 0.002
num_epochs = 100
epoch_size = 10  # one epoch is this # of examples
out_seq_len = 200
data_path = "./data/sherlock.txt"

# code to initialize dataloader, model
dataset = CharSeqDataloader(
    filepath=data_path, seq_len=seq_len, examples_per_epoch=epoch_size
)
model = CharRNN(
    n_chars=len(dataset.unique_chars),
    embedding_size=embedding_size,
    hidden_size=hidden_size,
)
model.to(device)

# Train the model
progress['Sherlock_CharRNN'] = train(model, dataset, lr=lr, out_seq_len=out_seq_len, num_epochs=num_epochs,
     sample_file='data/Sherlock_CharRNN.txt')

Starting model train..


100%|█████████████████████████████████████████| 100/100 [00:21<00:00,  4.55it/s]


2. Train your CharRNN on the Shakespeare dataset provided. Do the same as above.

In [3]:
hidden_size = 512
embedding_size = 300
seq_len = 100
lr = 0.002
num_epochs = 100
epoch_size = 10  # one epoch is this # of examples
out_seq_len = 200
data_path = "./data/shakespeare.txt"

# code to initialize dataloader, model
dataset = CharSeqDataloader(
    filepath=data_path, seq_len=seq_len, examples_per_epoch=epoch_size
)
model = CharRNN(
    n_chars=len(dataset.unique_chars),
    embedding_size=embedding_size,
    hidden_size=hidden_size,
)
model.to(device)

# Train the model
progress['Shakespeare_CharRNN'] = train(model, dataset, lr=lr, out_seq_len=out_seq_len, num_epochs=num_epochs,
     sample_file='data/Shakespeare_CharRNN.txt')

Starting model train..


100%|█████████████████████████████████████████| 100/100 [00:21<00:00,  4.61it/s]


**For the REPORT, please do the following:**
1. Train your CharLSTM on the Sherlock Holmes dataset provided. Include 3-5 samples generated from the model once you are reasonably confident in your model’s modelling abilities.

In [None]:
hidden_size = 512
embedding_size = 300
seq_len = 100
lr = 0.002
num_epochs = 100
epoch_size = 10  # one epoch is this # of examples
out_seq_len = 200
data_path = "./data/sherlock.txt"

# code to initialize dataloader, model
dataset = CharSeqDataloader(
    filepath=data_path, seq_len=seq_len, examples_per_epoch=epoch_size
)
model = CharLSTM(
    n_chars=len(dataset.unique_chars),
    embedding_size=embedding_size,
    hidden_size=hidden_size,
)
model.to(device)

# Train the model
progress['Sherlock_CharLSTM'] = train(model, dataset, lr=lr, out_seq_len=out_seq_len, num_epochs=num_epochs,
     sample_file='data/Sherlock_CharLSTM.txt')

Starting model train..


 61%|█████████████████████████▌                | 61/100 [08:13<04:42,  7.25s/it]

2. Train your CharLSTM on the Shakespeare dataset provided. Do the same as above.

In [None]:
hidden_size = 512
embedding_size = 300
seq_len = 100
lr = 0.002
num_epochs = 100
epoch_size = 10  # one epoch is this # of examples
out_seq_len = 200
data_path = "./data/shakespeare.txt"

# code to initialize dataloader, model
dataset = CharSeqDataloader(
    filepath=data_path, seq_len=seq_len, examples_per_epoch=epoch_size
)
model = CharLSTM(
    n_chars=len(dataset.unique_chars),
    embedding_size=embedding_size,
    hidden_size=hidden_size,
)
model.to(device)

# Train the model
progress['Shakespeare_CharLSTM'] = train(model, dataset, lr=lr, out_seq_len=out_seq_len, num_epochs=num_epochs,
     sample_file='data/Shakespeare_CharLSTM.txt')

In [None]:
import json
with open('data/progress.json', 'w') as f:
    json.dump(progress, f)

3. Note some observations regarding training your CharRNN vs. CharLSTM. Is training faster or slower? How does the training loss compare? Graph the loss. Is the final model better or worse at language modeling, and in what way? Any specific strengths or weaknesses you can observe for each model?