# Homework 6

In this homework you will be training and using a "char-RNN". This is the name given to a character-level recurrent neural network language model by [this famous blog post by Andrej Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). Before you start on the rest of the homework, please give the blog post a read, it's quite good!

I don't expect you to implement the char-RNN from scratch. Andrej's original char-rnn is in Torch (the predecessor to PyTorch that is not commonly used anymore). Fortunately, there are many other implementations of this model available; for example, there is one (in both mxnet and pytorch) in chapters 8 and 9 of [the textbook](http://d2l.ai), and another pytorch one [here](https://github.com/spro/char-rnn.pytorch). **Please use one of these example implementations (or another one that you find) when completing this homework**.

For this homework, please complete the following steps:

1. Download and tokenize the [Shakespeare dataset](http://www.gutenberg.org/files/100/100-0.txt) at a character level. I recommend basing your solution on the following code:
```Python
# Remove non-alphabetical characters, lowercase, and replace whitespace with ' '
raw_dataset = ' '.join(re.sub('[^A-Za-z ]+', '', text).lower().split())
# Maps token index to character
idx_to_char = list(set(raw_dataset))
# Maps character to token index
char_to_idx = dict([(char, i) for i, char in enumerate(idx_to_char)])
# Tokenize the dataset
corpus_indices = [char_to_idx[char] for char in raw_dataset]
```
1. Train a "vanilla" RNN (as described in chapter 9 of [the textbook](http://d2l.ai)) on the Shakespeare dataset. Report the training loss and generate some samples from the model at the end of training.
1. Train a GRU RNN (as described in chapter 10 of [the textbook](http://d2l.ai)) on the Shakespeare datatset. Is the final training loss higher or lower than the vanilla RNN? Are the samples from the model more or less realistic?
1. Find a smaller, simpler dataset than the Shakespeare data (you can find some ideas in Andrej's blog post, but feel free to get creative!) and train either the vanilla or GRU RNN on it instead. Is the final training loss higher or lower than it was for the Shakespeare data?

1) data is tokenized on following cells steps (2-4)

In [5]:
!pip install Unidecode



imports

In [6]:
import torch
import torch.nn as nn
from torch.autograd import Variable
import unidecode
import string
import random
import time
import math
import os
import argparse
import requests
from tqdm import tqdm

In [7]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


2) Vanilla RNN trained on Shakespeare ([source](https://github.com/vnikme/char-rnn.pytorch/tree/master))

In [13]:
import torch
import torch.nn as nn
from torch.autograd import Variable

class CharRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, model="gru", n_layers=1):
        super(CharRNN, self).__init__()
        self.model = model.lower()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.n_layers = n_layers

        self.encoder = nn.Embedding(input_size, hidden_size)
        if self.model == "gru":
            self.rnn = nn.GRU(hidden_size, hidden_size, n_layers, batch_first=True)
        elif self.model == "lstm":
            self.rnn = nn.LSTM(hidden_size, hidden_size, n_layers, batch_first=True)
        self.decoder = nn.Linear(hidden_size, output_size)

    def forward(self, input, hidden):
        """
            input: shape=(batch_size, seq_size)
            output: shape=(batch_size, seq_size, output_size)
        """
        encoded = self.encoder(input)
        output, hidden = self.rnn(encoded, hidden)
        output = self.decoder(output)
        return output, hidden

    def init_hidden(self, batch_size, cuda):
        cuda_wrapper = lambda x: x.cuda() if cuda else x
        if self.model == "lstm":
            return (cuda_wrapper(Variable(torch.zeros(self.n_layers, batch_size, self.hidden_size))),
                    cuda_wrapper(Variable(torch.zeros(self.n_layers, batch_size, self.hidden_size))))
        return cuda_wrapper(Variable(torch.zeros(self.n_layers, batch_size, self.hidden_size)))

In [15]:
import string
import random
import time
import math

def read_file(filename):
    file = open(filename).read()
    all_characters = list(set(file))
    return file, len(file), all_characters, len(all_characters)

# Turning a string into a tensor

def char_tensor(string, all_characters):
    tensor = torch.zeros(len(string)).long()
    for c in range(len(string)):
        try:
            tensor[c] = all_characters.index(string[c])
        except:
            continue
    return tensor

# Readable time elapsed

def time_since(since):
    s = time.time() - since
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)

In [16]:

import os
import argparse

def generate(decoder, all_characters, prime_str='A', predict_len=100, temperature=0.8, cuda=False):
    hidden = decoder.init_hidden(1, cuda)
    prime_input = Variable(char_tensor(prime_str, all_characters).unsqueeze(0))

    if cuda:
        prime_input = prime_input.cuda()
    predicted = prime_str

    # Use priming string to "build up" hidden state
    _, hidden = decoder(prime_input, hidden)
    inp = prime_input[0,-1].view(1, -1)

    for p in range(predict_len):
        output, hidden = decoder(inp, hidden)

        # Sample from the network as a multinomial distribution
        output_dist = output.data.view(-1).div(temperature).exp()
        top_i = torch.multinomial(output_dist, 1)[0]

        # Add predicted character to string and use as next input
        predicted_char = all_characters[top_i]
        predicted += predicted_char
        inp = Variable(char_tensor(predicted_char, all_characters).unsqueeze(0))
        if cuda:
            inp = inp.cuda()

    return predicted

In [19]:
filename = '/content/drive/MyDrive/CS2516/Homework/shakespeare.txt'
model = 'lstm'
n_epochs = 2000
print_every = 100
hidden_size = 128
n_layers = 2
learning_rate = 0.01
chunk_len = 200
batch_size = 128
shuffle = True
cuda = True

if cuda:
    print("Using CUDA")

file, file_len, all_characters, n_characters = read_file(filename)

def random_training_set(chunk_len, batch_size):
    inp = torch.LongTensor(batch_size, chunk_len)
    target = torch.LongTensor(batch_size, chunk_len)
    for bi in range(batch_size):
        start_index = random.randint(0, file_len - chunk_len - 1)
        end_index = start_index + chunk_len + 1
        chunk = file[start_index:end_index]
        inp[bi] = char_tensor(chunk[:-1], all_characters)
        target[bi] = char_tensor(chunk[1:], all_characters)
    inp = Variable(inp)
    target = Variable(target)
    if cuda:
        inp = inp.cuda()
        target = target.cuda()
    return inp, target

def train(inp, target):
    """
        inp: (batch_size, seq_size)
        target: (batch_size, seq_size)
    """
    hidden = decoder.init_hidden(batch_size, cuda)
    decoder.zero_grad()

    output, hidden = decoder(inp, hidden)
    loss = criterion(output.view(-1, output.size(-1)), target.view(-1))

    loss.backward()
    decoder_optimizer.step()

    return loss.item()

def save():
    save_filename = os.path.splitext(os.path.basename(filename))[0] + '.pt'
    torch.save((all_characters, decoder), save_filename)
    print('Saved as %s' % save_filename)

# Initialize models and start training

decoder = CharRNN(
    n_characters,
    hidden_size,
    n_characters,
    model=model,
    n_layers=n_layers,
)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

if cuda:
    decoder.cuda()

start = time.time()
all_losses = []
loss_avg = 0

try:
    print("Training for %d epochs..." % n_epochs)
    for epoch in tqdm(range(1, n_epochs + 1)):
        loss = train(*random_training_set(chunk_len, batch_size))
        loss_avg += loss

        if epoch % print_every == 0:
            print('[%s (%d %d%%) %.4f]' % (time_since(start), epoch, epoch / n_epochs * 100, loss))
            print(generate(decoder, all_characters, 'Wh', 100, cuda=cuda), '\n')

    print("Saving...")
    save()

except KeyboardInterrupt:
    print("Saving before quit...")
    save()

Using CUDA
Training for 2000 epochs...


  5%|▌         | 100/2000 [00:25<08:24,  3.77it/s]

[0m 25s (100 5%) 1.7850]
Wh semple the maid, in.

MENIUS:
And his king of 'tide to verdel's grumpet,
But prick you, to parged i 



 10%|█         | 200/2000 [00:50<08:16,  3.63it/s]

[0m 50s (200 10%) 1.5559]
When, by that the male or meat:
The good:
let a kiships the ruins he made the might ture and earred bu 



 15%|█▌        | 300/2000 [01:15<07:22,  3.84it/s]

[1m 15s (300 15%) 1.4857]
Where, my bring out as where the sent was my antwart; but thou whom'd to her with a speak.

CORIOLANUS 



 20%|██        | 400/2000 [01:39<06:55,  3.85it/s]

[1m 39s (400 20%) 1.4015]
Where, go,--
And lose us uncle body is the sigod such execute.

LEONTES:
Is now! is haste thou art not 



 25%|██▌       | 500/2000 [02:04<06:36,  3.79it/s]

[2m 4s (500 25%) 1.3561]
Wherver in him. Can we shall finded thou hast
And his brother of time to see me prevered along for the 



 30%|███       | 600/2000 [02:30<06:14,  3.74it/s]

[2m 29s (600 30%) 1.3669]
Wherefore come beching great thy lovy my blood
Hath dip of his father knonge of this slands
Of the mis 



 35%|███▌      | 700/2000 [02:54<05:42,  3.80it/s]

[2m 54s (700 35%) 1.3398]
Where thee, by that effemies up an our mistress.

PAULINA:
My father fellow,
And then afed men hands s 



 40%|████      | 800/2000 [03:19<05:14,  3.81it/s]

[3m 19s (800 40%) 1.3436]
Where is dew the bloody of your grace.

KING HENRY VI:
And, most for this corry that thou shame with G 



 45%|████▌     | 900/2000 [03:45<04:58,  3.68it/s]

[3m 45s (900 45%) 1.2883]
Where suit and have appearth they ray:
Be assimy of the notes, here a
make with this hand of manteling 



 50%|█████     | 1000/2000 [04:10<04:37,  3.61it/s]

[4m 10s (1000 50%) 1.3118]
Where is in all the good de cousins bark.
If they do now do you swear
I says that talk of true, we we  



 55%|█████▌    | 1100/2000 [04:34<04:00,  3.74it/s]

[4m 34s (1100 55%) 1.2891]
Wheevein it say, how would I do once as will then cover
While you well then bold withal;
And tell me,  



 60%|██████    | 1200/2000 [05:00<03:46,  3.54it/s]

[4m 59s (1200 60%) 1.3051]
Wheretimes not warrant, Aufidius,
Do from my ears steal at this work.

GRUMIO:
Heavion thou wast mine  



 65%|██████▌   | 1300/2000 [05:25<03:10,  3.68it/s]

[5m 25s (1300 65%) 1.2936]
Where was a man with me and a complot:
Since thy house and puttiful with vain;
For they still fearful, 



 70%|███████   | 1400/2000 [05:50<02:37,  3.80it/s]

[5m 50s (1400 70%) 1.2679]
Where is to paper mayor else his hand.

ISABELLA:
Can past thou think thou holy and bear
The time in o 



 75%|███████▌  | 1500/2000 [06:15<02:16,  3.66it/s]

[6m 15s (1500 75%) 1.2806]
Where they are some own valiant that bearing him:
As I am no ashan they would be condain of the proud
 



 80%|████████  | 1600/2000 [06:40<01:47,  3.72it/s]

[6m 40s (1600 80%) 1.2508]
Wheel you stay to help to an incanch;
The only flouring and create my sovereign.
I, where is struck an 



 85%|████████▌ | 1700/2000 [07:05<01:17,  3.85it/s]

[7m 5s (1700 85%) 1.2557]
Where is my womb, and you have been thou hast
viscent to friend and the world,
Which that I maid battl 



 90%|█████████ | 1800/2000 [07:30<00:54,  3.69it/s]

[7m 30s (1800 90%) 1.2686]
Where is a merciful up.

SICINIUS:
No, boy!

Lord:
Let me be the gates to the bear
To slaughter'd hope 



 95%|█████████▌| 1900/2000 [07:55<00:28,  3.55it/s]

[7m 55s (1900 95%) 1.2576]
Where it be done: but that hitles may be executed,
And with him do will strive, become in it.

DUKE VI 



100%|██████████| 2000/2000 [08:20<00:00,  3.99it/s]

[8m 20s (2000 100%) 1.2537]
Where have I have no more before I
Was neglicians, and to my love the flierful devil
To the Thrance an 

Saving...
Saved as shakespeare.pt





final loss:

`1.2537`

Final generated sequence:

`Where have I have no more before I
Was neglicians, and to my love the flierful devil
To the Thrance an`

3) GRU RNN trained on Shakespeare([source](https://github.com/vnikme/char-rnn.pytorch/tree/master))

In [17]:

filename = '/content/drive/MyDrive/CS2516/Homework/shakespeare.txt'
model = 'gru'
n_epochs = 2000
print_every = 100
hidden_size = 128
n_layers = 2
learning_rate = 0.01
chunk_len = 200
batch_size = 128
shuffle = True
cuda = True

if cuda:
    print("Using CUDA")

file, file_len, all_characters, n_characters = read_file(filename)

def random_training_set(chunk_len, batch_size):
    inp = torch.LongTensor(batch_size, chunk_len)
    target = torch.LongTensor(batch_size, chunk_len)
    for bi in range(batch_size):
        start_index = random.randint(0, file_len - chunk_len - 1)
        end_index = start_index + chunk_len + 1
        chunk = file[start_index:end_index]
        inp[bi] = char_tensor(chunk[:-1], all_characters)
        target[bi] = char_tensor(chunk[1:], all_characters)
    inp = Variable(inp)
    target = Variable(target)
    if cuda:
        inp = inp.cuda()
        target = target.cuda()
    return inp, target

def train(inp, target):
    """
        inp: (batch_size, seq_size)
        target: (batch_size, seq_size)
    """
    hidden = decoder.init_hidden(batch_size, cuda)
    decoder.zero_grad()

    output, hidden = decoder(inp, hidden)
    loss = criterion(output.view(-1, output.size(-1)), target.view(-1))

    loss.backward()
    decoder_optimizer.step()

    return loss.item()

def save():
    save_filename = os.path.splitext(os.path.basename(filename))[0] + '.pt'
    torch.save((all_characters, decoder), save_filename)
    print('Saved as %s' % save_filename)

# Initialize models and start training

decoder = CharRNN(
    n_characters,
    hidden_size,
    n_characters,
    model=model,
    n_layers=n_layers,
)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

if cuda:
    decoder.cuda()

start = time.time()
all_losses = []
loss_avg = 0

try:
    print("Training for %d epochs..." % n_epochs)
    for epoch in tqdm(range(1, n_epochs + 1)):
        loss = train(*random_training_set(chunk_len, batch_size))
        loss_avg += loss

        if epoch % print_every == 0:
            print('[%s (%d %d%%) %.4f]' % (time_since(start), epoch, epoch / n_epochs * 100, loss))
            print(generate(decoder, all_characters, 'Wh', 100, cuda=cuda), '\n')

    print("Saving...")
    save()

except KeyboardInterrupt:
    print("Saving before quit...")
    save()

Using CUDA
Training for 2000 epochs...


  5%|▌         | 100/2000 [00:23<08:31,  3.72it/s]

[0m 23s (100 5%) 1.6169]
Whom Olar the blood
That your contleans of the at be the nerved let;
And sine, not son't the eath not  



 10%|█         | 200/2000 [00:48<07:57,  3.77it/s]

[0m 48s (200 10%) 1.4490]
Whought,
Braken thee him actions, thou discore stames may from me,
Looks and for bope into enemy, and  



 15%|█▌        | 300/2000 [01:12<07:14,  3.92it/s]

[1m 12s (300 15%) 1.3898]
Why way man;
And Raveration, and too throne?

PRINCE ELIZABETH:
She answelly ston of this have attend  



 20%|██        | 400/2000 [01:35<06:46,  3.93it/s]

[1m 35s (400 20%) 1.3627]
Why wife his thing dreed your honour.

KING EDWARD IV:
Worthy lord, my knief, this saded every breathe 



 25%|██▌       | 500/2000 [01:59<06:21,  3.93it/s]

[1m 59s (500 25%) 1.3864]
What with turns!
The pluck the prisoner than I have must whose realed the
commonier: but rebelling men 



 30%|███       | 600/2000 [02:24<05:51,  3.98it/s]

[2m 24s (600 30%) 1.3426]
Where's to be a man.
And by beation be condity and
I have been prophet a traitors, such a week thy
dis 



 35%|███▌      | 700/2000 [02:48<05:40,  3.82it/s]

[2m 48s (700 35%) 1.3140]
Whire troe's in your letters
With stefter my soul is exclassalaster how he hath
As on Henry and him to 



 40%|████      | 800/2000 [03:12<05:09,  3.87it/s]

[3m 12s (800 40%) 1.3163]
Whire as my life and me,
And so you to the myself infect of an all,
Come conveyore the sun and in the  



 45%|████▌     | 900/2000 [03:37<04:46,  3.85it/s]

[3m 37s (900 45%) 1.2978]
Whire order here;
And trust-pillions not with them; for who do
On such teft of rained end as you to do 



 50%|█████     | 1000/2000 [04:02<04:25,  3.77it/s]

[4m 2s (1000 50%) 1.3098]
Wherefore there lady;
What has a millops and the king as end
The poor war's subjects? the wepper bette 



 55%|█████▌    | 1100/2000 [04:26<04:15,  3.52it/s]

[4m 26s (1100 55%) 1.2825]
Whieve her, we would not have for a bough.

GLOUCESTER:
The unserve a word in this bastards.

PERDITA: 



 60%|██████    | 1200/2000 [04:51<03:26,  3.87it/s]

[4m 51s (1200 60%) 1.2897]
Whimself too can abuse as shall I be more
Can with them which any hutreads,
And many head for their co 



 65%|██████▌   | 1300/2000 [05:15<03:23,  3.44it/s]

[5m 15s (1300 65%) 1.2926]
Whrifechieft ready well;
I know the way of your all set no put the heart
As a daggers.

MENENIUS:
My l 



 70%|███████   | 1400/2000 [05:40<02:38,  3.78it/s]

[5m 40s (1400 70%) 1.2808]
Whiefled grown a speak speak.

MENENIUS:
Why, sir, I cannot not had to the house.

VIRGILIA:
I do more 



 75%|███████▌  | 1500/2000 [06:04<02:20,  3.55it/s]

[6m 4s (1500 75%) 1.2992]
Whate your pronounce thee hither,
Lovers of mine eyes with her of the commoning.
You may slaughters ar 



 80%|████████  | 1600/2000 [06:29<01:43,  3.88it/s]

[6m 29s (1600 80%) 1.2764]
Why, present lady-time,
Or out of hands with utter, my lord,
Of thy battle let the foul vows of flower 



 85%|████████▌ | 1700/2000 [06:53<01:16,  3.95it/s]

[6m 53s (1700 85%) 1.2734]
Whick thee byself.

Boy:
Trust as if he hath littles were to kill'd a prince to him.

DUKE VINCENTIO:
 



 90%|█████████ | 1800/2000 [07:17<00:51,  3.87it/s]

[7m 17s (1800 90%) 1.2644]
Why, Gaunt is a woman's blind fear,
Which makes an oaths to protect a deputy,
In privateous comfort th 



 95%|█████████▌| 1900/2000 [07:42<00:25,  3.96it/s]

[7m 42s (1900 95%) 1.3071]
Whose countrymen, and all, if the cords be gone.

KING LEWIS XI:
Take it those much you and for my mea 



100%|██████████| 2000/2000 [08:06<00:00,  4.11it/s]

[8m 6s (2000 100%) 1.2759]
Why forward, not there.

HERMIONE:
He should rid him not stay my sight in the king;
You must come beho 

Saving...
Saved as shakespeare.pt





final loss:

`1.2759`

Final generated sequence:

`Why forward, not there.`

`HERMIONE:
He should rid him not stay my sight in the king;
You must come beho`

The final los is higher (but very close) when comapred to the previous model. Nevertheless, the generated sequences appear to have similar realism.

4) GRU RNN trained on dad jokes

In [18]:
import torch
import torch.nn as nn
from torch.autograd import Variable
import argparse
import os

from tqdm import tqdm

filename = '/content/drive/MyDrive/CS2516/Homework/jokes.txt'
model = 'gru'
n_epochs = 2000
print_every = 100
hidden_size = 128
n_layers = 2
learning_rate = 0.01
chunk_len = 200
batch_size = 128
shuffle = True
cuda = True

if cuda:
    print("Using CUDA")

file, file_len, all_characters, n_characters = read_file(filename)

def random_training_set(chunk_len, batch_size):
    inp = torch.LongTensor(batch_size, chunk_len)
    target = torch.LongTensor(batch_size, chunk_len)
    for bi in range(batch_size):
        start_index = random.randint(0, file_len - chunk_len - 1)
        end_index = start_index + chunk_len + 1
        chunk = file[start_index:end_index]
        inp[bi] = char_tensor(chunk[:-1], all_characters)
        target[bi] = char_tensor(chunk[1:], all_characters)
    inp = Variable(inp)
    target = Variable(target)
    if cuda:
        inp = inp.cuda()
        target = target.cuda()
    return inp, target

def train(inp, target):
    """
        inp: (batch_size, seq_size)
        target: (batch_size, seq_size)
    """
    hidden = decoder.init_hidden(batch_size, cuda)
    decoder.zero_grad()

    output, hidden = decoder(inp, hidden)
    loss = criterion(output.view(-1, output.size(-1)), target.view(-1))

    loss.backward()
    decoder_optimizer.step()

    return loss.item()

def save():
    save_filename = os.path.splitext(os.path.basename(filename))[0] + '.pt'
    torch.save((all_characters, decoder), save_filename)
    print('Saved as %s' % save_filename)

# Initialize models and start training

decoder = CharRNN(
    n_characters,
    hidden_size,
    n_characters,
    model=model,
    n_layers=n_layers,
)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

if cuda:
    decoder.cuda()

start = time.time()
all_losses = []
loss_avg = 0

try:
    print("Training for %d epochs..." % n_epochs)
    for epoch in tqdm(range(1, n_epochs + 1)):
        loss = train(*random_training_set(chunk_len, batch_size))
        loss_avg += loss

        if epoch % print_every == 0:
            print('[%s (%d %d%%) %.4f]' % (time_since(start), epoch, epoch / n_epochs * 100, loss))
            print(generate(decoder, all_characters, 'Wh', 100, cuda=cuda), '\n')

    print("Saving...")
    save()

except KeyboardInterrupt:
    print("Saving before quit...")
    save()

Using CUDA
Training for 2000 epochs...


  5%|▌         | 100/2000 [00:25<08:06,  3.91it/s]

[0m 24s (100 5%) 0.8921]
Whe was favorite no pression with a roman pizza tree?<> When he was the difference.
Have tay on the bo 



 10%|█         | 200/2000 [00:50<07:42,  3.89it/s]

[0m 50s (200 10%) 0.1851]
Whatical fow do you go to read a batile?<>It was always so jaded.
Why did the coffee secroson?<> They' 



 15%|█▌        | 300/2000 [01:14<07:14,  3.92it/s]

[1m 14s (300 15%) 0.1212]
What’s the drumman part of the car bey to tell you a fighting joke...<>but I forgot the punch line.
Wh 



 20%|██        | 400/2000 [01:39<07:04,  3.77it/s]

[1m 39s (400 20%) 0.0986]
Whate a computer’s favorite snack?<>Microchips!
Why was the robot so tired after his road trip?<>He ha 



 25%|██▌       | 500/2000 [02:04<06:35,  3.79it/s]

[2m 4s (500 25%) 0.0921]
Whe Shath his so popular?<>Because it has a lot of dates!
Why did Mickey Mouse take a trip into space? 



 30%|███       | 600/2000 [02:28<06:03,  3.85it/s]

[2m 28s (600 30%) 0.0845]
Whating the other day teaching me how to read maps backwards<>turns out it was just spam.
I'm reading  



 35%|███▌      | 700/2000 [02:52<05:34,  3.89it/s]

[2m 52s (700 35%) 0.0891]
When it called Cecor for?<>Because he couldn’t see that well.
My boss told me to have a good day...<>. 



 40%|████      | 800/2000 [03:17<05:37,  3.56it/s]

[3m 17s (800 40%) 0.0761]
Whicken sedans!
How do you make a Kleenex dance? <>Put a little boogie in it!
A termite walks into a b 



 45%|████▌     | 900/2000 [03:42<05:06,  3.59it/s]

[3m 42s (900 45%) 0.0831]
Who was lab partners with a door?<> When it's ajar.
I made a belt out of watches once...<> It was a wa 



 50%|█████     | 1000/2000 [04:06<04:18,  3.87it/s]

[4m 6s (1000 50%) 0.0800]
When it cross to soes? <> Sore arms.
Last night me and my girlfriend watched three Do jked only have t 



 55%|█████▌    | 1100/2000 [04:30<03:49,  3.93it/s]

[4m 30s (1100 55%) 0.0790]
Whis their wedding day?<>It was loaf at first sight.
Why do melons have weddings?<>Because they cantal 



 60%|██████    | 1200/2000 [04:55<03:29,  3.82it/s]

[4m 55s (1200 60%) 0.0745]
When it gets bad, I take something for it.
I used to be addicted to soap...<> but I'm clean now.
When  



 65%|██████▌   | 1300/2000 [05:20<03:10,  3.68it/s]

[5m 20s (1300 65%) 0.0907]
When it take to make an octopus laugh? <>Ten-tickles.
I’m only familiar with 25 letters in the English 



 70%|███████   | 1400/2000 [05:45<02:45,  3.63it/s]

[5m 45s (1400 70%) 0.0874]
When it's a nice gnawint to people eat beavers.
What’s the most patriotic states. <>I had to calm him  



 75%|███████▌  | 1500/2000 [06:10<02:16,  3.66it/s]

[6m 9s (1500 75%) 0.0778]
Whelcr ion.
Want to hear a joke about a piece of paper? Never mind... <>it's tearable.
I just watched  



 80%|████████  | 1600/2000 [06:34<01:43,  3.87it/s]

[6m 34s (1600 80%) 0.0733]
When it hit me.
I’ve been bored recently, so I decided to take up fencing.<> The neighbors keep demand 



 85%|████████▌ | 1700/2000 [06:58<01:17,  3.86it/s]

[6m 58s (1700 85%) 0.0751]
When it's ajar.
I made a belt out of watches once...<> It was a waist of time.
This furniture store ke 



 90%|█████████ | 1800/2000 [07:23<00:52,  3.79it/s]

[7m 23s (1800 90%) 0.0742]
When it gets bad, I take something for it.
I used to be addicted to soap...<> but I'm clean now.
When  



 95%|█████████▌| 1900/2000 [07:47<00:26,  3.80it/s]

[7m 47s (1900 95%) 0.0727]
When it's ajar.
I made a belt out of watches once...<> It was a waist of time.
This furniture store ke 



100%|██████████| 2000/2000 [08:11<00:00,  4.07it/s]

[8m 11s (2000 100%) 0.0767]
When it's ajar.
I made a belt out of watches once...<> It was a waist outates it saw the salad dressin 

Saving...
Saved as jokes.pt





The achieved loss is lower than the model trained on Shakespeare.

final loss:

`0.0767`

Final generated sequence:

`When it's ajar.
I made a belt out of watches once...<> It was a waist outates it saw the salad dressin `