# Tutorial - Generative Recurrent Neural Networks

Last time we discussed using recurrent neural networks to make predictions about sequences. In particular, we treated tweets as a **sequence** of words. Since tweets can have a variable number of words, we needed an architecture that can take variable-sized sequences as input.

This time, we will use recurrent neural networks to **generate** sequences.
Generating sequences is more involved comparing to making predictions about
sequences. However, it is a very interesting task, and many students chose
sequence-generation tasks for their projects.

Much of today's content is an adaptation of the "Practical PyTorch" github 
repository [1].

[1] https://github.com/spro/practical-pytorch/blob/master/char-rnn-generation/char-rnn-generation.ipynb

## Review

In recurrent neural networks the input sequence is broken down into tokens. We could choose whether to tokenize based on words, or based on characters. The representation of each token (GloVe or one-hot) is processed by the RNN one step at a time to update the hidden (or context) state.

In a predictive RNN, the value of the hidden states  is a representation of **all the text that was processed thus far**. Similarly, in a generative RNN, The value of the hidden state will be a representation of **all the text that still needs to be generated**. We will use this hidden state to produce the sequence, one token at a time.

Similar to the last tutorial we will break up the problem of generating text
to generating one token at a time.

We will do so with the help of two functions:

1. We need to be able to generate the *next* token, given the current 
   hidden state. In practice, we get a probability distribution over 
   the next token, and sample from that probability distribution.
2. We need to be able to update the hidden state somehow. To do so,
   we need two piece of information: the old hidden state, and the actual
   token that was generated in the previous step. The actual token generated
   will inform the subsequent tokens.

We will repeat both functions until a special "END OF SEQUENCE" token is
generated.

Note that there are several tricky things that we will have to figure out.
For example, how do we actually sample the actual token from the probability
distribution over tokens? What would we do during training, and how might 
that be different from during testing/evaluation? We will answer those
questions as we implement the RNN.

For now, let's start with our training data.

## Data: Donald Trump's Tweets from 2018

The training set we use is a collection of Donald Trump's tweets from 2018.
We will only use tweets that are 140 characters or shorter, and tweets
that contains more than just a URL.
Since tweets often contain creative spelling and numbers, and upper vs lower
case characters are read very differently, we will use a character-level RNN.

To start, let us load the trump.csv file to Google Colab and provide access to the drive. The file can be obtained from Quercus.

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
import csv

# file location (make sure to use your file location)
file_dir = '/content/drive/My Drive/Colab Notebooks/Lab 6 Tutorial/'

tweets = list(line[0] for line in csv.reader(open(file_dir + 'trump.csv')))
len(tweets)

22402

There are over 20000 tweets in this collection.
Let's look at a few of them, just to get a sense of the kind of text
we're dealing with:

In [0]:
print(tweets[100])
print(tweets[1000])
print(tweets[10000])

God Bless the people of Venezuela!
It was my honor. THANK YOU! https://t.co/1LvqbRQ1bi
Nobody but Donald Trump will save Israel. You are wasting your time with these politicians and political clowns. Best! #SheldonAdelson


## Generating One Tweet

Normally, when we build a new machine learn model, we want to make sure
that our model can overfit. To that end, we will first build a neural network
that can generate _one_ tweet really well. We can choose any tweet (or any other text)
we want.  Let's choose to build an RNN that generates `tweet[100]`.

In [0]:
tweet = tweets[100]
print(tweet)
print(len(tweet))

God Bless the people of Venezuela!
34


First, we will need to encode this tweet using a one-hot encoding.
We'll build dictionary mappings
from the character to the index of that character (a unique integer identifier),
and from the index to the character. We'll use the same naming scheme that `torchtext`
uses (`stoi` and `itos`).

For simplicity, we'll work with a limited vocabulary containing
just the characters in `tweet[100]`, plus two special tokens:

- `<EOS>` represents "End of String", which we'll append to the end of our tweet.
  Since tweets are variable-length, this is a way for the RNN to signal
  that the entire sequence has been generated.
- `<BOS>` represents "Beginning of String", which we'll prepend to the beginning of 
  our tweet. This is the first token that we will feed into the RNN.

The way we use these special tokens will become more clear as we build the model.

In [0]:
vocab = list(set(tweet)) + ["<BOS>", "<EOS>"]
vocab_stoi = {s: i for i, s in enumerate(vocab)}
vocab_itos = {i: s for i, s in enumerate(vocab)}
vocab_size = len(vocab)

In [0]:
print(vocab)
print(vocab_stoi)
print(vocab_itos)
print(vocab_size)

['u', 'B', 't', 's', 'h', 'e', 'V', 'l', 'o', 'n', 'z', 'p', 'f', '!', 'G', ' ', 'a', 'd', '<BOS>', '<EOS>']
{'u': 0, 'B': 1, 't': 2, 's': 3, 'h': 4, 'e': 5, 'V': 6, 'l': 7, 'o': 8, 'n': 9, 'z': 10, 'p': 11, 'f': 12, '!': 13, 'G': 14, ' ': 15, 'a': 16, 'd': 17, '<BOS>': 18, '<EOS>': 19}
{0: 'u', 1: 'B', 2: 't', 3: 's', 4: 'h', 5: 'e', 6: 'V', 7: 'l', 8: 'o', 9: 'n', 10: 'z', 11: 'p', 12: 'f', 13: '!', 14: 'G', 15: ' ', 16: 'a', 17: 'd', 18: '<BOS>', 19: '<EOS>'}
20


Now that we have our vocabulary, we can build the PyTorch model
for this problem.
The actual model is not as complex as you might think. We actually
already learned about all the components that we need. (Using and training
the model is the hard part)

In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [0]:


class TextGenerator(nn.Module):
    def __init__(self, vocab_size, hidden_size, n_layers=1):
        super(TextGenerator, self).__init__()

        # identiy matrix for generating one-hot vectors
        self.ident = torch.eye(vocab_size)

        # recurrent neural network
        self.rnn = nn.GRU(vocab_size, hidden_size, n_layers, batch_first=True)

        # a fully-connect layer that outputs a distribution over
        # the next token, given the RNN output
        self.decoder = nn.Linear(hidden_size, vocab_size)
    
    def forward(self, inp, hidden=None):
        inp = self.ident[inp]                  # generate one-hot vectors of input
        output, hidden = self.rnn(inp, hidden) # get the next output and hidden state
        output = self.decoder(output)          # predict distribution over next tokens
        return output, hidden

model = TextGenerator(vocab_size, 64)

## Training with Teacher Forcing

At a very high level, we want our RNN model to have a high probability
of generating the tweet. An RNN model generates text
one character at a time based on the hidden state value.
At each time step, we will check whether the mdoel generated the
correct character. That is, at each time step,
we are trying to select the correct next character out of all the 
characters in our vocabulary. Recall that this problem is a multi-class
classification problem, and we can use Cross-Entropy loss to train our
network to become better at this type of problem.

In [0]:
criterion = nn.CrossEntropyLoss()

However, we don't just have a single multi-class classification problem.
Instead, we have **one classification problem per time-step** (per token)!
So, how do we predict the first token in the sequence? 
How do we predict the second token in the sequence? 

To help you understand what happens durign RNN training, we'll start with a
inefficient training code that shows you what happens step-by-step. We'll
start with computing the loss for the first token generated, then the second token,
and so on.
Later on, we'll switch to a simpler and more performant version of the code.

So, let's start with the first classification problem: the problem of generating
the **first** token (`tweet[0]`).

To generate the first token, we'll feed the RNN network (with an initial, empty
hidden state) the "<BOS>" token. Then, the output

In [0]:
bos_input = torch.Tensor([vocab_stoi["<BOS>"]])
print(bos_input.shape, type(bos_input))
bos_input = bos_input.long()
print(bos_input.shape, type(bos_input))
bos_input = bos_input.unsqueeze(0)
print(bos_input.shape, type(bos_input))
output, hidden = model(bos_input, hidden=None)
output # distribution over the first token

torch.Size([1]) <class 'torch.Tensor'>
torch.Size([1]) <class 'torch.Tensor'>
torch.Size([1, 1]) <class 'torch.Tensor'>


tensor([[[ 4.4225e-02,  1.3240e-01, -2.3740e-02,  3.0483e-02, -7.1641e-02,
           1.0096e-01,  8.1199e-02,  1.2761e-01, -1.2205e-06,  3.1027e-03,
           1.6290e-02, -5.5477e-02, -1.9324e-02,  9.2466e-03,  2.4454e-02,
           1.1759e-01, -7.2113e-02, -1.5862e-02,  3.0889e-02, -9.1323e-02]]],
       grad_fn=<AddBackward0>)

In [0]:
bos_input

tensor([[18]])

We can compute the loss using `criterion`. Since the model is untrained,
the loss is expected to be high. (For now, we won't do anything
with this loss, and omit the backward pass.)

In [0]:
target = torch.Tensor([vocab_stoi[tweet[0]]]).long().unsqueeze(0)
criterion(output.reshape(-1, vocab_size), # reshape to 2D tensor
          target.reshape(-1))             # reshape to 1D tensor

tensor(3.0706, grad_fn=<NllLossBackward>)

In [0]:
print(target)
print(output)
print(output.reshape(-1, vocab_size))
print(target.reshape(-1))

tensor([[6]])
tensor([[[-0.0023,  0.1443,  0.0212,  0.0992,  0.1040,  0.0890, -0.0478,
           0.0194,  0.1120, -0.0436,  0.0201,  0.0630, -0.0489,  0.0486,
           0.1207, -0.0904, -0.0836,  0.0305, -0.0409, -0.0220]]],
       grad_fn=<AddBackward0>)
tensor([[-0.0023,  0.1443,  0.0212,  0.0992,  0.1040,  0.0890, -0.0478,  0.0194,
          0.1120, -0.0436,  0.0201,  0.0630, -0.0489,  0.0486,  0.1207, -0.0904,
         -0.0836,  0.0305, -0.0409, -0.0220]], grad_fn=<AsStridedBackward>)
tensor([6])


Now, we need to update the hidden state and generate a prediction
for the next token. To do so, we need to provide the current token to
the RNN. We already said that during test time, we'll need to sample
from the predicted probabilty over tokens that the neural network
just generated. 

Right now, we can do something better: we can **use the ground-truth,
actual target token**. This technique is called **teacher-forcing**, 
and generally speeds up training. The reason is that right now, 
since our model does not perform well, the predicted probability
distribution is pretty far from the ground truth. So, it is very,
very difficult for the neural network to get back on track given bad
input data.

In [0]:
# Use teacher-forcing: we pass in the ground truth `target`,
# rather than using the NN predicted distribution
output, hidden = model(target, hidden)
output # distribution over the second token

tensor([[[-0.0403,  0.1374,  0.0388,  0.0839,  0.1198,  0.0802, -0.0541,
           0.0267,  0.0691, -0.0618, -0.0191,  0.0998, -0.0151,  0.0469,
           0.1330, -0.0750, -0.0906,  0.0467, -0.0325, -0.0012]]],
       grad_fn=<AddBackward0>)

Similar to the first step, we can compute the loss, quantifying the
difference between the predicted distribution and the actual next
token. This loss can be used to adjust the weights of the neural
network (which we are not doing yet).

In [0]:
target = torch.Tensor([vocab_stoi[tweet[1]]]).long().unsqueeze(0)
criterion(output.reshape(-1, vocab_size), # reshape to 2D tensor
          target.reshape(-1))             # reshape to 1D tensor

tensor(2.8854, grad_fn=<NllLossBackward>)

We can continue this process of:

- feeding the previous ground-truth token to the RNN,
- obtaining the prediction distribution over the next token, and
- computing the loss,

for as many steps as there are tokens in the ground-truth tweet.

In [0]:
for i in range(2, len(tweet)):
    output, hidden = model(target, hidden)
    target = torch.Tensor([vocab_stoi[tweet[1]]]).long().unsqueeze(0)
    loss = criterion(output.reshape(-1, vocab_size), # reshape to 2D tensor
                     target.reshape(-1))             # reshape to 1D tensor
    print(i, output, loss)

2 tensor([[[-0.0321,  0.1427,  0.0448,  0.1036,  0.1191,  0.0974, -0.0968,
           0.0005,  0.0566, -0.0691, -0.0189,  0.1206, -0.0185, -0.0102,
           0.1500, -0.0580, -0.0986,  0.0828, -0.0334, -0.0036]]],
       grad_fn=<AddBackward0>) tensor(2.8801, grad_fn=<NllLossBackward>)
3 tensor([[[-0.0241,  0.1407,  0.0507,  0.1155,  0.1168,  0.1062, -0.1245,
          -0.0156,  0.0492, -0.0770, -0.0265,  0.1350, -0.0186, -0.0439,
           0.1625, -0.0474, -0.1015,  0.1005, -0.0278, -0.0061]]],
       grad_fn=<AddBackward0>) tensor(2.8819, grad_fn=<NllLossBackward>)
4 tensor([[[-0.0192,  0.1372,  0.0549,  0.1213,  0.1152,  0.1095, -0.1419,
          -0.0246,  0.0458, -0.0836, -0.0334,  0.1448, -0.0177, -0.0633,
           0.1709, -0.0411, -0.1021,  0.1098, -0.0227, -0.0083]]],
       grad_fn=<AddBackward0>) tensor(2.8851, grad_fn=<NllLossBackward>)
5 tensor([[[-0.0167,  0.1342,  0.0575,  0.1237,  0.1144,  0.1102, -0.1524,
          -0.0295,  0.0445, -0.0885, -0.0380,  0.1512, -0.016

Finally, with our final token, we should expect to output the "<EOS>"
token, so that our RNN learns when to stop generating characters.

In [0]:
output, hidden = model(target, hidden)
target = torch.Tensor([vocab_stoi["<EOS>"]]).long().unsqueeze(0)
loss = criterion(output.reshape(-1, vocab_size), # reshape to 2D tensor
                 target.reshape(-1))             # reshape to 1D tensor
print(i, output, loss)

33 tensor([[[-0.0140,  0.1282,  0.0606,  0.1244,  0.1135,  0.1090, -0.1657,
          -0.0355,  0.0457, -0.0969, -0.0423,  0.1600, -0.0139, -0.0868,
           0.1825, -0.0337, -0.1004,  0.1201, -0.0153, -0.0121]]],
       grad_fn=<AddBackward0>) tensor(3.0338, grad_fn=<NllLossBackward>)


In practice, we don't really need a loop. Recall that in a predictive RNN,
the `nn.RNN` module can take an entire sequence as input. We can do the
same thing here:

In [0]:
tweet_ch = ["<BOS>"] + list(tweet) + ["<EOS>"]
tweet_indices = [vocab_stoi[ch] for ch in tweet_ch]
tweet_tensor = torch.Tensor(tweet_indices).long().unsqueeze(0)

print(tweet_tensor.shape)

output, hidden = model(tweet_tensor[:,:-1]) # <EOS> is never an input token
target = tweet_tensor[:,1:]                 # <BOS> is never a target token
loss = criterion(output.reshape(-1, vocab_size), # reshape to 2D tensor
                 target.reshape(-1))             # reshape to 1D tensor

torch.Size([1, 36])


Here, the input to our neural network model is the *entire*
sequence of input tokens (everything from "<BOS>" to the
last character of the tweet). The neural network generates a prediction distribution
of the next token at each step. We can compare each of these  with the ground-truth
`target`.


Our training loop (for learning to generate the single `tweet`) will therefore
look something like this:

In [0]:
print(tweet_tensor[:,:-1])
print(target)

tensor([[18,  6,  1,  8, 15,  9, 17,  7,  3,  3, 15, 10,  2,  7, 15,  0,  7,  1,
          0, 17,  7, 15,  1,  4, 15,  5,  7, 16,  7, 11, 12,  7, 17, 13, 14]])
tensor([[ 6,  1,  8, 15,  9, 17,  7,  3,  3, 15, 10,  2,  7, 15,  0,  7,  1,  0,
         17,  7, 15,  1,  4, 15,  5,  7, 16,  7, 11, 12,  7, 17, 13, 14, 19]])


In [0]:
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
for it in range(500):
    optimizer.zero_grad()
    output, _ = model(tweet_tensor[:,:-1])
    loss = criterion(output.reshape(-1, vocab_size),
                 target.reshape(-1))
    loss.backward()
    optimizer.step()

    if (it+1) % 100 == 0:
        print("[Iter %d] Loss %f" % (it+1, float(loss)))

[Iter 100] Loss 1.840927
[Iter 200] Loss 0.157599
[Iter 300] Loss 0.030487
[Iter 400] Loss 0.013288
[Iter 500] Loss 0.007862


The training loss is decreasing with training, which is what we expect.

## Generating a Token

At this point, we want to see whether our model is actually learning
something. So, we need to talk about how to
actually use the RNN model to generate text. If we can 
generate text, we can make a qualitative asssessment of how well
our RNN is performing.

The main difference between training and test-time (generation time)
is that we don't have the ground-truth tokens to feed as inputs
to the RNN. Instead, we need to actually **sample** a token based
on the neural network's prediction distribution.

But how can we sample a token from a distribution?

On one extreme, we can always take
the token with the largest probability (argmax). This has been our
go-to technique in other classification tasks. However, this idea
will fail here. The reason is that in practice, 
**we want to be able to generate a variety of different sequences from
the same model**. An RNN that can only generate a single new Trump Tweet
is fairly useless.

In short, we want some randomness. We can do so by using the logit
outputs from our model to construct a multinomial distribution over
the tokens, then and sample a random token from that multinomial distribution.

One natural multinomial distribution we can choose is the 
distribution we get after applying the softmax on the outputs.
However, we will do one more thing: we will add a **temperature**
parameter to manipulate the softmax outputs. We can set a
**higher temperature** to make the probability of each token
**more even** (more random), or a **lower temperature** to assign
more probability to the tokens with a higher logit (output).
A **higher temperature** means that we will get a more diverse sample,
with potentially more mistakes. A **lower temperature** means that we
may see repetitions of the same high probability sequence.

In [0]:
def sample_sequence(model, max_len=100, temperature=0.8):
    generated_sequence = ""
   
    inp = torch.Tensor([vocab_stoi["<BOS>"]]).long()
    hidden = None
    for p in range(max_len):
        output, hidden = model(inp.unsqueeze(0), hidden)
        # Sample from the network as a multinomial distribution
        output_dist = output.data.view(-1).div(temperature).exp()
        top_i = int(torch.multinomial(output_dist, 1)[0])
        # Add predicted character to string and use as next input
        predicted_char = vocab_itos[top_i]
        
        if predicted_char == "<EOS>":
            break
        generated_sequence += predicted_char       
        inp = torch.Tensor([top_i]).long()
    return generated_sequence

print(sample_sequence(model, temperature=0.8))
print(sample_sequence(model, temperature=1.0))
print(sample_sequence(model, temperature=1.5))
print(sample_sequence(model, temperature=2.0))
print(sample_sequence(model, temperature=5.0))

God Bless the people of Venezuela!
God Bless the people of Venezuela!
os Bless the peopleeof Venezuela!
God Bdess Bpeooeop e ofzuenezeela!
auodlflBdptfh oe!on!!lhlpfeVeGhfpup!f


Since we only trained the model on a single sequence, we won't see
the effect of the temperature parameter yet. 

For now, the output of the calls to the `sample_sequence` function
assures us that our training code looks reasonable, and we can
proceed to training on our full dataset!

## Training the Trump Tweet Generator

For the actual training, let's use `torchtext` so that we can use
the `BucketIterator` to make batches. Like in Lab 5, we'll create a 
`torchtext.data.Field` to use `torchtext` to read the CSV file, and convert
characters into indices. The object has convient parameters to specify
the BOS and EOS tokens.

In [0]:
import torchtext

text_field = torchtext.data.Field(sequential=True,      # text sequence
                                  tokenize=lambda x: x, # because are building a character-RNN
                                  include_lengths=True, # to track the length of sequences, for batching
                                  batch_first=True,
                                  use_vocab=True,       # to turn each character into an integer index
                                  init_token="<BOS>",   # BOS token
                                  eos_token="<EOS>")    # EOS token

fields = [('text', text_field), ('created_at', None), ('id_str', None)]
trump_tweets = torchtext.data.TabularDataset(file_dir + "trump.csv", "csv", fields)
len(trump_tweets) # should be >20,000 like before

22402

In [0]:
text_field.build_vocab(trump_tweets)
vocab_stoi = text_field.vocab.stoi # so we don't have to rewrite sample_sequence
vocab_itos = text_field.vocab.itos # so we don't have to rewrite sample_sequence
vocab_size = len(text_field.vocab.itos)
vocab_size

253

Let's just verify that the `BucketIterator` works as expected, but start with batch_size of 1.

In [0]:
data_iter = torchtext.data.BucketIterator(trump_tweets, 
                                          batch_size=1,
                                          sort_key=lambda x: len(x.text),
                                          sort_within_batch=True)
for (tweet, lengths), label in data_iter:
    print(label)   # should be None
    print(lengths) # contains the length of the tweet(s) in batch
    print(tweet.shape) # should be [1, max(length)]
    break

None
tensor([82])
torch.Size([1, 82])


To account for batching, our actual training code will change, but just a little bit.
In fact, our training code from before will work with a batch size larger than one!

In [0]:
def train(model, data, batch_size=1, num_epochs=1, lr=0.001, print_every=100):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()
    it = 0
    
    data_iter = torchtext.data.BucketIterator(data,
                                              batch_size=batch_size,
                                              sort_key=lambda x: len(x.text),
                                              sort_within_batch=True)
    for e in range(num_epochs):
        # get training set
        avg_loss = 0
        for (tweet, lengths), label in data_iter:
            target = tweet[:, 1:]
            inp = tweet[:, :-1]
            # cleanup
            optimizer.zero_grad()
            # forward pass
            output, _ = model(inp)
            loss = criterion(output.reshape(-1, vocab_size), target.reshape(-1))
            # backward pass
            loss.backward()
            optimizer.step()

            avg_loss += loss
            it += 1 # increment iteration count
            if it % print_every == 0:
                print("[Iter %d] Loss %f" % (it+1, float(avg_loss/print_every)))
                print("    " + sample_sequence(model, 140, 0.8))
                avg_loss = 0

model = TextGenerator(vocab_size, 64)

In [0]:
train(model, trump_tweets, batch_size=1, num_epochs=1, lr=0.004, print_every=100)
print(sample_sequence(model, temperature=0.8))
print(sample_sequence(model, temperature=0.8))
print(sample_sequence(model, temperature=1.0))
print(sample_sequence(model, temperature=1.0))
print(sample_sequence(model, temperature=1.5))
print(sample_sequence(model, temperature=1.5))
print(sample_sequence(model, temperature=2.0))
print(sample_sequence(model, temperature=2.0))
print(sample_sequence(model, temperature=5.0))
print(sample_sequence(model, temperature=5.0))

[Iter 101] Loss 3.767316
    2😇ele!Bt d  rtduxhn car:n ndinnt ahertann  ociiwadearicaaNnl.nawd dyogtenrtmeytrr iErire tiaennRse n ean0nara0u    a  d tinmre reri  oed nhd
[Iter 201] Loss 3.389243
    Pesan dhhamon 1l/euito o s w Uve r of. hunarthisn giNilZvem p hpl k ahurerumsane ci!d Mrog#: 
[Iter 301] Loss 3.136456
    @ufe!2 mos @r four mp:s @fre neey cE ory jin llll ItTmp fDy1tp fe ICearlrutiany 1tpsulanin TRlgd Kurrimp @Dorampr . t ppasE1/t thonalouf 1ro
[Iter 401] Loss 2.921844
    ho den /f jome hit! its ig . 1sentor Iruth intimp As tho/Fk0coum in wo//t.Fouscowtco////ttptprupave OForit. t..
[Iter 501] Loss 2.827049
    fu! jores wit ho/m://t.chaGmorgpimpe Creds mutho seay @S Dui I Trand At deall het y inThigh yewl bisttyegtot yrusterunadeatop ct enallicles 
[Iter 601] Loss 2.690218
    'te woure de perttisellen nored indere fburs Greni are inder ind ind henar on I Arrarser//t.co/W..
[Iter 701] Loss 2.547648
    ruliscees ine so dabanditeristand agt seid fof the thestou thang sors

In [0]:
train(model, trump_tweets, batch_size=32, num_epochs=1, lr=0.004, print_every=100)
print(sample_sequence(model, temperature=0.8))
print(sample_sequence(model, temperature=1.0))
print(sample_sequence(model, temperature=1.5))
print(sample_sequence(model, temperature=2.0))
print(sample_sequence(model, temperature=5.0))

[Iter 101] Loss 1.867335
    @theben_woshen: @realDonaldTrump 3rmesion funned Donal. - toundines really many!
[Iter 201] Loss 1.748948
    Thank you In Clinton was to so best a Doperyonase in they #Chicauses https://t.co/SSSVTvFT3Q
[Iter 301] Loss 1.719149
    Rig greats for the pallly with his crowd will be interviewed toons entrowe syfired. https://t.co/QcKqA7Ccj
[Iter 401] Loss 1.710168
    @aseghigr: @realDonaldTrump @chery Buss tonight. Trump : https://t.co/GcJCYPzFKBE
[Iter 501] Loss 1.699322
    @CNOWS @ITROUST become is it Mann' lachise we the believe Als famity be good on Enjoys meewing #MAGAINLEATD!Thooka American ITID! https://t.
[Iter 601] Loss 1.697793
    My ploote exolure fum and the voting goinal it week the GOP Elie The Best whink is best belogais to like that in leading to the decration th
[Iter 701] Loss 1.684912
    ...... https://t.co/jHdSikkLgk
Unicates trump for The American and so the will be poority under brow in the U.S. seeves for preside
@cSa7ttort: I waws m

## Generative RNN using GPU
Training a generative RNN can be a slow process. Here's a sample GPU implementation to speed up the training. The changes required to enable GPU are provided in the comments below.

In [0]:
# Generative Recurrent Neural Network Implementation with GPU

def sample_sequence_cuda(model, max_len=100, temperature=0.8):
    generated_sequence = ""
   
    inp = torch.Tensor([vocab_stoi["<BOS>"]]).long().cuda()    # <----- GPU
    hidden = None
    for p in range(max_len):
        output, hidden = model(inp.unsqueeze(0), hidden)
        # Sample from the network as a multinomial distribution
        output_dist = output.data.view(-1).div(temperature).exp().cpu()
        top_i = int(torch.multinomial(output_dist, 1)[0])
        # Add predicted character to string and use as next input
        predicted_char = vocab_itos[top_i]
        
        if predicted_char == "<EOS>":
            break
        generated_sequence += predicted_char       
        inp = torch.Tensor([top_i]).long().cuda()    # <----- GPU
    return generated_sequence


def train_cuda(model, data, batch_size=1, num_epochs=1, lr=0.001, print_every=100):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()
    it = 0
    data_iter = torchtext.data.BucketIterator(data,
                                              batch_size=batch_size,
                                              sort_key=lambda x: len(x.text),
                                              sort_within_batch=True)
    for e in range(num_epochs):
        # get training set
        avg_loss = 0
        for (tweet, lengths), label in data_iter:
            target = tweet[:, 1:].cuda()              # <------- GPU
            inp = tweet[:, :-1].cuda()                # <------- GPU
            # cleanup
            optimizer.zero_grad()
            # forward pass
            output, _ = model(inp)
            loss = criterion(output.reshape(-1, vocab_size), target.reshape(-1))
            # backward pass
            loss.backward()
            optimizer.step()

            avg_loss += loss
            it += 1 # increment iteration count
            if it % print_every == 0:
                print("[Iter %d] Loss %f" % (it+1, float(avg_loss/print_every)))
                print("    " + sample_sequence_cuda(model, 140, 0.8))
                avg_loss = 0

model = TextGenerator(vocab_size, 64)
model = model.cuda()
model.ident = model.ident.cuda()
train_cuda(model, trump_tweets, batch_size=32, num_epochs=1, lr=0.004, print_every=100)

[Iter 101] Loss 3.655993
    -  ibt1sae cet nla.heoonruiwesGeainoupAi/ :e hhy ioalad o GttothhaTt raam oos 3g yrS tsoefos lrd uw n/ vafp @ nsi@:oyl nthderanb ofa  edknTf
[Iter 201] Loss 3.166978
    Viere amen re'me hood f/ weugloit rore mlece the d/Coth tuldog mh' the ntou rorA tho ta ingorar trsitintu r ve lre t. h tot o. de ghes N re 
[Iter 301] Loss 2.838572
    Iups: ho ta: lDang.
[Iter 401] Loss 2.633453
    @RESTrump Trump @ras @I3 Doums @reingice Thatp te a gingoor whow ime imard Greant Hire thynon W.. HEA lobighe ar Arefonte tot tind last. Awa
[Iter 501] Loss 2.492021
    @lalDosellon le Dopps: @reackestint @shot longede the the an th ma ake greegrevene land bo it thtt iitt Tous iu the tend ad - hattban oa Ca 
[Iter 601] Loss 2.387830
    “Ving Trump cheston. Hucs tolild teristincid doted wstpach!
[Iter 701] Loss 2.315289
    @Dongacas: @raticalDonalDonaldTrump sithster @Matadighta sha not wall #Manatarsa courd Wankent The will exthp mis and you hn thard


In [0]:
train_cuda(model, trump_tweets, batch_size=32, num_epochs=10, lr=0.004, print_every=500)

[Iter 501] Loss 2.134008
    .@realDonaldTrump jol to the deabar in on to bal Mainer Mainue leal surf the is a cauntion. Great hes you reillenge amorly. #MAGAIG EATICO
[Iter 1001] Loss 1.145037
    @Jos: @realDonaldTrump 17 you niald poldon goid surch Bake https://t.co/YITLJ4PMPEl
[Iter 1501] Loss 0.357768
    @pbavinelerming: @realDonaldTrump @senityBesJ: “Mr ove eour president soon What from has a geting Honefry and excan is be never in about as 
[Iter 2001] Loss 1.812317
    New Lasten heaw succesing a great seught fan is he bay be air nit!!!!
[Iter 2501] Loss 1.410036
    @keBulpy: @realDonaldTrump is not with the polll ureal other all tole will for 2016 It am world be have mactic to sig actice http://t.co/1i1
[Iter 3001] Loss 0.683035
    Tod.M. Thank you respose nother the gett be diding strated oppoding for 4-win! https://t.co/7HTXQHWd70
[Iter 3501] Loss 1.744348
    Thank you! https://t.co/hr1ABeiFdh
[Iter 4001] Loss 1.706780
    @Dolitanenn: @realDonaldTrump is not are in I lo

In [0]:
train_cuda(model, trump_tweets, batch_size=32, num_epochs=10, lr=0.0001, print_every=500)

[Iter 501] Loss 1.645454
    @oridike_Je: If you a love a boight were that is for In Arep Hear htt:T will but look talker obmont in tonatib a dan like on hard!
[Iter 1001] Loss 0.983718
    Viecific watch to job to surnmenting a to the nice no imary of afford faulical my my with in Northeria in he would stigns parting the Haduse
[Iter 1501] Loss 0.321627
    Thank you on country! They now on on NYC ObamaCare you is about and now. ......
[Iter 2001] Loss 1.645432
    Join muss will be very hove to nice from fantas News Flake he is tho us think the conting Obama and beders
[Iter 2501] Loss 1.306979
    @cotristaniart: @realDonaldTrump Winnerman hower on me failing his a getton it of the should need!
[Iter 3001] Loss 0.642361
    Hillary and bad. Press &amp; at Trump I bost the stoodse fight http://t.co/ORtRAWn6Q1
[Iter 3501] Loss 1.646678
    The prosed to will be nice on on my great water it up the would roong! in about the show and blle cours. #NANINANGEATHS Trump:) Atter a meet
[Iter 

In [0]:
train_cuda(model, trump_tweets, batch_size=32, num_epochs=10, lr=0.0001, print_every=500)

[Iter 501] Loss 1.647989
    @Hamermays: @realDonaldTrump @realDonaldTrump Great Combery aldown of all on think stor rating night announcest. #Fox
[Iter 1001] Loss 0.985187
    A win a Fring in Ted Sanders!
[Iter 1501] Loss 0.322097
    Hady http://t.co/fr8bLYzaEw
[Iter 2001] Loss 1.647827
    @SackEklystey: @realDonaldTrump I will be say and has why have and with know what long on campaign of Good proster and new Trump you've the 
[Iter 2501] Loss 1.308835
    Will be my me who longend in Trump disgerted you love you made cyoring support thas leader heblotion. #TheAhNA🇺🇸https://t.co/jLYx83qfaW
[Iter 3001] Loss 0.643265
    And dumey to melion is a dicked he country is nilled for the so cantefteing @realDonaldTrump. Amazing on the USA lourester new this with you
[Iter 3501] Loss 1.648983
    @bricetpouca: @realDonaldTrump @realDonaldTrump @MarnewsPence Great no. I will making in the meeting this is night just was beautiful Genera
[Iter 4001] Loss 1.628593
    Thank you! #urClongain' Ar

Let's generate some results using different levels of temperature.

In [0]:
for i in range(5):
  print(sample_sequence_cuda(model, 140, 0.2))

@therersan: @realDonaldTrump @realDonaldTrump @realDonaldTrump http://t.co/ONUnPizznd
@toleenenen: @realDonaldTrump @realDonaldTrump @realDonaldTrump #Trump2016 https://t.co/DCg4wUvzb
@Markerylera: @realDonaldTrump @realDonaldTrump http://t.co/lknovUEvvv
@marnacka: @realDonaldTrump @realDonaldTrump http://t.co/1tGDNfFsGT
@Danoter: @realDonaldTrump @realDonaldTrump @realDonaldTrump http://t.co/v5Y54uD5Ky


In [0]:
for i in range(5):
  print(sample_sequence_cuda(model, 140, 0.6))

@Jennaineloe: @realDonaldTrump I was it will be failed out of the USA 2016 now expection start success from am States to how the be a set by
@DannyCaralo47: @realDonaldTrump  Anned to me hore wait states are the so on the persing the fire!
I lithers is suppic @realDonaldTrump vote to was so will congrectiver and meeting of North See report and the great the going to this would 
@tibbadlat231: @realDonaldTrump @FoxNews @JoberBrand    I will be leaders to is that with our president in the Humpinines and the me in the 
@Millutermann: Donald Terer @realDonaldTrump @ReadDonandike Thanks Beages - State be office and beautiful done there this commentinuess!


In [0]:
for i in range(5):
  print(sample_sequence_cuda(model, 140, 0.8))

@boolayda38: @realDonaldTrump @foxandfriends @realDonaldTrump Condrey13 months even on sime be interviewing for intervidance otter the bette
A now that do the @realDonaldTrump campicant to glable.
It is no she is a press us the seed it and our so great char great the siends.
@greila_nYarzah  @realDonaldTrump You nor dearmery in numberument Speitl All Opan and be a bad time of no campicales (stading and Governg ME
@joxBir61: @realDonaldTrump http://t.co/SIp6lJyVs"


In [0]:
for i in range(5):
  print(sample_sequence_cuda(model, 140, 1))

@Trold14: @realDonaldTrump has was as awa to time ase'm going... in puppais! #TOUTBS PUNTLO NATIOR!
Need been Both @realDonaldTrump incration? President Ociamo frand.
Hillary Clinton's Donald This's the interentirs about about. Thank you!
Law! Thanks: Trump: Flowly up “Pass thanks.
IT Presidention firee 8! #HietAme  hlinane! https://t.co/EP0WHcE1Do


In [0]:
for i in range(5):
  print(sample_sequence_cuda(model, 140, 1.5))

Frawing nypifn NERQ betoons 3/27:!bone!! https://t.co/rOomgd15
Histon pollory-andwing Trump.
@SpahL_AlISGLAAG: @flhIrnezoOsS THU POILT HAl542. @rearTurpht But stoe repricauleth hagieled https://t.co/BxCCa6R65l
S Amgrinnelweria scot'vue he's great two wept Jeb! Your- Sen.#CarenitCBue.TPretect!
@TSdekok7726 Existamy folcttk:///Royizaly!
