# Make America Recurrent Again

![Tiny Handed Orange Man Doing Dumb Things](https://pixel.nymag.com/imgs/daily/selectall/2017/05/16/MockingTrumpSponge.nocrop.w710.h2147483647.jpg)

Long Short Term Memory (LSTMs) neural networks are neural networks with their own internal memory state. They can process sequences of information, remembering information they have seen at previous time steps and utilise it to influence decisions later on in the sequence.

For example, we could feed in a sequence of weather data and use an LSTM to predict future weather patterns. We can also utilise LSTMs to learn sequences of text, and by feeding it's own predictions back into itself, we can get it to generate text.

In this notebook, we'll generate Trump quotes from a dataset of past campaign speeches using Keras.

In [41]:
import numpy as np
import keras
from keras.layers import LSTM, Dense, Dropout
from keras.models import Sequential
from keras.optimizers import RMSprop
from keras.callbacks import ModelCheckpoint
import random
import sys

## Our Dataset

Fortunately, someone has done the job of extracting all of the Trump campaign speeches into a single corpus of text for us. We'll download this text file from github and save it locally. We'll then load the file into our `text` variable, changing all the text to lower case as we go...

In [42]:
path = keras.utils.get_file('trump.txt', origin='https://raw.githubusercontent.com/ryanmcdermott/trump-speeches/master/speeches.txt')

text = open(path).read().lower()

print('Corpus length:', len(text))

Corpus length: 896270


## Our Training Data

We'll feed small sentences into our model, and retrieve a prediction of the next character. We'll then add this to our sequence and repeat for as long as necessary in order to generate a speech.

We'll need to extract a training set of our inputs and outputs for our model to learn from. We'll create two arrays to hold these; `sentences` and `next_chars`. `max_len` will determine our sentence length, and `step` will determine how far we move across each time we pick a new sentence.

In [43]:
max_len = 50
step = 3

sentences = []
next_chars = []

## Extracting the Data

For every character in the entire corpus (minus the max length and step so that we don't go past the end of our document) we'll select a range from the current character, to the current character plus the max length. We'll add this selection to the list, and also add the character after this (next character) to our next chars list. We repeat this until we reach the end of the document.

In [44]:
for i in range(0, len(text) - max_len, step):
    sentences.append(text[i:i + max_len])
    next_chars.append(text[i + max_len])

In [45]:
sentences

["\ufeffspeech 1\n\n\n...thank you so much.  that's so nice.",
 "eech 1\n\n\n...thank you so much.  that's so nice.  i",
 "h 1\n\n\n...thank you so much.  that's so nice.  isn'",
 "\n\n\n...thank you so much.  that's so nice.  isn't h",
 "...thank you so much.  that's so nice.  isn't he a",
 "thank you so much.  that's so nice.  isn't he a gr",
 "nk you so much.  that's so nice.  isn't he a great",
 "you so much.  that's so nice.  isn't he a great gu",
 " so much.  that's so nice.  isn't he a great guy. ",
 " much.  that's so nice.  isn't he a great guy.  he",
 "ch.  that's so nice.  isn't he a great guy.  he do",
 "  that's so nice.  isn't he a great guy.  he doesn",
 "hat's so nice.  isn't he a great guy.  he doesn't ",
 "'s so nice.  isn't he a great guy.  he doesn't get",
 "so nice.  isn't he a great guy.  he doesn't get a ",
 "nice.  isn't he a great guy.  he doesn't get a fai",
 "e.  isn't he a great guy.  he doesn't get a fair p",
 " isn't he a great guy.  he doesn't get a fair 

In [46]:
next_chars

[' ',
 's',
 't',
 'e',
 ' ',
 'e',
 ' ',
 'y',
 ' ',
 ' ',
 'e',
 "'",
 'g',
 ' ',
 'f',
 'r',
 'r',
 's',
 'h',
 'd',
 's',
 't',
 'e',
 'i',
 ' ',
 't',
 ' ',
 's',
 'n',
 ' ',
 'i',
 ' ',
 'n',
 'i',
 'a',
 ' ',
 ' ',
 'l',
 'y',
 ' ',
 'm',
 'e',
 ',',
 'n',
 'v',
 'y',
 't',
 'n',
 'y',
 'e',
 ',',
 'e',
 'u',
 ' ',
 'h',
 'e',
 'r',
 't',
 'e',
 'e',
 ' ',
 'r',
 't',
 'e',
 'i',
 ' ',
 'd',
 'a',
 ' ',
 'e',
 ' ',
 's',
 'c',
 'l',
 'e',
 's',
 'f',
 ' ',
 't',
 'e',
 ' ',
 'i',
 'd',
 'd',
 'i',
 'a',
 ' ',
 'e',
 'b',
 'y',
 'a',
 ' ',
 'e',
 'n',
 'u',
 'r',
 'e',
 ' ',
 'r',
 'h',
 't',
 ' ',
 'r',
 '.',
 'a',
 'o',
 'a',
 'o',
 'h',
 'p',
 'p',
 ' ',
 ' ',
 'w',
 ' ',
 'h',
 ' ',
 'v',
 's',
 'e',
 'i',
 ' ',
 ' ',
 'm',
 'n',
 ' ',
 'r',
 'w',
 'k',
 'g',
 'e',
 'l',
 ' ',
 'h',
 ' ',
 'n',
 't',
 'w',
 'k',
 't',
 'y',
 'a',
 ' ',
 ' ',
 'k',
 't',
 ' ',
 'u',
 'r',
 'g',
 'a',
 ' ',
 ' ',
 'v',
 't',
 ' ',
 'o',
 'e',
 'f',
 'o',
 '.',
 's',
 't',
 't',
 ' ',
 'e',
 'a'

## Character Dictionary

As our neural network can only handle numerical input, we'll use our corpus to generate a unique list of characters first. Then we'll turn this into a dictionary, mapping the character to it's index in the array. We'll eventually use this to look up each character and convert it into a number.

In [47]:
chars = sorted(list(set(text))) # List of unique characters in the text

print('Unique characters:', len(chars))
chars

Unique characters: 67


['\n',
 ' ',
 '!',
 '"',
 '$',
 '%',
 '&',
 "'",
 '(',
 ')',
 ',',
 '-',
 '.',
 '/',
 '0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 ':',
 ';',
 '=',
 '?',
 '@',
 '[',
 ']',
 '_',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z',
 'é',
 '–',
 '—',
 '‘',
 '’',
 '“',
 '”',
 '…',
 '\ufeff']

In [48]:
char_indices = dict((char, chars.index(char)) for char in chars)
char_indices

{'\n': 0,
 ' ': 1,
 '!': 2,
 '"': 3,
 '$': 4,
 '%': 5,
 '&': 6,
 "'": 7,
 '(': 8,
 ')': 9,
 ',': 10,
 '-': 11,
 '.': 12,
 '/': 13,
 '0': 14,
 '1': 15,
 '2': 16,
 '3': 17,
 '4': 18,
 '5': 19,
 '6': 20,
 '7': 21,
 '8': 22,
 '9': 23,
 ':': 24,
 ';': 25,
 '=': 26,
 '?': 27,
 '@': 28,
 '[': 29,
 ']': 30,
 '_': 31,
 'a': 32,
 'b': 33,
 'c': 34,
 'd': 35,
 'e': 36,
 'f': 37,
 'g': 38,
 'h': 39,
 'i': 40,
 'j': 41,
 'k': 42,
 'l': 43,
 'm': 44,
 'n': 45,
 'o': 46,
 'p': 47,
 'q': 48,
 'r': 49,
 's': 50,
 't': 51,
 'u': 52,
 'v': 53,
 'w': 54,
 'x': 55,
 'y': 56,
 'z': 57,
 'é': 58,
 '–': 59,
 '—': 60,
 '‘': 61,
 '’': 62,
 '“': 63,
 '”': 64,
 '…': 65,
 '\ufeff': 66}

## One Hot Encoding

As our model can't handle text information, we'll use one hot encoding as a way to encode our text to a numerical vector that our model can work with.

One hot encoding usually takes the length of the total number of categories, where every column is 0 and a 1 is placed in the column corresponding to the category we're encoding.

For example, let's say we were encoding the letter `C` from this alphabet... `['A', 'B', 'C', 'D']` we would simply put a `1` in the place of the column corresponding to `C` and leave all the other values as `0`. We would end up with `[0, 0, 1, 0]`; our one hot encoding of C.

We'll use the unique list we created earlier, this will be the size of our one hot encoded vector with the corresponding character set to 1.

We'll one hot encode our sentences by first creating a multidimensional array of the size:

`(total number of sentences, 
length of each sentence, 
length of the total character set (our one hot vector for each character))`

In [49]:
x = np.zeros((len(sentences), max_len, len(chars)), dtype=np.bool)
x

array([[[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ...,
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ...,
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ...,
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, Fal

## One Hot Encoding our Next Characters

We'll repeat the same process for our labels (the things we want to predict)

In [50]:
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
y

array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])

## Add the 1's to our OHVs

As we just created empty arrays, we'll now loop through our sentences, and for every character, we'll look it up in our `char_index` and set the corresponding index to 1. For example, let's say our character is `e`, and assuming we have a standard alphabet, we would set column 5 equal to 1.

In [51]:
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

In [52]:
x

array([[[False, False, False, ..., False, False,  True],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ...,
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ...,
        [False,  True, False, ..., False, False, False],
        [False,  True, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False,  True, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ...,
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, Fal

## Defining our Model

Our model could be more complex, but it's better to start simple and add complexity later. Plus I think the simplicity of the model could be an advantage in generating Trump campaign speeches.

We'll initialise a sequential model, which just means we'll be stacking layers in a linear fashion. Then we'll add an LSTM layer, specifying the input shape of a single sentence training example. We'll finally add a fully connected layer with a softmax activation function. This will result in giving us a probability distribution for the next character with the same size as our unique set of characters list we created earlier. From this we could just sample the most likely next character, but we'll cover introducing stochasticity into our sampling later...

In [34]:
model = Sequential()
model.add(LSTM(128, input_shape=(max_len, len(chars))))
model.add(Dense(len(chars), activation='softmax'))

Compile the model. As we're predicting categories (a particular letter), we'll use categorical cross entropy for our loss and the RMSprop optimizer.

In [35]:
model.compile(
    loss='categorical_crossentropy',
    optimizer=RMSprop(lr=0.01),
    metrics=['acc']
)

## Sampling the next character

When sampling the next character in our sequence, we can't just pick the most likely character. This results in a predictable sequence with very little meaning. Likewise, if we picked from a distribution where every character is equally likely we won't get anything interesting either, just jibberish. We need a way to control the randomness (stochasticity) in our probability distribution of next characters. We'll achieve this by introducing a softmax temperature so that we can dial in exactly how much randomness we want. Then we'll let our network pick from these, the advantage of this is that if the letter E has a 0.3 probability, instead of not being picked, we'll pick it 30% of the time. This allows unlikely characters to become part of the sequence too and makes our results more flexible and interesting.

In [36]:
def sample(preds, temperature=0.5):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

## Training and Predictions

A full exposure of our dataset to the model is called an epoch. We'll train our model for 20 epochs (we could train for a lot longer, but this is enough to generate semi-coherent speeches (a substantial improvement from actual speeches)).

For each epoch, we'll train our model on the data, then we'll select a random sentence from our corpus. We'll use this sentence as our seed, and for 500 iterations, we'll feed that sentence to our model, predict the next character, and add it to our sequence. We'll repeat this process until the 500 characters have been selected and then move on to our next training iteration.

In [37]:
for epoch in range(0, 20):
    print("\n")
    print("Epoch: ", epoch)
    
    model.fit(
        x, 
        y, 
        batch_size=128, 
        epochs=1, 
        callbacks=[
            ModelCheckpoint(
                filepath=f"trumpnet.h5",
                monitor='loss', # Only save when validation loss improves
                save_best_only=True
            )
        ]
    )
    
    # Select a random sentence from our corpus
    
    start_index = random.randint(0, len(text) - max_len - 1)
    print("Start Index: ", start_index)
    
    generated_text = text[start_index:start_index + max_len]
    
    print('--- Generating with seed: "' + generated_text + '"')
    print("\n")
    
    sys.stdout.write(generated_text)
        
    for i in range(500):
        # One hot encode the characters generated so far
        sampled = np.zeros((1, max_len, len(chars)))
        
        for t, char in enumerate(generated_text):
            sampled[0, t, char_indices[char]] = 1.
                
        # Predict next char and add it to sequence 
        preds = model.predict(sampled, verbose=0)[0]
        next_index = sample(preds, 0.5)
        next_char = chars[next_index]
        generated_text += next_char
        generated_text = generated_text[1:]
            
        sys.stdout.write(next_char)



Epoch:  0
Epoch 1/1
Start Index:  669427
--- Generating with seed: "announces because he’s so predictable "we are leav"


announces because he’s so predictable "we are leaving but it’s happen. i said, "the for probably be trump people of the renting the person. they work on the pace. we have to stat and we’re going to happen. and a terrible and them a lot of people the kinting the gone in the wonderfal and i think you know, i love the people and what happens. it’s a money it. you know when i say you’ve got them on the there this bad the langiment must president. we all strent plan. and we said, "what’s a presend that i got thought, the menting and the country and 

Epoch:  1
Epoch 1/1
Start Index:  478885
--- Generating with seed: "ommercial, "we’ll be right back with our great cha"


ommercial, "we’ll be right back with our great change it. i see and i want to take a lot of the united states and there is there in thank it terries that thank you and the way i want to have been there.

  This is separate from the ipykernel package so we can avoid doing imports until


ls him like in the money in the world and president and they can’t get it in the money. we have to