# Generating a Donald Trump Speech With A Recurrent Neural Network






This is a notebook that follows all the work done in _"Generating A Shakespeare Text With A Recurrent Neural Network"_, so I won't go into details. The data used in this notebook can be found in https://www.kaggle.com/arnavsharmaas/all-donald-trump-transcripts. Please download it and save it into the folder where this notebook is located.

## Preprocessing

In [54]:
import numpy as np
import tensorflow as tf

In [74]:
text = open('trump_3.6.txt', 'rb').read().decode(encoding='utf-8')

text = text #the text has more than 3 million characters



In [75]:
vocabulary = [char for char in set(text)] #print if you want to have a look at the set of characters in the text

char2idx = {u:i for i,u in enumerate(vocabulary)}


In [76]:
idx2char = np.array(vocabulary)


In [77]:
text_as_int = np.array([char2idx[c] for c in text])

## Data preparation for RNN


In [78]:
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
seq_length = 100 #we have to define the length of the sequences
sequences = char_dataset.batch(seq_length+1, drop_remainder=True) #batch must be of size seq_length+1 so we can have displacement

In [79]:
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target) #https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map



In [80]:
batch_size = 64

buffer_size = 10000

dataset = dataset.shuffle(buffer_size).batch(batch_size,drop_remainder=True)

print(dataset)

<BatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int32, tf.int32)>


## Building the RNN model


In [81]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Embedding,LSTM,Dense

def build_model(vocab_size, embedding_dim,rnn_units,batch_size):
    
    model = Sequential()
    
    #First layer
    model.add(Embedding(input_dim = vocab_size,
                        output_dim = embedding_dim,
                        batch_input_shape = [batch_size,None])) 
    
    #Second layer
    
    model.add(LSTM(rnn_units, #number of neurons in the layer
                   return_sequences = True, #we specify that we want to predict the character following each of the input characters,
                                            #not only of the last one
                   stateful = True, # f True, the last state for each sample at index i in a batch will be used as initial
                                    #state for the sample of index i in the following batch.
                   recurrent_initializer = 'glorot_uniform' #indicates how internal weight matrices must be initialized
                    )) 
    
    #Third layer
    
    model.add(Dense(vocab_size)) 
    
    return model
    

In [84]:
vocab_size = len(vocabulary)
embedding_dim = 64 #arbitrary
rnn_units = 1024 

model = build_model(
vocab_size = vocab_size,
embedding_dim = embedding_dim,
rnn_units = rnn_units,
batch_size = batch_size)



## Model Training



In [85]:
from tensorflow.keras.losses import sparse_categorical_crossentropy

def loss(labels,logits): #logits are the "predicted values" (likelihoods in this case)
    return sparse_categorical_crossentropy(labels, logits, from_logits=True)

In [86]:

model.compile(optimizer='adam', loss=loss)

import os 

checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
        filepath = checkpoint_prefix,
        save_weights_only=True)


epochs= 50

history = model.fit(dataset,epochs=epochs,callbacks=[checkpoint_callback])


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


## Text Generator



In [87]:
model = build_model(
vocab_size = vocab_size,
embedding_dim = embedding_dim,
rnn_units = rnn_units,
batch_size = 1)

model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

model.build(tf.TensorShape([1,None]))

In [256]:
def generate_text(model,start_string,char2idx, idx2char,num_generate = 500):
    
    #we convert the initial input to numerical representation
    
    input_eval = [char2idx[s] for s in start_string]
    
    input_eval = tf.expand_dims(input_eval,0) #expanded to match batch format shape
    
    text_generated = [] #generated text will be stored here
    
    temperature = 0.4 # if temperature =0, model is very conservative. If temperature = 1, model is very creative (but riskier)
    
    model.reset_states()#Resets all of the metric state variables.
    
    for i in range(num_generate): #loop to generate characters
        predictions = model(input_eval) #generate prediction
        predictions = tf.squeeze(predictions,0) #remove batch format
        predictions = predictions / temperature #added to affect probability of next character
        predicted_id = tf.random.categorical(predictions,num_samples=1)[-1,0].numpy() #next character is selected following
                                                                                      #categorical distribution
        
        input_eval = tf.expand_dims([predicted_id],0) #predicted character passed as next input
        
        text_generated.append(idx2char[predicted_id]) #character is added to text in character format
    
    return (start_string + ''.join(text_generated))

In [264]:
print(generate_text(model,char2idx=char2idx, idx2char = idx2char, start_string=u"Welcome"))

Welcome.
Mr. President, I cannot describe it. This is the greatest economy in history and now we have to come together and we have the greatest economy in history, and we will be there and they show with all of the things that we did that we have to look at it. They don’t want to talk about it. They don’t like him. So I said, “We have to do this.” And they said, “Well, what do you think?” I said, “You know, the way you want to see a lot of money on the moon and the United States will be the first nati


## Additional Resources


https://medium.com/towards-artificial-intelligence/create-your-first-text-generator-with-lstm-in-few-minutes-3b59ee139ca0