# Project 5: Text Generation with Recurrent Neural Networks, LSTM, GRU, and Hyperas

In this notebook we're going to play with a new type of deep learning: Recurrent Neural Networks. The inteded output is a neural network capable of generating text in the style of whatever it was trained on. These networks obviously don't have thoughts of their own, so they don't form coherent thoughts, but they are eerily good at talking in the right style. Imagine an elementary school student making fun of their teacher saying, "Look at me, I'm Mr. S, math is important, stay in school, biology, mitochonria are the powerhouse of the cell, blah blah blah," and you'll have a good idea for the type of things these networks might produce.

These networks are specifically designed to deal with time series data. To understand the need for this, think about how you might design a traditional NN to process text.

First you'd need an input tensor of a particular shape. How would you determine the shape? How would you deal with different length sentences? What if I wanted to process a book?

Our second issue is how the NN would process the data. There's nothing in a NN architecture that looks at the order of the inputs. Let's say you built a NN that's predicting home prices. Sure, your NN has some idea of how square footage relates to school rating regarding home price, but it looks at both of these things at the same time. The order doesn't matter. But sometimes order does matter, and in those cases we need an RNN.

So we have a new type of NN that deals with time series data that can produce nonsense sentences. Who cares? 

The longer text generation that we're going to do is more fun than useful. But if you were to use it to just guess the next word, you can imagine where this might be used in auto-complete. You can also do interesting things with these networks like generate music.

## Ok, so what's an RNN?

An RNN is just a neural network that feeds it's output back into itself as an input. Imagine a for loop that just feeds the output of each iteration into the next cycle of the loop...that's really about it. Here's the common diagram you'll see for these netwroks:

![alt text](https://machinelearningblogcom.files.wordpress.com/2018/02/bildschirmfoto-2018-02-21-um-10-30-04.png?w=1400)

As the equal sign implies, those are just different ways of representing the same network. For the input at every time step (X), fed into the network (A), you get an output (h) that is both output and the input (X) for the next time step.

And just to reiterate: h is fed back into the network. I found the lack of label on the arrow feeding into the next timestep confusing whenever I was first reading about RNNs.

A Deep RNN is an RNN that has multiple NNs at ever time step. Honestly, you aren't going to see the term "Deep RNN" often. Most people just say RNN. But it is a common type that's used. It's going to look more like this:

![alt text](https://www.oreilly.com/library/view/neural-networks-and/9781492037354/assets/mlst_1412.png)

## What's going on in the network

Before literally addressing the technical details, let's talk about another problem that needs solving in our network: Memory. Think back to the Q-Learning notebook. Remember how our DQNN would forget how to play the early stage of a game after it got so good that it never faltered for thousands of iterations? Well, RNNs have a similar problem. If every time step is only determined by the last, then you quickly diverge from whatever was a few steps ealier.

Imagine your network starts with "The boy picked up the". What's the next word? Well, if your network doesn't remember anything past "the", then you're in trouble. You'll probably get a noun back, but there's slim chances that noun makes sense as something the boy would throw. You can see where this would quicly diverge into nonsense. 

We're going to solve this with a Long Short Term Memory (LSTM) cell. 

The actual meat of the processing is happening in the cell (the A in our diagram). There are a few cell types, but we're just going to focus on the common LSTM cell for now. Let's start with a scary picture and break it down:

![alt text](https://i.ytimg.com/vi/kMLl-TKaEnc/maxresdefault.jpg)

**Breaking the LSTM down**:

The first thing you'll want to note is that the LSTM outputs 2 states to the next cell (the two arrows). Each cell takes 2 inputs from the previous time step. 

We're going to call the top ouput arrow 'L' for 'long-term. This isn't what it's offically called, but it's not labeled in the diagram and it'll help us keep things straight as we walk through what's happening here. We're going to keep 'x' for the input for this timestep and 'h' as the output of the current timestep.

Here's the conceptual frameworks to get us going:

- The yellow boxes are neural network layers. They're labeled with their activation function. The O-shape is for sigmoid. Also notice that there is a tanh activation that's a pink circle.
    - Sigmoid acitvation functions output a value between 0 and 1. They center on 0.5 and approach 0 or 1 as the input values grow. 
    - Tanh activation functions are like sigmoid, but they range from -1 to 1 and center on 0.
    
![alt text](https://cdn-images-1.medium.com/max/800/1*f9erByySVjTjohfFdNkJYQ.jpeg)

- The pink circles with either a '+' or 'x' are either adding or multipling vectors. The multipliers are called the 'gates'.
    - Imagine what happens when an input comes into the bottom left sigmoid activation. Remember, sigmoids can output values from 0 to 1. Let's say this sigmoid zeros some values. The input coming into that multiplication gate in the top left will have values that are zero'd. That multiplier is acting as a gate for what values will be carried into the cell.
    
- If sigmoid activation are used to determine what's not important, tanh activations are deciding what to do with what is deemed important. Values that don't matter can still go to zero here, but the -1 to 1 range implies that we're using these to determine what we're doing with the information we're keeping.

- The last concpetual piece you'll want to focus on is that L (the top line) is meant for remembering past data. This is what solves the memory probelm from earlier.

**The Gates**

Our gates are actually named after their functions.

1. The top left gate is called the *forget gate*. Just like our example from earlier, the output (h) from the last cell and the input for this time step (x) are run through a sigmoid activation and the output of that activation is multiplied by the L from the previous LSTM cell. At this point, all of your values in L are either kept the same (multiplied by 1), forgotten (multiplied by zero), or tweaked (multiplied by something inbetween 0 and 1). 

2. The gate in the middle of the cell digram is the *input gate*. We run two activations before hitting this gate: the sigmoid and the tanh. Both activations are fed by both h and x. So, we process the the current inputs with the tanh, then decide what doesn't need to be remembered in the long term (this is the actual input gate - the sigmoid plus the multiplication), we then add what we want to remember to L. This is the L that's passed to the next LSTM.

3. The last gate, the *output gate*, is also fed by both a tanh and a sigmoid. We take the output from the last addition operation, another sigmoid activation of h and x, and multiply them to create our next h. In this step we're essentially combining out long term memory (L) with our current state (x and h sigmoid) to determine our next state. This is the h that's passed to the next LSTM and is the output for this time step.

Remember that the gates are just the multiplication operations. What we're really traning when we're training this network are the tanh and sigmoid activation function that feed the gates. The sigmoid for the forget gate needs totally different weights that the sigmoid that feeds the output gates.

## Working with text

### Tokenization 

Like any input into a neural network, we prefer our data to be one hot encoded or at least mapped to numbers in some way.

We can tokenize our data at a few levels. We have characters, words, and n-grams. (Assuming one hot method) Characters and words are just creating one hot encoding at that particualr level. If we just have the enligh alphabet for characters, your vecotr is going to have one 1 and 25 0s. If our corpus of text has 20000 words, a word based tokenization will have one 1 and 19999 0s. 

N-grams, or bag-of-words, is a little different. This isn't exactly going to give state of the art results, but it's common enough that you should know it. N-grams are overlapping chunks of sentences. If our entire dataset was just "The boy ran", we would break it down into "The", "The boy", "boy", "boy ran", and "ran". Obviously this has some drawbacks. It doesn't even preserve order.

### Embeddings

Let's say we tokenize our corpus at the word level. We can train a neural network layer that's whole purpose is to just help interpret the meaning of the words that we're feeding in. This is the embedding layer.

Embedding layers take the tokenzied vector that you feed in, let's say your 1/19999 vector, and instead turn it into a smaller vector with values between 0 and 1. So your 20000 size vector might come out looking something like `[0.4, 0.38, 0.86, 0.34]`.

The neat part about these vecotrs is that they can essentially represent meanings mathmatically. The classic example is that you can take the embedding vector for "king", subtract the embedding output for "man", and the output vector would land on a word like "royalty". You could then add the embedding for "woman", and the output will be the vector for "queen".

You can see where having an embedding layer in front of your RNN.

### You ready to code yet?

Time to get to the project part. As stated earlier, we'll be generating text. For our dataset I've decided to use President Trump's tweets.

I'd first like to say that I'm geniunely choosing Trump because I think he's both an objectively good and fun option. I don't mean this as political support or disapproval.

President Trump seems like an interesting subject for a few reasons. First, these types of RNNs aren't interesting when outputting a large amount of data. They tend to be better suited to smaller outputs. Trump has a massive corpus of short writings (tweets) to train on that are around the lenth we want for our outpus. Second, he has a distinctive style. I think it'll be obvious to anyone seeing the output of our network that we're trying to generate Trump-like speech (assuming this works...).

[I'll be pulling our database of tweets from here](http://www.trumptwitterarchive.com/archive)

In [1]:
import pandas as pd

# Loading and getting a basic idea of the shape of our data.
# You can download this dataset with more information and columns,
#    but I kept it simple for us.
tweet_data = pd.read_csv('trump_tweets.csv', encoding='utf-8')
print(f'Colum Headings: {list(tweet_data.columns.values)}')
print(f'Shape: {tweet_data.shape}')

Colum Headings: ['text']
Shape: (23985, 1)


In [2]:
# Playing around with accessing the data.
# It's sctructured like a list of dicts, but you have to use 'iloc'
#    to look up an index in a pandas dataframe.
print(f'Oldest Tweet: {tweet_data.iloc[23984].text}')
print('============================================================================')
print(f'Newest Tweet: {tweet_data.iloc[0].text}')

Oldest Tweet: Be sure to tune in and watch Donald Trump on Late Night with David Letterman as he presents the Top Ten List tonight!
Newest Tweet: Marist/NPR/PBS Poll shows President Trump’s approval rating among Latinos going to 50% an increase in one year of 19%. Thank you working hard!


In [3]:
# I want to get the average words per tweet so I know about what length to make my outputs.
lengths = []
for index, tweet in tweet_data.iterrows():
    lengths.append(len(str(tweet['text']).split(' ')))

total_words = sum(lengths)
average_words = total_words / len(lengths)
print(f'Average Words Per Tweet: {average_words}')

Average Words Per Tweet: 19.73183239524703


In [4]:
import re
from tensorflow.keras.preprocessing.text import Tokenizer

# Cleaning links out of the tweets.
def clean_tweets(tweets):
    cleaned = []
    for tweet in tweets:
        tweet = re.sub(r'http.*\s', '', tweet)
        tweet = re.sub(r'http.*$', '', tweet)
        tweet = re.sub(r'http', '', tweet)
        cleaned.append(tweet)
    return cleaned

# Removing the stucture of the dataframe and making a simple list of words.
# This looked less silly when there was more than a sinle column in the CSV, though it's still useful.
entire_corpus = []
for index, tweet in tweet_data.iterrows():
    entire_corpus.append(str(tweet['text']))

entire_corpus = clean_tweets(entire_corpus)
    
# Here's we tell Keras how we want our tokenization to work
tokenizer = Tokenizer(filters='!"#$%&()*+,-./:;<=>?@[\]^_`{|}~\r\n', #Filter out punctuation
                      lower=True, #Make everything lower-case
                      split=' ', # Distinguish between words by spacing
                      char_level=False) #Tokenize words, not individual character

# Apply the tokenizer
tokenizer.fit_on_texts(entire_corpus)

# Create our actual sequences of numbers
tokenized = tokenizer.texts_to_sequences(entire_corpus)

# tokenizer.word_index is a dict saying what index applies to each word
number_of_words = len(list(tokenizer.word_index)) + 1

print(f'Sample of word index: {list(tokenizer.word_index.items())[:29]}')
print(f'Number of word indexes: {number_of_words}')
print(f'Sample of word sequences: {tokenized[0]}')
print(f'Nubmer of word sequences: {len(tokenized)}')

Sample of word index: [('the', 1), ('to', 2), ('and', 3), ('a', 4), ('of', 5), ('is', 6), ('in', 7), ('for', 8), ('on', 9), ('i', 10), ('you', 11), ('will', 12), ('be', 13), ('great', 14), ('that', 15), ('are', 16), ('with', 17), ('it', 18), ('at', 19), ('our', 20), ('amp', 21), ('we', 22), ('have', 23), ('my', 24), ('he', 25), ('not', 26), ('trump', 27), ('by', 28), ('was', 29)]
Number of word indexes: 24621
Sample of word sequences: [8789, 11955, 6243, 197, 595, 69, 1457, 1000, 1458, 1489, 7250, 85, 2, 800, 63, 850, 7, 76, 155, 5, 2330, 36, 11, 225, 151]
Nubmer of word sequences: 23985


In [5]:
# Save our index to word mapping
import json
with open('trump_word_dict_tokenized.json', 'w') as file:
    output = json.dumps(tokenizer.word_index, indent=4)
    file.write(output)

In [6]:
# tokenizer.index_word is the reverse of the earlier mapping
reverse_index_of_word = tokenizer.index_word

# Playing around with decoding numbers into sentences
reversing_tokens = ' '.join(reverse_index_of_word[word] for word in tokenized[100][:20])

print(reversing_tokens)

23 of federal inmates are illegal immigrants border arrests are up 240 in the great state of texas between 2011


In [7]:
# Saving our reverse mapping
with open('trump_word_dict_reverse.json', 'w') as file:
    output = json.dumps(tokenizer.index_word, indent=4)
    file.write(output)

In [8]:
import numpy as np

features = []
labels = []

# We're going to predict the next word using the last 3
training_length = 3

# Here's where we make our features and labels.
for sequence in tokenized:
    # Create multiple training examples from each sequence
    for index in range(training_length, len(sequence)):
        # Extract the features and label
        extract = sequence[index - training_length:index + 1]
        # Set the features and label
        features.append(extract[:-1])
        labels.append(extract[-1])


# Turn out features into a numpy array
features = np.array(features)

# Our output is going to be a onehot, so here's we're creating an array of all 0's
label_placeholder = np.zeros((len(features), number_of_words), dtype = np.int8)
# ...then changing the 0 at the correct index into a 1
for example_index, word_index in enumerate(labels):
    label_placeholder[example_index, word_index] = 1

labels = label_placeholder

print(features.shape)
print(labels.shape)

(392494, 3)
(392494, 24621)


In [9]:
print(f'Word at index 10000: {reverse_index_of_word[np.argmax(labels[10000])]}')

Word at index 10000: me


In [10]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding

# Making our model
model = Sequential()

# My output dimensions here are arbitrary
# The other args are determines by the input shape.
model.add(Embedding(input_dim=number_of_words,
              input_length = training_length,
              output_dim=16))

# Again, 64 is artibrary
# 'return_sequences' allows us stack more LSTM layers on top.
model.add(LSTM(64, return_sequences=False))

model.add(Dense(32, activation='relu'))

model.add(Dense(number_of_words, activation='softmax'))

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [11]:

from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

x_train = features[:375000]
y_train = labels[:375000]
x_test = features[375000:]
y_test = labels[375000:]

callbacks = [
    # Stop if validation loss drops for 3 epochs
    EarlyStopping(monitor='val_loss', patience=3),
    # Save the model with the best performace after ever epoch
    ModelCheckpoint('model1.h5', save_best_only=True, save_weights_only=False)
]

model.fit(x_train, y_train, 
            batch_size=4096, epochs=500,
            validation_data=(x_test, y_test),
            callbacks=callbacks)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 375000 samples, validate on 17494 samples
Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500


<tensorflow.python.keras.callbacks.History at 0x192e51f8da0>

In [12]:
from tensorflow.keras.models import load_model
import json

# This block just allows me to start from the middle of the notebook, if needed.

with open('trump_word_dict_reverse.json', 'r') as file:
    reverse_lookup = json.loads(file.read())
    
with open('trump_word_dict_tokenized.json', 'r') as file:
    tokenized = json.loads(file.read())
    
model = load_model('model1.h5')

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


In [13]:
import numpy as np

# Here we're just testing that predication works.

# Create an input sequence of words with all 0's
input_words = np.zeros((1, training_length), dtype = np.int8)

# Predict the next word
output = model.predict(input_words)

print(reverse_lookup[str(np.argmax(output))])

the


In [14]:
import random

output_words = []

input_words = [[]]

# Create an input array of 4 random words
for x in range(training_length):
    input_words[0].append(random.randint(0, number_of_words - 1))
    
input_words = np.asarray(input_words)

# These networks output highly repetative data.
# This functions flattens the prediction scores so we don't always get the
#    most highly predicted word.
# A temperature of 1 gives random results and 0 always gives the most likely word.
# Note that 0 won't actually work here because we are dividing by the temperature.
def reweight_word(preds, temperature=0.5):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    preds = preds.reshape(24621)
    probas = np.random.multinomial(1, preds, 1)[0]
    return np.argmax(probas)

# Predict 20 words
for i in range(20):
    word_oh = model.predict(input_words)
    weighted_index = reweight_word(word_oh)
    # Translate our number to a word
    # line below without function = reverse_lookup[str(np.argmax(word_oh))]
    word = reverse_lookup[str(weighted_index)]
    # Save word out output
    output_words.append(word)

    # Create out new input for the next iteration
    new_input_placeholder = [[]]
    for i in range(training_length):
        index = i + 1
        # Resuse words 2 and 3 (as words 1 and 3)
        if i < 2:
            new_input_placeholder[0].append(input_words[0][index])
        # Make word 3 out newly predicted word
        else:
            new_input_placeholder[0].append(weighted_index)

    input_words = np.asarray(new_input_placeholder)
    
output_tweet = ' '.join(output_words)

print(output_tweet)

a y for to the a the and a a and the obama and is the at the in the


## Results

Not Great. Our tweet is (I've run this alot, so what's above is different) "at at 17 00 p m at the u s is a great job of the u s is a". You can see where it's going. It smells of a tweet from president Trump. But it's clearly not great. 

I played around with the temperature a bit. The network likes outputting "is a great job" all the way up to 1. At 1 it outputs total gibberish.

The biggest problem I have with the network as-is is that there is no variety. It's really stuck on the whole 'great job' thing. If it output a variety tweets, all of similar caliber to the one listed above, I'd be less dissappointed.

# Using Pre-trained embeddings

For our 2nd attempt we're going to use pre-trained embedding layers. I think this will help in a few ways:

1. We'll get embeddings that have a much, much larger vocabulary and a better understanding of the relation between all of the words.
2. We'll have an actual dictionary that doesn't include things like partial URLs. I think these very sparsly represented, meaningless URLs and emoji encodings threw things off. 
3. I think our embedding layer may not have had enough data in both volume and variety to learn on. We don't have the problem with pre-trained embeddings.

We're going to use GloVe: Global Vectors for Word Representation embeddings, which was devleoped by a group out of Stanford in 2014. We're going to use a small GloVe pre-trained embedding that has 400K words.

I'm downloading the 'glove.6B.zip' embedding from [here](https://nlp.stanford.edu/projects/glove/). This package come with multiple output embedding sizes: 50, 100, 200, and 300. We'll be using 100.

This embedding with simply pass all zeros if it doesn't know a word.

We're going to have to reload and re-work our data.

In [15]:
from tensorflow.keras.preprocessing.text import Tokenizer
import numpy as np
import pandas as pd

# Nothing new in this cell. Just reloading the data.

tweet_data = pd.read_csv('trump_tweets.csv')

entire_corpus = []
for index, tweet in tweet_data.iterrows():
    entire_corpus.append(str(tweet['text']))
    
entire_corpus = clean_tweets(entire_corpus)
    
tokenizer = Tokenizer(filters='!"#$%&()*+,-./:;<=>?@[\]^_`{|}~',
                      lower=True,
                      split=' ',
                      char_level=False)

tokenizer.fit_on_texts(entire_corpus)

word_index = tokenizer.word_index
reverse_index_word = tokenizer.index_word
number_of_words = len(word_index) + 1
word_counts = tokenizer.word_counts

tokenized = tokenizer.texts_to_sequences(entire_corpus)

features = []
labels = []

training_length = 3

for sequence in tokenized:
    for index in range(training_length, len(sequence)):
        extract = sequence[index - training_length:index + 1]
        features.append(extract[:-1])
        labels.append(extract[-1])
    
features = np.array(features)

label_placeholder = np.zeros((len(features), number_of_words), dtype = np.int8)

for example_index, word_idx in enumerate(labels):
    label_placeholder[example_index, word_idx] = 1
    
labels = label_placeholder

In [16]:
# This is where we actually load the vectors as a numpy array.
glove_vectors = 'glove.6B/glove.6B.100d.txt'
glove = np.loadtxt(glove_vectors, dtype='str', comments=None, encoding='utf8')
# Expecting 400K words, with 100 output dimensions.
print(glove.shape)

(400000, 101)


In [17]:
vectors = glove[:, 1:].astype('float')
words = glove[:, 0]

# Clear the large embedding object from memory
del glove

In [18]:
# Associating words to their embeddings
word_lookup = {word: vector for word, vector in zip(words, vectors)}

embedding_matrix = np.zeros((number_of_words, vectors.shape[1]))

not_found = 0

# Note that word_index is from our Trump tweets
# This loop is counting how many words are in our tweets,
#    but not out embeddings.
for index, word in enumerate(word_index.keys()):
    vector = word_lookup.get(word, None)
    if vector is not None:
        embedding_matrix[index + 1, :] = vector
    else:
        not_found += 1
        
print(f'Words not found in embeddings: {not_found}')

Words not found in embeddings: 9454


In [19]:
# More memory cleaning
import gc
gc.enable()
del vectors
gc.collect()

0

In [20]:
# Normalize and convert nan (not a number) to 0
embedding_matrix = embedding_matrix / np.linalg.norm(embedding_matrix, axis=1).reshape((-1, 1))
embedding_matrix = np.nan_to_num(embedding_matrix)

  


In [21]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding, Masking
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

x_train = features[:375000]
y_train = labels[:375000]
x_test = features[375000:]
y_test = labels[375000:]

model = Sequential()

# Notice that we're expecting 100 output dimensions
# We are setting our weights to our embedding_matrix
# We are making it so that our model.fit doesn't try to tune this layer
model.add(Embedding(input_dim=number_of_words,
                    input_length = training_length,
                    output_dim=100,
                    weights=[embedding_matrix],
                    trainable=False,
                    mask_zero=True
                   ))

# Any timesteps that are all zeros will be left out.
model.add(Masking(mask_value=0.0))

model.add(LSTM(64, return_sequences=False))

model.add(Dense(32, activation='relu'))

model.add(Dense(number_of_words, activation='softmax'))

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

callbacks = [
    EarlyStopping(monitor='val_loss', patience=3),
    ModelCheckpoint('model2.h5', save_best_only=True, save_weights_only=False)
]

model.fit(x_train, y_train, 
            batch_size=4096, epochs=500,
            validation_data=(x_test, y_test),
            callbacks=callbacks)

Train on 375000 samples, validate on 17554 samples
Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500


<tensorflow.python.keras.callbacks.History at 0x199ae1fb0f0>

In [22]:
from tensorflow.keras.models import load_model
import json

# Nothing new here

with open('trump_word_dict_reverse.json', 'r') as file:
    reverse_lookup = json.loads(file.read())
    
with open('trump_word_dict_tokenized.json', 'r') as file:
    tokenized = json.loads(file.read())
    
model = load_model('model2.h5')

In [69]:
import random

output_words = []

input_words = [[]]

# Nothing new here

for x in range(training_length):
    input_words[0].append(random.randint(0,number_of_words - 1))
    
input_words = np.asarray(input_words)

def reweight_word(preds, temperature=0.85):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    preds = preds.reshape(24857)
    probas = np.random.multinomial(1, preds, 1)[0]
    return np.argmax(probas)

for i in range(20):
    word_oh = model.predict(input_words)
    weighted_index = reweight_word(word_oh)
    word = reverse_lookup[str(np.argmax(word_oh))]
    output_words.append(word)

    new_input_placeholder = [[]]
    for i in range(training_length):
        index = i + 1
        if i < 2:
            new_input_placeholder[0].append(input_words[0][index])
        else:
            new_input_placeholder[0].append(weighted_index)

    input_words = np.asarray(new_input_placeholder)

output_tweet = ' '.join(output_words)

print(output_tweet)

and trump maga in a a the people have the years the people tower of the is the will are


## Results
of the great state of the united states is a total joke and the best of the great state of



# Hyperas