# Project 5: Text Generation with Recurrent Neural Networks, LSTM, and Hyperas

## Hyperas

Hyperas is used for automated machine learning tuning in keras. It's based on the hyperopt library, with a focus on simplification and focus on keras.

The concepts here are going to be pretty simple. The main differences you're going to see between this and our previous notebooks are:

1. We have to use actual data creation and model creation functions.
    - The data function ensures that we only have to load our data once. We have to return the feautres and labels in a particualr order.
    - The model function defines our model and the hyperparameter tunings that we want to try.
2. We'll plug the data and model functions into a hyperas function that loads the data and tunes the model.

There aren't any new machine learning concpets in this notebook, but this tool will be invaluable for finding the best model for any future project.

The only real Hyperas notes I have are:

- tpe - This is the optimization algorithm we'll be using. You can use any algorithm that hyperopt supports. TPE is Tree-structured Parzen Estimator, it's more than just a random search, but most importantly: It's what the docs use.
- Trials - this is a hyperopt trials object that has to be passed to hyperas.

### Imports

In [26]:
from hyperopt import Trials, STATUS_OK, tpe
from hyperas import optim
from hyperas.distributions import choice, uniform
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding, Masking, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
import json
import re
import numpy as np
import pandas as pd
import gc

### Data

This function is a little nasty. Hyperas can handle nested function calls, but it doesn't love it. I've opted for one large, flat function.

- I originally had the embeddings loading with the model, but they were having to reload on every itereation. These take around a minute to load, so the entire process is faster with embeddings loaded as data.
- There are sveral layers of abstraction between our code and hyperopt. Hyperas was being annoying about passing variables between functions. Hence, the need to 'global' some variables.

In [27]:
def data():
    print(f'data')
    global training_length
    training_length = 3
    tweet_data = pd.read_csv('trump_tweets.csv')

    entire_corpus = []
    for index, tweet in tweet_data.iterrows():
        entire_corpus.append(str(tweet['text']))
    
    cleaned = []
    for tweet in entire_corpus:
        tweet = re.sub(r'http.*\s', '', tweet)
        tweet = re.sub(r'http.*$', '', tweet)
        tweet = re.sub(r'http', '', tweet)
        cleaned.append(tweet)
        
    entire_corpus = cleaned
        
    tokenizer = Tokenizer(filters=str('!"$%&()*+,-./:;<=>?@[\]^_`{|}~\r\n'),
                          lower=True,
                          split=' ',
                          char_level=False)
    
    tokenizer.fit_on_texts(entire_corpus)
    
    word_index = tokenizer.word_index
    reverse_index_word = tokenizer.index_word
    global number_of_words
    number_of_words = len(word_index) + 1
    word_counts = tokenizer.word_counts
    
    tokenized = tokenizer.texts_to_sequences(entire_corpus)
    
    features = []
    labels = []
    
    for sequence in tokenized:
        for index in range(training_length, len(sequence)):
            extract = sequence[index - training_length:index + 1]
            features.append(extract[:-1])
            labels.append(extract[-1])
    
    features = np.array(features)
    
    label_placeholder = np.zeros((len(features), number_of_words), dtype = np.int8)
    
    for example_index, word_idx in enumerate(labels):
        label_placeholder[example_index, word_idx] = 1
    
    labels = label_placeholder
    
    train_percent = int(round(float(features.shape[0]) * 0.9))
    
    x_train = features[:train_percent]
    y_train = labels[:train_percent]
    x_test = features[train_percent:]
    y_test = labels[train_percent:]
    
    with open('trump_word_dict_tokenized2.json', 'w') as file:
        output = json.dumps(tokenizer.word_index, indent=4)
        file.write(output)
    
    with open('trump_word_dict_reverse2.json', 'w') as file:
        output = json.dumps(tokenizer.index_word, indent=4)
        file.write(output)
        
    print(f'embeddings')
    global pretrained_embeddings
    glove_vectors = 'glove.6B/glove.6B.100d.txt'
    glove = np.loadtxt(glove_vectors, dtype='str', comments=None, encoding='utf8')
    vectors = glove[:, 1:].astype('float')
    words = glove[:, 0]
    del glove
    word_lookup = {word: vector for word, vector in zip(words, vectors)}
    pretrained_embeddings = np.zeros((number_of_words, vectors.shape[1]))
    with open('trump_word_dict_tokenized.json', 'r') as file:
        word_index = json.loads(file.read())
    for index, word in enumerate(word_index.keys()):
        vector = word_lookup.get(word, None)
        if vector is not None:
            pretrained_embeddings[index + 1, :] = vector
    gc.enable()
    del vectors
    gc.collect()
    pretrained_embeddings = pretrained_embeddings / np.linalg.norm(pretrained_embeddings, axis=1).reshape((-1, 1))
    pretrained_embeddings = np.nan_to_num(pretrained_embeddings)
    print(f'embeddings complete')

    return x_train, y_train, x_test, y_test

### Model

We're going to talk syntax and resoning here, rather than cluttering up the code.

- **{{}}** - This is the basic syntax for telling hyperas "here's some params I want you to tune".
- **{{choice([1, 2])}}** - Syntax for passing hyperas a list of options to chosse from. 1 or 2 in this case.
- **{{uniform(0, 1)}}** - Hyperas will decide on it's own, along a uniform distribution of the values you pass.
- **if {{choice(['one', 'two'])}} == 'two'** - Pointing out that we don't have to only use hyperas directly on hyperparameters.
    - At 2 points I use this to try out entire layers, at both the LSTM and Dense level. Notice that the Dense if adds a Dense and a dropout.
    - At 1 point I use this to decide on a learning rate and learning rate decay. This is because I don't want hyperas trying the lowest LRs with the highest LR decay.
- **lstm_size = {{choice([64, 128])}}** - Using hyperas to set a variable to plug in later.
    - Honestly, I didn't really want to do this. I ran into an OOM error at really high trial levels and did this to help reduce the search space.
- **[EarlyStopping(monitor='val_acc', patience=2)]** - We need this to increase the search speed. Hyperas is evaluating accuracy, so I aslo used accuracy here. Accuracy seems to bounce a little less than loss and we have a lot of iterations to try here, so I used 2 patience.
- **'loss': -validation_acc**, - You may notice that at the end of the function we're passing a negative accuracy. This is because hyperas will always try to lower the value you pass to it. We actually want our value to go up, so we're passing its negative for hyperas to assess.

In [3]:
def model(x_train, y_train, x_test, y_test):

    model = Sequential()

    model.add(Embedding(input_dim=number_of_words,
                        input_length = training_length,
                        output_dim=100,
                        weights=[pretrained_embeddings],
                        trainable={{choice([False, True])}},
                        mask_zero=True
                       ))
    
    model.add(Masking(mask_value=0.0))
    
    lstm_size = {{choice([64, 128, 256])}}
    
    if {{choice(['one_lstm', 'two_lstm'])}} == 'two_lstm':
         model.add(LSTM(lstm_size, return_sequences=True))

    model.add(LSTM(lstm_size, return_sequences=False))
    
    model.add(Dense({{choice([32, 64, 256])}}, activation='relu'))
    
    model.add(Dropout({{uniform(0, 1)}}))
    
    if {{choice(['one_dense', 'two_dense'])}} == 'two_dense':
        model.add(Dense({{choice([32, 64])}}, activation='relu'))

        model.add(Dropout({{uniform(0, 1)}}))
    
    model.add(Dense(number_of_words, activation='softmax'))
    
    if {{choice(['norm_lr', 'high_lr'])}} == 'high_lr':
        optimizer = Adam(lr={{choice([0.1, 0.2])}},
                     decay={{choice([0.001, 0.01])}})
    else:
        optimizer = Adam(lr={{choice([0.001, 0.01])}},
                     decay={{choice([0.0, 0.0001])}})
    
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    
    callbacks = [EarlyStopping(monitor='val_acc', patience=2)]
    
    
    result = model.fit(x_train, y_train, 
                       batch_size=4096,
                       epochs=50,
                       verbose=1,
                       validation_data=(x_test, y_test),
                       callbacks=callbacks)
    
    #get the highest validation accuracy of the training epochs
    validation_acc = np.amax(result.history['val_acc'])
    
    print('Best validation acc of epoch:', validation_acc)
    return {'loss': -validation_acc, 'status': STATUS_OK, 'model': model}

### Training

- I originally started with 100 evals but woke up to an OOM that happened around 57. I did some tuning to reduce the search space and cut the evals to 50.

In [4]:
X_train, Y_train, X_test, Y_test = data()

best_run, best_model = optim.minimize(model=model,
                                      data=data,
                                      algo=tpe.suggest,
                                      max_evals=50,
                                      trials=Trials(),
                                      notebook_name='text-gen-lstm-hyperas-2')

print("Evalutation of best performing model:")
print(best_model.evaluate(X_test, Y_test))
print("Best performing model chosen hyper-parameters:")
print(best_run)

data
embeddings




embeddings complete
>>> Imports:
#coding=utf-8

try:
    from hyperopt import Trials, STATUS_OK, tpe
except:
    pass

try:
    from hyperas import optim
except:
    pass

try:
    from hyperas.distributions import choice, uniform
except:
    pass

try:
    from tensorflow.keras.preprocessing.text import Tokenizer
except:
    pass

try:
    from tensorflow.keras.models import Sequential
except:
    pass

try:
    from tensorflow.keras.layers import LSTM, Dense, Embedding, Masking, Dropout
except:
    pass

try:
    from tensorflow.keras.optimizers import Adam
except:
    pass

try:
    from tensorflow.keras.callbacks import EarlyStopping
except:
    pass

try:
    import json
except:
    pass

try:
    import re
except:
    pass

try:
    import numpy as np
except:
    pass

try:
    import pandas as pd
except:
    pass

try:
    import gc
except:
    pass

>>> Hyperas search space:

def get_space():
    return {
        'trainable': hp.choice('trainable', [False, True]),
        'lstm

  pretrained_embeddings = pretrained_embeddings / np.linalg.norm(pretrained_embeddings, axis=1).reshape((-1, 1))


embeddings complete
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Best validation acc of epoch: 0.04245451306125378
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Best validation acc of epoch: 0.09805820290025638


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Best validation acc of epoch: 0.05101676771485516
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Best validation acc of epoch: 0.04242903013009221
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Best validation acc of epoch: 0.04242903013009221
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Best validation acc of epoch: 0.04242903013009221
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Best validation acc of epoch: 0.017863513596863227
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Best validation acc of epoch: 0.04242903013009221
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Best validation acc of epoch: 0.0207175984967

Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Best validation acc of epoch: 0.09059171276010873
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Best validation acc of epoch: 0.10371540667485658
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Best validation acc of epoch: 0.042454513031065656
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Best validation acc of epoch: 0.04242903013009221
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Best validation acc of epoch: 0.04242903013009221
Train on 353176 samples, validate o

Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Best validation acc of epoch: 0.12453493698812468
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Best validation acc of epoch: 0.12805157733688988
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Best validation acc of epoch: 0.12685388123264288
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Best validation acc of epoch: 0.1291218593129567
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/5

Epoch 12/50
Best validation acc of epoch: 0.12853575256170832
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Best validation acc of epoch: 0.12466235115959352
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Best validation acc of epoch: 0.09614698556875284
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Best validation acc of epoch: 0.12772029924754805
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epo

Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Best validation acc of epoch: 0.09861882670724266
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Best validation acc of epoch: 0.12797512836360547
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Best validation acc of epoch: 0.046939503678527124
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50


Epoch 20/50
Epoch 21/50
Best validation acc of epoch: 0.10641659424107627
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Best validation acc of epoch: 0.1159981653549202
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Best validation acc of epoch: 0.10391927039451321
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50


Epoch 25/50
Epoch 26/50
Best validation acc of epoch: 0.107486875946779
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Best validation acc of epoch: 0.05285153670569582
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Best validation acc of epoch: 0.04242903013009221
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Best validation acc of epoch: 0.04242903013009221
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Best validation acc of epoch: 0.07295754526407977
Train on 353176 samples, validate on 39242 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Best validation acc of epoch: 0.042429030130092

In [6]:
best_model.save('model3.h5')

In [7]:
import random
import numpy as np
from tensorflow.keras.models import load_model
import json

def reweight_word(preds, word_dict_len, temperature):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    preds = preds.reshape(word_dict_len)
    probas = np.random.multinomial(1, preds, 1)[0]
    return np.argmax(probas)

def create_text(model_path, lookup_path, training_length, num_output_words=20, temperature=0.5):
    output_words = []
    input_words = [[]]
    
    model = load_model(model_path)
    
    with open(lookup_path, 'r') as file:
        reverse_lookup = json.loads(file.read())
        
    word_dict_len = len(reverse_lookup) + 1

    for x in range(training_length):
        input_words[0].append(random.randint(0,word_dict_len - 1))
        
    input_words = np.asarray(input_words)

    for i in range(num_output_words):
        word_oh = model.predict(input_words)
        weighted_index = reweight_word(word_oh, word_dict_len, temperature)
        word = reverse_lookup[str(np.argmax(word_oh))]
        output_words.append(word)
    
        new_input_placeholder = [[]]
        for i in range(training_length):
            index = i + 1
            if i < 2:
                new_input_placeholder[0].append(input_words[0][index])
            else:
                new_input_placeholder[0].append(weighted_index)
    
        input_words = np.asarray(new_input_placeholder)
    
    output_tweet = ' '.join(output_words)
    
    return output_tweet

In [8]:
for temp in [0.1, 0.25, 0.5, 0.75, 0.95]:
    print('====================')
    print(f'Temperature: {temp}')
    print('====================')
    for i in range(3):
        tweet = create_text('model3.h5', 'trump_word_dict_reverse2.json', 3,
                        num_output_words=20, temperature=temp)
        print(f'{i + 1}: {tweet}')

Temperature: 0.1


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


1: to the election s is a great guy to be the u and the great of the of the fbi
2: is be a great governor of the great state of ohio and a great disaster for the u s is
3: on record high crime and many more years of the u s is a great guy who is a total
Temperature: 0.25
1: is a total disaster for the u s is the fbi is a best thing the people is a great
2: to the new york times is a total guy who is a a plateau it's a beginning to be a
3: year killing in the u s is a great guy to be in the u s is a great guy
Temperature: 0.5
1: of the great state of ohio is will be interviewed by foxandfriends tonight seanhannity trump discussing the tweets erictrump is
2: to be in the u of a total in the history and doj and the are a great to make
3: the is a a plateau it's a beginning to is have been a total meltdown on the solution york times
Temperature: 0.75
1: of the trump to be been been a to the of the money penalty for the great and smart and


  


2: in the u of on the country and are be let a better than the u of the country and
3: i the the world is be a tax cuts and in the into our country and great honor to the
Temperature: 0.95
1: of great job people in the white of the they to will #trump2016 #makeamericagreatagain omaha missusa of staff the the
2: is restaurant exclusively amp shirts are be to america great again rally in can seen the on the to for
3: in god not of the the the we the u will a happy the news is the palestinian of the
