# Disclaimer
The following notebook is borrowed from [deepschool.io](https://github.com/sachinruk/deepschool.io/blob/master/Lesson%2016%20-%20LSTM%20Trump%20Tweets%20-%20Solutions.ipynb). I have refactored the code a bit, added some print statements to clarify what things look like, and modified the bit that saves the model, to separate the architecture from the weights.
Then I pretrained some models, that you can load to see the effect of different sizes of the LSTM used.

# LSTM (Long Short Term Memory)

There is a branch of Deep Learning that is dedicated to processing time series. These deep Nets are **Recursive Neural Nets (RNNs)**. LSTMs are one of the few types of RNNs that are available. Gated Recurent Units (GRUs) are the other type of popular RNNs.

This is an illustration from http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (A highly recommended read)

![RNNs](./images/RNN-unrolled.png)

Pros:
- Really powerful pattern recognition system for time series

Cons:
- Cannot deal with missing time steps.
- Time steps must be discretised and not continuous.

![trump](./images/trump.jpg)

In [None]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import re

from keras.models import Sequential
from keras.layers import Activation, Dropout, Flatten, Dense, BatchNormalization, LSTM, Embedding, TimeDistributed
from keras.models import load_model, model_from_json

import pickle

In [None]:
df = pd.read_csv('data/trump.csv') # might need to change location if on Floydhub
df = df[df.is_retweet=='false']
df.text = df.text.str.lower()
df.text = df.text.str.replace(r'http[\w:/\.]+','') # remove urls
df.text = df.text.str.replace(r'[^!\'"#$%&\()*+,-./:;<=>?@_’`{|}~\w\s]',' ') #remove everything but characters and punctuation
df.text = df.text.str.replace(r'\s\s+',' ') #replace multple white space with a single one
df = df[[len(t)<180 for t in df.text.values]]
df = df[[len(t)>50 for t in df.text.values]]
df.head()

In [None]:
df.shape

In [None]:
trump_tweets = [text for text in df.text.values[::-1]]
trump_tweets[:5]

Create a dictionary to convert letters to numbers and vice versa.

In [None]:
all_tweets = ''.join(trump_tweets)
char2int = dict(zip(set(all_tweets), range(len(set(all_tweets)))))
char2int['<END>'] = len(char2int)
char2int['<GO>'] = len(char2int)
char2int['<PAD>'] = len(char2int)
int2char = dict(zip(char2int.values(), char2int.keys()))

# Print the dictionaries extracted from the data
print(char2int)
print(int2char)

In [None]:
# Encode the text representation of the tweets to be used with neural networks
text_num = [[char2int['<GO>']]+[char2int[c] for c in tweet]+ [char2int['<END>']] for tweet in trump_tweets]

# Print an example to see what is going on
print('<GO>' + trump_tweets[0] +'<END>')
print(text_num[0])

In [None]:
plt.hist([len(t) for t in trump_tweets],50)
plt.show()

In [None]:
len_vocab = len(char2int)
sentence_len = 40

num_examples = 0
for tweet in text_num:
    num_examples += len(tweet)-sentence_len

x = np.zeros((num_examples, sentence_len))
y = np.zeros((num_examples, sentence_len))

k = 0
for tweet in text_num:
    for i in range(len(tweet)-sentence_len):
        x[k,:] = np.array(tweet[i:i+sentence_len])
        y[k,:] = np.array(tweet[i+1:i+sentence_len+1])
        k += 1
        
y = y.reshape(y.shape+(1,))

In [None]:
y.shape

## Many to Many LSTM

In [None]:
model = Sequential()
model.add(Embedding(len_vocab, 512)) # , batch_size=batch_size
model.add(LSTM(512, return_sequences=True)) # , stateful=True
model.add(TimeDistributed(Dense(len_vocab, activation='softmax')))

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()

Pay special attention to how the probabilites are taken. p is of shape `(1, sequence_len, len(char2int))` where len(char2int) is the number of available characters. The 1 is there because we are only predicting one feature, `y`. We are only concerned about the last prediction probability of the sequence. This is due to the fact that all other letters have already been appended. Hence we predict a letter from the distribution `p[0][-1]`.

Why did we keep appending to the sequence and predicting? Why not use simply the last letter. If we were to do this, we would lose information that comes from the previous letter via the hidden state and cell memory. Keep in mind that each LSTM unit has 3 inputs, the x, the hidden state, and the cell memory. 

Also important to notice that the Cell Memory is not used in connecting to the Dense layer, only the hidden state.

In [None]:
np.random.choice(5,20,p=[0.9, 0.1, 0, 0, 0])

In [None]:
# Function to predict tweets
# encoded_sentence: integer array of encoded tweet
# model: the model used to predict the probabilities of each character
# encode_dict: dictionary to convert from char to integers
# decode_dict: dictionary to convert from integers to chars
# returns a string with the predicted tweet, predicting one char at at time
def predict_tweet(encoded_sentence, model,encode_dict,decode_dict):
    sentence = [decode_dict[l] for l in encoded_sentence]
    for i in range(150):
        if sentence[-1]=='<END>':
            break
        p = model.predict(np.array(encoded_sentence)[None,:])
        encoded_sentence.append(np.random.choice(len(encode_dict),1,p=p[0][-1])[0])
        sentence.append(decode_dict[encoded_sentence[-1]])
    return ''.join(sentence)

In [None]:
# Train the model over 10 epochs, and print a prediction for each epoch
# If you just want a quick result, just train it one epoch by setting n_epochs to 1
n_epochs = 10
for i in range(n_epochs+1):
    print('<'*100)
    print( 'Start training epoch %s' % (i))
    tweet = [char2int['<GO>']] #choose a random letter
    print(predict_tweet(tweet,model,char2int,int2char)) # print a random tweet predicted by model so far
    print('>'*100)
    if i!=n_epochs:
        model.fit(x,y, batch_size=1024, epochs=1)

## Saving the model

There are actually three things that need to be saved when saving RNN models in keras.
1. The model architecture. 
2. The model weights. It's a good idea to separate the architecture from the weights, if you decide to retrain your model afterwards.
3. The associated dictionary that refers to the character embeddings. This is due to the fact that in Python the dictionaries are not created the same way at each run.

In [None]:
# 1. Convert model architecture to JSON format
architecture = model.to_json()

# save architecture to JSON file
with open('./architecture.trump.json', 'wt') as json_file:
    json_file.write(architecture)

# 2. Save weights to hdf5 file
model.save_weights('./weights.trump.h5')

# 3. Save the dictionary for the character embeddings
with open('./tweets.pickle', 'wb') as f:
    pickle.dump((char2int, int2char), f)

To load the model run the following:

In [None]:
# Load text Dict
with open('./tweets.pickle', 'rb') as f:
    char2int, int2char = pickle.load(f)

# load architecture from JSON File
json_file = open('./architecture.trump.json', 'rt')
architecture = json_file.read()
json_file.close()
# create model from architecture
model2 = model_from_json(architecture)
# load weights from hdf5 file
model2.load_weights('./weights.trump.h5')
#model2 = model2.load_weights('trump_model.h5')


In [None]:
# Predict 5 tweets with your trained model
for j in range(10):
    tweet = [char2int['<GO>']] #choose a random letter
    print(predict_tweet(tweet,model2,char2int,int2char))
    print('='*100)

In [None]:
# Complete some tweets

tweet = [char2int[letter] for letter in "white supremacists are "]
print(predict_tweet(tweet,model2,char2int,int2char) + "\n")

tweet = [char2int[letter] for letter in "obama is "]
print(predict_tweet(tweet,model2,char2int,int2char)+ "\n")

tweet = [char2int[letter] for letter in "i resign"]
print(predict_tweet(tweet,model2,char2int,int2char))

# Using the pretrained models
In the following, we load the models I have pretrained for you. I have used the same architecture, but just played a bit with the size of the Embedding network for sizes of:

- 64 , epochs: 40, loss: 1.5484
- 128, epochs: 30, loss: 1.39   
- 256, epochs: 20, loss: 1.1716 
- 512, epochs: 10, loss: 0.9269

We load each model, and make some predictions. Look at how the predictions evolve.

In [None]:
# Colors for printing
class colors:
    RED = '\033[91m'
    BLUE = '\033[94m'
    GREEN = '\033[92m'    
    close = '\033[0m'

# Function to load the pretrained models
def load_model(dictionary_file, architecture_file, weight_file):
    # Load text Dict
    with open(dictionary_file, 'rb') as f:
        encode_dict, decode_dict = pickle.load(f)

    # load architecture from JSON File
    json_file = open(architecture_file, 'rt')
    architecture = json_file.read()
    json_file.close()

    # create model from architecture
    model = model_from_json(architecture)
    # load weights from hdf5 file
    model.load_weights(weight_file)
    return encode_dict, decode_dict, model

In [None]:
# Load the pretrained models

char2int_x64, int2char_x64, model_x64 = load_model('./tweets.x64.e40.pickle', './architecture.x64.e40.trump.json', './weights.x64.e40.trump.h5')
char2int_x128, int2char_x128, model_x128 = load_model('./tweets.x128.e30.pickle', './architecture.x128.e30.trump.json', './weights.x128.e30.trump.h5')
char2int_x256, int2char_x256, model_x256 = load_model('./tweets.x256.e20.pickle', './architecture.x256.e20.trump.json', './weights.x256.e20.trump.h5')
char2int_x512, int2char_x512, model_x512 = load_model('./tweets.x512.e10.pickle', './architecture.x512.e10.trump.json', './weights.x512.e10.trump.h5')


In [None]:
# Predict 5 tweets of each model

print(colors.BLUE + '='*50 + ' x64e40 ' + '='*50 + colors.close)
for j in range(5):
    tweet = [char2int_x64['<GO>']] #choose a random letter
    print(colors.BLUE + predict_tweet(tweet,model_x64,char2int_x64,int2char_x64) + colors.close)    

print(colors.GREEN + '='*50 + ' x128e30 ' + '='*50 + colors.close)
for j in range(5):
    tweet = [char2int_x128['<GO>']] #choose a random letter
    print(colors.GREEN + predict_tweet(tweet,model_x128,char2int_x128,int2char_x128) + colors.close)    
    
print(colors.RED + '='*50 + ' x256e20 ' + '='*50 + colors.close)
for j in range(5):
    tweet = [char2int_x256['<GO>']] #choose a random letter
    print(colors.RED + predict_tweet(tweet,model_x256,char2int_x256,int2char_x256) + colors.close)    

print('='*50 + ' x512e10 ' + '='*50)
for j in range(5):
    tweet = [char2int_x512['<GO>']] #choose a random letter
    print(predict_tweet(tweet,model_x512,char2int_x512,int2char_x512))    



In [None]:
tweet = [char2int_x64[letter] for letter in "white supremacists are "]
print(colors.BLUE + 'x64e40: ' +predict_tweet(tweet,model_x64,char2int_x64,int2char_x64) + colors.close + "\n")

tweet = [char2int_x128[letter] for letter in "white supremacists are "]
print(colors.GREEN + 'x128e30: ' + predict_tweet(tweet,model_x128,char2int_x128,int2char_x128) + colors.close +"\n")

tweet = [char2int_x256[letter] for letter in "white supremacists are "]
print(colors.RED + 'x256e20: ' + predict_tweet(tweet,model_x256,char2int_x256,int2char_x256) + colors.close +"\n")

tweet = [char2int_x512[letter] for letter in "white supremacists are "]
print('x512e10: ' + predict_tweet(tweet,model_x512,char2int_x512,int2char_x512))

In [None]:
tweet = [char2int_x64[letter] for letter in "obama is "]
print(colors.BLUE + 'x64e40: ' +predict_tweet(tweet,model_x64,char2int_x64,int2char_x64) + colors.close + "\n")

tweet = [char2int_x128[letter] for letter in "obama is "]
print(colors.GREEN + 'x128e30: ' + predict_tweet(tweet,model_x128,char2int_x128,int2char_x128) + colors.close + "\n")

tweet = [char2int_x256[letter] for letter in "obama is "]
print(colors.RED + 'x256e20: ' + predict_tweet(tweet,model_x256,char2int_x256,int2char_x256) + colors.close + "\n")

tweet = [char2int_x512[letter] for letter in "obama is "]
print('x512e10: ' + predict_tweet(tweet,model_x512,char2int_x512,int2char_x512))

In [None]:
tweet = [char2int_x64[letter] for letter in "i resign "]
print(colors.BLUE + 'x64e40: ' +predict_tweet(tweet,model_x64,char2int_x64,int2char_x64) + colors.close + "\n")

tweet = [char2int_x128[letter] for letter in "i resign "]
print(colors.GREEN + 'x128e30: ' + predict_tweet(tweet,model_x128,char2int_x128,int2char_x128) + colors.close + "\n")

tweet = [char2int_x256[letter] for letter in "i resign "]
print(colors.RED + 'x256e20: ' + predict_tweet(tweet,model_x256,char2int_x256,int2char_x256) + colors.close+ "\n")

tweet = [char2int_x512[letter] for letter in "i resign "]
print('x512e10: ' + predict_tweet(tweet,model_x512,char2int_x512,int2char_x512))

In [None]:
tweet = [char2int_x512[letter] for letter in "covfefe means "]
print('x512e10: ' + predict_tweet(tweet,model_x512,char2int_x512,int2char_x512))