In [1]:
"""
What? Vanilla LSTMs

A simple LSTM configuration is the Vanilla LSTM. It is named Vanilla in this book to di↵erentiate it from deeper 
LSTMs and the suite of more elaborate configurations. It is the LSTM architecture defined in the original 1997 
LSTM paper and the architecture that will give good results on most small sequence prediction problems
 
https://machinelearningmastery.com/stacked-long-short-term-memory-networks/
"""

'\nWhat? Vanilla LSTMs\n\nA simple LSTM configuration is the Vanilla LSTM. It is named Vanilla in this book to di↵erentiate it from deeper \nLSTMs and the suite of more elaborate configurations. It is the LSTM architecture defined in the original 1997 \nLSTM paper and the architecture that will give good results on most small sequence prediction problems\n \nReference: Long short-term memory networks with python, Jason Brownlee\n'

In [2]:
# Import python modules
import tensorflow as tf
tf.compat.v1.disable_eager_execution()

from random import randint
from numpy import array
from numpy import argmax
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense



### Echo Sequence Prediction Problem

In [3]:
"""
Given a sequence of random integers as input output the value of a random integer at a specific time input 
step that is not specified to the model. For example, given the input sequence of random integers [5, 3, 2] 
and the chosen time step was the second value, then the expected output is 3. Technically, this is a sequence
classification problem; it is formulated as a many-to-one prediction problem, where there are multiple input 
time steps and one output time step at the end of the sequence.
"""

'\nGiven a sequence of random integers as input output the value of a random integer at a specific time input \nstep that is not specified to the model. For example, given the input sequence of random integers [5, 3, 2] \nand the chosen time step was the second value, then the expected output is 3. Technically, this is a sequence\nclassification problem; it is formulated as a many-to-one prediction problem, where there are multiple input \ntime steps and one output time step at the end of the sequence.\n'

### Generate Random Sequences

In [4]:
"""
We can generate random integers in Python using the randint() function that takes two parameters indicating the 
range of integers from which to draw values. In this lesson, we will define the problem as having integer values
between 0 and 99 with 100 unique values.
"""

'\nWe can generate random integers in Python using the randint() function that takes two parameters indicating the \nrange of integers from which to draw values. In this lesson, we will define the problem as having integer values\nbetween 0 and 99 with 100 unique values.\n'

In [5]:
# generate a sequence of random numbers in [0, n_features)
def generate_sequence(length, n_features):
    return [randint(0, n_features-1) for _ in range(length)]

### One Hot Encode Sequences

In [6]:
"""
We need to transform them into a format that is suitable for training an LSTM network. In this case, we can use a 
one hot encoding of the integer values where each value is represented by a 100 element binary vector that is all 
0 values except the index of the integer, which is marked 1.
"""

'\nWe need to transform them into a format that is suitable for training an LSTM network. In this case, we can use a \none hot encoding of the integer values where each value is represented by a 100 element binary vector that is all \n0 values except the index of the integer, which is marked 1.\n'

In [7]:
# one hot encode sequence
def one_hot_encode(sequence, n_features):
    encoding = list()
    for value in sequence:
        vector = [0 for _ in range(n_features)]
        vector[value] = 1
        encoding.append(vector)
    return array(encoding)

# decode a one hot encoded string
def one_hot_decode(encoded_seq):
    return [argmax(vector) for vector in encoded_seq]

### Worked Example

In [8]:
"""
We can tie all of this together. Below is the complete code listing for generating a sequence of 25 random 
integers and encoding each integer as a binary vector.
"""

'\nWe can tie all of this together. Below is the complete code listing for generating a sequence of 25 random \nintegers and encoding each integer as a binary vector.\n'

In [9]:
# generate random sequence
sequence = generate_sequence(25, 100)
print(sequence)

# one hot encode
encoded = one_hot_encode(sequence, 100)
print(encoded)

# one hot decode
decoded = one_hot_decode(encoded)
print(decoded)

[6, 12, 16, 79, 5, 38, 83, 4, 53, 50, 27, 33, 25, 23, 84, 9, 4, 71, 12, 67, 68, 36, 84, 29, 38]
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
[6, 12, 16, 79, 5, 38, 83, 4, 53, 50, 27, 33, 25, 23, 84, 9, 4, 71, 12, 67, 68, 36, 84, 29, 38]


### Reshape Sequences

In [10]:
"""
The final step is to reshape the one hot encoded sequences into a format that can be used as input to the LSTM. 
This involves reshaping the encoded sequence to have n time steps and k features, where n is the number of 
integers in the generated sequence and k is the set of possible integers at each time step (e.g. 100)
"""

'\nThe final step is to reshape the one hot encoded sequences into a format that can be used as input to the LSTM. \nThis involves reshaping the encoded sequence to have n time steps and k features, where n is the number of \nintegers in the generated sequence and k is the set of possible integers at each time step (e.g. 100)\n'

In [11]:
# generate one example for an lstm
def generate_example(length, n_features, out_index):
    # generate sequence
    sequence = generate_sequence(length, n_features)
    # one hot encode
    encoded = one_hot_encode(sequence, n_features)
    # reshape sequence to be 3D
    X = encoded.reshape((1, length, n_features))
    # select output
    y = encoded[out_index].reshape(1, n_features)
    return X, y

### Worked out example

In [12]:
"""
We can put all of this together and test the generation of one example ready for fitting or evaluating an LSTM.
Running the code generates one encoded sequence and prints out the shape of the input and output components of 
the sequence for the LSTM.
"""

'\nWe can put all of this together and test the generation of one example ready for fitting or evaluating an LSTM.\nRunning the code generates one encoded sequence and prints out the shape of the input and output components of \nthe sequence for the LSTM.\n'

In [13]:
X, y = generate_example(25, 100, 2)
print(X.shape)
print(y.shape)

(1, 25, 100)
(1, 100)


### Define and Compile the Model

In [14]:
"""
To keep the model small and ensure it is fit in a reasonable time, we will greatly simplify the problem by 
reducing the sequence length to 5 integers and the number of features to 10 (e.g. 0-9). The model must specify 
the expected dimensionality of the input data. In this case, in terms of time steps (5) and features (10). We 
will use a single hidden layer LSTM with 25 memory units, chosen with a little trial and error. The output layer 
is a fully connected layer (Dense) with 10 neurons for the 10 possible integers that may be output. A softmax 
activation function is used on the output layer to allow the network to learn and output the distribution over
the possible output values. mThe network will use the log loss function while training, suitable for multiclass 
classification problems, and the efficient Adam optimization algorithm. The accuracy metric will be reported each 
training epoch to give an idea of the skill of the model in addition to the loss
"""

'\nTo keep the model small and ensure it is fit in a reasonable time, we will greatly simplify the problem by \nreducing the sequence length to 5 integers and the number of features to 10 (e.g. 0-9). The model must specify \nthe expected dimensionality of the input data. In this case, in terms of time steps (5) and features (10). We \nwill use a single hidden layer LSTM with 25 memory units, chosen with a little trial and error. The output layer \nis a fully connected layer (Dense) with 10 neurons for the 10 possible integers that may be output. A softmax \nactivation function is used on the output layer to allow the network to learn and output the distribution over\nthe possible output values. mThe network will use the log loss function while training, suitable for multiclass \nclassification problems, and the efficient Adam optimization algorithm. The accuracy metric will be reported each \ntraining epoch to give an idea of the skill of the model in addition to the loss\n'

In [15]:
# define model
length = 5
n_features = 10
out_index = 2
model = Sequential()
model.add(LSTM(25, input_shape=(length, n_features)))
model.add(Dense(n_features, activation= "softmax"))
model.compile(loss = "categorical_crossentropy", optimizer= "adam" , metrics=["acc"])
print(model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm (LSTM)                  (None, 25)                3600      
_________________________________________________________________
dense (Dense)                (None, 10)                260       
Total params: 3,860
Trainable params: 3,860
Non-trainable params: 0
_________________________________________________________________
None


### Fit the Model

In [16]:
"""
The number of epochs is the number of iterations of generating samples and essentially the batch size is 1 sample.
Below is an example of fitting the model for 10,000 epochs found with a little trial and error.
"""

'\nThe number of epochs is the number of iterations of generating samples and essentially the batch size is 1 sample.\nBelow is an example of fitting the model for 10,000 epochs found with a little trial and error.\n'

In [None]:
# fit model
for i in range(3000):    
    X, y = generate_example(length, n_features, out_index)
    model.fit(X, y, epochs=1, verbose=2)

### Evaluate the Model

In [None]:
"""
Once the model is fit, we can estimate the skill of the model when classifying new random sequences. We can do 
this by simply making predictions on 100 randomly generated sequences and counting the number of correct 
predictions made.
"""

In [None]:
# evaluate model
correct = 0
for i in range(1000):
    X, y = generate_example(length, n_features, out_index)
    yhat = model.predict(X)
    if one_hot_decode(yhat) == one_hot_decode(y):
        correct += 1
print("Accuracy: %f"  % ((correct/100)*100.0))

### Make Predictions With the Model

In [None]:
"""
Finally, we can use the fit model to make predictions on new randomly generated sequences. For this problem,
this is much the same as the case of evaluating the model. Because this is more of a user-facing activity, we
can decode the whole sequence, expected output, and prediction and print them on the screen
"""

In [None]:
# prediction on new data
X, y = generate_example(length, n_features, out_index)
yhat = model.predict(X)
print("Sequence: %s"  % [one_hot_decode(x) for x in X])
print("Expected: %s"  % one_hot_decode(y))
print("Predicted: %s"  % one_hot_decode(yhat))

In [None]:
"""
I have used 1000 instead of 10k, as it was taking too lon
"""