
# INTRODUCTION TO LSTM

The Long Short-Term Memory, or LSTM, network is a type of Recurrent Neural Network (RNN) designed for sequence problems.

Sequence prediction is different to other types of supervised learning problems. The sequence imposes an order on the observations that must be preserved when training models and making predictions.

Given a standard feedforward Multilayer Perceptron network, an RNN can be thought of as the addition of loops to the architecture. The recurrent connections add state or memory to the network and allow it to learn and harness the ordered nature of observations within input sequences.


**LSTM architectures**:

* Vanilla LSTM. Memory cells of a single LSTM layer are used in a simple network structure.
* Stacked LSTM. LSTM layers are stacked one on top of another into deep networks.
* CNN LSTM. A convolutional neural network is used to learn features in spatial input
like images and the LSTM can be used to support a sequence of images as input or
generate a sequence in response to an image.
* Encoder-Decoder LSTM. One LSTM network encodes input sequences and a separate
LSTM network decodes the encoding into an output sequence.
* Bidirectional LSTM. Input sequences are presented and learned both forward and
backward.
* Generative LSTM. LSTMs learn the structure relationship in input sequences so well
that they can generate new plausible sequences.

In [2]:
## Sequence Prediction Problem

In [3]:
from random import randint
from numpy import array
from numpy import argmax
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense


Using TensorFlow backend.


In [8]:
# generate a sequence of random numbers in [0, n_features)
def generate_sequence(length, n_features):
    return [randint(0, n_features-1) for _ in range(length)]

# one hot encode sequence
def one_hot_encode(sequence, n_features):
    encoding = list()
    for value in sequence:
        vector = [0 for _ in range(n_features)]
        vector[value] = 1
        encoding.append(vector)
    return array(encoding)

# decode a one hot encoded string
def one_hot_decode(encoded_seq):
    return [argmax(vector) for vector in encoded_seq]

# generate one example for an lstm
def generate_example(length, n_features, out_index):
    # generate sequence
    sequence = generate_sequence(length, n_features)
    # one hot encode
    encoded = one_hot_encode(sequence, n_features)
    # reshape sequence to be 3D
    X = encoded.reshape((1, length, n_features))
    # select output
    y = encoded[out_index].reshape(1, n_features)
    return X, y





In [10]:
# define model
length = 5
n_features = 10
out_index = 2
model = Sequential()
model.add(LSTM(25, input_shape=(length, n_features)))
model.add(Dense(n_features, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 25)                3600      
_________________________________________________________________
dense_1 (Dense)              (None, 10)                260       
Total params: 3,860
Trainable params: 3,860
Non-trainable params: 0
_________________________________________________________________


In [20]:
    # generate sequence
sequence = generate_sequence(length, n_features)
sequence

[2, 5, 6, 1, 2]

In [None]:
sequence

In [26]:
# one hot encode
encoded = one_hot_encode(sequence, n_features)
encoded
encoded.shape

(5, 10)

In [27]:
# reshape sequence to be 3D
X = encoded.reshape((1, length, n_features))
X.shape

(1, 5, 10)

In [31]:
encoded[out_index]

array([0, 0, 0, 0, 0, 0, 1, 0, 0, 0])

In [29]:
# select output
y = encoded[out_index].reshape(1, n_features)
y

array([[0, 0, 0, 0, 0, 0, 1, 0, 0, 0]])

In [13]:
# fit model
for i in range(10000):
    X, y = generate_example(length, n_features, out_index)

In [14]:
X

array([[[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
        [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]]])

In [15]:
y

array([[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]])

In [32]:
# fit model
for i in range(10000):
    X, y = generate_example(length, n_features, out_index)
    model.fit(X, y, epochs=1, verbose=2)


Epoch 1/1
 - 1s - loss: 2.2611 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.3000 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.3596 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.3458 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.2588 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.3023 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.2598 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.2231 - acc: 1.0000
Epoch 1/1
 - 0s - loss: 2.3069 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.3441 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.2699 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.3367 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.2631 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.2588 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.3163 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.3245 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.2282 - acc: 1.0000
Epoch 1/1
 - 0s - loss: 2.3229 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.2851 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.4349 - acc: 0.0000e+00
Epoch 1/1
 - 0s - loss: 2.27

In [34]:

# evaluate model
correct = 0
for i in range(100):
    X, y = generate_example(length, n_features, out_index)

    yhat = model.predict(X)
    if one_hot_decode(yhat) == one_hot_decode(y):
        correct += 1
print('Accuracy: %f' % ((correct/100.0)*100.0))



Accuracy: 100.000000
Sequence: [[4, 6, 6, 9, 0]]
Expected: [6]
Predicted: [6]


In [35]:
# prediction on new data
X, y = generate_example(length, n_features, out_index)
yhat = model.predict(X)
print('Sequence: %s' % [one_hot_decode(x) for x in X])
print('Expected: %s' % one_hot_decode(y))
print('Predicted: %s' % one_hot_decode(yhat))

Sequence: [[4, 3, 6, 6, 6]]
Expected: [6]
Predicted: [6]
