# Naive LSTM for a Three-Char Time Step Window to One-Char Mapping

In Keras, the intended use of LSTMs is to provide context in the form of time steps, rather than windowed features like with other network types.

We can take our first example and simply change the sequence length from 1 to 3.

In [1]:
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils

# fix random seed for reproducibility
numpy.random.seed(7)

# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))

# prepare the dataset of input to output pairs encoded as integers
seq_length = 3
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
    seq_in = alphabet[i:i + seq_length]
    seq_out = alphabet[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])
    print(seq_in, '->', seq_out)

Using TensorFlow backend.


ABC -> D
BCD -> E
CDE -> F
DEF -> G
EFG -> H
FGH -> I
GHI -> J
HIJ -> K
IJK -> L
JKL -> M
KLM -> N
LMN -> O
MNO -> P
NOP -> Q
OPQ -> R
PQR -> S
QRS -> T
RST -> U
STU -> V
TUV -> W
UVW -> X
VWX -> Y
WXY -> Z


The difference is that the reshaping of the input data takes the sequence as a time step sequence of one feature, rather than a single time step of multiple features.

In [5]:
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), seq_length, 1))
# normalize
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

In [6]:
samples, time_steps, features = X.shape
print("Samples: ", samples)
print("Time Steps: ", time_steps)
print("Features: ", features)

Samples:  23
Time Steps:  3
Features:  1


This is the correct intended use of providing sequence context to your LSTM in Keras. The full code example is provided below for completeness.

In [7]:
# create and fit the model
model = Sequential()
model.add(LSTM(32, input_shape=(time_steps, features)))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=1, verbose=2)

Epoch 1/500
 - 0s - loss: 3.2701 - acc: 0.0000e+00
Epoch 2/500
 - 0s - loss: 3.2547 - acc: 0.0000e+00
Epoch 3/500
 - 0s - loss: 3.2465 - acc: 0.0000e+00
Epoch 4/500
 - 0s - loss: 3.2392 - acc: 0.0435
Epoch 5/500
 - 0s - loss: 3.2313 - acc: 0.0435
Epoch 6/500
 - 0s - loss: 3.2229 - acc: 0.0435
Epoch 7/500
 - 0s - loss: 3.2141 - acc: 0.0435
Epoch 8/500
 - 0s - loss: 3.2050 - acc: 0.0435
Epoch 9/500
 - 0s - loss: 3.1938 - acc: 0.0435
Epoch 10/500
 - 0s - loss: 3.1820 - acc: 0.0435
Epoch 11/500
 - 0s - loss: 3.1699 - acc: 0.0435
Epoch 12/500
 - 0s - loss: 3.1551 - acc: 0.0435
Epoch 13/500
 - 0s - loss: 3.1394 - acc: 0.0435
Epoch 14/500
 - 0s - loss: 3.1236 - acc: 0.0435
Epoch 15/500
 - 0s - loss: 3.1059 - acc: 0.0435
Epoch 16/500
 - 0s - loss: 3.0883 - acc: 0.0000e+00
Epoch 17/500
 - 0s - loss: 3.0723 - acc: 0.0000e+00
Epoch 18/500
 - 0s - loss: 3.0585 - acc: 0.0435
Epoch 19/500
 - 0s - loss: 3.0382 - acc: 0.0435
Epoch 20/500
 - 0s - loss: 3.0213 - acc: 0.0870
Epoch 21/500
 - 0s - loss: 3.

Epoch 170/500
 - 0s - loss: 1.0999 - acc: 0.9130
Epoch 171/500
 - 0s - loss: 1.0898 - acc: 0.9130
Epoch 172/500
 - 0s - loss: 1.0914 - acc: 0.8696
Epoch 173/500
 - 0s - loss: 1.0792 - acc: 0.9130
Epoch 174/500
 - 0s - loss: 1.0835 - acc: 0.8696
Epoch 175/500
 - 0s - loss: 1.0723 - acc: 0.8261
Epoch 176/500
 - 0s - loss: 1.0644 - acc: 0.8696
Epoch 177/500
 - 0s - loss: 1.0609 - acc: 0.9130
Epoch 178/500
 - 0s - loss: 1.0622 - acc: 0.8696
Epoch 179/500
 - 0s - loss: 1.0459 - acc: 0.9130
Epoch 180/500
 - 0s - loss: 1.0428 - acc: 0.9130
Epoch 181/500
 - 0s - loss: 1.0419 - acc: 0.8696
Epoch 182/500
 - 0s - loss: 1.0321 - acc: 0.8696
Epoch 183/500
 - 0s - loss: 1.0279 - acc: 0.9130
Epoch 184/500
 - 0s - loss: 1.0275 - acc: 0.9130
Epoch 185/500
 - 0s - loss: 1.0183 - acc: 0.8696
Epoch 186/500
 - 0s - loss: 1.0132 - acc: 0.8696
Epoch 187/500
 - 0s - loss: 1.0070 - acc: 0.9130
Epoch 188/500
 - 0s - loss: 1.0042 - acc: 0.9565
Epoch 189/500
 - 0s - loss: 1.0001 - acc: 0.9565
Epoch 190/500
 - 0s 

 - 0s - loss: 0.4658 - acc: 0.9565
Epoch 338/500
 - 0s - loss: 0.4652 - acc: 0.9565
Epoch 339/500
 - 0s - loss: 0.4693 - acc: 1.0000
Epoch 340/500
 - 0s - loss: 0.4642 - acc: 0.9565
Epoch 341/500
 - 0s - loss: 0.4570 - acc: 1.0000
Epoch 342/500
 - 0s - loss: 0.4567 - acc: 1.0000
Epoch 343/500
 - 0s - loss: 0.4552 - acc: 1.0000
Epoch 344/500
 - 0s - loss: 0.4534 - acc: 0.9565
Epoch 345/500
 - 0s - loss: 0.4456 - acc: 1.0000
Epoch 346/500
 - 0s - loss: 0.4427 - acc: 1.0000
Epoch 347/500
 - 0s - loss: 0.4445 - acc: 0.9565
Epoch 348/500
 - 0s - loss: 0.4383 - acc: 1.0000
Epoch 349/500
 - 0s - loss: 0.4458 - acc: 1.0000
Epoch 350/500
 - 0s - loss: 0.4383 - acc: 1.0000
Epoch 351/500
 - 0s - loss: 0.4329 - acc: 1.0000
Epoch 352/500
 - 0s - loss: 0.4347 - acc: 0.9565
Epoch 353/500
 - 0s - loss: 0.4294 - acc: 0.9565
Epoch 354/500
 - 0s - loss: 0.4307 - acc: 1.0000
Epoch 355/500
 - 0s - loss: 0.4279 - acc: 1.0000
Epoch 356/500
 - 0s - loss: 0.4264 - acc: 1.0000
Epoch 357/500
 - 0s - loss: 0.4239

<keras.callbacks.History at 0x11d17c908>

In [8]:
# summarize performance of the model
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))

Model Accuracy: 100.00%


In [12]:
# demonstrate some model predictions
for pattern in dataX:
    x = numpy.reshape(pattern, (1, len(pattern), 1))
    x = x / float(len(alphabet))
    prediction = model.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    result = int_to_char[index]
    seq_in = "".join([int_to_char[value] for value in pattern])
    print(seq_in, "->", result)

ABC -> D
BCD -> E
CDE -> F
DEF -> G
EFG -> H
FGH -> I
GHI -> J
HIJ -> K
IJK -> L
JKL -> M
KLM -> N
LMN -> O
MNO -> P
NOP -> Q
OPQ -> R
PQR -> S
QRS -> T
RST -> U
STU -> V
TUV -> W
UVW -> X
VWX -> Y
WXY -> Z


We can see that the model learns the problem perfectly as evidenced by the model evaluation and the example predictions.

But it has learned a simpler problem. Specifically, it has learned to predict the next letter from a sequence of three letters in the alphabet. It can be shown any random sequence of three letters from the alphabet and predict the next letter.

It can not actually enumerate the alphabet. I expect that a larger enough multilayer perception network might be able to learn the same mapping using the window method.

The LSTM networks are stateful. They should be able to learn the whole alphabet sequence, but by default the Keras implementation resets the network state after each training batch.