[Reference](http://philipperemy.github.io/keras-stateful-lstm/)

## Questions and Answers

* I’m given a big sequence (e.g. Time Series) and I split it into smaller sequences to construct my input matrix X. Is it possible that the LSTM may find dependencies between the sequences? 

    No it’s not possible unless you go for the stateful LSTM. Most of the problems can be solved with stateless LSTM so if you go for the stateful mode, make sure you really need it. In stateless mode, long term memory does not mean that the LSTM will remember the content of the previous batches.
    


* Why do we make the difference between stateless and stateful LSTM in Keras

    A LSTM has cells and is therefore stateful by definition (not the same stateful meaning as used in Keras). Fabien Chollet gives this definition of statefulness: 
    stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch. 
    
    Said differently, whenever you train or test your LSTM, you first have to build your input matrix X of shape nb_samples, timesteps, input_dim where your batch size divides nb_samples. For instance, if nb_samples=1024 and batch_size=64, it means that your model will receive blocks of 64 samples, compute each output (whatever the number of timesteps is for every sample), average the gradients and propagate it to update the parameters vector. 

    By default, Keras shuffles (permutes) the samples in X and the dependencies between Xi and Xi+1 are lost. Let’s assume there’s no shuffling in our explanation. 

     If the model is stateless, the cell states are reset at each sequence. With the stateful model, all the states are propagated to the next batch. It means that the state of the sample located at index i, Xi will be used in the computation of the sample Xi+bs in the next batch, where bs is the batch size (no shuffling).
     

     
* Said differently, whenever you train or test your LSTM, you first have to build your input matrix X of shape nb_samples, timesteps, input_dim where your batch size divides nb_samples. For instance, if nb_samples=1024 and batch_size=64, it means that your model will receive blocks of 64 samples, compute each output (whatever the number of timesteps is for every sample), average the gradients and propagate it to update the parameters vector. 



* Why do Keras require the batch size in stateful mode? 

    When the model is stateless, Keras allocates an array for the states of size output_dim (understand number of cells in your LSTM). At each sequence processing, this state array is reset. 

    In Stateful model, Keras must propagate the previous states for each sample across the batches. Referring to the explanation above, a sample at index i in batch #1 (Xi+bs) will know the states of the sample i in batch #0 (Xi). In this case, the structure to store the states is of the shape (batch_size, output_dim). This is the reason why you have to specify the batch size at the creation of the LSTM. If you don’t do so, Keras may raise an error to remind you: If a RNN is stateful, a complete input_shape must be provided (including batch size).


[Reference](https://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/)

## Target

* How to develop a naive LSTM network for a sequence prediction problem.
* How to carefully manage state through batches and features with an LSTM network.
* Hot to manually manage state in an LSTM network for stateful prediction.

## Problem Description: Learn the Alphabet

In this tutorial we are going to develop and contrast a number of different LSTM recurrent neural network models.

The context of these comparisons will be a simple sequence prediction problem of learning the alphabet. That is, given a letter of the alphabet, predict the next letter of the alphabet.

This is a simple sequence prediction problem that once understood can be generalized to other sequence prediction problems like time series prediction and sequence classification.

Let’s prepare the problem with some python code that we can reuse from example to example.

Firstly, let’s import all of the classes and functions we plan to use in this tutorial.

In [None]:
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils

In [None]:
# fix random seed for reproducibility
numpy.random.seed(7)

We can now define our dataset, the alphabet. We define the alphabet in uppercase characters for readability.

Neural networks model numbers, so we need to map the letters of the alphabet to integer values. We can do this easily by creating a dictionary (map) of the letter index to the character. We can also create a reverse lookup for converting predictions back into characters to be used later.

In [None]:
# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))

Now we need to create our input and output pairs on which to train our neural network. We can do this by defining an input sequence length, then reading sequences from the input alphabet sequence.

For example we use an input length of 1. Starting at the beginning of the raw input data, we can read off the first letter “A” and the next letter as the prediction “B”. We move along one character and repeat until we reach a prediction of “Z”.

In [None]:
# prepare the dataset of input to output pairs encoded as integers
seq_length = 1
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
	seq_in = alphabet[i:i + seq_length]
	seq_out = alphabet[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
dd	print(seq_in, '->', seq_out)

We need to reshape the NumPy array into a format expected by the LSTM networks, that is [samples, time steps, features].



In [None]:
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), seq_length, 1))
X.shape

In [None]:
# normalize
X = X / float(len(alphabet))

In [None]:
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

## Naive LSTM for Learning One-Char to One-Char Mapping

Let’s start off by designing a simple LSTM to learn how to predict the next character in the alphabet given the context of just one character.

We will frame the problem as a random collection of one-letter input to one-letter output pairs. As we will see this is a difficult framing of the problem for the LSTM to learn.

Let’s define an LSTM network with 32 units and an output layer with a softmax activation function for making predictions. Because this is a multi-class classification problem, we can use the log loss function (called “categorical_crossentropy” in Keras), and optimize the network using the ADAM optimization function.

The model is fit over 500 epochs with a batch size of 1.

In [None]:
# Naive LSTM to learn one-char to one-char mapping
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
# fix random seed for reproducibility
numpy.random.seed(7)
# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
# prepare the dataset of input to output pairs encoded as integers
seq_length = 1
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
	seq_in = alphabet[i:i + seq_length]
	seq_out = alphabet[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
	print(seq_in, '->', seq_out)
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), seq_length, 1))
# normalize
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# create and fit the model
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=1, verbose=2)
# summarize performance of the model
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))
# demonstrate some model predictions
for pattern in dataX:
	x = numpy.reshape(pattern, (1, len(pattern), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	print(seq_in, "->", result)

We can see that this problem is indeed difficult for the network to learn.

The reason is, the poor LSTM units do not have any context to work with. Each input-output pattern is shown to the network in a random order and the state of the network is reset after each pattern (each batch where each batch contains one pattern).

This is abuse of the LSTM network architecture, treating it like a standard multilayer Perceptron.

Next, let’s try a different framing of the problem in order to provide more sequence to the network from which to learn.

## Naive LSTM for a Three-Char Feature Window to One-Char Mapping

A popular approach to adding more context to data for multilayer Perceptrons is to use the window method.

This is where previous steps in the sequence are provided as additional input features to the network. We can try the same trick to provide more context to the LSTM network.

Here, we increase the sequence length from 1 to 3, for example:

In [None]:
# Here, we increase the sequence length from 1 to 3, for example:

# prepare the dataset of input to output pairs encoded as integers
seq_length = 3


Each element in the sequence is then provided as a new input feature to the network. This requires a modification of how the input sequences reshaped in the data preparation step:

In [None]:
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), 1, seq_length))

In [None]:
# Naive LSTM to learn three-char window to one-char mapping
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
# fix random seed for reproducibility
numpy.random.seed(7)
# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
# prepare the dataset of input to output pairs encoded as integers
seq_length = 3
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
	seq_in = alphabet[i:i + seq_length]
	seq_out = alphabet[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
	print(seq_in, '->', seq_out)
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), 1, seq_length))
# normalize
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# create and fit the model
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=1, verbose=2)
# summarize performance of the model
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))
# demonstrate some model predictions
for pattern in dataX:
	x = numpy.reshape(pattern, (1, 1, len(pattern)))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	print(seq_in, "->", result)

We can see a small lift in performance that may or may not be real. This is a simple problem that we were still not able to learn with LSTMs even with the window method.

Again, this is a misuse of the LSTM network by a poor framing of the problem. Indeed, the sequences of letters are time steps of one feature rather than one time step of separate features. We have given more context to the network, but not more sequence as it expected.

In the next section, we will give more context to the network in the form of time steps.

## Naive LSTM for a Three-Char Time Step Window to One-Char Mapping


The difference is that the reshaping of the input data takes the sequence as a time step sequence of one feature, rather than a single time step of multiple features.

In [None]:
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), seq_length, 1))

This is the correct intended use of providing sequence context to your LSTM in Keras. The full code example is provided below for completeness.

In [None]:
# Naive LSTM to learn three-char time steps to one-char mapping
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
# fix random seed for reproducibility
numpy.random.seed(7)
# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
# prepare the dataset of input to output pairs encoded as integers
seq_length = 3
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
	seq_in = alphabet[i:i + seq_length]
	seq_out = alphabet[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
	print(seq_in, '->', seq_out)
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), seq_length, 1))
# normalize
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# create and fit the model
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=1, verbose=0)
# summarize performance of the model
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))
# demonstrate some model predictions
for pattern in dataX:
	x = numpy.reshape(pattern, (1, len(pattern), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	print(seq_in, "->", result)

We can see that the model learns the problem perfectly as evidenced by the model evaluation and the example predictions.

But it has learned a simpler problem. Specifically, it has learned to predict the next letter from a sequence of three letters in the alphabet. It can be shown any random sequence of three letters from the alphabet and predict the next letter.

It can not actually enumerate the alphabet. I expect that a larger enough multilayer perception network might be able to learn the same mapping using the window method.

The LSTM networks are stateful. They should be able to learn the whole alphabet sequence, but by default the Keras implementation resets the network state after each training batch.

## LSTM State Within A Batch|

The Keras implementation of LSTMs resets the state of the network after each batch.

This suggests that if we had a batch size large enough to hold all input patterns and if all the input patterns were ordered sequentially, that the LSTM could use the context of the sequence within the batch to better learn the sequence.

We can demonstrate this easily by modifying the first example for learning a one-to-one mapping and increasing the batch size from 1 to the size of the training dataset.

Additionally, Keras shuffles the training dataset before each training epoch. To ensure the training data patterns remain sequential, we can disable this shuffling.

In [None]:
model.fit(X, y, epochs=5000, batch_size=len(dataX), verbose=2, shuffle=False)

The network will learn the mapping of characters using the the within-batch sequence, but this context will not be available to the network when making predictions. We can evaluate both the ability of the network to make predictions randomly and in sequence.

The full code example is provided below for completeness.

In [48]:
len(dataX)

25

In [42]:
# Naive LSTM to learn one-char to one-char mapping with all data in each batch
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
from keras.preprocessing.sequence import pad_sequences
# fix random seed for reproducibility
numpy.random.seed(7)
# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
# prepare the dataset of input to output pairs encoded as integers
seq_length = 1
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
	seq_in = alphabet[i:i + seq_length]
	seq_out = alphabet[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
	print(seq_in, '->', seq_out)
# convert list of lists to array and pad sequences if needed
X = pad_sequences(dataX, maxlen=seq_length, dtype='float32')
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (X.shape[0], seq_length, 1))
# normalize
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# create and fit the model
model = Sequential()
model.add(LSTM(16, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=5000, batch_size=len(dataX), verbose=2, shuffle=False)
# summarize performance of the model
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))
# demonstrate some model predictions
for pattern in dataX:
	x = numpy.reshape(pattern, (1, len(pattern), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	print(seq_in, "->", result)
# demonstrate predicting random patterns
print("Test a Random Pattern:")
for i in range(0,20):
	pattern_index = numpy.random.randint(len(dataX))
	pattern = dataX[pattern_index]
	x = numpy.reshape(pattern, (1, len(pattern), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	print(seq_in, "->", result)

A -> B
B -> C
C -> D
D -> E
E -> F
F -> G
G -> H
H -> I
I -> J
J -> K
K -> L
L -> M
M -> N
N -> O
O -> P
P -> Q
Q -> R
R -> S
S -> T
T -> U
U -> V
V -> W
W -> X
X -> Y
Y -> Z
Epoch 1/5000
 - 1s - loss: 3.2569 - acc: 0.0400
Epoch 2/5000
 - 0s - loss: 3.2566 - acc: 0.0400
Epoch 3/5000
 - 0s - loss: 3.2564 - acc: 0.0400
Epoch 4/5000
 - 0s - loss: 3.2561 - acc: 0.0400
Epoch 5/5000
 - 0s - loss: 3.2558 - acc: 0.0400
Epoch 6/5000
 - 0s - loss: 3.2555 - acc: 0.0400
Epoch 7/5000
 - 0s - loss: 3.2552 - acc: 0.0400
Epoch 8/5000
 - 0s - loss: 3.2549 - acc: 0.0400
Epoch 9/5000
 - 0s - loss: 3.2547 - acc: 0.0400
Epoch 10/5000
 - 0s - loss: 3.2544 - acc: 0.0400
Epoch 11/5000
 - 0s - loss: 3.2541 - acc: 0.0400
Epoch 12/5000
 - 0s - loss: 3.2538 - acc: 0.0400
Epoch 13/5000
 - 0s - loss: 3.2535 - acc: 0.0400
Epoch 14/5000
 - 0s - loss: 3.2533 - acc: 0.0400
Epoch 15/5000
 - 0s - loss: 3.2530 - acc: 0.0400
Epoch 16/5000
 - 0s - loss: 3.2527 - acc: 0.0400
Epoch 17/5000
 - 0s - loss: 3.2524 - acc: 0.0400
E

Epoch 164/5000
 - 0s - loss: 3.1706 - acc: 0.0800
Epoch 165/5000
 - 0s - loss: 3.1695 - acc: 0.0800
Epoch 166/5000
 - 0s - loss: 3.1685 - acc: 0.0800
Epoch 167/5000
 - 0s - loss: 3.1674 - acc: 0.0800
Epoch 168/5000
 - 0s - loss: 3.1663 - acc: 0.0800
Epoch 169/5000
 - 0s - loss: 3.1653 - acc: 0.0800
Epoch 170/5000
 - 0s - loss: 3.1642 - acc: 0.0800
Epoch 171/5000
 - 0s - loss: 3.1631 - acc: 0.0800
Epoch 172/5000
 - 0s - loss: 3.1620 - acc: 0.0800
Epoch 173/5000
 - 0s - loss: 3.1609 - acc: 0.0800
Epoch 174/5000
 - 0s - loss: 3.1598 - acc: 0.0800
Epoch 175/5000
 - 0s - loss: 3.1586 - acc: 0.0800
Epoch 176/5000
 - 0s - loss: 3.1575 - acc: 0.0800
Epoch 177/5000
 - 0s - loss: 3.1564 - acc: 0.0800
Epoch 178/5000
 - 0s - loss: 3.1552 - acc: 0.0800
Epoch 179/5000
 - 0s - loss: 3.1540 - acc: 0.0800
Epoch 180/5000
 - 0s - loss: 3.1529 - acc: 0.0800
Epoch 181/5000
 - 0s - loss: 3.1517 - acc: 0.0800
Epoch 182/5000
 - 0s - loss: 3.1505 - acc: 0.0800
Epoch 183/5000
 - 0s - loss: 3.1493 - acc: 0.0800


Epoch 328/5000
 - 0s - loss: 2.9164 - acc: 0.2400
Epoch 329/5000
 - 0s - loss: 2.9145 - acc: 0.2400
Epoch 330/5000
 - 0s - loss: 2.9127 - acc: 0.2400
Epoch 331/5000
 - 0s - loss: 2.9109 - acc: 0.2400
Epoch 332/5000
 - 0s - loss: 2.9091 - acc: 0.2400
Epoch 333/5000
 - 0s - loss: 2.9072 - acc: 0.2400
Epoch 334/5000
 - 0s - loss: 2.9054 - acc: 0.2400
Epoch 335/5000
 - 0s - loss: 2.9036 - acc: 0.2400
Epoch 336/5000
 - 0s - loss: 2.9018 - acc: 0.2400
Epoch 337/5000
 - 0s - loss: 2.9000 - acc: 0.2400
Epoch 338/5000
 - 0s - loss: 2.8981 - acc: 0.2400
Epoch 339/5000
 - 0s - loss: 2.8963 - acc: 0.2400
Epoch 340/5000
 - 0s - loss: 2.8945 - acc: 0.2400
Epoch 341/5000
 - 0s - loss: 2.8927 - acc: 0.2400
Epoch 342/5000
 - 0s - loss: 2.8909 - acc: 0.2400
Epoch 343/5000
 - 0s - loss: 2.8891 - acc: 0.2400
Epoch 344/5000
 - 0s - loss: 2.8872 - acc: 0.2400
Epoch 345/5000
 - 0s - loss: 2.8854 - acc: 0.2400
Epoch 346/5000
 - 0s - loss: 2.8836 - acc: 0.2400
Epoch 347/5000
 - 0s - loss: 2.8818 - acc: 0.2400


Epoch 492/5000
 - 0s - loss: 2.6445 - acc: 0.2400
Epoch 493/5000
 - 0s - loss: 2.6431 - acc: 0.2400
Epoch 494/5000
 - 0s - loss: 2.6417 - acc: 0.2400
Epoch 495/5000
 - 0s - loss: 2.6403 - acc: 0.2400
Epoch 496/5000
 - 0s - loss: 2.6389 - acc: 0.2400
Epoch 497/5000
 - 0s - loss: 2.6375 - acc: 0.2400
Epoch 498/5000
 - 0s - loss: 2.6360 - acc: 0.2400
Epoch 499/5000
 - 0s - loss: 2.6346 - acc: 0.2400
Epoch 500/5000
 - 0s - loss: 2.6332 - acc: 0.2400
Epoch 501/5000
 - 0s - loss: 2.6318 - acc: 0.2400
Epoch 502/5000
 - 0s - loss: 2.6304 - acc: 0.2400
Epoch 503/5000
 - 0s - loss: 2.6290 - acc: 0.2400
Epoch 504/5000
 - 0s - loss: 2.6276 - acc: 0.2400
Epoch 505/5000
 - 0s - loss: 2.6262 - acc: 0.2400
Epoch 506/5000
 - 0s - loss: 2.6249 - acc: 0.2400
Epoch 507/5000
 - 0s - loss: 2.6235 - acc: 0.2400
Epoch 508/5000
 - 0s - loss: 2.6221 - acc: 0.2400
Epoch 509/5000
 - 0s - loss: 2.6207 - acc: 0.2400
Epoch 510/5000
 - 0s - loss: 2.6193 - acc: 0.2400
Epoch 511/5000
 - 0s - loss: 2.6179 - acc: 0.2400


Epoch 656/5000
 - 0s - loss: 2.4402 - acc: 0.2000
Epoch 657/5000
 - 0s - loss: 2.4391 - acc: 0.2000
Epoch 658/5000
 - 0s - loss: 2.4380 - acc: 0.2000
Epoch 659/5000
 - 0s - loss: 2.4369 - acc: 0.2000
Epoch 660/5000
 - 0s - loss: 2.4357 - acc: 0.2000
Epoch 661/5000
 - 0s - loss: 2.4346 - acc: 0.2000
Epoch 662/5000
 - 0s - loss: 2.4335 - acc: 0.2000
Epoch 663/5000
 - 0s - loss: 2.4324 - acc: 0.2000
Epoch 664/5000
 - 0s - loss: 2.4313 - acc: 0.2000
Epoch 665/5000
 - 0s - loss: 2.4302 - acc: 0.2000
Epoch 666/5000
 - 0s - loss: 2.4291 - acc: 0.2000
Epoch 667/5000
 - 0s - loss: 2.4280 - acc: 0.2000
Epoch 668/5000
 - 0s - loss: 2.4270 - acc: 0.2000
Epoch 669/5000
 - 0s - loss: 2.4259 - acc: 0.2000
Epoch 670/5000
 - 0s - loss: 2.4248 - acc: 0.2000
Epoch 671/5000
 - 0s - loss: 2.4237 - acc: 0.2000
Epoch 672/5000
 - 0s - loss: 2.4226 - acc: 0.2000
Epoch 673/5000
 - 0s - loss: 2.4215 - acc: 0.2000
Epoch 674/5000
 - 0s - loss: 2.4204 - acc: 0.2000
Epoch 675/5000
 - 0s - loss: 2.4193 - acc: 0.2000


Epoch 820/5000
 - 0s - loss: 2.2720 - acc: 0.2800
Epoch 821/5000
 - 0s - loss: 2.2710 - acc: 0.2800
Epoch 822/5000
 - 0s - loss: 2.2701 - acc: 0.2800
Epoch 823/5000
 - 0s - loss: 2.2691 - acc: 0.2800
Epoch 824/5000
 - 0s - loss: 2.2682 - acc: 0.2800
Epoch 825/5000
 - 0s - loss: 2.2672 - acc: 0.2800
Epoch 826/5000
 - 0s - loss: 2.2663 - acc: 0.2800
Epoch 827/5000
 - 0s - loss: 2.2653 - acc: 0.2800
Epoch 828/5000
 - 0s - loss: 2.2643 - acc: 0.2800
Epoch 829/5000
 - 0s - loss: 2.2634 - acc: 0.2800
Epoch 830/5000
 - 0s - loss: 2.2625 - acc: 0.2800
Epoch 831/5000
 - 0s - loss: 2.2615 - acc: 0.2800
Epoch 832/5000
 - 0s - loss: 2.2606 - acc: 0.2800
Epoch 833/5000
 - 0s - loss: 2.2596 - acc: 0.2800
Epoch 834/5000
 - 0s - loss: 2.2587 - acc: 0.2800
Epoch 835/5000
 - 0s - loss: 2.2577 - acc: 0.2800
Epoch 836/5000
 - 0s - loss: 2.2568 - acc: 0.2800
Epoch 837/5000
 - 0s - loss: 2.2558 - acc: 0.2800
Epoch 838/5000
 - 0s - loss: 2.2549 - acc: 0.2800
Epoch 839/5000
 - 0s - loss: 2.2540 - acc: 0.2800


Epoch 984/5000
 - 0s - loss: 2.1288 - acc: 0.5600
Epoch 985/5000
 - 0s - loss: 2.1280 - acc: 0.5600
Epoch 986/5000
 - 0s - loss: 2.1272 - acc: 0.5600
Epoch 987/5000
 - 0s - loss: 2.1264 - acc: 0.5600
Epoch 988/5000
 - 0s - loss: 2.1256 - acc: 0.5600
Epoch 989/5000
 - 0s - loss: 2.1249 - acc: 0.5600
Epoch 990/5000
 - 0s - loss: 2.1241 - acc: 0.5600
Epoch 991/5000
 - 0s - loss: 2.1233 - acc: 0.5600
Epoch 992/5000
 - 0s - loss: 2.1225 - acc: 0.5600
Epoch 993/5000
 - 0s - loss: 2.1217 - acc: 0.5600
Epoch 994/5000
 - 0s - loss: 2.1210 - acc: 0.5600
Epoch 995/5000
 - 0s - loss: 2.1202 - acc: 0.5600
Epoch 996/5000
 - 0s - loss: 2.1194 - acc: 0.5600
Epoch 997/5000
 - 0s - loss: 2.1186 - acc: 0.5600
Epoch 998/5000
 - 0s - loss: 2.1179 - acc: 0.5600
Epoch 999/5000
 - 0s - loss: 2.1171 - acc: 0.5600
Epoch 1000/5000
 - 0s - loss: 2.1163 - acc: 0.5600
Epoch 1001/5000
 - 0s - loss: 2.1155 - acc: 0.5600
Epoch 1002/5000
 - 0s - loss: 2.1148 - acc: 0.5600
Epoch 1003/5000
 - 0s - loss: 2.1140 - acc: 0.5

Epoch 1145/5000
 - 0s - loss: 2.0141 - acc: 0.7200
Epoch 1146/5000
 - 0s - loss: 2.0134 - acc: 0.7200
Epoch 1147/5000
 - 0s - loss: 2.0127 - acc: 0.7200
Epoch 1148/5000
 - 0s - loss: 2.0121 - acc: 0.7200
Epoch 1149/5000
 - 0s - loss: 2.0114 - acc: 0.7200
Epoch 1150/5000
 - 0s - loss: 2.0108 - acc: 0.7200
Epoch 1151/5000
 - 0s - loss: 2.0101 - acc: 0.7200
Epoch 1152/5000
 - 0s - loss: 2.0094 - acc: 0.7200
Epoch 1153/5000
 - 0s - loss: 2.0088 - acc: 0.7200
Epoch 1154/5000
 - 0s - loss: 2.0081 - acc: 0.7200
Epoch 1155/5000
 - 0s - loss: 2.0075 - acc: 0.7200
Epoch 1156/5000
 - 0s - loss: 2.0068 - acc: 0.7200
Epoch 1157/5000
 - 0s - loss: 2.0062 - acc: 0.7200
Epoch 1158/5000
 - 0s - loss: 2.0055 - acc: 0.7200
Epoch 1159/5000
 - 0s - loss: 2.0049 - acc: 0.7200
Epoch 1160/5000
 - 0s - loss: 2.0042 - acc: 0.7200
Epoch 1161/5000
 - 0s - loss: 2.0036 - acc: 0.7200
Epoch 1162/5000
 - 0s - loss: 2.0029 - acc: 0.7200
Epoch 1163/5000
 - 0s - loss: 2.0023 - acc: 0.7200
Epoch 1164/5000
 - 0s - loss: 2

Epoch 1306/5000
 - 0s - loss: 1.9139 - acc: 0.7200
Epoch 1307/5000
 - 0s - loss: 1.9134 - acc: 0.7200
Epoch 1308/5000
 - 0s - loss: 1.9128 - acc: 0.7200
Epoch 1309/5000
 - 0s - loss: 1.9122 - acc: 0.7200
Epoch 1310/5000
 - 0s - loss: 1.9116 - acc: 0.6800
Epoch 1311/5000
 - 0s - loss: 1.9110 - acc: 0.6800
Epoch 1312/5000
 - 0s - loss: 1.9104 - acc: 0.6800
Epoch 1313/5000
 - 0s - loss: 1.9099 - acc: 0.6800
Epoch 1314/5000
 - 0s - loss: 1.9093 - acc: 0.6800
Epoch 1315/5000
 - 0s - loss: 1.9087 - acc: 0.6800
Epoch 1316/5000
 - 0s - loss: 1.9081 - acc: 0.6800
Epoch 1317/5000
 - 0s - loss: 1.9076 - acc: 0.6800
Epoch 1318/5000
 - 0s - loss: 1.9070 - acc: 0.6800
Epoch 1319/5000
 - 0s - loss: 1.9064 - acc: 0.6800
Epoch 1320/5000
 - 0s - loss: 1.9058 - acc: 0.6800
Epoch 1321/5000
 - 0s - loss: 1.9052 - acc: 0.6800
Epoch 1322/5000
 - 0s - loss: 1.9047 - acc: 0.6800
Epoch 1323/5000
 - 0s - loss: 1.9041 - acc: 0.6800
Epoch 1324/5000
 - 0s - loss: 1.9035 - acc: 0.6800
Epoch 1325/5000
 - 0s - loss: 1

Epoch 1467/5000
 - 0s - loss: 1.8244 - acc: 0.7600
Epoch 1468/5000
 - 0s - loss: 1.8239 - acc: 0.7600
Epoch 1469/5000
 - 0s - loss: 1.8233 - acc: 0.7600
Epoch 1470/5000
 - 0s - loss: 1.8228 - acc: 0.7600
Epoch 1471/5000
 - 0s - loss: 1.8223 - acc: 0.7600
Epoch 1472/5000
 - 0s - loss: 1.8218 - acc: 0.7600
Epoch 1473/5000
 - 0s - loss: 1.8213 - acc: 0.7600
Epoch 1474/5000
 - 0s - loss: 1.8208 - acc: 0.7600
Epoch 1475/5000
 - 0s - loss: 1.8202 - acc: 0.7600
Epoch 1476/5000
 - 0s - loss: 1.8197 - acc: 0.7600
Epoch 1477/5000
 - 0s - loss: 1.8192 - acc: 0.7600
Epoch 1478/5000
 - 0s - loss: 1.8187 - acc: 0.7600
Epoch 1479/5000
 - 0s - loss: 1.8182 - acc: 0.7600
Epoch 1480/5000
 - 0s - loss: 1.8177 - acc: 0.7600
Epoch 1481/5000
 - 0s - loss: 1.8172 - acc: 0.7600
Epoch 1482/5000
 - 0s - loss: 1.8166 - acc: 0.7600
Epoch 1483/5000
 - 0s - loss: 1.8161 - acc: 0.7600
Epoch 1484/5000
 - 0s - loss: 1.8156 - acc: 0.7600
Epoch 1485/5000
 - 0s - loss: 1.8151 - acc: 0.7600
Epoch 1486/5000
 - 0s - loss: 1

Epoch 1628/5000
 - 0s - loss: 1.7461 - acc: 0.8400
Epoch 1629/5000
 - 0s - loss: 1.7456 - acc: 0.8400
Epoch 1630/5000
 - 0s - loss: 1.7452 - acc: 0.8400
Epoch 1631/5000
 - 0s - loss: 1.7447 - acc: 0.8400
Epoch 1632/5000
 - 0s - loss: 1.7443 - acc: 0.8400
Epoch 1633/5000
 - 0s - loss: 1.7438 - acc: 0.8400
Epoch 1634/5000
 - 0s - loss: 1.7434 - acc: 0.8400
Epoch 1635/5000
 - 0s - loss: 1.7430 - acc: 0.8400
Epoch 1636/5000
 - 0s - loss: 1.7425 - acc: 0.8400
Epoch 1637/5000
 - 0s - loss: 1.7421 - acc: 0.8400
Epoch 1638/5000
 - 0s - loss: 1.7416 - acc: 0.8400
Epoch 1639/5000
 - 0s - loss: 1.7412 - acc: 0.8400
Epoch 1640/5000
 - 0s - loss: 1.7407 - acc: 0.8400
Epoch 1641/5000
 - 0s - loss: 1.7403 - acc: 0.8400
Epoch 1642/5000
 - 0s - loss: 1.7398 - acc: 0.8400
Epoch 1643/5000
 - 0s - loss: 1.7394 - acc: 0.8400
Epoch 1644/5000
 - 0s - loss: 1.7389 - acc: 0.8400
Epoch 1645/5000
 - 0s - loss: 1.7385 - acc: 0.8400
Epoch 1646/5000
 - 0s - loss: 1.7380 - acc: 0.8400
Epoch 1647/5000
 - 0s - loss: 1

Epoch 1789/5000
 - 0s - loss: 1.6787 - acc: 0.8400
Epoch 1790/5000
 - 0s - loss: 1.6783 - acc: 0.8400
Epoch 1791/5000
 - 0s - loss: 1.6779 - acc: 0.8400
Epoch 1792/5000
 - 0s - loss: 1.6775 - acc: 0.8400
Epoch 1793/5000
 - 0s - loss: 1.6771 - acc: 0.8400
Epoch 1794/5000
 - 0s - loss: 1.6767 - acc: 0.8400
Epoch 1795/5000
 - 0s - loss: 1.6763 - acc: 0.8400
Epoch 1796/5000
 - 0s - loss: 1.6760 - acc: 0.8400
Epoch 1797/5000
 - 0s - loss: 1.6756 - acc: 0.8400
Epoch 1798/5000
 - 0s - loss: 1.6752 - acc: 0.8400
Epoch 1799/5000
 - 0s - loss: 1.6748 - acc: 0.8400
Epoch 1800/5000
 - 0s - loss: 1.6744 - acc: 0.8400
Epoch 1801/5000
 - 0s - loss: 1.6740 - acc: 0.8400
Epoch 1802/5000
 - 0s - loss: 1.6736 - acc: 0.8400
Epoch 1803/5000
 - 0s - loss: 1.6732 - acc: 0.8400
Epoch 1804/5000
 - 0s - loss: 1.6729 - acc: 0.8400
Epoch 1805/5000
 - 0s - loss: 1.6725 - acc: 0.8400
Epoch 1806/5000
 - 0s - loss: 1.6721 - acc: 0.8400
Epoch 1807/5000
 - 0s - loss: 1.6717 - acc: 0.8400
Epoch 1808/5000
 - 0s - loss: 1

Epoch 1950/5000
 - 0s - loss: 1.6187 - acc: 0.8400
Epoch 1951/5000
 - 0s - loss: 1.6183 - acc: 0.8400
Epoch 1952/5000
 - 0s - loss: 1.6180 - acc: 0.8400
Epoch 1953/5000
 - 0s - loss: 1.6176 - acc: 0.8400
Epoch 1954/5000
 - 0s - loss: 1.6173 - acc: 0.8400
Epoch 1955/5000
 - 0s - loss: 1.6169 - acc: 0.8400
Epoch 1956/5000
 - 0s - loss: 1.6166 - acc: 0.8400
Epoch 1957/5000
 - 0s - loss: 1.6162 - acc: 0.8400
Epoch 1958/5000
 - 0s - loss: 1.6159 - acc: 0.8400
Epoch 1959/5000
 - 0s - loss: 1.6155 - acc: 0.8400
Epoch 1960/5000
 - 0s - loss: 1.6151 - acc: 0.8400
Epoch 1961/5000
 - 0s - loss: 1.6148 - acc: 0.8400
Epoch 1962/5000
 - 0s - loss: 1.6144 - acc: 0.8400
Epoch 1963/5000
 - 0s - loss: 1.6141 - acc: 0.8400
Epoch 1964/5000
 - 0s - loss: 1.6137 - acc: 0.8400
Epoch 1965/5000
 - 0s - loss: 1.6134 - acc: 0.8400
Epoch 1966/5000
 - 0s - loss: 1.6130 - acc: 0.8400
Epoch 1967/5000
 - 0s - loss: 1.6127 - acc: 0.8400
Epoch 1968/5000
 - 0s - loss: 1.6123 - acc: 0.8400
Epoch 1969/5000
 - 0s - loss: 1

Epoch 2111/5000
 - 0s - loss: 1.5632 - acc: 0.8400
Epoch 2112/5000
 - 0s - loss: 1.5628 - acc: 0.8400
Epoch 2113/5000
 - 0s - loss: 1.5625 - acc: 0.8400
Epoch 2114/5000
 - 0s - loss: 1.5622 - acc: 0.8400
Epoch 2115/5000
 - 0s - loss: 1.5618 - acc: 0.8400
Epoch 2116/5000
 - 0s - loss: 1.5615 - acc: 0.8400
Epoch 2117/5000
 - 0s - loss: 1.5612 - acc: 0.8400
Epoch 2118/5000
 - 0s - loss: 1.5608 - acc: 0.8400
Epoch 2119/5000
 - 0s - loss: 1.5605 - acc: 0.8400
Epoch 2120/5000
 - 0s - loss: 1.5602 - acc: 0.8400
Epoch 2121/5000
 - 0s - loss: 1.5598 - acc: 0.8400
Epoch 2122/5000
 - 0s - loss: 1.5595 - acc: 0.8400
Epoch 2123/5000
 - 0s - loss: 1.5592 - acc: 0.8400
Epoch 2124/5000
 - 0s - loss: 1.5588 - acc: 0.8400
Epoch 2125/5000
 - 0s - loss: 1.5585 - acc: 0.8400
Epoch 2126/5000
 - 0s - loss: 1.5582 - acc: 0.8400
Epoch 2127/5000
 - 0s - loss: 1.5579 - acc: 0.8400
Epoch 2128/5000
 - 0s - loss: 1.5575 - acc: 0.8400
Epoch 2129/5000
 - 0s - loss: 1.5572 - acc: 0.8400
Epoch 2130/5000
 - 0s - loss: 1

Epoch 2272/5000
 - 0s - loss: 1.5114 - acc: 0.8800
Epoch 2273/5000
 - 0s - loss: 1.5111 - acc: 0.8800
Epoch 2274/5000
 - 0s - loss: 1.5108 - acc: 0.8800
Epoch 2275/5000
 - 0s - loss: 1.5105 - acc: 0.8800
Epoch 2276/5000
 - 0s - loss: 1.5101 - acc: 0.8800
Epoch 2277/5000
 - 0s - loss: 1.5098 - acc: 0.8800
Epoch 2278/5000
 - 0s - loss: 1.5095 - acc: 0.8800
Epoch 2279/5000
 - 0s - loss: 1.5092 - acc: 0.8800
Epoch 2280/5000
 - 0s - loss: 1.5089 - acc: 0.8800
Epoch 2281/5000
 - 0s - loss: 1.5086 - acc: 0.8800
Epoch 2282/5000
 - 0s - loss: 1.5083 - acc: 0.8800
Epoch 2283/5000
 - 0s - loss: 1.5080 - acc: 0.8800
Epoch 2284/5000
 - 0s - loss: 1.5077 - acc: 0.8800
Epoch 2285/5000
 - 0s - loss: 1.5074 - acc: 0.8800
Epoch 2286/5000
 - 0s - loss: 1.5071 - acc: 0.8800
Epoch 2287/5000
 - 0s - loss: 1.5068 - acc: 0.8800
Epoch 2288/5000
 - 0s - loss: 1.5065 - acc: 0.8800
Epoch 2289/5000
 - 0s - loss: 1.5062 - acc: 0.8800
Epoch 2290/5000
 - 0s - loss: 1.5059 - acc: 0.8800
Epoch 2291/5000
 - 0s - loss: 1

Epoch 2433/5000
 - 0s - loss: 1.4641 - acc: 0.9200
Epoch 2434/5000
 - 0s - loss: 1.4638 - acc: 0.9200
Epoch 2435/5000
 - 0s - loss: 1.4635 - acc: 0.9200
Epoch 2436/5000
 - 0s - loss: 1.4632 - acc: 0.9200
Epoch 2437/5000
 - 0s - loss: 1.4630 - acc: 0.9200
Epoch 2438/5000
 - 0s - loss: 1.4627 - acc: 0.9200
Epoch 2439/5000
 - 0s - loss: 1.4624 - acc: 0.9200
Epoch 2440/5000
 - 0s - loss: 1.4621 - acc: 0.9200
Epoch 2441/5000
 - 0s - loss: 1.4618 - acc: 0.9200
Epoch 2442/5000
 - 0s - loss: 1.4616 - acc: 0.9200
Epoch 2443/5000
 - 0s - loss: 1.4613 - acc: 0.9200
Epoch 2444/5000
 - 0s - loss: 1.4610 - acc: 0.9200
Epoch 2445/5000
 - 0s - loss: 1.4607 - acc: 0.9200
Epoch 2446/5000
 - 0s - loss: 1.4604 - acc: 0.9200
Epoch 2447/5000
 - 0s - loss: 1.4602 - acc: 0.9200
Epoch 2448/5000
 - 0s - loss: 1.4599 - acc: 0.9200
Epoch 2449/5000
 - 0s - loss: 1.4596 - acc: 0.9200
Epoch 2450/5000
 - 0s - loss: 1.4593 - acc: 0.9200
Epoch 2451/5000
 - 0s - loss: 1.4590 - acc: 0.9200
Epoch 2452/5000
 - 0s - loss: 1

Epoch 2594/5000
 - 0s - loss: 1.4203 - acc: 0.9200
Epoch 2595/5000
 - 0s - loss: 1.4200 - acc: 0.9200
Epoch 2596/5000
 - 0s - loss: 1.4198 - acc: 0.9200
Epoch 2597/5000
 - 0s - loss: 1.4195 - acc: 0.9200
Epoch 2598/5000
 - 0s - loss: 1.4192 - acc: 0.9200
Epoch 2599/5000
 - 0s - loss: 1.4190 - acc: 0.9200
Epoch 2600/5000
 - 0s - loss: 1.4187 - acc: 0.9200
Epoch 2601/5000
 - 0s - loss: 1.4184 - acc: 0.9200
Epoch 2602/5000
 - 0s - loss: 1.4182 - acc: 0.9200
Epoch 2603/5000
 - 0s - loss: 1.4179 - acc: 0.9200
Epoch 2604/5000
 - 0s - loss: 1.4176 - acc: 0.9200
Epoch 2605/5000
 - 0s - loss: 1.4174 - acc: 0.9200
Epoch 2606/5000
 - 0s - loss: 1.4171 - acc: 0.9200
Epoch 2607/5000
 - 0s - loss: 1.4169 - acc: 0.9200
Epoch 2608/5000
 - 0s - loss: 1.4166 - acc: 0.9200
Epoch 2609/5000
 - 0s - loss: 1.4163 - acc: 0.9200
Epoch 2610/5000
 - 0s - loss: 1.4161 - acc: 0.9200
Epoch 2611/5000
 - 0s - loss: 1.4158 - acc: 0.9200
Epoch 2612/5000
 - 0s - loss: 1.4155 - acc: 0.9200
Epoch 2613/5000
 - 0s - loss: 1

Epoch 2755/5000
 - 0s - loss: 1.3782 - acc: 0.9200
Epoch 2756/5000
 - 0s - loss: 1.3780 - acc: 0.9200
Epoch 2757/5000
 - 0s - loss: 1.3777 - acc: 0.9200
Epoch 2758/5000
 - 0s - loss: 1.3775 - acc: 0.9200
Epoch 2759/5000
 - 0s - loss: 1.3772 - acc: 0.9200
Epoch 2760/5000
 - 0s - loss: 1.3770 - acc: 0.9200
Epoch 2761/5000
 - 0s - loss: 1.3767 - acc: 0.9200
Epoch 2762/5000
 - 0s - loss: 1.3764 - acc: 0.9200
Epoch 2763/5000
 - 0s - loss: 1.3762 - acc: 0.9200
Epoch 2764/5000
 - 0s - loss: 1.3759 - acc: 0.9200
Epoch 2765/5000
 - 0s - loss: 1.3757 - acc: 0.9200
Epoch 2766/5000
 - 0s - loss: 1.3754 - acc: 0.9200
Epoch 2767/5000
 - 0s - loss: 1.3752 - acc: 0.9200
Epoch 2768/5000
 - 0s - loss: 1.3749 - acc: 0.9200
Epoch 2769/5000
 - 0s - loss: 1.3747 - acc: 0.9200
Epoch 2770/5000
 - 0s - loss: 1.3744 - acc: 0.9200
Epoch 2771/5000
 - 0s - loss: 1.3742 - acc: 0.9200
Epoch 2772/5000
 - 0s - loss: 1.3739 - acc: 0.9200
Epoch 2773/5000
 - 0s - loss: 1.3736 - acc: 0.9200
Epoch 2774/5000
 - 0s - loss: 1

Epoch 2916/5000
 - 0s - loss: 1.3380 - acc: 0.9200
Epoch 2917/5000
 - 0s - loss: 1.3377 - acc: 0.9200
Epoch 2918/5000
 - 0s - loss: 1.3375 - acc: 0.9200
Epoch 2919/5000
 - 0s - loss: 1.3372 - acc: 0.9200
Epoch 2920/5000
 - 0s - loss: 1.3370 - acc: 0.9200
Epoch 2921/5000
 - 0s - loss: 1.3367 - acc: 0.9200
Epoch 2922/5000
 - 0s - loss: 1.3365 - acc: 0.9200
Epoch 2923/5000
 - 0s - loss: 1.3362 - acc: 0.9200
Epoch 2924/5000
 - 0s - loss: 1.3360 - acc: 0.9200
Epoch 2925/5000
 - 0s - loss: 1.3358 - acc: 0.9200
Epoch 2926/5000
 - 0s - loss: 1.3355 - acc: 0.9200
Epoch 2927/5000
 - 0s - loss: 1.3353 - acc: 0.9200
Epoch 2928/5000
 - 0s - loss: 1.3350 - acc: 0.9200
Epoch 2929/5000
 - 0s - loss: 1.3348 - acc: 0.9200
Epoch 2930/5000
 - 0s - loss: 1.3345 - acc: 0.9200
Epoch 2931/5000
 - 0s - loss: 1.3343 - acc: 0.9200
Epoch 2932/5000
 - 0s - loss: 1.3340 - acc: 0.9200
Epoch 2933/5000
 - 0s - loss: 1.3338 - acc: 0.9200
Epoch 2934/5000
 - 0s - loss: 1.3335 - acc: 0.9200
Epoch 2935/5000
 - 0s - loss: 1

Epoch 3077/5000
 - 0s - loss: 1.2988 - acc: 0.9200
Epoch 3078/5000
 - 0s - loss: 1.2985 - acc: 0.9200
Epoch 3079/5000
 - 0s - loss: 1.2983 - acc: 0.9200
Epoch 3080/5000
 - 0s - loss: 1.2981 - acc: 0.9200
Epoch 3081/5000
 - 0s - loss: 1.2978 - acc: 0.9200
Epoch 3082/5000
 - 0s - loss: 1.2976 - acc: 0.9200
Epoch 3083/5000
 - 0s - loss: 1.2973 - acc: 0.9200
Epoch 3084/5000
 - 0s - loss: 1.2971 - acc: 0.9200
Epoch 3085/5000
 - 0s - loss: 1.2969 - acc: 0.9200
Epoch 3086/5000
 - 0s - loss: 1.2966 - acc: 0.9200
Epoch 3087/5000
 - 0s - loss: 1.2964 - acc: 0.9200
Epoch 3088/5000
 - 0s - loss: 1.2961 - acc: 0.9200
Epoch 3089/5000
 - 0s - loss: 1.2959 - acc: 0.9200
Epoch 3090/5000
 - 0s - loss: 1.2957 - acc: 0.9200
Epoch 3091/5000
 - 0s - loss: 1.2954 - acc: 0.9200
Epoch 3092/5000
 - 0s - loss: 1.2952 - acc: 0.9200
Epoch 3093/5000
 - 0s - loss: 1.2950 - acc: 0.9200
Epoch 3094/5000
 - 0s - loss: 1.2947 - acc: 0.9200
Epoch 3095/5000
 - 0s - loss: 1.2945 - acc: 0.9200
Epoch 3096/5000
 - 0s - loss: 1

Epoch 3238/5000
 - 0s - loss: 1.2608 - acc: 0.9200
Epoch 3239/5000
 - 0s - loss: 1.2606 - acc: 0.9200
Epoch 3240/5000
 - 0s - loss: 1.2604 - acc: 0.9200
Epoch 3241/5000
 - 0s - loss: 1.2601 - acc: 0.9200
Epoch 3242/5000
 - 0s - loss: 1.2599 - acc: 0.9200
Epoch 3243/5000
 - 0s - loss: 1.2597 - acc: 0.9200
Epoch 3244/5000
 - 0s - loss: 1.2594 - acc: 0.9200
Epoch 3245/5000
 - 0s - loss: 1.2592 - acc: 0.9200
Epoch 3246/5000
 - 0s - loss: 1.2590 - acc: 0.9200
Epoch 3247/5000
 - 0s - loss: 1.2587 - acc: 0.9200
Epoch 3248/5000
 - 0s - loss: 1.2585 - acc: 0.9200
Epoch 3249/5000
 - 0s - loss: 1.2583 - acc: 0.9200
Epoch 3250/5000
 - 0s - loss: 1.2580 - acc: 0.9200
Epoch 3251/5000
 - 0s - loss: 1.2578 - acc: 0.9200
Epoch 3252/5000
 - 0s - loss: 1.2576 - acc: 0.9200
Epoch 3253/5000
 - 0s - loss: 1.2573 - acc: 0.9200
Epoch 3254/5000
 - 0s - loss: 1.2571 - acc: 0.9200
Epoch 3255/5000
 - 0s - loss: 1.2569 - acc: 0.9200
Epoch 3256/5000
 - 0s - loss: 1.2567 - acc: 0.9200
Epoch 3257/5000
 - 0s - loss: 1

Epoch 3399/5000
 - 0s - loss: 1.2238 - acc: 0.9200
Epoch 3400/5000
 - 0s - loss: 1.2235 - acc: 0.9200
Epoch 3401/5000
 - 0s - loss: 1.2233 - acc: 0.9200
Epoch 3402/5000
 - 0s - loss: 1.2231 - acc: 0.9200
Epoch 3403/5000
 - 0s - loss: 1.2228 - acc: 0.9200
Epoch 3404/5000
 - 0s - loss: 1.2226 - acc: 0.9200
Epoch 3405/5000
 - 0s - loss: 1.2224 - acc: 0.9200
Epoch 3406/5000
 - 0s - loss: 1.2222 - acc: 0.9200
Epoch 3407/5000
 - 0s - loss: 1.2219 - acc: 0.9200
Epoch 3408/5000
 - 0s - loss: 1.2217 - acc: 0.9200
Epoch 3409/5000
 - 0s - loss: 1.2215 - acc: 0.9200
Epoch 3410/5000
 - 0s - loss: 1.2213 - acc: 0.9200
Epoch 3411/5000
 - 0s - loss: 1.2210 - acc: 0.9200
Epoch 3412/5000
 - 0s - loss: 1.2208 - acc: 0.9200
Epoch 3413/5000
 - 0s - loss: 1.2206 - acc: 0.9200
Epoch 3414/5000
 - 0s - loss: 1.2203 - acc: 0.9200
Epoch 3415/5000
 - 0s - loss: 1.2201 - acc: 0.9200
Epoch 3416/5000
 - 0s - loss: 1.2199 - acc: 0.9200
Epoch 3417/5000
 - 0s - loss: 1.2197 - acc: 0.9200
Epoch 3418/5000
 - 0s - loss: 1

Epoch 3560/5000
 - 0s - loss: 1.1875 - acc: 0.9200
Epoch 3561/5000
 - 0s - loss: 1.1873 - acc: 0.9200
Epoch 3562/5000
 - 0s - loss: 1.1871 - acc: 0.9200
Epoch 3563/5000
 - 0s - loss: 1.1869 - acc: 0.9200
Epoch 3564/5000
 - 0s - loss: 1.1866 - acc: 0.9200
Epoch 3565/5000
 - 0s - loss: 1.1864 - acc: 0.9200
Epoch 3566/5000
 - 0s - loss: 1.1862 - acc: 0.9200
Epoch 3567/5000
 - 0s - loss: 1.1860 - acc: 0.9200
Epoch 3568/5000
 - 0s - loss: 1.1858 - acc: 0.9200
Epoch 3569/5000
 - 0s - loss: 1.1855 - acc: 0.9200
Epoch 3570/5000
 - 0s - loss: 1.1853 - acc: 0.9200
Epoch 3571/5000
 - 0s - loss: 1.1851 - acc: 0.9200
Epoch 3572/5000
 - 0s - loss: 1.1849 - acc: 0.9200
Epoch 3573/5000
 - 0s - loss: 1.1846 - acc: 0.9200
Epoch 3574/5000
 - 0s - loss: 1.1844 - acc: 0.9200
Epoch 3575/5000
 - 0s - loss: 1.1842 - acc: 0.9200
Epoch 3576/5000
 - 0s - loss: 1.1840 - acc: 0.9200
Epoch 3577/5000
 - 0s - loss: 1.1838 - acc: 0.9200
Epoch 3578/5000
 - 0s - loss: 1.1835 - acc: 0.9200
Epoch 3579/5000
 - 0s - loss: 1

Epoch 3721/5000
 - 0s - loss: 1.1522 - acc: 0.9600
Epoch 3722/5000
 - 0s - loss: 1.1519 - acc: 0.9600
Epoch 3723/5000
 - 0s - loss: 1.1517 - acc: 0.9600
Epoch 3724/5000
 - 0s - loss: 1.1515 - acc: 0.9600
Epoch 3725/5000
 - 0s - loss: 1.1513 - acc: 0.9600
Epoch 3726/5000
 - 0s - loss: 1.1511 - acc: 0.9600
Epoch 3727/5000
 - 0s - loss: 1.1508 - acc: 0.9600
Epoch 3728/5000
 - 0s - loss: 1.1506 - acc: 0.9600
Epoch 3729/5000
 - 0s - loss: 1.1504 - acc: 0.9600
Epoch 3730/5000
 - 0s - loss: 1.1502 - acc: 0.9600
Epoch 3731/5000
 - 0s - loss: 1.1500 - acc: 0.9600
Epoch 3732/5000
 - 0s - loss: 1.1498 - acc: 0.9600
Epoch 3733/5000
 - 0s - loss: 1.1496 - acc: 0.9600
Epoch 3734/5000
 - 0s - loss: 1.1493 - acc: 0.9600
Epoch 3735/5000
 - 0s - loss: 1.1491 - acc: 0.9600
Epoch 3736/5000
 - 0s - loss: 1.1489 - acc: 0.9600
Epoch 3737/5000
 - 0s - loss: 1.1487 - acc: 0.9600
Epoch 3738/5000
 - 0s - loss: 1.1485 - acc: 0.9600
Epoch 3739/5000
 - 0s - loss: 1.1482 - acc: 0.9600
Epoch 3740/5000
 - 0s - loss: 1

Epoch 3882/5000
 - 0s - loss: 1.1176 - acc: 0.9600
Epoch 3883/5000
 - 0s - loss: 1.1174 - acc: 0.9600
Epoch 3884/5000
 - 0s - loss: 1.1172 - acc: 0.9600
Epoch 3885/5000
 - 0s - loss: 1.1170 - acc: 0.9600
Epoch 3886/5000
 - 0s - loss: 1.1168 - acc: 0.9600
Epoch 3887/5000
 - 0s - loss: 1.1166 - acc: 0.9600
Epoch 3888/5000
 - 0s - loss: 1.1164 - acc: 0.9600
Epoch 3889/5000
 - 0s - loss: 1.1161 - acc: 0.9600
Epoch 3890/5000
 - 0s - loss: 1.1159 - acc: 0.9600
Epoch 3891/5000
 - 0s - loss: 1.1157 - acc: 0.9600
Epoch 3892/5000
 - 0s - loss: 1.1155 - acc: 0.9600
Epoch 3893/5000
 - 0s - loss: 1.1153 - acc: 0.9600
Epoch 3894/5000
 - 0s - loss: 1.1151 - acc: 0.9600
Epoch 3895/5000
 - 0s - loss: 1.1149 - acc: 0.9600
Epoch 3896/5000
 - 0s - loss: 1.1147 - acc: 0.9600
Epoch 3897/5000
 - 0s - loss: 1.1144 - acc: 0.9600
Epoch 3898/5000
 - 0s - loss: 1.1142 - acc: 0.9600
Epoch 3899/5000
 - 0s - loss: 1.1140 - acc: 0.9600
Epoch 3900/5000
 - 0s - loss: 1.1138 - acc: 0.9600
Epoch 3901/5000
 - 0s - loss: 1

Epoch 4043/5000
 - 0s - loss: 1.0839 - acc: 0.9600
Epoch 4044/5000
 - 0s - loss: 1.0837 - acc: 0.9600
Epoch 4045/5000
 - 0s - loss: 1.0835 - acc: 0.9600
Epoch 4046/5000
 - 0s - loss: 1.0833 - acc: 0.9600
Epoch 4047/5000
 - 0s - loss: 1.0831 - acc: 0.9600
Epoch 4048/5000
 - 0s - loss: 1.0829 - acc: 0.9600
Epoch 4049/5000
 - 0s - loss: 1.0827 - acc: 0.9600
Epoch 4050/5000
 - 0s - loss: 1.0825 - acc: 0.9600
Epoch 4051/5000
 - 0s - loss: 1.0823 - acc: 0.9600
Epoch 4052/5000
 - 0s - loss: 1.0821 - acc: 0.9600
Epoch 4053/5000
 - 0s - loss: 1.0819 - acc: 0.9600
Epoch 4054/5000
 - 0s - loss: 1.0817 - acc: 0.9600
Epoch 4055/5000
 - 0s - loss: 1.0815 - acc: 0.9600
Epoch 4056/5000
 - 0s - loss: 1.0812 - acc: 0.9600
Epoch 4057/5000
 - 0s - loss: 1.0810 - acc: 0.9600
Epoch 4058/5000
 - 0s - loss: 1.0808 - acc: 0.9600
Epoch 4059/5000
 - 0s - loss: 1.0806 - acc: 0.9600
Epoch 4060/5000
 - 0s - loss: 1.0804 - acc: 0.9600
Epoch 4061/5000
 - 0s - loss: 1.0802 - acc: 0.9600
Epoch 4062/5000
 - 0s - loss: 1

Epoch 4204/5000
 - 0s - loss: 1.0511 - acc: 0.9600
Epoch 4205/5000
 - 0s - loss: 1.0509 - acc: 0.9600
Epoch 4206/5000
 - 0s - loss: 1.0507 - acc: 0.9600
Epoch 4207/5000
 - 0s - loss: 1.0505 - acc: 0.9600
Epoch 4208/5000
 - 0s - loss: 1.0503 - acc: 0.9600
Epoch 4209/5000
 - 0s - loss: 1.0501 - acc: 0.9600
Epoch 4210/5000
 - 0s - loss: 1.0499 - acc: 0.9600
Epoch 4211/5000
 - 0s - loss: 1.0497 - acc: 0.9600
Epoch 4212/5000
 - 0s - loss: 1.0495 - acc: 0.9600
Epoch 4213/5000
 - 0s - loss: 1.0493 - acc: 0.9600
Epoch 4214/5000
 - 0s - loss: 1.0491 - acc: 0.9600
Epoch 4215/5000
 - 0s - loss: 1.0489 - acc: 0.9600
Epoch 4216/5000
 - 0s - loss: 1.0487 - acc: 0.9600
Epoch 4217/5000
 - 0s - loss: 1.0485 - acc: 0.9600
Epoch 4218/5000
 - 0s - loss: 1.0483 - acc: 0.9600
Epoch 4219/5000
 - 0s - loss: 1.0481 - acc: 0.9600
Epoch 4220/5000
 - 0s - loss: 1.0479 - acc: 0.9600
Epoch 4221/5000
 - 0s - loss: 1.0477 - acc: 0.9600
Epoch 4222/5000
 - 0s - loss: 1.0475 - acc: 0.9600
Epoch 4223/5000
 - 0s - loss: 1

Epoch 4365/5000
 - 0s - loss: 1.0190 - acc: 1.0000
Epoch 4366/5000
 - 0s - loss: 1.0188 - acc: 1.0000
Epoch 4367/5000
 - 0s - loss: 1.0186 - acc: 1.0000
Epoch 4368/5000
 - 0s - loss: 1.0185 - acc: 1.0000
Epoch 4369/5000
 - 0s - loss: 1.0183 - acc: 1.0000
Epoch 4370/5000
 - 0s - loss: 1.0181 - acc: 1.0000
Epoch 4371/5000
 - 0s - loss: 1.0179 - acc: 1.0000
Epoch 4372/5000
 - 0s - loss: 1.0177 - acc: 1.0000
Epoch 4373/5000
 - 0s - loss: 1.0175 - acc: 1.0000
Epoch 4374/5000
 - 0s - loss: 1.0173 - acc: 1.0000
Epoch 4375/5000
 - 0s - loss: 1.0171 - acc: 1.0000
Epoch 4376/5000
 - 0s - loss: 1.0169 - acc: 1.0000
Epoch 4377/5000
 - 0s - loss: 1.0167 - acc: 1.0000
Epoch 4378/5000
 - 0s - loss: 1.0165 - acc: 1.0000
Epoch 4379/5000
 - 0s - loss: 1.0163 - acc: 1.0000
Epoch 4380/5000
 - 0s - loss: 1.0161 - acc: 1.0000
Epoch 4381/5000
 - 0s - loss: 1.0159 - acc: 1.0000
Epoch 4382/5000
 - 0s - loss: 1.0157 - acc: 1.0000
Epoch 4383/5000
 - 0s - loss: 1.0155 - acc: 1.0000
Epoch 4384/5000
 - 0s - loss: 1

Epoch 4526/5000
 - 0s - loss: 0.9878 - acc: 1.0000
Epoch 4527/5000
 - 0s - loss: 0.9876 - acc: 1.0000
Epoch 4528/5000
 - 0s - loss: 0.9874 - acc: 1.0000
Epoch 4529/5000
 - 0s - loss: 0.9872 - acc: 1.0000
Epoch 4530/5000
 - 0s - loss: 0.9871 - acc: 1.0000
Epoch 4531/5000
 - 0s - loss: 0.9869 - acc: 1.0000
Epoch 4532/5000
 - 0s - loss: 0.9867 - acc: 1.0000
Epoch 4533/5000
 - 0s - loss: 0.9865 - acc: 1.0000
Epoch 4534/5000
 - 0s - loss: 0.9863 - acc: 1.0000
Epoch 4535/5000
 - 0s - loss: 0.9861 - acc: 1.0000
Epoch 4536/5000
 - 0s - loss: 0.9859 - acc: 1.0000
Epoch 4537/5000
 - 0s - loss: 0.9857 - acc: 1.0000
Epoch 4538/5000
 - 0s - loss: 0.9855 - acc: 1.0000
Epoch 4539/5000
 - 0s - loss: 0.9853 - acc: 1.0000
Epoch 4540/5000
 - 0s - loss: 0.9851 - acc: 1.0000
Epoch 4541/5000
 - 0s - loss: 0.9849 - acc: 1.0000
Epoch 4542/5000
 - 0s - loss: 0.9848 - acc: 1.0000
Epoch 4543/5000
 - 0s - loss: 0.9846 - acc: 1.0000
Epoch 4544/5000
 - 0s - loss: 0.9844 - acc: 1.0000
Epoch 4545/5000
 - 0s - loss: 0

Epoch 4687/5000
 - 0s - loss: 0.9574 - acc: 1.0000
Epoch 4688/5000
 - 0s - loss: 0.9572 - acc: 1.0000
Epoch 4689/5000
 - 0s - loss: 0.9570 - acc: 1.0000
Epoch 4690/5000
 - 0s - loss: 0.9568 - acc: 1.0000
Epoch 4691/5000
 - 0s - loss: 0.9566 - acc: 1.0000
Epoch 4692/5000
 - 0s - loss: 0.9565 - acc: 1.0000
Epoch 4693/5000
 - 0s - loss: 0.9563 - acc: 1.0000
Epoch 4694/5000
 - 0s - loss: 0.9561 - acc: 1.0000
Epoch 4695/5000
 - 0s - loss: 0.9559 - acc: 1.0000
Epoch 4696/5000
 - 0s - loss: 0.9557 - acc: 1.0000
Epoch 4697/5000
 - 0s - loss: 0.9555 - acc: 1.0000
Epoch 4698/5000
 - 0s - loss: 0.9553 - acc: 1.0000
Epoch 4699/5000
 - 0s - loss: 0.9551 - acc: 1.0000
Epoch 4700/5000
 - 0s - loss: 0.9550 - acc: 1.0000
Epoch 4701/5000
 - 0s - loss: 0.9548 - acc: 1.0000
Epoch 4702/5000
 - 0s - loss: 0.9546 - acc: 1.0000
Epoch 4703/5000
 - 0s - loss: 0.9544 - acc: 1.0000
Epoch 4704/5000
 - 0s - loss: 0.9542 - acc: 1.0000
Epoch 4705/5000
 - 0s - loss: 0.9540 - acc: 1.0000
Epoch 4706/5000
 - 0s - loss: 0

Epoch 4848/5000
 - 0s - loss: 0.9277 - acc: 1.0000
Epoch 4849/5000
 - 0s - loss: 0.9275 - acc: 1.0000
Epoch 4850/5000
 - 0s - loss: 0.9274 - acc: 1.0000
Epoch 4851/5000
 - 0s - loss: 0.9272 - acc: 1.0000
Epoch 4852/5000
 - 0s - loss: 0.9270 - acc: 1.0000
Epoch 4853/5000
 - 0s - loss: 0.9268 - acc: 1.0000
Epoch 4854/5000
 - 0s - loss: 0.9266 - acc: 1.0000
Epoch 4855/5000
 - 0s - loss: 0.9265 - acc: 1.0000
Epoch 4856/5000
 - 0s - loss: 0.9263 - acc: 1.0000
Epoch 4857/5000
 - 0s - loss: 0.9261 - acc: 1.0000
Epoch 4858/5000
 - 0s - loss: 0.9259 - acc: 1.0000
Epoch 4859/5000
 - 0s - loss: 0.9257 - acc: 1.0000
Epoch 4860/5000
 - 0s - loss: 0.9255 - acc: 1.0000
Epoch 4861/5000
 - 0s - loss: 0.9254 - acc: 1.0000
Epoch 4862/5000
 - 0s - loss: 0.9252 - acc: 1.0000
Epoch 4863/5000
 - 0s - loss: 0.9250 - acc: 1.0000
Epoch 4864/5000
 - 0s - loss: 0.9248 - acc: 1.0000
Epoch 4865/5000
 - 0s - loss: 0.9246 - acc: 1.0000
Epoch 4866/5000
 - 0s - loss: 0.9245 - acc: 1.0000
Epoch 4867/5000
 - 0s - loss: 0

As we expected, the network is able to use the within-sequence context to learn the alphabet, achieving 100% accuracy on the training data.

Importantly, the network can make accurate predictions for the next letter in the alphabet for randomly selected characters. Very impressive.

## Stateful LSTM for a One-Char to One-Char Mapping

We have seen that we can break-up our raw data into fixed size sequences and that this representation can be learned by the LSTM, but only to learn random mappings of 3 characters to 1 character.

We have also seen that we can pervert batch size to offer more sequence to the network, but only during training.

Ideally, we want to expose the network to the entire sequence and let it learn the inter-dependencies, rather than us define those dependencies explicitly in the framing of the problem.

We can do this in Keras by making the LSTM layers stateful and **manually resetting the state of the network at the end of the epoch, which is also the end of the training sequence.**

**Questions**: Why reset state of network at the end of epoch.

This is truly how the LSTM networks are intended to be used. We find that by allowing the network itself to learn the dependencies between the characters, that we need a smaller network (half the number of units) and fewer training epochs (almost half).

We first need to define our LSTM layer as stateful. In so doing, we must explicitly specify the batch size as a dimension on the input shape. This also means that when we evaluate the network or make predictions, we must also specify and adhere to this same batch size. This is not a problem now as we are using a batch size of 1. This could introduce difficulties when making predictions when the batch size is not one as predictions will need to be made in batch and in sequence.

```
batch_size = 1
model.add(LSTM(16, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True))
```

We can do this in a for loop. Again, we do not shuffle the input, preserving the sequence in which the input training data was created.
```
for i in range(300):
	model.fit(X, y, epochs=1, batch_size=batch_size, verbose=2, shuffle=False)
	model.reset_states()
```

As mentioned, we specify the batch size when evaluating the performance of the network on the entire training dataset.

```
# summarize performance of the model
scores = model.evaluate(X, y, batch_size=batch_size, verbose=0)
model.reset_states()
print("Model Accuracy: %.2f%%" % (scores[1]*100))
```

Finally, we can demonstrate that the network has indeed learned the entire alphabet. We can seed it with the first letter “A”, request a prediction, feed the prediction back in as an input, and repeat the process all the way to “Z”.

```
# demonstrate some model predictions
seed = [char_to_int[alphabet[0]]]
for i in range(0, len(alphabet)-1):
	x = numpy.reshape(seed, (1, len(seed), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	print int_to_char[seed[0]], "->", int_to_char[index]
	seed = [index]
model.reset_states()

```

In [51]:
# Stateful LSTM to learn one-char to one-char mapping
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
# fix random seed for reproducibility
numpy.random.seed(7)
# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
# prepare the dataset of input to output pairs encoded as integers
seq_length = 1
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
	seq_in = alphabet[i:i + seq_length]
	seq_out = alphabet[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
	print(seq_in, '->', seq_out)
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), seq_length, 1))
# normalize
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# create and fit the model
batch_size = 1
model = Sequential()
model.add(LSTM(16, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
for i in range(300):
	model.fit(X, y, epochs=1, batch_size=batch_size, verbose=2, shuffle=False)
	model.reset_states()
# summarize performance of the model
scores = model.evaluate(X, y, batch_size=batch_size, verbose=0)
model.reset_states()
print("Model Accuracy: %.2f%%" % (scores[1]*100))
# demonstrate some model predictions
seed = [char_to_int[alphabet[0]]]
for i in range(0, len(alphabet)-1):
	x = numpy.reshape(seed, (1, len(seed), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	print(int_to_char[seed[0]], "->", int_to_char[index])
	seed = [index]
model.reset_states()
# demonstrate a random starting point
letter = "K"
seed = [char_to_int[letter]]
print("New start: ", letter)
for i in range(0, 5):
	x = numpy.reshape(seed, (1, len(seed), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	print(int_to_char[seed[0]], "->", int_to_char[index])
	seed = [index]
model.reset_states()

A -> B
B -> C
C -> D
D -> E
E -> F
F -> G
G -> H
H -> I
I -> J
J -> K
K -> L
L -> M
M -> N
N -> O
O -> P
P -> Q
Q -> R
R -> S
S -> T
T -> U
U -> V
V -> W
W -> X
X -> Y
Y -> Z
Epoch 1/1
 - 1s - loss: 3.2662 - acc: 0.0400
Epoch 1/1
 - 0s - loss: 3.2541 - acc: 0.1200
Epoch 1/1
 - 0s - loss: 3.2483 - acc: 0.1200
Epoch 1/1
 - 0s - loss: 3.2427 - acc: 0.1200
Epoch 1/1
 - 0s - loss: 3.2370 - acc: 0.1200
Epoch 1/1
 - 0s - loss: 3.2309 - acc: 0.1200
Epoch 1/1
 - 0s - loss: 3.2242 - acc: 0.0400
Epoch 1/1
 - 0s - loss: 3.2166 - acc: 0.0800
Epoch 1/1
 - 0s - loss: 3.2077 - acc: 0.0800
Epoch 1/1
 - 0s - loss: 3.1967 - acc: 0.1200
Epoch 1/1
 - 0s - loss: 3.1829 - acc: 0.1200
Epoch 1/1
 - 0s - loss: 3.1647 - acc: 0.0800
Epoch 1/1
 - 0s - loss: 3.1404 - acc: 0.0800
Epoch 1/1
 - 0s - loss: 3.1090 - acc: 0.1200
Epoch 1/1
 - 0s - loss: 3.0735 - acc: 0.0800
Epoch 1/1
 - 0s - loss: 3.0425 - acc: 0.0800
Epoch 1/1
 - 0s - loss: 3.0266 - acc: 0.1600
Epoch 1/1
 - 0s - loss: 3.0371 - acc: 0.1200
Epoch 1/1
 - 0s

 - 0s - loss: 1.6569 - acc: 0.2800
Epoch 1/1
 - 0s - loss: 1.6452 - acc: 0.2800
Epoch 1/1
 - 0s - loss: 1.6364 - acc: 0.2800
Epoch 1/1
 - 0s - loss: 1.6250 - acc: 0.2800
Epoch 1/1
 - 0s - loss: 1.6185 - acc: 0.2800
Epoch 1/1
 - 0s - loss: 1.6137 - acc: 0.2800
Epoch 1/1
 - 0s - loss: 1.6051 - acc: 0.2800
Epoch 1/1
 - 0s - loss: 1.6022 - acc: 0.2800
Epoch 1/1
 - 0s - loss: 1.5992 - acc: 0.2800
Epoch 1/1
 - 0s - loss: 1.6000 - acc: 0.2800
Epoch 1/1
 - 0s - loss: 1.5938 - acc: 0.2800
Epoch 1/1
 - 0s - loss: 1.5851 - acc: 0.3200
Epoch 1/1
 - 0s - loss: 1.5802 - acc: 0.3200
Epoch 1/1
 - 0s - loss: 1.5718 - acc: 0.3200
Epoch 1/1
 - 0s - loss: 1.5784 - acc: 0.3200
Epoch 1/1
 - 0s - loss: 1.5794 - acc: 0.3200
Epoch 1/1
 - 0s - loss: 1.5843 - acc: 0.3200
Epoch 1/1
 - 0s - loss: 1.5891 - acc: 0.2800
Epoch 1/1
 - 0s - loss: 1.5908 - acc: 0.2800
Epoch 1/1
 - 0s - loss: 1.5869 - acc: 0.2400
Epoch 1/1
 - 0s - loss: 1.5860 - acc: 0.2400
Epoch 1/1
 - 0s - loss: 1.5780 - acc: 0.2400
Epoch 1/1
 - 0s - lo

We can see that the network has memorized the entire alphabet perfectly. It used the context of the samples themselves and learned whatever dependency it needed to predict the next character in the sequence.

We can also see that if we seed the network with the first letter, that it can correctly rattle off the rest of the alphabet.

We can also see that it has only learned the full alphabet sequence and that from a cold start. When asked to predict the next letter from “K” that it predicts “B” and falls back into regurgitating the entire alphabet.

To truly predict “K” the state of the network would need to be warmed up iteratively fed the letters from “A” to “J”. This tells us that we could achieve the same effect with a “stateless” LSTM by preparing training data like:

## LSTM with Variable-Length Input to One-Char Output

In the previous section, we discovered that the Keras “stateful” LSTM was really only a shortcut to replaying the first n-sequences, but didn’t really help us learn a generic model of the alphabet.

In this section we explore a variation of the “stateless” LSTM that learns random subsequences of the alphabet and an effort to build a model that can be given arbitrary letters or subsequences of letters and predict the next letter in the alphabet.

Firstly, we are changing the framing of the problem. To simplify we will define a maximum input sequence length and set it to a small value like 5 to speed up training. This defines the maximum length of subsequences of the alphabet will be drawn for training. In extensions, this could just as set to the full alphabet (26) or longer if we allow looping back to the start of the sequence.

We also need to define the number of random sequences to create, in this case 1000. This too could be more or less. I expect less patterns are actually required.

In [52]:
# prepare the dataset of input to output pairs encoded as integers
num_inputs = 1000
max_len = 5
dataX = []
dataY = []
for i in range(num_inputs):
	start = numpy.random.randint(len(alphabet)-2)
	end = numpy.random.randint(start, min(start+max_len,len(alphabet)-1))
	sequence_in = alphabet[start:end+1]
	sequence_out = alphabet[end + 1]
	dataX.append([char_to_int[char] for char in sequence_in])
	dataY.append(char_to_int[sequence_out])
	print(sequence_in, '->', sequence_out)

K -> L
NOP -> Q
GH -> I
KLMN -> O
X -> Y
Q -> R
NOPQR -> S
HIJ -> K
IJ -> K
C -> D
FG -> H
JKLMN -> O
TU -> V
NOPQR -> S
O -> P
TU -> V
MNOPQ -> R
PQ -> R
S -> T
VWXY -> Z
VWXY -> Z
CD -> E
BCDEF -> G
OPQ -> R
LMNO -> P
HIJKL -> M
STU -> V
GHI -> J
UVWX -> Y
NOPQ -> R
HIJK -> L
NOP -> Q
Q -> R
HIJ -> K
W -> X
QR -> S
UVWX -> Y
H -> I
ABC -> D
RSTUV -> W
VW -> X
OP -> Q
RSTUV -> W
ABC -> D
ABC -> D
GHIJ -> K
WXY -> Z
BCDE -> F
N -> O
JK -> L
X -> Y
TUV -> W
L -> M
F -> G
MN -> O
JKLMN -> O
G -> H
BCDEF -> G
LMN -> O
N -> O
V -> W
BCDEF -> G
KLM -> N
ST -> U
TUV -> W
MN -> O
JKLM -> N
LM -> N
U -> V
FGH -> I
TUV -> W
C -> D
HIJK -> L
UVWX -> Y
W -> X
QR -> S
PQR -> S
STUVW -> X
RSTU -> V
TU -> V
RSTU -> V
JKL -> M
JKL -> M
RSTUV -> W
GHI -> J
V -> W
CD -> E
QRSTU -> V
M -> N
BCDE -> F
WX -> Y
K -> L
VW -> X
GHI -> J
CD -> E
XY -> Z
HI -> J
C -> D
IJK -> L
DEFG -> H
UV -> W
LM -> N
X -> Y
UV -> W
I -> J
NO -> P
ABCD -> E
K -> L
IJK -> L
JKL -> M
EFGHI -> J
JK -> L
TU -> V
IJ -> K
MNOPQ ->

The input sequences vary in length between 1 and max_len and therefore require zero padding. Here, we use left-hand-side (prefix) padding with the Keras built in pad_sequences() function.


In [53]:
X = pad_sequences(dataX, maxlen=max_len, dtype='float32')

The trained model is evaluated on randomly selected input patterns. This could just as easily be new randomly generated sequences of characters. I also believe this could also be a linear sequence seeded with “A” with outputs fes back in as single character inputs.

The full code listing is provided below for completeness

In [58]:
# LSTM with Variable Length Input Sequences to One Character Output
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
from keras.preprocessing.sequence import pad_sequences
# from theano.tensor.shared_randomstreams import RandomStreams
# fix random seed for reproducibility
numpy.random.seed(7)
# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
# prepare the dataset of input to output pairs encoded as integers
num_inputs = 1000
max_len = 5
dataX = []
dataY = []
for i in range(num_inputs):
	start = numpy.random.randint(len(alphabet)-2)
	end = numpy.random.randint(start, min(start+max_len,len(alphabet)-1))
	sequence_in = alphabet[start:end+1]
	sequence_out = alphabet[end + 1]
	dataX.append([char_to_int[char] for char in sequence_in])
	dataY.append(char_to_int[sequence_out])
	print(sequence_in, '->', sequence_out)
# convert list of lists to array and pad sequences if needed
X = pad_sequences(dataX, maxlen=max_len, dtype='float32')
# reshape X to be [samples, time steps, features]
X = numpy.reshape(X, (X.shape[0], max_len, 1))
# normalize
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# create and fit the model
batch_size = 1
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], 1)))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=batch_size, verbose=2)
# summarize performance of the model
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))
# demonstrate some model predictions
for i in range(20):
	pattern_index = numpy.random.randint(len(dataX))
	pattern = dataX[pattern_index]
	x = pad_sequences([pattern], maxlen=max_len, dtype='float32')
	x = numpy.reshape(x, (1, max_len, 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	print(seq_in, "->", result)

PQRST -> U
W -> X
O -> P
OPQ -> R
IJKLM -> N
QRSTU -> V
ABCD -> E
X -> Y
GHIJ -> K
M -> N
XY -> Z
QRST -> U
ABC -> D
JKLMN -> O
OP -> Q
XY -> Z
D -> E
T -> U
B -> C
QRSTU -> V
HIJ -> K
JKLM -> N
ABCDE -> F
X -> Y
V -> W
DE -> F
DEFG -> H
BCDE -> F
EFGH -> I
BCDE -> F
FG -> H
RST -> U
TUV -> W
STUV -> W
LMN -> O
P -> Q
MNOP -> Q
JK -> L
MNOP -> Q
OPQRS -> T
UVWXY -> Z
PQRS -> T
D -> E
EFGH -> I
IJK -> L
WX -> Y
STUV -> W
MNOPQ -> R
P -> Q
WXY -> Z
VWX -> Y
V -> W
HI -> J
KLMNO -> P
UV -> W
JKL -> M
ABCDE -> F
WXY -> Z
M -> N
CDEF -> G
KLMNO -> P
RST -> U
RS -> T
W -> X
J -> K
WX -> Y
JKLMN -> O
MN -> O
L -> M
BCDE -> F
TU -> V
MNOPQ -> R
NOPQR -> S
HIJ -> K
JKLM -> N
STUVW -> X
QRST -> U
N -> O
VWXY -> Z
B -> C
UVWX -> Y
OP -> Q
K -> L
C -> D
X -> Y
ST -> U
JKLM -> N
B -> C
QR -> S
RS -> T
VWXY -> Z
S -> T
NOP -> Q
KLMNO -> P
IJ -> K
EF -> G
MNOP -> Q
WXY -> Z
HI -> J
P -> Q
STUVW -> X
Q -> R
MN -> O
O -> P
C -> D
L -> M
JKLM -> N
K -> L
IJKLM -> N
FGHIJ -> K
LM -> N
OPQ -> R
U -> V
HIJ

Epoch 1/500
 - 3s - loss: 3.0781 - acc: 0.0650
Epoch 2/500
 - 2s - loss: 2.7658 - acc: 0.1290
Epoch 3/500
 - 2s - loss: 2.4374 - acc: 0.1960
Epoch 4/500
 - 2s - loss: 2.2116 - acc: 0.2620
Epoch 5/500
 - 2s - loss: 2.0615 - acc: 0.3120
Epoch 6/500
 - 2s - loss: 1.9390 - acc: 0.3270
Epoch 7/500
 - 2s - loss: 1.8382 - acc: 0.3460
Epoch 8/500
 - 2s - loss: 1.7547 - acc: 0.3750
Epoch 9/500
 - 2s - loss: 1.6758 - acc: 0.4160
Epoch 10/500
 - 2s - loss: 1.6026 - acc: 0.4480
Epoch 11/500
 - 2s - loss: 1.5329 - acc: 0.4670
Epoch 12/500
 - 2s - loss: 1.4723 - acc: 0.4950
Epoch 13/500
 - 2s - loss: 1.4189 - acc: 0.5050
Epoch 14/500
 - 2s - loss: 1.3637 - acc: 0.5480
Epoch 15/500
 - 2s - loss: 1.3189 - acc: 0.5540
Epoch 16/500
 - 2s - loss: 1.2704 - acc: 0.5840
Epoch 17/500
 - 2s - loss: 1.2293 - acc: 0.5820
Epoch 18/500
 - 2s - loss: 1.1956 - acc: 0.6100
Epoch 19/500
 - 2s - loss: 1.1453 - acc: 0.6230
Epoch 20/500
 - 2s - loss: 1.1202 - acc: 0.6430
Epoch 21/500
 - 2s - loss: 1.0811 - acc: 0.6570
E

Epoch 171/500
 - 2s - loss: 0.2481 - acc: 0.9230
Epoch 172/500
 - 2s - loss: 0.2577 - acc: 0.9240
Epoch 173/500
 - 2s - loss: 0.3267 - acc: 0.8890
Epoch 174/500
 - 2s - loss: 0.2454 - acc: 0.9260
Epoch 175/500
 - 2s - loss: 0.2465 - acc: 0.9230
Epoch 176/500
 - 2s - loss: 0.2496 - acc: 0.9250
Epoch 177/500
 - 2s - loss: 0.3219 - acc: 0.9130
Epoch 178/500
 - 2s - loss: 0.2452 - acc: 0.9330
Epoch 179/500
 - 2s - loss: 0.2433 - acc: 0.9260
Epoch 180/500
 - 2s - loss: 0.2511 - acc: 0.9110
Epoch 181/500
 - 2s - loss: 0.3451 - acc: 0.9010
Epoch 182/500
 - 2s - loss: 0.2358 - acc: 0.9340
Epoch 183/500
 - 2s - loss: 0.2526 - acc: 0.9280
Epoch 184/500
 - 2s - loss: 0.2490 - acc: 0.9260
Epoch 185/500
 - 2s - loss: 0.2645 - acc: 0.9160
Epoch 186/500
 - 2s - loss: 0.3463 - acc: 0.9030
Epoch 187/500
 - 2s - loss: 0.2306 - acc: 0.9390
Epoch 188/500
 - 2s - loss: 0.2315 - acc: 0.9340
Epoch 189/500
 - 2s - loss: 0.2399 - acc: 0.9330
Epoch 190/500
 - 2s - loss: 0.2335 - acc: 0.9370
Epoch 191/500
 - 2s 

 - 2s - loss: 0.2197 - acc: 0.9410
Epoch 339/500
 - 2s - loss: 0.1349 - acc: 0.9700
Epoch 340/500
 - 3s - loss: 0.1384 - acc: 0.9680
Epoch 341/500
 - 3s - loss: 0.1404 - acc: 0.9670
Epoch 342/500
 - 2s - loss: 0.1455 - acc: 0.9660
Epoch 343/500
 - 2s - loss: 0.1403 - acc: 0.9640
Epoch 344/500
 - 2s - loss: 0.1846 - acc: 0.9480
Epoch 345/500
 - 2s - loss: 0.1673 - acc: 0.9600
Epoch 346/500
 - 2s - loss: 0.1345 - acc: 0.9670
Epoch 347/500
 - 2s - loss: 0.1366 - acc: 0.9670
Epoch 348/500
 - 2s - loss: 0.1402 - acc: 0.9650
Epoch 349/500
 - 2s - loss: 0.1391 - acc: 0.9590
Epoch 350/500
 - 2s - loss: 0.1403 - acc: 0.9600
Epoch 351/500
 - 2s - loss: 0.1408 - acc: 0.9620
Epoch 352/500
 - 2s - loss: 0.2952 - acc: 0.9350
Epoch 353/500
 - 2s - loss: 0.1314 - acc: 0.9680
Epoch 354/500
 - 2s - loss: 0.1307 - acc: 0.9720
Epoch 355/500
 - 2s - loss: 0.1342 - acc: 0.9630
Epoch 356/500
 - 2s - loss: 0.1355 - acc: 0.9640
Epoch 357/500
 - 2s - loss: 0.1749 - acc: 0.9460
Epoch 358/500
 - 2s - loss: 0.2242

KeyboardInterrupt: 