The sequence prediction problem involves learning to predict the next step in the following 10-step sequence

In [6]:
length = 10
sequence = [i / float(length) for i in range(length)]
print(sequence)

[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]


We must convert the sequence to a supervised learning problem. That means when 0.0 is shown as an input pattern, the network must learn to predict the next step as 0.1.

In [5]:
from pandas import concat
from pandas import DataFrame

# create sequence
length = 10
sequence = [i/float(length) for i in range(length)]

# create X/y pairs
df = DataFrame(sequence)
df = concat([df, df.shift(1)], axis=1)
df.dropna(inplace=True)
print(df)


     0    0
1  0.1  0.0
2  0.2  0.1
3  0.3  0.2
4  0.4  0.3
5  0.5  0.4
6  0.6  0.5
7  0.7  0.6
8  0.8  0.7
9  0.9  0.8


We will be using a recurrent neural network called a long short-term memory network to learn the sequence. As such, we must transform the input patterns from a 2D array (1 column with 9 rows) to a 3D array comprised of [rows, timesteps, columns] where timesteps is 1 because we only have one timestep per observation on each row.

In [7]:
# convert to LSTM friendly format
values = df.values
X, y = values[:, 0], values[:, 1]
X = X.reshape(len(X), 1, 1)
print(X.shape, y.shape)

(9, 1, 1) (9,)


The training batch size will cover the entire training dataset (batch learning) and predictions will be made one at a time (one-step prediction).

We will use an LSTM network fit for 1000 epochs.

The weights will be updated at the end of each training epoch (batch learning) meaning that the batch size will be equal to the number of training observations (9).

For these experiments, we will require fine-grained control over when the internal state of the LSTM is updated. Normally LSTM state is cleared at the end of each batch in Keras, but we can control it by making the LSTM stateful and calling model.reset_state() to manage this state manually. This will be needed in later sections.

The network has one input, a hidden layer with 10 units, and an output layer with 1 unit. The default tanh activation functions are used in the LSTM units and a linear activation function in the output layer.

A mean squared error optimization function is used for this regression problem with the efficient ADAM optimization algorithm.

In [19]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import LSTM, Dense

# configure network
n_batch = len(X)
n_epoch = 1000
n_neurons = 10

# design network
model = Sequential()
model.add(
    LSTM(n_neurons,
         batch_input_shape=(n_batch, X.shape[1], X.shape[2]),
         stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')

We will fit the network to all of the examples each epoch and reset the state of the network at the end of each epoch manually.

In [22]:
# fit network
for i in range(n_epoch):
    model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
    model.reset_states()

Finally, we will forecast each step in the sequence one at a time.

This requires a batch size of 1, that is different to the batch size of 9 used to fit the network, and will result in an error when the example is run.

In [23]:
# online forecast
for i in range(len(X)):
    testX, testy = X[i], y[i]
    testX = testX.reshape(1, 1, 1)
    yhat = model.predict(testX, batch_size=1)
    print('>Expected=%.1f, Predicted=%.1f' % (testy, yhat))

ValueError: in user code:

    /home/nikhil/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:1147 predict_function  *
        outputs = self.distribute_strategy.run(
    /home/nikhil/.local/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:951 run  **
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /home/nikhil/.local/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2290 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /home/nikhil/.local/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2649 _call_for_each_replica
        return fn(*args, **kwargs)
    /home/nikhil/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:1122 predict_step  **
        return self(x, training=False)
    /home/nikhil/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py:885 __call__
        input_spec.assert_input_compatibility(self.input_spec, inputs,
    /home/nikhil/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/input_spec.py:224 assert_input_compatibility
        raise ValueError('Input ' + str(input_index) +

    ValueError: Input 0 is incompatible with layer sequential_3: expected shape=(9, None, 1), found shape=[1, 1, 1]


### Solution 1: Online Learning (Batch Size = 1)


One solution to this problem is to fit the model using online learning.

This is where the batch size is set to a value of 1 and the network weights are updated after each training example.

This can have the effect of faster learning, but also adds instability to the learning process as the weights widely vary with each batch.

Nevertheless, this will allow us to make one-step forecasts on the problem. The only change required is setting n_batch to 1 as follows:

In [24]:
from pandas import DataFrame
from pandas import concat
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM 

# create sequence
length = 10
sequence = [i / float(length) for i in range(length)]

# create X/y pairs
df = DataFrame(sequence)
df = concat([df, df.shift(1)], axis=1)
df.dropna(inplace=True)

# convert to LSTM friendly format
values = df.values
X, y = values[:, 0], values[:, 1]
X = X.reshape(len(X), 1, 1)

# configure network
n_batch = 1
n_epoch = 1000
n_neurons = 10

# design network
model = Sequential()
model.add(
    LSTM(n_neurons,
         batch_input_shape=(n_batch, X.shape[1], X.shape[2]),
         stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')

# fit network
for i in range(n_epoch):
    model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
    model.reset_states()

# online forecast
for i in range(len(X)):
    testX, testy = X[i], y[i]
    testX = testX.reshape(1, 1, 1)
    yhat = model.predict(testX, batch_size=1)
    print('>Expected=%.1f, Predicted=%.1f' % (testy, yhat))

>Expected=0.0, Predicted=0.0
>Expected=0.1, Predicted=0.1
>Expected=0.2, Predicted=0.2
>Expected=0.3, Predicted=0.3
>Expected=0.4, Predicted=0.4
>Expected=0.5, Predicted=0.5
>Expected=0.6, Predicted=0.6
>Expected=0.7, Predicted=0.7
>Expected=0.8, Predicted=0.8


### Solution 2: Batch Forecasting (Batch Size = N)

Another solution is to make all predictions at once in a batch.

This would mean that we could be very limited in the way the model is used.

We would have to use all predictions made at once, or only keep the first prediction and discard the rest.


We can adapt the example for batch forecasting by predicting with a batch size equal to the training batch size, then enumerating the batch of predictions, as follows:

In [25]:
from pandas import DataFrame
from pandas import concat
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# create sequence
length = 10
sequence = [i / float(length) for i in range(length)]

# create X/y pairs
df = DataFrame(sequence)
df = concat([df, df.shift(1)], axis=1)
df.dropna(inplace=True)

# convert to LSTM friendly format
values = df.values
X, y = values[:, 0], values[:, 1]
X = X.reshape(len(X), 1, 1)

# configure network
n_batch = len(X)
n_epoch = 1000
n_neurons = 10

# design network
model = Sequential()
model.add(
    LSTM(n_neurons,
         batch_input_shape=(n_batch, X.shape[1], X.shape[2]),
         stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')

# fit network
for i in range(n_epoch):
    model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
    model.reset_states()

    # batch forecast
yhat = model.predict(X, batch_size=n_batch)
for i in range(len(y)):
    print('>Expected=%.1f, Predicted=%.1f' % (y[i], yhat[i]))

>Expected=0.0, Predicted=0.0
>Expected=0.1, Predicted=0.1
>Expected=0.2, Predicted=0.2
>Expected=0.3, Predicted=0.3
>Expected=0.4, Predicted=0.4
>Expected=0.5, Predicted=0.5
>Expected=0.6, Predicted=0.6
>Expected=0.7, Predicted=0.7
>Expected=0.8, Predicted=0.8


### Solution 3: Copy Weights
    
A better solution is to use different batch sizes for training and predicting.

The way to do this is to copy the weights from the fit network and to create a new network with the pre-trained weights.

We can do this easily enough using the get_weights() and set_weights() functions in the Keras API, as follows:

In [35]:
from pandas import DataFrame
from pandas import concat
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM

# create sequence
length = 10
sequence = [i / float(length) for i in range(length)]

# create X/y pairs
df = DataFrame(sequence)
df = concat([df, df.shift(1)], axis=1)
df.dropna(inplace=True)

# convert to LSTM friendly format
values = df.values
X, y = values[:, 0], values[:, 1]
X = X.reshape(len(X), 1, 1)

# configure network
n_batch = len(X)
n_epoch = 1000
n_neurons = 10

# design network
model = Sequential()
model.add(
    LSTM(n_neurons,
         batch_input_shape=(n_batch, X.shape[1], X.shape[2]),
         stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')

# fit network
for i in range(n_epoch):
    model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
    model.reset_states()

# re-define the batch size
n_batch = 1

# re-define model
new_model = Sequential()
new_model.add(
    LSTM(n_neurons,
         batch_input_shape=(n_batch, X.shape[1], X.shape[2]),
         stateful=True))
new_model.add(Dense(1))

# copy weights
old_weights = model.get_weights()
new_model.set_weights(old_weights)

# compile model
new_model.compile(loss='mean_squared_error', optimizer='adam')

# online forecast
for i in range(len(X)):
    testX, testy = X[i], y[i]
    testX = testX.reshape(1, 1, 1)
    yhat = new_model.predict(testX, batch_size=n_batch)
    print('>Expected=%.1f, Predicted=%.1f' % (testy, yhat))

>Expected=0.0, Predicted=0.0
>Expected=0.1, Predicted=0.1
>Expected=0.2, Predicted=0.2
>Expected=0.3, Predicted=0.4
>Expected=0.4, Predicted=0.6
>Expected=0.5, Predicted=0.9
>Expected=0.6, Predicted=1.1
>Expected=0.7, Predicted=1.4
>Expected=0.8, Predicted=1.6
