# Chapter 9:
# How to develop LSTMs for Time Series Forecasting

After this tutorial, you will know:

* How to develop LSTM models fir univariate time series forecasting
* How to develop LSTM models fro multivariate time series forecasting
* How to develop LSTM models for multi-step time series forecasting

##  9.2 Univariate LSTM Models

This section is divided into six parts, they are:

1. Data Preparation
2. Vanilla LSTM
3. Stacked LSTM
4. Bidirectional LSTM
5. CNN-LSTM
6. ConvLSTM

### Data Preparation

In [2]:
# univariate data preparation
from numpy import array

#split a univariate sequence into samples
def split_sequence(sequence, n_steps):
    X, y = list(), list()
    for i in range(len(sequence)):
        # find the end of this pattern
        end_ix = i +n_steps
        if end_ix > len(sequence) - 1:
            break
        #gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

In [3]:
# define input sequence
raw_seq = array([i for i in range(10, 100, 10)])

In [4]:
# choose a number of time steps
n_steps = 3
# split into samples
X, y = split_sequence(raw_seq, n_steps)   

In [None]:
# summarize the data
for i in range(len(X)):
    print(X[i],y[i])

In [5]:
# reshape from [samples, timesteps] into [samples, timestepsm features]
n_features = 1
X = X.reshape((X.shape[0], X.shape[1], n_features))

### Vanilla LSTM

A Vanilla LSTm is an LSTM model that has a single hidden layer of LSTM units, and an output layer used to make a prediction. Key to LSTMs is that they offer native support for sequences. UNlike a CNN that reads across the entire input vector, the LSTM model reads one time step of the sequence at a time and builds up an internal state representation that can be used as a learned context for making a prediction

In [1]:
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense

In [7]:
# define model
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape = (n_steps, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

In [8]:
# fit model
model.fit(X, y, epochs=2000, verbose=0)

<keras.src.callbacks.history.History at 0x76f0dfed5760>

In [9]:
# demonstrare prediction
x_input = array([70,80,90])
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)

In [10]:
print(yhat)

[[100.45489]]


### Stacked LSTM

Multiple hidden LSTM layer can be stacked one on top another in what is referred to as a Stacked LSTM Model. An LSTM layer requieres a three-dimensional input and LSTMs by default will produce a two-dimensional output as an interpretation from the end of the sequence. We can address this by having the LSTM output a value for each time step in the input data by setting the `return_sequences=True` argument on the layer. This allow us to have 3D output from hidden LSTM layer as input to the next.

In [11]:
# define model 
model = Sequential()
model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

  super().__init__(**kwargs)


In [13]:
# fit model
model.fit(X, y, epochs=2000, verbose=0)

<keras.src.callbacks.history.History at 0x76f0de6d6510>

In [None]:
# demonstrare prediction
x_input = array([70,80,90])
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)

In [None]:
print(yhat)

### Bidirectional LSTM

On some sequence prediction probelsm, it can be beneficial to allow the LSTM model to learn the input sequence both forward and backwards and concatenate both interpretations. This is called a Bidirectional LSTM. We can implement a Bidirectional LSTM for univariate time series forecasting by wrapping the first hidden layer in a wrapper layer called Bidirectional.

In [14]:
from keras.layers import Bidirectional

In [15]:
# define model
model = Sequential()
model.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(n_steps, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

  super().__init__(**kwargs)


In [16]:
# fit model
model.fit(X, y, epochs=2000, verbose=0)

<keras.src.callbacks.history.History at 0x76f0b8264c80>

In [17]:
# demonstrare prediction
x_input = array([70,80,90])
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)

In [18]:
print(yhat)

[[100.3858]]


### CNN-LSTM

A convolutional neural network, or CNN for short, is a type of neural network developed for working with two-dimensional image data. The CNN can be very effective at automatically extracting and learning features from one-dimensional sequence data such as univariate time series data. A CNN model can be used in a hybrid model with an LSTM backend where the CNN is used to interpret subsequences of input that together are provided as a sequence to an LSTM model to interpret. THis hybrid model si called CNN-LSTM.

The first step is to split the input sequences into subsequences that can be processed by the CNN model. For example, we can first split our univariate time series data into input/output samples with four steps as input and one as output. EAch sample can be then split into two sub-samples, each with two time steps. The CNN can interpret each subsequence of two time steps and provide a time series of interpretations of the subseqeunces to the LSTM model to process as input. We can parametrize this and define the numbe of subsequences as `n_seq` and the number of time steps per subsequence as `n_steps`. The unput data can then be reshape to have the required structure: `[samples, subsequences, timesteps, features]`.

In [19]:
# choose a number of time steps
n_steps = 4
# split inot samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, n_steps, n_features))

We want to reuse the same CNN model when reading in each sub-sequence of data separately.
This can be achieved bu wrapping the entire CNN model in a `TimeDistributed` wrapper that will apply the entire model once per input, in this case, once per input subsequence. The CNN models first has a convolutional layer for reading across the subsequence that requires a number of filters and a kernel size to be specified. The number of filters is the number of reads or interpretations of the input sequence. The kernel size is the number of time steps included of each rad operation of the input seqeunce, The convolution layer is followed by a max pooling layer that distills the filter maps down to 1/4 of their size that includes the most salient featuees. These structures are then flattend down to a single one-dimensional vector to be used as a single input time step to the LSTM layer.

```python
# define the input cnn model
model.add(TimeDistributed(Conv1D(64, 1, activation = 'relu'), input_shape=(None, n_steps, n_features)))
model.add(TimeDistributed(MaxPooling1D()))
model.add(TImeDistributed(Flatten))

Next, we can define the LSTM part of the model that interprets the CNN model's read of the input sequence and makes a prediction

```python
# define the output model
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))

In [21]:
from keras.layers import TimeDistributed
from keras.layers import Conv1D
from keras.layers import MaxPooling1D
from keras.layers import Flatten

In [23]:
# define model
model = Sequential()
# define the input cnn model
model.add(TimeDistributed(Conv1D(64, 1, activation = 'relu'), input_shape=(None, n_steps, n_features)))
model.add(TimeDistributed(MaxPooling1D()))
model.add(TimeDistributed(Flatten()))
# define the output model
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam',loss='mse')

In [24]:
# fit model
model.fit(X, y, epochs=500, verbose=0)

<keras.src.callbacks.history.History at 0x76f0b8b0f7a0>

In [25]:
# demonstrare prediction
x_input = array([60,70,80,90])
x_input = x_input.reshape((1, n_seq, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)

In [26]:
print(yhat)

[[100.841805]]


### ConvLSTM

A type of LSTM related to the CNN-LSTM is the ConvLSTM, where the convolutional reading od input is built directly into each LSTM unit. The ConvLSTM was developed for reading two-dimensional spatial-temporal data, but can be adapted for use with univariate time series forecasting. The layer expects input as a sequence of two-dimensionalimages, therefore the shape of input data must be: `[samples, timesteps, rows, columns, features]`.


For our purposed, we can split each sample into subsequences where timesteps will become the number of subsequences, or `n_seq`, and colunmns will be the number of time steps for each subseqeunce, or `n_steps`. The number of rows is fixed at 1 as we are working with one-dimensional data. We can now reshape the prepared samples into the required structure.

In [27]:
# choose a number of time steps
n_steps = 4
# split into samples
x, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, timesteps, rows, columns, features]
n_features = 1
n_seq =2
n_steps = 2 
X = X.reshape((X.shape[0], n_seq, 1, n_steps, n_features))

We can define the ConvLSTM as a single layer in terms of the number of filters and a two-dimensional kernel size in terms of `(rows, columns)`. As we are working with a one-dimensional series, the number of rows is always fixed to 1 in the kernel. THe output of the model must then be flattened before it can be interpreted and a prediction made.

In [29]:
from keras.layers import ConvLSTM2D

In [30]:
# define model
model = Sequential()
# define the input cnnlstm model
model.add(ConvLSTM2D(64, (1,2), activation='relu', input_shape=(n_seq, 1, n_steps, n_features)))
model.add(Flatten())
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

  super().__init__(**kwargs)


In [31]:
# fit model
model.fit(X,y, epochs=500, verbose=0)

<keras.src.callbacks.history.History at 0x76f0de6d77a0>

In [32]:
# demonstrare prediction
x_input = array([60,70,80,90])
x_input = x_input.reshape((1, n_seq, 1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)

In [33]:
print(yhat)

[[100.48329]]


## 9.3 Multivariate LSTM Models

Multivariate time series data means data where there is more than one observation for each time step. There are two main models that we may require with multivariate time series data; they are:

1. Multiple Input Series
2. Multiple Parallel Series

### Multiple Input Series

A problem may have two or more parallel input time series and an output time series that is dependent on the input time series. The input time series are parallel beacuse each series has an observation at the same time steps. We can demonstrate this with a simple example of two parallel input time series where the output series is the simple addition of the input series.

In [34]:
from numpy import array
# define input sequence
in_seq1 = array([i for i in range(10, 100, 10)])
in_seq2 = array([i for i in range(15, 100, 10)])
out_seq = array([in_seq1[i] + in_seq2[i] for i in range(len(in_seq1))])

We can reshape these three arrays of data as a single dataset where each row is a time step, and each column is a separate time series.

In [35]:
from numpy import hstack
# convert to [row, columns] structure

in_seq1 = in_seq1.reshape((len(in_seq1), 1))
in_seq2 = in_seq2.reshape((len(in_seq2), 1))
out_seq = out_seq.reshape((len(out_seq), 1))

#horizontally stack columns
dataset = hstack((in_seq1, in_seq2, out_seq))

In [36]:
print(dataset)

[[ 10  15  25]
 [ 20  25  45]
 [ 30  35  65]
 [ 40  45  85]
 [ 50  55 105]
 [ 60  65 125]
 [ 70  75 145]
 [ 80  85 165]
 [ 90  95 185]]


In [37]:
# split a multivariate sequence into samples
def split_sequences(sequences, n_steps):
    X, y = list(), list()
    for i in range(len(sequences)):
        # find the end of this pattern
        end_ix = i + n_steps
        # check if we are beyond the dataset
        if end_ix > len(sequences):
            break
        # gather input and output parts of the pattern
        seq_x, seq_y =  sequences[i:end_ix, :-1], sequences[end_ix-1, -1]
        X.append(seq_x)
        y.append(seq_y)

    return array(X), array(y)

In [38]:
# choose a number of time steps
n_steps = 3
# convert into input/output
X,  y = split_sequences(dataset, n_steps)

As with the univariate tiem series, we must structure these data into samples with input and output elements. An LSTM model needs sufficient context to learn a mapping from an input sequence to an output value. LSTMs can support parallel input time series as separate variables or features. Therefore, we need to split the data into samples maintaining the order of observations across the two input sequences. If we chose three input time steps, then the first sample would look like:

In [39]:
for i in range(len(X)):
    print(X[i], y[i])

[[10 15]
 [20 25]
 [30 35]] 65
[[20 25]
 [30 35]
 [40 45]] 85
[[30 35]
 [40 45]
 [50 55]] 105
[[40 45]
 [50 55]
 [60 65]] 125
[[50 55]
 [60 65]
 [70 75]] 145
[[60 65]
 [70 75]
 [80 85]] 165
[[70 75]
 [80 85]
 [90 95]] 185


That is, the first three time steps of each parallel series are provided as input to the model and the model associates this with the value in the output series at the third time step, in this case, 65. 
We can see that, in transforming the time series into input/output samples to train the model, that we will have to discard some values from the output time series where we do not have values in th einput time series at prior time steps. IN turn, the choice of the size of the number of input time steps will have an important effect on how much of the training data is used.

In [40]:
print(X.shape)
print(y.shape)

(7, 3, 2)
(7,)


We can see that the X component is three-dimensional. THe first dimension is the number of samples, in thsi case 7. The second dimension is the number of time steps per sample, in this case 3, the value specified to the function. Finally, the last dimension specifies the number of parallel time serie or the number of variables, in this case 2 for the two parallel series.

This is the exact three-dimensional structure expected by an LSTM as input.

W ea re now ready to fit an LSTM model on this data. We will use a Vanilla LSTM where the number of time steps and parallel series (features) are specified for the input layer via the `input_shape` argument.

In [50]:
# define model
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

  super().__init__(**kwargs)


In [51]:
model.fit(X, y, epochs =200, verbose=0)

<keras.src.callbacks.history.History at 0x76f0b90c6180>

In [43]:
x_input = array([[80,85], [90,95], [100,105]])

In [46]:
X[0].shape

(3, 2)

In [44]:
x_input

array([[ 80,  85],
       [ 90,  95],
       [100, 105]])

In [47]:
n_features = 2

In [52]:
# demonstrate prediction
x_input = array([[80,85], [90,95], [100,105]])
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose = 0)



In [53]:
print(yhat)

[[207.4449]]


### Multiple Parallel Series

An alternate time series problem is the case where there are multiple parallel time series and a avalue must be predicted for each.

We may want to predict the value for each of the three time series for the next time step. This might be referred to as multivariate forecasting. Again, the data must be split inot input/output samples in order to train the model.

In [54]:
# split a multivariate sequence into samples
def split_sequences(sequences, n_steps):
    X, y = list(), list()
    for i in range(len(sequences)):
        # find the end of this pattern
        end_ix = i + n_steps
        # check if we are beyond the dataset
        if end_ix > len(sequences)-1:
            break
        # gather input and output parts of the pattern
        seq_x, seq_y =  sequences[i:end_ix, :], sequences[end_ix, :]
        X.append(seq_x)
        y.append(seq_y)

    return array(X), array(y)

In [55]:
from numpy import array
# define input sequence
in_seq1 = array([i for i in range(10, 100, 10)])
in_seq2 = array([i for i in range(15, 100, 10)])
out_seq = array([in_seq1[i] + in_seq2[i] for i in range(len(in_seq1))])

from numpy import hstack
# convert to [row, columns] structure

in_seq1 = in_seq1.reshape((len(in_seq1), 1))
in_seq2 = in_seq2.reshape((len(in_seq2), 1))
out_seq = out_seq.reshape((len(out_seq), 1))

#horizontally stack columns
dataset = hstack((in_seq1, in_seq2, out_seq))

In [56]:
# choose a number of time steps
n_steps = 3
# convert into input/output
X, y = split_sequences(dataset, n_steps)

In [57]:
print(X.shape, y.shape)

(6, 3, 3) (6, 3)


The shape of X is three-dimensional, including the number of samples (6), the number of time steps chosen per sample (3), and the number of parallel time series or features (3). The sahpe of y is two-dimensional as we might expect for the number of samples (6) and the number of variables per sample to be predicted (3).

The data is ready to use in an LSTM model that expects three-dimensional input and two-dimensional output shapes for the X and y components of each samples.

In [None]:
# summarize the data
for i in range(len(X)):
    print(X[i], y[i])

[[10 15 25]
 [20 25 45]
 [30 35 65]] [40 45 85]
[[20 25 45]
 [30 35 65]
 [40 45 85]] [ 50  55 105]
[[ 30  35  65]
 [ 40  45  85]
 [ 50  55 105]] [ 60  65 125]
[[ 40  45  85]
 [ 50  55 105]
 [ 60  65 125]] [ 70  75 145]
[[ 50  55 105]
 [ 60  65 125]
 [ 70  75 145]] [ 80  85 165]
[[ 60  65 125]
 [ 70  75 145]
 [ 80  85 165]] [ 90  95 185]


We are now ready to fit an LSTM model on this data. We will use a Stacked LSTM where the number of time steps and parallel series (features) are specified for the input layer via the `input_shape` argument. The number of parallel series is also used in the specification of the number of values to predict by the model in the output layer; again, this is three.

In [61]:
n_features = X.shape[2]

In [62]:
# define model 
model = Sequential()
model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))
model.add(LSTM(100, activation='relu'))
model.add(Dense(n_features))
model.compile(optimizer='adam', loss='mse')

  super().__init__(**kwargs)


In [63]:
model.fit(X,y, epochs = 400, verbose= 0)

<keras.src.callbacks.history.History at 0x76f0b8f0f7a0>

In [64]:
# demonstrate prediction
x_input = array([[70,75,145], [80,85,165],[90,95,185]])
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)



In [65]:
print(yhat)

[[100.24531  105.234764 205.46284 ]]


## 9.4 Multi-step LSTM Models

A time series forecasting probelm that requires a prediction of multiple time steps into the future can be referred to as multi-step time series forecasting. Specifically, these are problems where teh forecast horizon or interval is more than one time step. There are two main types of LSTM models that can be used for multi-step forecasting; they are:

1. Vector Ouput Model
2. Encoder-Decoder Model

### Data Preparation

As with one-step forecasting, a time series used for multi-step time series forecasting must be split into samples with input and output components.

In [76]:
# split a univariate sequence into samples
def split_sequence(sequence, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequence)):
        # find the end of this patter
        end_ix = i + n_steps_in
        out_end_ix = end_ix + n_steps_out
        # check if we are beyond the sequence
        if out_end_ix > len(sequence):
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

In [77]:
# define input sequence
raw_seq = array([i for i in range(10, 100, 10)])

In [78]:
# choose a number of time steps
n_steps_in = 3
n_steps_out = 2
# split into samples
X, y = split_sequence(raw_seq, n_steps_in, n_steps_out)   

In [79]:
# summarize the data
for i in range(len(X)):
    print(X[i], y[i])

[10 20 30] [40 50]
[20 30 40] [50 60]
[30 40 50] [60 70]
[40 50 60] [70 80]
[50 60 70] [80 90]


Now that we know how to prepare data for multi-step forecasting, let's look at some LSTM models that can learn this mapping

### Vector Output Model

Like other types of nerual network models, the LSTM can output a vector directyl that can be interpreted directly as a multi-step forecast. THis approach was seen in the previous section were one time step of each output time serieas was forecasted as a vector. As with the LSTMs for univariate data in a prior section, the prepared smaples must first be reshaped. The LSTM expects data to have a three-dimensional strcuture of `[samples, timesteps, features]`, and in this case, we only have one feature so the reshape is straightforward.

In [80]:
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features = 1
X = X.reshape((X.shape[0], X.shape[1], n_features))

With the number of onput and outpu steps specified in the `n_steps_in` and `n_steps_out` variables, we can define a multi-step time-series ofrecasting model. Below defines a Stacked LSTM for multi-step forecasting.

In [81]:
# define mode
model = Sequential()
model.add(LSTM(100, activation = 'relu', return_sequences=True, input_shape=(n_steps_in, n_features)))
model.add(LSTM(100, activation='relu'))
model.add(Dense(n_steps_out))
model.compile(optimizer='adam', loss='mse')

  super().__init__(**kwargs)


In [82]:
model.fit(X,y, epochs = 50,verbose= 0)

<keras.src.callbacks.history.History at 0x76f0b90c73b0>

As expected by the model, the shape of the single sample of input data when making the prediction must be `[1,3,1]` for the 1 sample, 3 time steps of the input, and the single feature.

In [83]:
# demonstrate prediction
x_input = array([70,80,90])
x_input = x_input.reshape((1, n_steps_in, n_features))
yhat = model.predict(x_input, verbose = 0)

In [85]:
print(yhat)

[[104.6682  122.07481]]


### Encode-Decoder Model

A model specifically developed for forecasting variable lenght output sequences is called the Encoder-Decoder LSTM. The model was designed for predictio problems where there are both input and output sequences, so-called sequence-to-sequence, or seq2seq problems, such as translating test from one language to another. This model can be used for multi-step time series forecasting. As it name suggests, the model is comprised of two sub-models: the encoder and the decoder.

The encoder is a model responsible for reading and interpreting the input sequence. The ouput od the encoder is a fixed lenght vector that represents the model's interpretation of the sequence. The encoder is traditionally a Vanilla LSTM model, although other encoder models can be used such as Stacked, Bidirectional, and CNN models.