## Target

* The benefit of deep neural network architectures.
* The Stacked LSTM recurrent neural network architecture.
* How to implement stacked LSTMs in Python with Keras.

## Stacked LSTM Architecture

Stacked LSTMs are now a stable technique for challenging sequence prediction problems. A Stacked LSTM architecture can be defined as an LSTM model comprised of multiple LSTM layers. An LSTM layer above provides a sequence output rather than a single value output to the LSTM layer below. Specifically, one output per input time step, rather than one output time step for all input time steps.


<img src='stacked_lstm_image/architecture_stacked_lstm.png' >

We can easily create Stacked LSTM models in Keras Python deep learning library

Each LSTMs memory cell requires a 3D input. When an LSTM processes one input sequence of time steps, each memory cell will output a single value for the whole sequence as a 2D array.

We can demonstrate this below with a model that has a single hidden LSTM layer that is also the output layer.

In [4]:
# Example of one output for whole sequence
from keras.models import Sequential
from keras.layers import LSTM
from numpy import array
# define model where LSTM is also output layer
model = Sequential()
model.add(LSTM(1, input_shape=(3,1)))
model.compile(optimizer='adam', loss='mse')
# input time steps
data = array([0.1, 0.2, 0.3]).reshape((1,3,1))
# make and show prediction
print(model.predict(data))

[[0.08597416]]


The 2 dimemsion for output array is: [batch,output]


if we input 2 batch of data

In [5]:
# Example of one output for whole sequence
from keras.models import Sequential
from keras.layers import LSTM
from numpy import array
# define model where LSTM is also output layer
model = Sequential()
model.add(LSTM(1, input_shape=(3,1)))
model.compile(optimizer='adam', loss='mse')
# input time steps
data = array([[0.1, 0.2, 0.3],[0.11, 0.21, 0.31]]).reshape((2,3,1))
# make and show prediction
print(model.predict(data))

[[-0.06690586]
 [-0.06988746]]


To stack LSTM layers, we need to change the configuration of the prior LSTM layer to output a 3D array as input for the subsequent layer.

We can do this by setting the return_sequences argument on the layer to True (defaults to False). This will return one output for each input time step and provide a 3D array.
Below is the same example as above with return_sequences=True.

In [6]:
# Example of one output for each input time step
from keras.models import Sequential
from keras.layers import LSTM
from numpy import array
# define model where LSTM is also output layer
model = Sequential()
model.add(LSTM(1, return_sequences=True, input_shape=(3,1)))
model.compile(optimizer='adam', loss='mse')
# input time steps
data = array([0.1, 0.2, 0.3]).reshape((1,3,1))
# make and show prediction
print(model.predict(data))

[[[0.01198609]
  [0.03274481]
  [0.05974784]]]


Below is an example of defining a 2 hidden layer Stacked LSTM:

In [17]:
# Example of one output for each input time step
from keras.models import Sequential
from keras.layers import LSTM,Dense
from numpy import array
# define model where LSTM is also output layer
model = Sequential()
model.add(LSTM(1, return_sequences=True, input_shape=(3,1)))
model.add(LSTM(1, return_sequences=False, input_shape=(3,1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# input time steps
data = array([0.1, 0.2, 0.3]).reshape((1,3,1))
# make and show prediction
print(model.predict(data))

[[0.02405564]]


** The Following is not related to topic, I just try to understand output shape of every layer **

In [21]:
# Example of one output for each input time step
from keras.models import Sequential
from keras.layers import LSTM,Dense
from numpy import array
# define model where LSTM is also output layer
model = Sequential()
model.add(LSTM(1, return_sequences=True, input_shape=(3,1)))
model.add(LSTM(1, return_sequences=False, input_shape=(3,1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# input time steps
data = array([[0.1, 0.2, 0.3],[0.11, 0.21, 0.31]]).reshape((2,3,1))
# make and show prediction
outputs = model.predict(data)
print(outputs)
print(outputs.shape)

[[4.2805441e-05]
 [4.5176832e-05]]
(2, 1)


In [22]:
# Example of one output for each input time step
from keras.models import Sequential
from keras.layers import LSTM,Dense
from numpy import array
# define model where LSTM is also output layer
model = Sequential()
model.add(LSTM(1, return_sequences=True, input_shape=(3,1)))
model.add(LSTM(1, return_sequences=True, input_shape=(3,1)))
model.add(Dense(5))
model.compile(optimizer='adam', loss='mse')
# input time steps
data = array([[0.1, 0.2, 0.3],[0.11, 0.21, 0.31]]).reshape((2,3,1))
# make and show prediction
outputs = model.predict(data)
print(outputs)
print(outputs.shape)

[[[-0.00278195  0.00287771  0.00258103  0.00352587  0.00172118]
  [-0.00955357  0.00988243  0.00886359  0.0121083   0.00591075]
  [-0.02063208  0.02134227  0.01914199  0.02614931  0.01276497]]

 [[-0.00304709  0.00315198  0.00282702  0.00386192  0.00188522]
  [-0.01020056  0.01055168  0.00946386  0.0129283   0.00631104]
  [-0.02169035  0.02243697  0.02012382  0.02749057  0.01341972]]]
(2, 3, 5)


In [27]:
# Example of one output for each input time step
from keras.models import Sequential
from keras.layers import LSTM,Dense
from numpy import array
# define model where LSTM is also output layer
model = Sequential()
model.add(LSTM(1, return_sequences=True, input_shape=(3,1)))
model.add(LSTM(3, return_sequences=True, input_shape=(3,1)))
# model.add(Dense(5))
model.compile(optimizer='adam', loss='mse')
# input time steps
data = array([[0.1, 0.2, 0.3],[0.11, 0.21, 0.31]]).reshape((2,3,1))
# make and show prediction
outputs = model.predict(data)
print(outputs)
print(outputs.shape)

[[[0.00098834 0.0017503  0.00147278]
  [0.00327409 0.00556307 0.00499001]
  [0.00671013 0.01102901 0.0104747 ]]

 [[0.00108394 0.00192047 0.0016161 ]
  [0.0034911  0.00592722 0.00532668]
  [0.00702963 0.01153969 0.01098973]]]
(2, 3, 3)
