# <font color="gold"><b>LSMT</b></font>

This [tutorial](https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/) is divided into four parts; they are:

- Univariate LSTM Models
    - [Data Preparation](#Data_Preparation)
    - [Vanilla LSTM](#Vanilla_LSTM)
    - [Stacked_LSTM](#Stacked_LSTM)
    - [Bidirectional_LSTM](#Bidirectional_LSTM)
    - [CNN_LSTM](#CNN_LSTM)
    - [ConvLSTM](#ConvLSTM)
- [Multivariate LSTM Models](#Multivariate_LSTM_Models)
    - Multiple Input Series
    - Multiple Parallel Series
- [Multi-Step LSTM Models](#Multi_Step_LSTM_Models)
    - Data Preparation
    - Vector Output Model
    - Encoder-Decoder Model
- Multivariate Multi-Step LSTM Models
    - Multiple Input Multi-Step Output.
    - Multiple Parallel Input and Multi-Step Output


# <font color="gold"><b>Univariate LSTM Models</b></font>
- [Data Preparation](#Data_Preparation)
- [Vanilla LSTM](#Vanilla_LSTM)
- [Stacked_LSTM](#Stacked_LSTM)
- [Bidirectional_LSTM](#Bidirectional_LSTM)
- [CNN_LSTM](#CNN_LSTM)
- [ConvLSTM](#ConvLSTM)

<a id="Data_Preparation"></a>
## <font color="teal"><b>Data Preparation</b></font>

In [27]:
# split a univariate sequence
def split_sequence(sequence, n_steps):
	X, y = list(), list()
	for i in range(len(sequence)):
		# find the end of this pattern
		end_ix = i + n_steps
		# check if we are beyond the sequence
		if end_ix > len(sequence)-1:
			break
		# gather input and output parts of the pattern
		seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
		X.append(seq_x)
		y.append(seq_y)
	return array(X), array(y)

In [33]:
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features = 1
X = X.reshape((X.shape[0], X.shape[1], n_features))

print(f"X:\n{X.flatten()} \n\ny:{y.flatten()}")

X:
[10 20 30 20 30 40 30 40 50 40 50 60 50 60 70 60 70 80] 

y:[40 50 60 70 80 90]


<a id="Vanilla_LSTM"></a>
## <font color="teal"><b>Vanilla LSTM</b></font>

In [26]:
# univariate lstm example
from numpy import array
from tensorflow import keras as keras
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense


# define model
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

# fit model
model.fit(X, y, epochs=200, verbose=0)

# demonstrate prediction
x_input = array([70, 80, 90])
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)


In [29]:
print(f"y:{y.flatten()} and yhat:{yhat.flatten()}")

y:[40 50 60 70 80 90] and yhat:[102.517494]


<a id="Stacked_LSTM"></a>
## <font color="teal"><b>Stacked LSTM</b></font>

Multiple hidden LSTM layers can be stacked one on top of another in what is referred to as a Stacked LSTM model
> However, LSTM layer requires a three-dimensional input and LSTMs by default will produce a two-dimensional output as an interpretation from the end of the sequence

`Stacking LSTM hidden layers makes the model deeper, more accurately earning the description as a deep learning technique
It is the depth of neural networks that is generally attributed to the success of the approach on a wide range of challenging prediction problems`

> Additional hidden layers can be added to a Multilayer Perceptron neural network to make it deeper<br>

> The additional hidden layers are understood to recombine the learned representation from prior layers and create new representations at high levels of abstraction<br>

> e.g. from lines to shapes to objects

- _We can address this by setting the <font color="green">return_sequences=True</font> argument on the layer and having the LSTM output a value for each time step in the input data_<br>
- _This allows us to have 3D output from hidden LSTM layer as input to the next_

We can therefore define a Stacked LSTM as follows:
```python
# define model
model = Sequential()
model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
```

In [34]:
# define model
model = Sequential()
# Pay attention to return_sequences=True
model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))
model.add(LSTM(50, activation='relu'))
# 1 neuron per feature col and total features = n_features
model.add(Dense(n_features))

model.compile(optimizer='adam', loss='mse')

# fit model
model.fit(X, y, epochs=200, verbose=0)

# demonstrate prediction
x_input = array([70, 80, 90])
# Making 3D array for LSTM input
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[102.34008]]


<a id="idirectional_LSTM"></a>
## <font color="teal"><b>Bidirectional LSTM</b></font>
On some sequence prediction problems, it can be beneficial to 
> allow the LSTM model to learn the input sequence both forward and backwards and concatenate both interpretations

> We can implement a Bidirectional LSTM for univariate time series forecasting by wrapping the first hidden layer in a wrapper layer called Bidirectional

- __In problems where all timesteps of the input sequence are available, Bidirectional LSTMs train two instead of one LSTMs on the input sequence.__<br><br>
- __The first on the input sequence as-is and the second on a reversed copy of the input sequence__<br><br>
- __This can provide additional context to the network and result in faster and even fuller learning on the problem.__

1) Bidirectional LSTMs are supported in Keras via the Bidirectional layer wrapper <br>
 - This wrapper takes a recurrent layer (e.g. the first LSTM layer) as an argument <br>
 
2) One could specify the merge mode, that is how the forward and backward outputs should be combined before being passed on to the next layer. The options are:
  - __‘sum‘__: The outputs are added together
  - __‘mul‘__: The outputs are multiplied together
  - __‘concat‘__: The outputs are concatenated together (the default), providing double the number of outputs to the next layer
  - __‘ave‘__: The average of the outputs is taken<br>
  
  
More Information in the [blog](https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras/)

In [36]:
# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
# choose a number of time steps
n_steps = 3
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features = 1
X = X.reshape((X.shape[0], X.shape[1], n_features))


from keras.layers import Bidirectional

# define model
model = Sequential()
model.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(n_steps, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

# fit model
model.fit(X, y, epochs=200, verbose=0)

# demonstrate prediction
x_input = array([70, 80, 90])
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)

print(yhat)

[[102.624146]]


<a id="CNN_LSTM"></a>
## <font color="teal"><b>CNN LSTM</b></font>
> A convolutional neural network, or CNN for short, is a type of neural network developed for working with two-dimensional image data, spatial data

`The CNN can be very effective at automatically extracting and learning features from one-dimensional sequence data such as univariate time series data`


_CNN LSTMs(also called as Long-term RCN)_ are a class of models that is both spatially and temporally deep, and has the flexibility to be applied to a variety of vision tasks involving sequential inputs and outputs

CNN LSTMs were developed for visual time series prediction problems and the application of generating textual descriptions from sequences of images, like:
- __Activity Recognition__: Generating a textual description of an activity demonstrated in a sequence of images
- __Image Description__: Generating a textual description of a single image
- __Video Description__: Generating a textual description of a sequence of images

`
CNN LSTM: A CNN model can be used in a hybrid model with an LSTM backend where the CNN is used to interpret subsequences of input that together are provided as a sequence to an LSTM model to interpret
`
> A CNN LSTM can be defined by adding CNN layers on the front end followed by LSTM layers with a Dense layer on the output <br>

> It is helpful to think of this architecture as defining two sub-models: the CNN Model for feature extraction and the LSTM Model for interpreting the features across time steps <br>

___
### <font color="yellow">__CNN Model__</font>
```python
cnn = Sequential()
cnn.add(Conv2D(1, (2,2), activation='relu', padding='same', input_shape=(10,10,1)))
cnn.add(MaxPooling2D(pool_size=(2, 2)))
cnn.add(Flatten())
```

- The snippet above expects to read in 10×10 pixel images with 1 channel (e.g. black and white)
- The Conv2D will read the image in 2×2 snapshots and output one new 10×10 interpretation of the image
- The MaxPooling2D will pool the interpretation into 2×2 blocks reducing the output to a 5×5 consolidation
- The Flatten layer will take the single 5×5 map and transform it into a 25-element vector ready for some other layer to deal with, such as a Dense for outputting a prediction
> All these makes sense for image classification and other computer vision tasks
___

### <font color="yellow">__LSTM Model__</font>
- The CNN model above is only capable of handling a single image, transforming it from input pixels into an internal matrix or vector representation
- We need to repeat this operation across multiple images and allow the LSTM to build up internal state and update weights using BPTT across a sequence of the internal vector representations of input images
- We want to apply the CNN model to each input image and pass on the output of each input image to the LSTM as a single time step
- __We can achieve this by wrapping the entire CNN input model (one layer or more) in a TimeDistributed layer__

```python
model.add(TimeDistributed(...))
model.add(LSTM(...))
model.add(Dense(...))
```
- __TimeDistributed layer__ layer achieves the desired outcome of applying the same layer or layers multiple times
- In this case, applying it multiple times to multiple input time steps and in turn providing a sequence of “image interpretations” or “image features” to the LSTM model to work on

___
### <font color="yellow">__CNN LSTM Model__</font>
We can define a CNN LSTM model in Keras by first defining the CNN layer or layers, wrapping them in a TimeDistributed layer and then defining the LSTM and output layers
```python
model = Sequential()
# define CNN model
model.add(TimeDistributed(Conv2D(...))
model.add(TimeDistributed(MaxPooling2D(...)))
model.add(TimeDistributed(Flatten()))
# define LSTM model
model.add(LSTM(...))
model.add(Dense(...))
```
1) Split the input sequences into subsequences that can be processed by the CNN model
 - e.g., we can first split our univariate time series data into input/output samples with four steps as input and one as output <br>
 
2) Each sample can then be split into two sub-samples, each with two time steps <br>
3) The CNN can interpret each subsequence of two time steps and provide a time series of interpretations of the subsequences to the LSTM model to process as input <br>

```python
# choose a number of time steps
n_steps = 4
# split into samples
X, y = split_sequence(raw_seq, n_steps)

# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features = 1
# number of subsequences as n_seq; number of time steps per subsequence as n_steps
n_seq = 2; n_steps = 2
X = X.reshape((X.shape[0], n_seq, n_steps, n_features))
```
- We want to reuse the same CNN model when reading in each sub-sequence of data separately
- This can be achieved by wrapping the entire CNN model in a TimeDistributed wrapper that will apply the entire model once per input, in this case, once per input subsequence
```python
model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, n_steps, n_features)))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
```

- The CNN model first has a convolutional layer for reading across the subsequence that requires a number of filters and a kernel size to be specified
- The number of filters is the number of reads or interpretations of the input sequence
- The kernel size is the number of time steps included of each ‘read’ operation of the input sequence
- The convolution layer is followed by a max pooling layer that distills the filter maps down to 1/2 of their size that includes the most salient features
- These structures are then flattened down to a single one-dimensional vector to be used as a single input time step to the LSTM layer
- Next, we can define the LSTM part of the model that interprets the CNN model’s read of the input sequence and makes a prediction
```python
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
```
___

In [41]:
from numpy import array

from keras.models import Sequential
from keras.layers import LSTM, Dense, Flatten, TimeDistributed
from keras.layers.convolutional import Conv1D, MaxPooling1D

# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
# choose a number of time steps
n_steps = 4
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, n_steps, n_features))
# define model
model = Sequential()
model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, n_steps, n_features)))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(X, y, epochs=500, verbose=0)
# demonstrate prediction
x_input = array([60, 70, 80, 90])
x_input = x_input.reshape((1, n_seq, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[101.208145]]


<a id = "Multivariate_LSTM_Models"></a>
# <font color="gold"><b>Multivariate LSTM Models</b></font>
- Multiple Input Series
- Multiple Parallel Series
<br>

### __A problem may have:__
- [x] two or more parallel input time series and
- [x] an output time series that is dependent on the input time series
> The input time series are parallel because each series has an observation at the same time steps


In [46]:
# multivariate data preparation
from numpy import array
from numpy import hstack

# define input sequence
in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])
in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])
out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])

# convert to [rows, columns] structure
in_seq1 = in_seq1.reshape((len(in_seq1), 1))
in_seq2 = in_seq2.reshape((len(in_seq2), 1))
out_seq = out_seq.reshape((len(out_seq), 1))
# horizontally stack columns
dataset = hstack((in_seq1, in_seq2, out_seq))
dataset

array([[ 10,  15,  25],
       [ 20,  25,  45],
       [ 30,  35,  65],
       [ 40,  45,  85],
       [ 50,  55, 105],
       [ 60,  65, 125],
       [ 70,  75, 145],
       [ 80,  85, 165],
       [ 90,  95, 185]])

<a id="Multi_Step_LSTM_Models"></a>
# <font color="gold"><b>Multi-Step LSTM Models</b></font>