<a href="https://colab.research.google.com/github/mahesh-keswani/ML-DL-Basics/blob/main/TimeDistributedLayer_LSTM_Types.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Long Short-Term Networks or LSTMs are a popular and powerful type of Recurrent Neural Network, or RNN.
# They can be quite difficult to configure and apply to arbitrary sequence prediction problems, even with well defined
# and “easy to use” interfaces like those provided in the Keras deep learning library in Python.

# One reason for this difficulty in Keras is the use of the TimeDistributed wrapper layer and the need for 
# some LSTM layers to return sequences rather than single values.

<h1>Sequence Learning Problem
</h1>

In [2]:
# We will use a simple sequence learning problem to demonstrate the TimeDistributed layer.

# In this problem, the sequence [0.0, 0.2, 0.4, 0.6, 0.8] will be given as input, one item at a time 
# and must be in turn returned as output, one item at a time.

# Think of it as learning a simple echo program. We give 0.0 as input, we expect to see 0.0 as output, 
# repeated for each item in the sequence.

In [3]:
from numpy import array
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM

In [4]:
length = 5
seq = array([i/float(length) for i in range(length)])
print(seq)

[0.  0.2 0.4 0.6 0.8]


In [5]:
# The example is configurable and you can play with longer/shorter sequences yourself later if you like.

<h1>One-to-One LSTM for Sequence Prediction</h1>

In [6]:
# Before we dive in, it is important to show that this sequence learning problem can be learned piecewise.

# That is, we can reframe the problem into a dataset of input-output pairs for each item in the sequence. 
# Given 0, the network should output 0, given 0.2, the network must output 0.2, and so on.

# This is the simplest formulation of the problem and requires the sequence to be split into input-output pairs 
# and for the sequence to be predicted one step at a time.

# The input-output pairs are as follows:
# X, 	y
# 0.0,	0.0
# 0.2,	0.2
# 0.4,	0.4
# 0.6,	0.6
# 0.8,	0.8

# The input for LSTMs must be three dimensional. We can reshape the 2D sequence into a 3D sequence with 
# 5 samples, 1 time step, and 1 feature. We will define the output as 5 samples with 1 feature.

X = seq.reshape(5, 1, 1)
y = seq.reshape(5, 1)

In [7]:
# We will define the network model as having 1 input with 1 time step. 
# The first hidden layer will be an LSTM with 5 units. The output layer with be a fully-connected layer with 1 output.

# define LSTM configuration
n_neurons = length  # 5
n_batch = length  # 5
n_epoch = 1000

# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(1, 1)))
model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm (LSTM)                  (None, 5)                 140       
_________________________________________________________________
dense (Dense)                (None, 1)                 6         
Total params: 146
Trainable params: 146
Non-trainable params: 0
_________________________________________________________________
None


In [9]:
# We can see that the LSTM layer has 140 parameters. This is calculated based on the number of inputs (1) 
# and the number of outputs (5 for the 5 units in the hidden layer), as follows
#                                   bias terms
# n = 4 * ((inputs + 1) * outputs + outputs^2)
# n = 4 * ((1 + 1) * 5 + 5^2)
# n = 4 * 35
# n = 140

# We can also see that the fully connected layer only has 6 parameters for the number of inputs 
# (5 for the 5 inputs from the previous layer), number of outputs (1 for the 1 neuron in the layer), and the bias.
# n = inputs * outputs + outputs
# n = 5 * 1 + 1
# n = 6


In [8]:
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)

Epoch 1/1000
1/1 - 2s - loss: 0.2307
Epoch 2/1000
1/1 - 0s - loss: 0.2290
Epoch 3/1000
1/1 - 0s - loss: 0.2272
Epoch 4/1000
1/1 - 0s - loss: 0.2254
Epoch 5/1000
1/1 - 0s - loss: 0.2237
Epoch 6/1000
1/1 - 0s - loss: 0.2219
Epoch 7/1000
1/1 - 0s - loss: 0.2202
Epoch 8/1000
1/1 - 0s - loss: 0.2184
Epoch 9/1000
1/1 - 0s - loss: 0.2167
Epoch 10/1000
1/1 - 0s - loss: 0.2150
Epoch 11/1000
1/1 - 0s - loss: 0.2132
Epoch 12/1000
1/1 - 0s - loss: 0.2115
Epoch 13/1000
1/1 - 0s - loss: 0.2098
Epoch 14/1000
1/1 - 0s - loss: 0.2081
Epoch 15/1000
1/1 - 0s - loss: 0.2064
Epoch 16/1000
1/1 - 0s - loss: 0.2047
Epoch 17/1000
1/1 - 0s - loss: 0.2030
Epoch 18/1000
1/1 - 0s - loss: 0.2014
Epoch 19/1000
1/1 - 0s - loss: 0.1997
Epoch 20/1000
1/1 - 0s - loss: 0.1980
Epoch 21/1000
1/1 - 0s - loss: 0.1964
Epoch 22/1000
1/1 - 0s - loss: 0.1947
Epoch 23/1000
1/1 - 0s - loss: 0.1931
Epoch 24/1000
1/1 - 0s - loss: 0.1915
Epoch 25/1000
1/1 - 0s - loss: 0.1899
Epoch 26/1000
1/1 - 0s - loss: 0.1883
Epoch 27/1000
1/1 - 0

<tensorflow.python.keras.callbacks.History at 0x7f142d06bcd0>

In [10]:
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)

for value in result:
	print('%.1f' % value)

0.0
0.2
0.4
0.6
0.8


In [None]:
# The network correctly learns the prediction problem.