This Doc aims to explain the different options of outputs in keras LSTM.

Mainly, there could be three types of outputs, which is controlled by the option inside of the LSTM function.

`LSTM(units=, return_sequences=False, return_state=False)`

, where 
- `units = ` is followed by an integer, and it defines the length of output.
- `return_sequences = ` is followed by a bool. It is defaulted to be False, but whenever it is true, the output would be all the **hidden state**, which means the outputs have one more dimension about the periods.
- `return_state = ` is also followed by a bool. If it's true, then return includes also the **cell state**.

All other factors are not recommended to be changed, because the LSTM is optimised by cuda, GPU. Any changes may significantly slow down the training duration.

In [1]:
#@title Check GPU

import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
    print('GPU device not found')
else:
    print('Found GPU at: {}'.format(device_name))

2023-06-07 13:49:15.134092: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


GPU device not found


2023-06-07 13:49:20.834679: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
#@title Version Info
print('tf version: ', tf.__version__)
print('tf.keras version:', tf.keras.__version__)

tf version:  2.10.0
tf.keras version: 2.10.0


In [3]:
#@title Import Libraries
from random import randint
from numpy import array
from numpy import argmax
import keras.backend as K
from tensorflow.keras import models
from numpy import array_equal
import numpy as np
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import LSTM, Bidirectional
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras import Input
from tensorflow.keras.layers import TimeDistributed
from tensorflow.keras.layers import RepeatVector

In [4]:
#@title Generate one_hot_encoded Input & Output Sequences


# generate a sequence of random integers
def generate_sequence(length, n_unique):
    return [randint(0, n_unique-1) for _ in range(length)]

# one hot encode sequence
def one_hot_encode(sequence, n_unique):
    encoding = list()
    for value in sequence:
        vector = [0 for _ in range(n_unique)]
        vector[value] = 1
        encoding.append(vector)
    return array(encoding)

# decode a one hot encoded string
def one_hot_decode(encoded_seq):
	return [argmax(vector) for vector in encoded_seq]

# prepare data for the LSTM
def get_reversed_pairs(time_steps,vocabulary_size,verbose= False):
    # generate random sequence
    sequence_in = generate_sequence(time_steps, vocabulary_size)
    sequence_out = sequence_in[::-1]
    
    # one hot encode
    X = one_hot_encode(sequence_in, vocabulary_size)
    y = one_hot_encode(sequence_out, vocabulary_size)
    # reshape as 3D
    X = X.reshape((1, X.shape[0], X.shape[1]))
    y = y.reshape((1, y.shape[0], y.shape[1]))

    if(verbose):
        print('Generated sequences as follows')
        
        print('\nOne Sample Input Sequence in raw format:')
        print('X[0]=%s' % (one_hot_decode(X[0])))
        print('\nIn one_hot_encoded format:')
        print('X[0]=%s' % (X[0]))
        print('\nShape of an input to LSTM (X[0].shape): ', X.shape)
    return X,y


def create_dataset(train_size, test_size, time_steps,vocabulary_size):
    pairs = [get_reversed_pairs(time_steps,vocabulary_size) for _ in range(train_size)]
    pairs=np.array(pairs).squeeze()
    X_train = pairs[:,0]
    y_train = pairs[:,1]
    pairs = [get_reversed_pairs(time_steps,vocabulary_size) for _ in range(test_size)]
    pairs=np.array(pairs).squeeze()
    X_test = pairs[:,0]
    y_test = pairs[:,1]	
    print('\nShape of Input Batch to LSTM (X_train.shape): ', X_train.shape)
    return X_train, y_train, X_test, 	y_test

In [5]:
#@title Generate an input sequence

n_timesteps_in = 4  #@param {type:"integer"}
#each input sample has 4 values

n_features = 10   #@param {type:"integer"}
#each value is one_hot_encoded with 10 0/1
#n_timesteps_out = 2  #@param {type:"integer"}
#each output sample has 2 values padded with 0

# generate random sequence
X,y = get_reversed_pairs(n_timesteps_in,  n_features, verbose=True)
# generate datasets
train_size= 100 #@param {type:"integer"}
test_size = 20  

X_train, y_train , X_test, 	y_test=create_dataset(train_size, test_size, n_timesteps_in,n_features)

Generated sequences as follows

One Sample Input Sequence in raw format:
X[0]=[0, 9, 0, 9]

In one_hot_encoded format:
X[0]=[[1 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 1]
 [1 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 1]]

Shape of an input to LSTM (X[0].shape):  (1, 4, 10)

Shape of Input Batch to LSTM (X_train.shape):  (100, 4, 10)


# Introduction

In this tutorial, we will focus on the outputs of LSTM layer in Keras.

To create powerful models, especially for solving Seq2Seq learning problems, LSTM is the key layer.

To use LSTM effectively in models, we need to understand how it generates different results with respect to given parameters.

Therefore, in this tutorial, we will learn and use 3 important parameters (units, return_sequences, and return_state).

At the end of the tutorial you will be able to manage LSTM layer to satisfy the model requirements correctly.

If you are interested in LSTM or Deep Learning with Keras, please subsribe to my channel and activate the notifications so that you can be notified when new content is online. Thank you!

Before start, I would like to mention that I already prepared several content for having a better understanding of LSTM. You can access these videos by following playlists below:

- All About LSTM
- Seq2Seq Learning Problem
- Applied Machine Learning with Keras

# INPUT

Let's generate a sample input with time dimension as below:

# QUICK RECAP OF LSTM

## Internal Structure

![image.png](attachment:image.png)

### Roll-Out Representation of LSTM for each Time Step

![image-2.png](attachment:image-2.png)

## LSTM OUTPUTS

LSTM can return 4 different sets of results/states according to the given parameters:

1. **Default**: Last Hidden State (Hidden State of the last time step)
2. return_sequences=True** : All Hidden States (Hidden 1. **State of ALL the time steps)
3. **return_state=True** : Last Hidden State+ Last Hidden State (again!) + Last Cell State (Cell State of the last time step)
4. **return_sequences=True + return_state=True**: All Hidden States (Hidden State of ALL the time steps) + Last Hidden State + Last Cell State (Cell State of the last time step)

Using these 4 different results/states we can stack LSTM layers in various ways.

## LSTM Default return value:

Output is only the **hidden state** at the *last time step*.

Because return_sequences and return_states parameters are default (**False**).

The size of output is **2D** array of real numbers.

The **first dimension** is indicating the *number of samples in the batch* given to the LSTM layer

The **second dimension** is the **dimensionality of the output space** defined by the **units** parameter in Keras LSTM implementation.

![image-3.png](attachment:image-3.png)

### Example Code:

Since, in the following examples, the **LSTM unit parameter (dimensionality of the output space) is set to 16**, the last hidden state will have a dimension of 16.

Therefore, the Output Shape becomes (**None, 16**) & output is a tensor for 16 real numbers for each sample in the batch!

**None** is placeholder for the **batch_size**.

In [16]:
# define model
numberOfLSTMunits= 16

input =Input(shape=(n_timesteps_in, n_features))
state_h= LSTM(units=numberOfLSTMunits) (input) # units = 16, then the output should be in length of 16
model1 = Model(inputs=input, outputs=state_h)
model1.summary()

Model: "model_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_5 (InputLayer)        [(None, 4, 10)]           0         
                                                                 
 lstm_4 (LSTM)               (None, 16)                1728      
                                                                 
Total params: 1,728
Trainable params: 1,728
Non-trainable params: 0
_________________________________________________________________


In [7]:
result=model1.predict(X_train)
print('input shape:  ', X_train.shape)
print('state_h shape: ', result.shape)
print('result for the first sample/input: \n', result[0])

input shape:   (100, 4, 10)
state_h shape:  (100, 16)
result for the first sample/input: 
 [-0.00423787 -0.12499601 -0.05092138 -0.08194764  0.00860861 -0.07586323
  0.03419714  0.14004989 -0.04617537 -0.16262531  0.10421462  0.01551361
  0.06260992 -0.10016363  0.08497189  0.04524258]


### LSTM return_sequences=True value:

When **return_sequences parameter is True**, it will output **all the hidden states of each time steps**.

The ouput is a **3D** array of real numbers.

The **first dimension** is indicating the ***number of samples in the batch*** given to the LSTM layer.

The second dimension is the number of time steps in the input sequence. By indexing second dimension you can access all the hidden states of the units at **a given time step**.

The **third dimension** is the **dimensionality of the output space** defined by the **units** parameter in Keras LSTM implementation.

The content of the array is **all the hidden states of each time steps** of the LSTM layer.

![image.png](attachment:image.png)

### Example Code:

Since we have **4 time steps** and **unit (dimensionality of the output space)** is set to 16, the output shape will be (None, 4, 16).

Because LSTM returns **1 hidden state** for **each time step**.

In [8]:
numberOfLSTMunits= 16

input =Input(shape=(n_timesteps_in, n_features))
# units = 16, and the return_sequence is switched on, then the output should be in length of 16, 
# and have also 4-period memory, where the 4-period is consistent with the input-set shape of LSTM
all_state_h= LSTM(numberOfLSTMunits, return_sequences=True) (input)
model1 = Model(inputs=input, outputs=all_state_h)
model1.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 4, 10)]           0         
                                                                 
 lstm_1 (LSTM)               (None, 4, 16)             1728      
                                                                 
Total params: 1,728
Trainable params: 1,728
Non-trainable params: 0
_________________________________________________________________


In [9]:
result=model1.predict(X_train)

print('input shape:  ', X_train.shape)
print('all_state_h shape: ', result.shape)
print('\nhidden states for the first sample: \n', result[0])
print('\nhidden states for the first sample at the second time step: \n', result[0][1])

input shape:   (100, 4, 10)
all_state_h shape:  (100, 4, 16)

hidden states for the first sample: 
 [[ 1.7485080e-02 -2.1961629e-02 -3.8989577e-02  1.6411316e-02
  -3.8471710e-02 -7.4938521e-02 -2.4482546e-02 -6.9835987e-03
   6.1909340e-02 -3.9654065e-02 -2.2829209e-02 -2.8526425e-02
  -4.3380972e-02  6.4904734e-02  1.5291282e-02 -1.0599064e-02]
 [-9.1466354e-03 -2.6350748e-02 -2.0769355e-03  4.6031315e-02
  -5.0068911e-02 -9.9004805e-02  1.1496827e-02 -7.4695977e-03
   9.3892530e-02  3.1347664e-03 -6.8406366e-02 -5.8822781e-02
  -5.5516638e-02  5.1720788e-05  7.1556993e-02 -4.9828332e-02]
 [ 4.7257274e-02  4.0367410e-02 -2.1410307e-02  1.6856067e-02
  -5.1751290e-02 -2.2808665e-02 -3.6963049e-02  1.2823125e-02
   2.5566861e-02 -1.7201526e-02 -9.6509434e-02  3.8387517e-03
  -1.4135436e-02  2.8236950e-02  1.0354447e-01 -3.3998400e-02]
 [ 8.9223877e-02  8.0023348e-02 -3.2073706e-02  4.2719347e-03
  -5.6281663e-02  1.7613858e-02 -7.4577793e-02  2.6554450e-02
  -3.0847678e-02 -4.4049706e-

### LSTM return_state=True value:

When **return_state parameter is True**, it will output the **last** hidden state twice and the last cell state as the output from LSTM layer.

The ouput is a **three 2D-arrays** of real numbers.

The **first dimension** is indicating the number of samples (batch size) given to the LSTM layer

The **second dimension** is the **dimensionality of the output space** defined by unit parameter in the Keras LSTM layer.

It returns 3 arrays in the result:

The LSTM hidden state of the last time step: (None, 16) It is 16 because **dimensionality of the output space (unit parameter)** is set to 16.

The LSTM hidden state of the last time step (again):(None, 16)

The LSTM cell state of the last time step: (None, 16) refers last cell state value whose **dimensionality of the output space (unit parameter)** is set to 16.

### Example Code:

Since we set **unit parameter (dimensionality of the output space)** to 16, the output shape will be (None, 16) for all 3 tensors.

In [10]:
# define model
numberOfLSTMunits= 16

input =Input(shape=(n_timesteps_in, n_features))
LSTM_output, state_h, state_c= LSTM(numberOfLSTMunits, return_state=True) (input)
model1 = Model(inputs=input, outputs=[LSTM_output, state_h, state_c])
model1.summary()

Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_3 (InputLayer)        [(None, 4, 10)]           0         
                                                                 
 lstm_2 (LSTM)               [(None, 16),              1728      
                              (None, 16),                        
                              (None, 16)]                        
                                                                 
Total params: 1,728
Trainable params: 1,728
Non-trainable params: 0
_________________________________________________________________


In [11]:
model1.get_layer(index=1).output_shape

[(None, 16), (None, 16), (None, 16)]

In [12]:
print("Input layer output shape: ", model1.get_layer(index=0).output_shape)
print("LSTM layer output shape: ", model1.get_layer(index=1).output_shape)
results=model1.predict(X_train)
results=array(results)

print("\nWith batch of data:")
print('input shape:  ', X_train.shape)
print('result is 3 2D-array: ', results.shape)
print('\nLSTM_output is in the first array: ', results[0].shape)
print('\nstate_h which is exactly the same with LSTM_output is in the second array: ', results[1].shape)
print('\nIs the content of LSTM_output and state_h  exactly the same?\n ', results[0]==results[1])
print('\nstate_c is in the third array: ', results[2].shape)

Input layer output shape:  [(None, 4, 10)]
LSTM layer output shape:  [(None, 16), (None, 16), (None, 16)]

With batch of data:
input shape:   (100, 4, 10)
result is 3 2D-array:  (3, 100, 16)

LSTM_output is in the first array:  (100, 16)

state_h which is exactly the same with LSTM_output is in the second array:  (100, 16)

Is the content of LSTM_output and state_h  exactly the same?
  [[ True  True  True ...  True  True  True]
 [ True  True  True ...  True  True  True]
 [ True  True  True ...  True  True  True]
 ...
 [ True  True  True ...  True  True  True]
 [ True  True  True ...  True  True  True]
 [ True  True  True ...  True  True  True]]

state_c is in the third array:  (100, 16)


## LSTM return_state=True + return_sequences=True value:

**return_state and return_sequences parameters can be True at the same time**.

In this situation, LSTM layer returns **3 results**:

(as return_sequences=True)

the hidden states for each input time step,
(as return_state=True)

the hidden state output for the last time step and
the cell state for the last time step.

![image.png](attachment:image.png)

In [13]:
# define model
numberOfLSTMunits= 16

input =Input(shape=(n_timesteps_in, n_features))
all_state_h, state_h, state_c= LSTM(numberOfLSTMunits, return_sequences=True, return_state=True) (input)
model1 = Model(inputs=input, outputs=[all_state_h, state_h, state_c])
model1.summary()

Model: "model_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_4 (InputLayer)        [(None, 4, 10)]           0         
                                                                 
 lstm_3 (LSTM)               [(None, 4, 16),           1728      
                              (None, 16),                        
                              (None, 16)]                        
                                                                 
Total params: 1,728
Trainable params: 1,728
Non-trainable params: 0
_________________________________________________________________


In [14]:
print("Input layer output shape: ", model1.get_layer(index=0).output_shape)
print("LSTM layer output shape: ", model1.get_layer(index=1).output_shape)

Input layer output shape:  [(None, 4, 10)]
LSTM layer output shape:  [(None, 4, 16), (None, 16), (None, 16)]


In [15]:
results=model1.predict(X_train)
print("\nWith batch of data:")
print('input shape:  ', X_train.shape)
print('result is 3 2D-array len (results): ', len (results))
print('\nall_state_h is in the first array: ', results[0].shape)
print('\nstate_h  is in the second array: ', results[1].shape)
print('\nstate_c is in the third array: ', results[2].shape)


With batch of data:
input shape:   (100, 4, 10)
result is 3 2D-array len (results):  3

all_state_h is in the first array:  (100, 4, 16)

state_h  is in the second array:  (100, 16)

state_c is in the third array:  (100, 16)


## CONCLUSION
- There are 4 possible outputs from LSTM layer
- Important parameters are
    - units (dimensionality of the output space)
    - qreturn_sequences
    - return_state
- return_sequences and return_state parameters default values are FALSE
- Combination of TRUE and FALSE values for return_sequences and return_state parameters generates different set of outputs
- units (dimensionality of the output space) parameter defines how many numbers in the resulting tensor (representing a hidden or cell state value) will be

## MORE
- If you want to learn more about LSTM or how to use these outputs for solving problems, please check out my YouTube channel especially the following playlists:
    - All About LSTM
    - Seq2Seq Learning Problem
    - Applied Machine Learning with Keras