<a href="https://colab.research.google.com/github/rohailkhan/JB-lstm/blob/main/5_Models_for_Sequence_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Conv2D,  LSTM  

![LSTM](https://raw.githubusercontent.com/rohailkhan/data/main/lstm.PNG)



# 1 One to One (I/P=one_time_step , O/P= Dense(1)

In [None]:
model=Sequential()
# just use time_slots=1 in input shape
model.add(LSTM(units=3 , input_shape=(1,features))) 
# input_Shape=(samples, time_slots, features) but usually samples is left blank
model.add(Dense(1))

# 2 Many to One (I/P=many_time_steps , O/P= Dense(1)
almost same as One to one except in input shape, we dont use time_slots=1 but keep it equal to the input time steps

In [None]:
model=Sequential()
# just use time_slots='no of steps' in input shape
model.add(LSTM(units=3 , input_shape=(steps,features))) 
# input_Shape=(samples, time_slots, features) but usually samples is left blank
model.add(Dense(1))

# 2 One to Many
### e.g getting text sequence from an Image . We use ***TimeDistributed*** wrapper in order to use **same O/P Layer multiple times** for the requied number of output steps

In [None]:
model = Sequential()
model.add(Conv2D(...))
...
model.add(LSTM(...))
model.add(TimeDistributed(Dense(1)))

# Many to Many 
##1- Fixed length I/P : (input_shape=steps , LSTM (Return Seq true) O/P =TimeDsitributed 


##2- Variable length  : (Just add Encoder-decoder to map the input time-steps to fixed sized representation
LSTM layer must be configured
to return a value for each input time step rather than a single value at the end of the input
sequence (e.g. return sequences=True)

In [None]:
# In a sense, this model combines the capabilities of the many-to-one and one-to-many models.
# If the number of input and output time steps are equal, then the LSTM layer must be configured
# to return a value for each input time step rather than a single value at the end of the input
# sequence (e.g. return sequences=True) and the same Dense layer can be used to produce one
# output time step for each of the input time steps via the TimeDistributed layer wrapper
model = Sequential()
model.add(LSTM(..., input_shape=(steps, ...), return_sequences=True))
model.add(TimeDistributed(Dense(1)))

In [None]:
model = Sequential()
model.add(LSTM(..., input_shape=(in_steps, ...)))
model.add(RepeatVector(out_steps))
model.add(LSTM(..., return_sequences=True))
model.add(TimeDistributed(Dense(1)))

# **Mapping Applications to Models**
# 10 Important Prediction Types in 2 Groups

## **Group 1- Time Series**
1.   **Univariate Time Series Forecasting** : 
  
  I/P= 1 Series with **Many time steps** , O/P prediction= **1 time-step** --->> Many-to-One
2.   **Multivariate Time Series Forecasting** : 
  
  I/P= Many Series with **Many time steps** , O/P prediction= **1 time-step** beyond one or more of the input sequences --->> Many-to-One
3.   **Multi-step Time Series Forecasting** : 
 
  I/P= Many Series with **Many time steps** , O/P prediction= **1 time-step** beyond one or more of the input sequences --->> Many-to-Many
4.   **Time Series Classification** : 
 
  I/P= 1 or Many Series with **Many time steps** , O/P prediction= 1 Class -->> Many to One

  ************************************************************
  ************************************************************

## **Group 2- Natural Language Processing**

1.   **Image Captioning** : 
  
  I/P= 1 Image, O/P prediction= **Caption sentence** --->> One-to-Many
2.   **Video Captioning** : 
  
  I/P= Video , O/P prediction= **Caption sentence** --->> Many-to-Many
3.   **Sentiment Analysis** : 

  I/P=Sentence , O/P prediction= **Sentiment** --->> Many-to-Many
    
4.   **Text Translation** : 
  I/P= Text , O/P prediction= **Translation** --->> Many-to-Many
 
5.   **Text Summarization** : 
  I/P= Text , O/P prediction= **Summary** --->> Many-to-Many

# Cardinality from Time Steps (not Features!)

A common point of confusion is to conflate the above examples of sequence mapping models
with multiple input and output features. A sequence may be comprised of single values, one for
each time step.


![One to One](https://raw.githubusercontent.com/rohailkhan/data/main/JB-lstm5.PNG)

# **Two Common Misunderstandings**
## Time steps as Input Features
Lag observations at previous time steps are framed as input features to the model.
## Time steps as Output Features
Predictions at multiple future time steps are framed as output features to the model.

The first hidden layer in the network must define the number of inputs to expect, e.g. the
shape of the input layer. Input must be three-dimensional, comprised of samples, time steps,
and features in that order.



*   **Samples**. These are the rows in your data. One sample may be one sequence.1st item. (usually not specified.The model assumes one or more samples) So only Time steps and Features are mentioned
Note: Not to be confused with **batch_input_shape which also has 3 parameters i.e bacth size, Time steps , Features **

*   **Time steps**. These are the past observations for a feature, such as lag variables

*   **Features**. These are columns in your data


Assuming your data is loaded as a NumPy array, you can convert a 1D or 2D dataset to a 3D dataset 

Imagine we had 2 columns of input data (X) in a NumPy array. We could treat the two columns as two time steps and reshape it as follows:

`data = data.reshape((data.shape[0], data.shape[1], 1))`

 ( Example of reshaping a NumPy array with **1 feature**)

If you would like columns in your 2D data to become features with one time step, you can reshape it as follows:

`data = data.reshape((data.shape[0], 1, data.shape[1]))`

( Example of reshaping a NumPy array with **1 Time Step**)






In [None]:
model = Sequential()
model.add(LSTM(5, input_shape=(2,1)))
model.add(Dense(1))

2 -Make model with 32 LSTM units

In [None]:
model = Sequential()
model.add(LSTM(32, input_shape=(10, 1))) # note in input shape we dont specify samples..just the time-steps and Features

**Multiple Input features**

In [None]:
from numpy import array
data = array([
[0.1, 1.0],
[0.2, 0.9],
[0.3, 0.8],
[0.4, 0.7],
[0.5, 0.6],
[0.6, 0.5],
[0.7, 0.4],
[0.8, 0.3],
[0.9, 0.2],
[1.0, 0.1]])
data.shape

(10, 2)

In [None]:
data=data.reshape(1,10,2)
model=Sequential()
model.add(LSTM(units=32,input_shape=(10,2)))


**1- input shape of model**:
(Samples , Time_steps, Features)  .Note: Usually Samples not mentioned in model ,Time steps : Past observations  

```
# model=Sequential()
model.add(LSTM (units=2, input_shape=(Time_steps, Features),activation='tanh', recurrent_activation='sigmoid',unit_forget_bias=True)
```

**2- Epoch** is "One passing through" of all the samples through training to update weights .Epochs has 1 or 2 batches.

**3- Batches** : Usually weights are updated in Batches. Subset of the data that goes through 'One passing through" the training after which weights are updated 

**- Theree types of Batches** :
Bacth size =Total samples of training dataset (so weights only updated after the whole dataset)
Batch size =1 : After every sample wts are updated. (Stochastic gradient descent)
Batch size =32...After each batch of 32 or 128 etc, the weights get updated


**4-  LSTM internal state update**. Bacth size impose a tension on efficiency of learning , speed of learning and also influence of internal state (or how often the sate should reset) .Therefore batch size shape should be defined for stateful LSTM

```
# # batch_input_shape=( size,times, features)
```



```
# model.add(LSTM(2, stateful=True, batch_input_shape=(10, 5, 1)))..batch size 5, times steps=5, features=1)
```


**5-LSTM internal-state reset** : LSTM will not reset internal state at end of each epoch.We reset it after each epoch using model.reset_states() 
using for loop in fit
```
# for i in range(1000):
    model.fit(X, y, epochs=1, batch_input_shape=(10, 5, 1))
    model.reset_states()
```

**6- Bacth size for Predictions in stateful LSTM** : same batch size should be used in predictions as used in stateful lstm

**7- Make Shuffle=False** to preserve same state across samples in case of stateful LSTM


```
# for i in range(1000):
model.fit(X, y, epochs=1, shuffle=False, batch_input_shape=(10, 5, 1))
model.reset_states()
```


**8- Points for reset and training**
```
    1- If prediction at the end of each sequence..then make batch_size=1 to reset State after each sequence
    2- For longer sequences....reset after the batch....make shuffle=False....and also for very long seq, make batch_size=128
```