<a href="https://colab.research.google.com/github/martin-fabbri/colab-notebooks/blob/master/rnn/seq_to_seq_keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [61]:
import numpy as np

from tensorflow.keras import layers
from tensorflow.keras import Model
from tensorflow.keras.models import Sequential

## LSTM quick recap

Creating a layer of LSTM memory units allows you to specify the `number of memory units` within the layer.

```python
lstm = tf.keras.layers.LTSM(30) # number of memory units=30
```

Each unit or cell within the layer has an `internal cell state` ($c$), and output a `hidden state` ($h$) 

```python
inputs = Input(shape=(3, 1)
lstm, state_h, state_c = tf.keras.layers.LTSM(1, return_state=True)(inputs)
```

Each LSTM cell will output one hidden state $h$ for each input.

```python
h = tf.keras.layers.LTSM(X)
```


In [65]:
# input time steps
t1 = 0.1
t2 = 0.2
t3 = 0.3
time_steps = [t1, t2, t3]
one_memory_unit = 1

In [66]:
# define the model
inputs1 = layers.Input(shape=(3, 1))
lstm1 = layers.LSTM(one_memory_unit)(inputs1)
model = Model(inputs=inputs1, outputs=lstm1)

# define input data -> inputs should include the 
# batch reference (batch, time steps->sequence length, ?)
data = np.array(time_steps).reshape((1, 3, 1))

# make a prediction -> should output a single scalar hidden state
model.predict(data)[0][0]



0.09254665

It's possible to access the `hidden state output` $\ldots[\hat{y}_{t-1}],[\hat{y}_{t}],[\hat{y}_{t+1}]\ldots$ for each input time step. 

```python
LSTM(1, return_sequences=True)
```

In [67]:
# define the model
inputs1 = layers.Input(shape=(3, 1))
lstm1 = layers.LSTM(one_memory_unit, return_sequences=True)(inputs1)
model = Model(inputs=inputs1, outputs=lstm1)

# define input data -> inputs should include the 
# batch reference (batch, time steps->sequence length, ?)
data = np.array(time_steps).reshape((1, 3, 1))

# make a prediction -> should output y^ for each time step
model.predict(data)



array([[[0.01487367],
        [0.04095905],
        [0.07560855]]], dtype=float32)

Each LSTM call retains an `internal state` that `is not output`, called `cell state` ($c$).

Keras provides the return_state argument to the LSTM layer that will provide access to the `hidden` state ($state_h$) and the `cell` state ($state_c$).

```python
lstm1, state_h, state_c = LSTM(1, return_state=True)
```

In [68]:
# define the model
inputs1 = layers.Input(shape=(3, 1))
lstm1, state_h, state_c = layers.LSTM(one_memory_unit, return_state=True)(inputs1)
model = Model(inputs=inputs1, outputs=lstm1)

# define input data -> inputs should include the 
# batch reference (batch, time steps->sequence length, ?)
data = np.array(time_steps).reshape((1, 3, 1))

# make a prediction -> should output y^ for each time step
model.predict(data)



array([[-0.01654461]], dtype=float32)

Hidden state fro the last time step

In [69]:
state_h[0]

<tf.Tensor 'strided_slice_11:0' shape=(1,) dtype=float32>

Cell state for the last step

In [70]:
state_c[0]

<tf.Tensor 'strided_slice_12:0' shape=(1,) dtype=float32>

## TimeDistributed Layer

> This wrapper allows to apply a layer to every temporal slice of an input. `TimeDistributedDense` applies a same Dense (fully-connected) operation to every timestep of a 3D tensor.<br><br>
>Consider a batch of 32 video samples, where each sample is a 128x128 RGB image with channels_last data format, across 10 timesteps. The batch input shape is (32, 10, 128, 128, 3).<br><br>
>You can then use TimeDistributed to apply the same Conv2D layer to each of the 10 timesteps, independently:

In [51]:
inputs = layers.Input(shape=(10, 128, 128, 3))
conv_2d_layer = layers.Conv2D(64, (3, 3))
outputs = layers.TimeDistributed(conv_2d_layer)(inputs)
outputs.shape

TensorShape([None, 10, 126, 126, 64])

In [55]:
length = 5
seq = array([i / length for i in range(length)])
seq

array([0. , 0.2, 0.4, 0.6, 0.8])

## One-to-One LSTM for Senquence Prediction

In [57]:
X = seq.reshape(5, 1, 1)
X

array([[[0. ]],

       [[0.2]],

       [[0.4]],

       [[0.6]],

       [[0.8]]])

In [59]:
y = seq.reshape(5, 1)
y

array([[0. ],
       [0.2],
       [0.4],
       [0.6],
       [0.8]])

We will define the network model as having 1 input with 1 time step. The first hidden layer will be an LSTM with 5 units. The output layer with be a fully-connected layer with 1 output.

In [64]:
length = 5
seq = array([i/length for i in range(length)])
X = seq.reshape(len(seq), 1, 1)
y = seq.reshape(len(seq), 1)

In [72]:
n_memory_units = length
n_batch = length
n_epoch = 1000

model = Sequential([
  layers.LSTM(n_memory_units, input_shape=(1, 1)),
  layers.Dense(1)
])

model.compile(
    loss='mean_squared_error',
    optimizer='adam'
)

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_28 (LSTM)               (None, 5)                 140       
_________________________________________________________________
dense (Dense)                (None, 1)                 6         
Total params: 146
Trainable params: 146
Non-trainable params: 0
_________________________________________________________________


In [79]:
history = model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=0)

In [86]:
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result:
	print(f'{value[0]:.1f}', end=' ')

0.0 0.2 0.4 0.6 0.8 