## Recurrent Neural Network (RNN)

- The idea behind RNNs is to make use of sequential information

- In a traditional neural network we assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very bad idea

- If you want to predict the next word in a sentence you better know which words came before it

- Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far

<img src="simple_rnn.png" width="500" height="500">

### Recurrent Cells in Keras

- SimpleRNN

- LSTM 

- GRU

<img src="LSTM.png" width="500" height="500">

### The time steps defines how many times the LSTM cell state is updated for one sample (one mnist digit for example)

- To use LSTM for image classification we should prepare our data such that it has sequential meaning

- Lets pepared data (image here) for Sequential MNIST Classification

- We use 28 sequence (time step) each with 28 features

<img src="mnist_lstm.png" width="300" height="300">

## Activity: Train a LSTM model with Keras for MNIST Classification

In [None]:
from keras.datasets import mnist
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import LSTM
import keras

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train/np.max(x_train)
x_test = x_test/np.max(x_test)
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# print(x_train[1])
x_train = x_train.reshape(x_train.shape[0], 28, 28)
x_test = x_test.reshape(x_test.shape[0], 28, 28)
print(x_train[0])
nb_units = 50

model = Sequential()
# input_shape for LSTM shoud be (time steps, features)
model.add(LSTM(nb_units, input_shape=(28, 28)))
model.add(Dense(units=10, activation='softmax'))
# 2.5 Compile the model.
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# 2.6 Print out model.summary
epochs = 3

history = model.fit(x_train,
                    y_train,
                    epochs=epochs,
                    batch_size=128,
                    verbose=1,
                    validation_split=0.2)

scores = model.evaluate(x_test, y_test, verbose=2)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

### Return Sequence in LSTM

- `model.add(LSTM(nb_units, input_shape=(28, 28), return_sequences = False))`

<img src="return_seq_F.png" width="250" height="250">


- `model.add(LSTM(nb_units, input_shape=(28, 28), return_sequences = True))`

<img src="return_seq_T.png" width="250" height="250">

### How the LSTM model for MNIST look like?

<img src="mnist_lstm_nn.png" width="600" height="600">

### How many parameters LSTM has?

- Assume the subscript *t* indexes the time step

<img src="lstm_math.png" width="800" height="800">

- We have four W and four U and four bias

- The number of parameters for LSTM is 4dh + 4 hh  + 4h. The last term is for four bias

## Activity: Verify the number of parameters for LSTM in Keras

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import LSTM
import numpy as np

input_array = np.array([[[0], [1], [2], [3], [4]], [[5], [1], [2], [3], [6]]])
print(input_array.shape)
model = Sequential()
# input_shape for LSTM shoud be (time steps, features)
model.add(LSTM(10, input_shape=(5, 1), return_sequences=False))
model.summary()
print(input_array)
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
print(output_array)
# the number of parameters of a LSTM layer in Keras equals to
# params = 4 * ((size_of_input + 1) * size_of_output + size_of_output^2)
n_params = 4 * ((1 + 1) * 10 + 10**2)
print(n_params)
print(model.summary())