# Introduction
Keras provides multiple Recurrent layers and related classes. In this Sprint, we aim to be able to explain each role after moving all of these.


It is summarized in the following documents.


Recurrent layer-Keras documentation

In [1]:
# imports

# Problem 1
Execution of various methods

### NOTE
- I've tested basic GRU, LSTM and SimpleRNN
- On the same dataset
- With the same basic model structure with the only difference in the middle layer being GRU, LSTM or SimpleRNN
- The code is in other notebooks for simple execution, I only take note of their result here
- The base code is from keras's sample lstm code [Link](https://github.com/awslabs/keras-apache-mxnet/blob/master/examples/imdb_lstm.py)

### Related Notebooks
- `gru.ipynb`
- `sample_lstm.ipynb`
- `simple_rnn.ipynb`


### Basic model structure:
mid layer is LSTM/GRU/SimpleRNN
```
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, None, 128)         1280000   
_________________________________________________________________
lstm_1 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 129       
=================================================================
_________________________________________________________________
```

### Test's setup
- Model structure as above
- Dataset is IMDB dataset with 10000 max_features
- optimizer adam
- mid layer (LSTM/GRU/SimpleRNN) have dropout 0.2 and recurrent_dropout 0.2
- batch_size 32 and train for 2 epoch
- Result is evaluated on error and accuracy

### Result Comparison
|                    | SimpleRNN |  LSTM |  GRU  |
|:------------------:|:---------:|:-----:|:-----:|
|    Test Accuracy   |   0.739   | 0.828 | 0.838 |
|     Test Error     |   0.528   | 0.391 | 0.370 |
|   Train Accuracy   |   0.643   | 0.867 | 0.866 |
|     Train Error    |   0.619   | 0.322 | 0.320 |
| Validation Accuray |   0.739   | 0.828 | 0.838 |
|  Validation Error  |   0.528   | 0.391 | 0.370 |

### Conclusion
- SimpleRNN is quite behind compare to others
- LSTM and GRU have similar performance, decent in all test, validation and training

# Problem 2
(SKIP)

# Problem 3
Explanation of other classes

### Classes
I will be reading about these and explaining them, maybe discussing some ideas if possible

**RNN**
- RNN
- SimpleRNN
- SimpleRNNCell


**GRU**
- GRU
- GRUCell


**LSTM**
- LSTM
- LSTMCell


**Special?**
- StackedRNNCells
- CuDNNGRU
- CuDNNLSTM

## RNN
Recurrent Neural Network

![image info](./images/simple_rnn.png) RNN structure and it's visualization over time
- This one is basic NN to learn sequence data
- Basically it's a modded FC where each node takes two inputs:
  - output of the previous time step `h`
  - and input `x_i`
- The node keeps feeding its output into itself in the next iteration until the `x` input is finished
- This allow RNN to learn the sequential properties of the input
  
**Biggest Problem?**
- Vanishing gradient
- In this case, it leads to short term memory in RNN

### Keras SimpleRNN

This is just the keras implementation of our mentioned RNN, also called fully connected RNN

## The "Cell" Variant
Includes SimpleRNNCell, LSTMCell, GRUCell

- I think of these as utility unit, though i have not try to use them
- Basically, they take a step **down** the ladder of abstraction on RNN implementation
- RNN is a unit for processing a sequence by iterating through time, and the "cell" is one step in time
- "Cell" processes only one iteration of the input wheres RNN processes the whole sequence
- "Cell" allows user (us) to manipulate and build RNN the way we want in more detail and have control over what it does in each time step

## GRU and LSTM
- Gated Recurrent Unit
- Long Short Term Memory

<table border="0">
 <tr>
    <td>LSTM</td>
    <td>GRU</td>
 </tr>
 <tr>
    <td><img src="./images/lstm.jpg" width="500"/></td>
    <td><img src="./images/gru.jpeg" width="400"/></td>
 </tr>
</table>

Both are RNN architecture that aims to fix the **"Short term memory problem of FC RNN**

**Basis ideas of both models includes**
- The use of "gates" to control dataflow in the unit
- Significant data will be remembered while other will be discarded
- They incorporate a **"Data highway"** (cell state) into their structure where data of the previous time step can flow through pretty much `freely`, thus keeping the long term knowledge that has been learnt
- Some of their used gates includes:
  - input gate: take a part of current time step's knowledge to add to the `data highway`
  - output gate: output what has been learnt from current time step
  - forget gate: remove insignificance
- They also make big use of sigmoid and tanh activation
  - sigmoid to filter useless data
  - tanh to normalize

## Other RNN related classes in Keras
- StackedRNNCells
- CuDNNGRU
- CuDNNLSTM

### StackedRNNCells
A wrapper that allows an RNN unit to behave the same as a single cell (one time step)

### CUDNNGRU
Fast GRU implementation backed by cuDNN.



### CUDNNLSTM
Same as above

### NOTE on CUDNN
1. As of tensorflow 2.0, the CUDNN variant of GRU and LSTM is deprecated
2. Also, the base classes of GRU, LSTM will use the CUDNN implementation and GPU **IF** the requirements are met

Requirements are as follows:
```
The requirements to use the cuDNN implementation are:
  1. `activation` == `tanh`
  2. `recurrent_activation` == `sigmoid`
  3. `recurrent_dropout` == 0
  4. `unroll` is `False`
  5. `use_bias` is `True`
  6. Inputs are not masked or strictly right padded.
```
[Reference](https://github.com/tensorflow/tensorflow/blob/r2.1/tensorflow/python/keras/layers/recurrent_v2.py#L902)