# 1. Recurrent layers
[Recurrent Neural Network] (RNN) is a class of neural network architectures where nodes in a layers have internal connections, allowing to express temporal behaviour. There are many types of RNN layers, but they all share the same architecture. The image below shows the information flow for an observation, or for a document in the context of NLP.

<img src='image/rnn_general.png' style='height:200px; margin:20px auto;'>

Here, each green cell $\mathbf{x}_t\in\mathbb{R}^{V\times 1}$ represents the embedding vector of a token, and each blue cell $\mathbf{h}_t\in\mathbb{R}^{D\times 1}$ represents an output vector. With the input sequence size is fixed at $T$, RNN adjusts itself to match the input length. The most important part of a RNN layer is the grey cell $A$ that repeats multiple times, being account for information processing. We can see that at a time step, the output value $\mathbf{h}_t$ is influenced by all previous steps $\mathbf{h}_{t-1},\mathbf{h}_{t-2},\dots$, besides the input $\mathbf{x}_t$. This design resembles *memory* and enables RNN to capture sequential relationship.

[Recurrent Neural Network]: https://en.wikipedia.org/wiki/Recurrent_neural_network

## 1.1. Simple RNN
We call the vanilla architecture [Simple RNN] (1980s) to distinguish from the family name. RNN takes vectorized tokens as input, so that the input will be a tensor size $(N\times T\times V)$, where $N$ is the number of observations, $T$ is the sequence length and $V$ is the embedding size.

$$\mathbf{h}_t=\phi(\mathbf{W}_x\mathbf{x}_t+\mathbf{W}_h\mathbf{h}_{t-1}+\mathbf{b}_h)$$

where:
- $\mathbf{h}_t,\mathbf{h}_{t-1}\in\mathbb{R}$

<img src='image/rnn_cell.png' style='height:160px; margin:20px auto;'>

<code style='font-size:13px'><a href=https://www.tensorflow.org/api_docs/python/tf/keras/layers/SimpleRNN>SimpleRNN</a></code>

[Simple RNN]: https://en.wikipedia.org/wiki/Recurrent_neural_network

In [1]:
from sspipe import p, px
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
import tensorflow_hub as hub
import tensorflow_text as text

In [75]:
x = np.random.random((32,10,8))
y = np.random.random((32,))

In [79]:
rnn = layers.SimpleRNN(5)
y = rnn(x)
# display(y.shape)

In [80]:
rnn.weights

[<tf.Variable 'simple_rnn_48/simple_rnn_cell_49/kernel:0' shape=(8, 5) dtype=float32, numpy=
 array([[ 0.30787557,  0.279916  ,  0.3244257 , -0.17942247,  0.6445712 ],
        [-0.42084822, -0.2789095 , -0.14702582, -0.48875207, -0.32214996],
        [-0.1430338 ,  0.6396812 ,  0.06047022,  0.31589776,  0.6563163 ],
        [-0.01988047, -0.5084378 , -0.44210398,  0.24137521, -0.29147363],
        [ 0.5190995 , -0.5167712 , -0.43493587, -0.2736781 ,  0.6236826 ],
        [-0.26117405, -0.09863216, -0.39324647, -0.36624175,  0.22229993],
        [-0.59262437, -0.4157125 , -0.3820259 , -0.46295422, -0.5645746 ],
        [-0.6424703 ,  0.26339567, -0.3313635 ,  0.385486  ,  0.42365873]],
       dtype=float32)>,
 <tf.Variable 'simple_rnn_48/simple_rnn_cell_49/recurrent_kernel:0' shape=(5, 5) dtype=float32, numpy=
 array([[-0.08004582,  0.16700688, -0.3425147 , -0.91969395,  0.05048035],
        [-0.6168605 , -0.09782366, -0.7034774 ,  0.30595028,  0.14638162],
        [ 0.2573356 ,  0.1785

In [73]:
model = keras.Sequential([
    layers.SimpleRNN(5, input_shape=(10,8), return_sequences=True),
    layers.SimpleRNN(3),
    layers.Dense(10)
])
model.compile(loss='mse', optimizer='adam')
model.summary()

Model: "sequential_26"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn_44 (SimpleRNN)   (None, 10, 5)             70        
                                                                 
 simple_rnn_45 (SimpleRNN)   (None, 3)                 27        
                                                                 
 dense_11 (Dense)            (None, 10)                40        
                                                                 
Total params: 137
Trainable params: 137
Non-trainable params: 0
_________________________________________________________________


In [74]:
3*(1+3+5)

27

In [52]:
5*8 + 5*5 + 5

70

In [35]:
model.weights

[<tf.Variable 'simple_rnn_15/simple_rnn_cell_15/kernel:0' shape=(8, 4) dtype=float32, numpy=
 array([[-0.12565374, -0.17764747,  0.19202441,  0.00790167],
        [ 0.42867404,  0.17833441, -0.6105001 ,  0.49296635],
        [-0.40300983,  0.3589633 , -0.26932156, -0.41699216],
        [-0.28938767,  0.16760606, -0.34825772,  0.5725295 ],
        [ 0.3798756 , -0.45982867,  0.30792862, -0.05597204],
        [ 0.39521652,  0.21937352,  0.3404147 ,  0.5420316 ],
        [ 0.37472934,  0.37950474, -0.04657042,  0.18962324],
        [ 0.37935108,  0.24689758, -0.4418146 , -0.25006822]],
       dtype=float32)>,
 <tf.Variable 'simple_rnn_15/simple_rnn_cell_15/recurrent_kernel:0' shape=(4, 4) dtype=float32, numpy=
 array([[-0.80420995, -0.04441664,  0.47590402,  0.35325453],
        [ 0.07364351, -0.4747402 ,  0.57299924, -0.6639806 ],
        [-0.29404065,  0.7626039 ,  0.0019677 , -0.57616967],
        [-0.51123667, -0.4371317 , -0.66722065, -0.31995237]],
       dtype=float32)>,
 <tf.Varia

In [25]:
_ = [print(weight.shape) for weight in model.get_weights()]

(8, 4)
(4, 4)
(4,)


In [22]:
model.fit(x, y)



<keras.callbacks.History at 0x26f277c7ee0>

## 1.2. LSTM
[LSTM] (Long Short-Term Memory, 1997)

<img src='image/lstm_cell.png' style='height:320px; margin:20px auto;'>

<code style='font-size:13px'><a href=https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM>LSTM</a></code>

[LSTM]: https://en.wikipedia.org/wiki/Long_short-term_memory

<img src='image/lstm_steps.png' style='height:520px; margin:20px auto;'>

## 1.3. GRU
[Gated Recurrent Units] (GRU)

<img src='image/gru_cell.png' style='height:320px; margin:20px auto;'>

<code style='font-size:13px'><a href=https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRU>GRU</a></code>

[Gated Recurrent Units]: https://en.wikipedia.org/wiki/Gated_recurrent_unit

## 1.4. Bi-directional

<img src='image/rnn_bidirectional.png' style='height:280px; margin:20px auto;'>

# 2. Recurrent architectures

## 2.1. Seq2seq
[Seq2seq]

[Seq2seq]: https://en.wikipedia.org/wiki/Seq2seq

## 2.2. Attention
[Attention] implement
<code style='font-size:13px'><a href=https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention>Attention</a></code>

[Attention]: https://en.wikipedia.org/wiki/Attention_(machine_learning)

## 2.2. Transformer
[Transformer]

[Transformer]: https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)

# References
- *amitness.com - [Recurrent Keras layer](https://amitness.com/2020/04/recurrent-layers-keras/)*
- *colah.github.io - [Understanding LSTM Networks](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)*
- *distill.pub - [Memorization in RNNs](https://distill.pub/2019/memorization-in-rnns/)*
- *distill.pub - [Augumented RNNs](https://distill.pub/2016/augmented-rnns/)*
---
- https://www.kaggle.com/code/tanulsingh077/deep-learning-for-nlp-zero-to-transformers-bert
- https://www.kaggle.com/code/kredy10/simple-lstm-for-text-classification