<a href="https://colab.research.google.com/github/smfelixchoi/Deep-Learning/blob/master/No.%20of%20params%20in%20RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How can we count the number of parameters in RNN?

Reference (Image)

Vanilla RNN: https://ytd2525.wordpress.com/2016/08/03/understanding-deriving-and-extending-the-lstm/

LSTM & GRU: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

TensorFlow only counts the parameters in the RNN Cell, that is, parameters in the hidden layer.

### 1. Vanilla RNN

$$ 
\large
\begin{align*}
y_{t} &= W_{hy}h_{t} \\
h_{t} &= \tanh(W_{hh}h_{t-1} + W_{xh}x_{h} + b_h)
\end{align*}
$$

### 2. LSTM

$$
\large
\begin{align}
f_{t} &= \sigma(W_{xh}^{f}x_{t} + W_{hh}^{f}h_{t-1} + b^{h}) \tag{forget gate} \\
i_{t} &= \sigma(W_{xh}^{i}x_{t} + W_{hh}^{i}h_{t-1} + b^{i}) \tag{input gate} \\
\tilde C_{t} &= \tanh(W_{xh}^{\tilde C}x_{t} + W_{hh}^{\tilde C}h_{t-1} + b^{\tilde C}) \tag{Vanilla RNN} \\
C_{t} &= f_{t} \odot C_{t-1} + i_{t} \odot \tilde C_{t} \tag{cell state} \\
o_{t} &= \sigma(W_{xh}^{o}x_{t} + W_{hh}^{o}h_{t-1} + b^{o}) \tag{output gate} \\
h_{t} &= o_{t} \odot \tanh(C_t) \tag{hidden state}
\end{align}
\\ \odot: \text{element-wise product}
$$

### 3. GRU

$$
\large
\begin{align}
r_{t} &= \sigma(W_{r}x_t + U_{r}h_{t-1} + b_{r}) \tag{ reset gate } \\
z_{t} &= \sigma(W_{z}x_t + U_{z}h_{t-1} + b_{z}) \tag{ update gate } \\
\tilde h_{t} &= \tanh(W_{h}x_{t} + U_{h}(r_{t} \odot h_{t-1}) + b_{h}) \notag \\
h_{t} &= (1-z_{t}) \odot h_{t-1} + z_{t} \odot \tilde h_{t} \tag{ hidden state}
\end{align}
$$

In [None]:
## tf.keras.layers.GRU(
##    units, activation='tanh', recurrent_activation='sigmoid', use_bias=True,
##    kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal',
##    bias_initializer='zeros', kernel_regularizer=None, recurrent_regularizer=None,
##    bias_regularizer=None, activity_regularizer=None, kernel_constraint=None,
##    recurrent_constraint=None, bias_constraint=None, dropout=0.0,
##    recurrent_dropout=0.0, implementation=2, return_sequences=False,
##    return_state=False, go_backwards=False, stateful=False, unroll=False,
##    time_major=False, reset_after=True, **kwargs)

reset_after

GRU convention (whether to apply reset gate after or before matrix multiplication). False = "before", True = "after" (default and CuDNN compatible).

If reset_after = True, TensorFlow2 computes as follows:
$$
\large
\begin{align}
r_{t} &= \sigma(W_{r}x_t + b_{r1} + U_{r}h_{t-1} + b_{r2}) \tag{ reset gate } \\
z_{t} &= \sigma(W_{z}x_t + b_{z1} + U_{z}h_{t-1} + b_{z2}) \tag{ update gate } \\
\tilde h_{t} &= \tanh(W_{h}x_{t} + b_{h1} + U_{h}(r_{t} \odot h_{t-1}) + b_{h2}) \notag \\
h_{t} &= (1-z_{t}) \odot h_{t-1} + z_{t} \odot \tilde h_{t} \tag{ hidden state}
\end{align}
$$