# Variants of RNN cells
LSTM and GRU are two popular RNN cells now. GRU is more computationally efficient than LSTM without observable performance loss.
## LSTM unit
An lstm unit has three inputs: input at time slot $t$ $x_{t}$, internal state at time slot $t-1$ $c_{t-1}$, output at time slot $t-1$ $y_{t-1}$.

Get forget, input, output gates:
$$
f_{t}= \sigma(linear(x_{t}) \oplus linear(y_{t-1}))
$$

$$
i_{t}= \sigma(linear(x_{t}) \oplus linear(y_{t-1}))
$$

$$
o_{t}= \sigma(linear(x_{t}) \oplus linear(y_{t-1}))
$$
Process the input:
$$
c\_in = \tanh(linear(x_{t}) \oplus linear(y_{t-1}))
$$
update internal state $c_t$:
$$
c_t = f_{t} \odot c_{t-1} \oplus i_{t} \odot c\_in
$$
Control the output:
$$
y_{t} = o_{t} \odot \tanh(c_t)
$$

## GRU unit
Get the update, reset gates:
$$
z_{i} = \sigma(linear(x_{t}) \oplus linear(y_{t-1}))
$$

$$
r_{i} = \sigma(linear(x_{t}) \oplus linear(y_{t-1}))
$$
The candidate activiation:
$$
\tilde{y_{t}} = \tanh(linear(x_{t}) \oplus linear(r_{i}*y_{t-1})) 
$$
Get the updated output:
$$
y_{t} = (1 - z_{i})\odot y_{t-1} \oplus z_{i} \odot \tilde{y_{t}}
$$


*Note:*$\odot$ and $\oplus$ are element-wise computation.

In TensorFlow, a typical RNN cell return (output, hidden\_state). A LSTM cell returns ($y_t$,$c_t$), while a GRU cell returns ($y_{t}$, $y_{t}$). 

As to the $linear$ transformation, it takes a list of 2D, [batch, n], tensors. For each sample, the total dimemsion of it is [1,sum(n\_i)], and the transformation matrix W is in [sum(n\_i), output\_size]. After transformation, the output is in output\_size.