In [3]:
# Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
# https://arxiv.org/abs/1412.3555

GRU:

1. Update Gate: $z_t = z(x_t, h_{t-1})=\sigma(f(x_t, h_{t-1})) = \sigma(W^{(z)}_{xh}x_t + U^{(z)}_{hh}h_{t-1})$

1. Reset Gate: $r_t = r(x_t, h_{t-1})=\sigma(g(x_t, h_{t-1})) = \sigma(W^{(r)}_{xh}x_t + U^{(r)}_{hh}h_{t-1})$

1. New: $\tilde{h}_t = tanh(W^{(\tilde{h})}_{xh}x_t + r_t\circ U^{\tilde{h}}_{hh}h_{t-1})$

1. Final: $h_t = z_t \circ h_{t-1} + (1-z_t)\circ\tilde{h_t}$

In English:
1. The parameter inputs to the update gate and reset gate are both $x_t$ and $h_{t-1}$. 
2. The resulting activation after the application of matrices W and U, (unique to $z()$ and $r()$) are squashed between $0$ and $1$ by the sigmoid function allowing the resulting value to act as a "gate", where ~$1$ is "on", and ~$0$ is "off" with values in between reflecting differing magnitudes of "on"-ness.
3. if $r_t$ is $0$ then $\tilde{h}_t = tanh(W^{(\tilde{h})}_{xh}x_t + r_t\circ U^{\tilde{h}}_{hh}h_{t-1})$ becomes $\tilde{h}_t = tanh(W^{(\tilde{h})}_{xh}x_t)$ and we forget the past and we use the input, $x_t$ mapped to the hidden state for time $t$ for the final computation of the hidden state, $h_t$. 
4. The update gate, $z_t$, controls the split between $h_t$ and $\tilde{h}_t$. (Note: that $z_t$ closer to $1$ helps to moderate the vanishing gradient problem.)

LSTM:

Calculate the gates:
1. Input Gate: $i_t = input(x_t, h_{t-1})=\sigma(f(x_t, h_{t-1})) = \sigma(W^{(input)}_{xh}x_t + U^{(input)}_{hh}h_{t-1})$
1. Forget Gate: $f_t = forget(x_t, h_{t-1})=\sigma(g(x_t, h_{t-1})) = \sigma(W^{(forget)}_{xh}x_t + U^{(forget)}_{hh}h_{t-1})$
1. Output Gate: $o_t = output(x_t, h_{t-1})=\sigma(h(x_t, h_{t-1})) = \sigma(W^{(output)}_{xh}x_t + U^{(output)}_{hh}h_{t-1})$

Calculate new memory cell:
1. New: $\tilde{c_t} = tanh(c(x_t, h_{t-1})) = tanh(W^{(c)}_{xh}x_t + U^{(c)}_{hh}h_{t-1})$

Use the gate, $f_t$, to control how much of the past memory state, $c_{t-1}$, to "forget", and use gate, $i_t$, to control how much of the new memory state, $\tilde{c_t}$ we want to remember. The gate, $o_t$, finally controls how much of the final memory cell gets "out".
1. Final memory cell: $c_t = f_t \circ c_{t-1} + i_t \circ \tilde{c_t}$
1. Final hidden state: $h_t = o_t \circ tanh(c_t)$