<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Libraries" data-toc-modified-id="Libraries-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Libraries</a></span></li><li><span><a href="#Introduction" data-toc-modified-id="Introduction-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Setup" data-toc-modified-id="Setup-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Setup</a></span></li><li><span><a href="#Calculations" data-toc-modified-id="Calculations-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Calculations</a></span></li><li><span><a href="#Conclusion" data-toc-modified-id="Conclusion-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Conclusion</a></span></li></ul></div>

# Libraries

In [1]:
import tensorflow as tf
import numpy as np
from tensorflow.python.keras.layers import LSTM

In [2]:
sess = tf.InteractiveSession()

# Introduction

This notebook shows how Keras implements the LSTM network described in [Denny Britz's tutorial](http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/) by walking through the calculations using a trivial example.

# Setup

Specify the size of the hidden state:

In [3]:
n_hidden = 1

Create an LSTM cell:

In [4]:
lstm_cell = LSTM(units=n_hidden, activation='tanh', recurrent_activation='sigmoid', return_sequences=True, return_state=True, implementation=1)

Create some dummy data to be passed into the LSTM cell:

In [5]:
x_i = np.array(2).reshape((1, 1, 1))
x_i = tf.constant(x_i, tf.float32)
x_i.eval()

array([[[ 2.]]], dtype=float32)

Specify some weights in the form Keras expects:

In [6]:
lstm_cell.build(x_i.shape)

weights_kernel = np.array([0, 1, 4, 1]).reshape((1,4))
weights_recurrent = np.array([3, 2, -5, 4]).reshape((1,4))
weights_bias = np.array([1, 2, 3, 2]).reshape((4,))

In [7]:
lstm_cell.set_weights([weights_kernel, weights_recurrent, weights_bias])

Compute the outputs using Keras:

In [8]:
output_keras, h_t_keras, c_t_keras = lstm_cell(x_i)

Instructions for updating:
keep_dims is deprecated, use keepdims instead


Check that ouput is the same as h_t:

In [9]:
tf.assert_equal(output_keras, h_t_keras).run()

# Calculations

Reshape weights into matrices:

In [10]:
U_i = weights_kernel[:, :n_hidden]
U_f = weights_kernel[:, n_hidden:n_hidden * 2]
U_g = weights_kernel[:, n_hidden * 2:n_hidden * 3]
U_o = weights_kernel[:, n_hidden * 3:]

In [11]:
W_i = weights_recurrent[:, :n_hidden]
W_f = weights_recurrent[:, n_hidden:n_hidden * 2]
W_g = weights_recurrent[:, n_hidden * 2:n_hidden * 3]
W_o = weights_recurrent[:, n_hidden * 3:]

In [12]:
b_i = weights_bias[:n_hidden]
b_f = weights_bias[n_hidden:n_hidden * 2]
b_g = weights_bias[n_hidden * 2:n_hidden * 3]
b_o = weights_bias[n_hidden * 3:]

Set the initial states:

In [13]:
s_0 = np.zeros(x_i.shape, np.float32)
c_0 = np.zeros(x_i.shape, np.float32)

Compute the gates:

In [14]:
i = tf.sigmoid(x_i * U_i  + s_0 * W_i + b_i)
f = tf.sigmoid(x_i * U_f  + s_0 * W_f + b_f)
o = tf.sigmoid(x_i * U_o  + s_0 * W_o + b_o)
g = tf.tanh(   x_i * U_g  + s_0 * W_g + b_g)


Check for c_t:

In [15]:
c_t_manual = tf.multiply(c_0, f) + tf.multiply(g, i)
c_t_manual.eval()

array([[[ 0.7310586]]], dtype=float32)

In [16]:
c_t_keras.eval()

array([[ 0.7310586]], dtype=float32)

In [17]:
tf.assert_equal(c_t_manual, c_t_keras).run()

Check for h_t:

In [18]:
h_t_manual = tf.multiply(tf.tanh(c_t_manual), o)
h_t_manual.eval()

array([[[ 0.61249435]]], dtype=float32)

In [19]:
h_t_keras.eval()

array([[ 0.61249435]], dtype=float32)

In [20]:
tf.assert_equal(h_t_manual, h_t_keras).run()

# Conclusion

1. In Keras, the output of LSTM() is the last hidden state.
2. In Keras, kernel weights are the weights that are multiplied with the data, $x_t$ and the recurrent weights are the weights that are multiplied with the previous hidden state.
2. g is "candidate" memory while $c_t$ is the final memory that gets sent to the next LSTM cell.
3. The only outputs of an LSTM cell at step $t$ are $h_t$ and $c_t$.
4. Number of parameters in an LSTM network:

$\text{Parameters for kernel matrices} +\text{Parameters for recurrent matrices}+\text{Parameters for biases}$

where:

$$\begin{eqnarray*}
\text{Parameters for kernel matrices} & = & 4\times\left(\text{input dimension}\times\text{hidden state dimension}\right)\\
\text{Parameters for recurrent matrices} & = & 4\times\left(\text{hidden state dimension}\times \text{hidden state dimension}\right)\\
\text{Parameters for biases} & = & 4\times\text{hidden state dimension}
\end{eqnarray*}
$$