# https://medium.com/@erikhallstrm/using-the-tensorflow-multilayered-lstm-api-f6e7da7bbe40

# Using the Multilayered LSTM API in TensorFlow (4/7)
In the previous article we learned how to use the TensorFlow API to create a Recurrent neural network with Long short-term memory.
In this post we will make that architecture deep, introducing a LSTM with multiple layers.

One thing to notice is that for every layer of the network we will need a hidden state and a cell state.
Typically the input to the next LSTM-layer will be the previous state for that particular layer as well as the hidden activations of the “lower” or previous layer.
There is a good diagram in this article: https://arxiv.org/pdf/1409.2329v5.pdf

We could continue to store the states for each layer in many LSTMTuples, but that would require a lot of overhead.
You can only input data to the placeholders trough the feed_dict as Python lists or Numpy arrays anyways (not as LSTMTuples) so we still would have to convert between the datatypes.
Why not save the whole state for the network in a big tensor?
In order to do this the first thing we want to do is to replace `_current_cell_state` and `_current_hidden_state` on line 81–82 with the more generic:
`_current_state = np.zeros((num_layers, 2, batch_size, state_size))`

You also have to declare the new setting `num_layers = 3` in the beginning of the file, but you may choose any number of layers.
The “2” refers to the two states, cell- and hidden-state.
So for each layer and each sample in a batch, we have both a cell state and a hidden state vector with the size `state_size`.

Now modify lines 93 to 103 (the run function and the separation of the state tuple) back to the original statement, since the state is now stored in a single tensor.
```
_total_loss, _train_step, _current_state, _predictions_series = sess.run(
    [total_loss, train_step, current_state, predictions_series],
    feed_dict={
        batchX_placeholder: batchX,
        batchY_placeholder: batchY,
        init_state: _current_state
    })
```

You can change these lines 28 to 30 in the previous post:
```
cell_state = tf.placeholder(tf.float32, [batch_size, state_size])
hidden_state = tf.placeholder(tf.float32, [batch_size, state_size])
init_state = tf.nn.rnn_cell.LSTMStateTuple(cell_state, hidden_state)
```

To a single placeholder containing the whole state.
`init_state = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])`

Since the TensorFlow Multilayer-LSTM-API accepts the state as a tuple of LSTMTuples, we need to unpack the state state into this structure.
For each layer in the state we then create a LSTMTuple stated, and put these in a tuple, as shown below.
Add this just after the `init_state` placeholder.
```
state_per_layer_list = tf.unpack(init_state, axis=0)
rnn_tuple_state = tuple(
    [tf.nn.rnn_cell.LSTMStateTuple(state_per_layer_list[idx][0], state_per_layer_list[idx][1])
     for idx in range(num_layers)]
)
```
The forward pass on lines 40 and 41 should be changed to this:
```
# Forward passes
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
states_series, current_state = tf.nn.rnn(cell, inputs_series, initial_state=rnn_tuple_state)
```
The multi-layered LSTM is created by first making a single LSMTCell, and then duplicating this cell in an array, supplying it to the MultiRNNCell API call.
The forward pass uses the usual `tf.nn.rnn`, let’s print the output of this function, the `states_series` and `current_state` variables.

![Output of the previous states and the last LSTMStateTuples](pics/1__PY2IpauSp2AMQva0XQBvw.png)

Take a look at the tensor names between single quotes, we see that the RNN is unrolled 15 times.
<u>In the `states_series` all outputs have the name “Cell2”, it means that we get the output of the last LSTM layer’s hidden state in the list.
Furthermore the `LSTMStateTuple` in the `current_state` gives the whole state of all layers in the network.
“Cell0” refers to the first layer, “Cell1” to the second and “Cell2” to the third and final layer, “h” and “c” refers to hidden- and cell state.</u>

In [1]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

num_epochs = 5 #100
total_series_length = 50000
truncated_backprop_length = 15
state_size = 4
num_classes = 2
echo_step = 3
batch_size = 5
num_batches = total_series_length//batch_size//truncated_backprop_length
num_layers = 3

def generateData():
    x = np.array(np.random.choice(2, total_series_length, p=[0.5, 0.5]))
    y = np.roll(x, echo_step)
    y[0:echo_step] = 0

    x = x.reshape((batch_size, -1))  # The first index changing slowest, subseries as rows
    y = y.reshape((batch_size, -1))

    return (x, y)

batchX_placeholder = tf.placeholder(tf.float32, [batch_size, truncated_backprop_length])
batchY_placeholder = tf.placeholder(tf.int32, [batch_size, truncated_backprop_length])

init_state = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])

state_per_layer_list = tf.unstack(init_state, axis=0)
rnn_tuple_state = tuple(
    [tf.nn.rnn_cell.LSTMStateTuple(state_per_layer_list[idx][0], state_per_layer_list[idx][1])
     for idx in range(num_layers)]
)

W2 = tf.Variable(np.random.rand(state_size, num_classes),dtype=tf.float32)
b2 = tf.Variable(np.zeros((1,num_classes)), dtype=tf.float32)

# Unpack columns
inputs_series = tf.split(batchX_placeholder, truncated_backprop_length, 1)
labels_series = tf.unstack(batchY_placeholder, axis=1)

# Forward passes
#cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
#cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)

stacked_rnn = []
for _ in range(num_layers):
    stacked_rnn.append(tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True))
cell = tf.nn.rnn_cell.MultiRNNCell(stacked_rnn, state_is_tuple=True)

states_series, current_state = tf.contrib.rnn.static_rnn(cell, inputs_series, initial_state=rnn_tuple_state)

logits_series = [tf.matmul(state, W2) + b2 for state in states_series] #Broadcasted addition
predictions_series = [tf.nn.softmax(logits) for logits in logits_series]

losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels) for logits, labels in zip(logits_series,labels_series)]
total_loss = tf.reduce_mean(losses)

train_step = tf.train.AdagradOptimizer(0.3).minimize(total_loss)

def plot(loss_list, predictions_series, batchX, batchY):
    plt.subplot(2, 3, 1)
    plt.cla()
    plt.plot(loss_list)

    for batch_series_idx in range(5):
        one_hot_output_series = np.array(predictions_series)[:, batch_series_idx, :]
        single_output_series = np.array([(1 if out[0] < 0.5 else 0) for out in one_hot_output_series])

        plt.subplot(2, 3, batch_series_idx + 2)
        plt.cla()
        plt.axis([0, truncated_backprop_length, 0, 2])
        left_offset = range(truncated_backprop_length)
        plt.bar(left_offset, batchX[batch_series_idx, :], width=1, color="blue")
        plt.bar(left_offset, batchY[batch_series_idx, :] * 0.5, width=1, color="red")
        plt.bar(left_offset, single_output_series * 0.3, width=1, color="green")

    plt.draw()
    plt.pause(0.0001)


with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    plt.ion()
    plt.figure()
    plt.show()
    loss_list = []

    for epoch_idx in range(num_epochs):
        x,y = generateData()

        _current_state = np.zeros((num_layers, 2, batch_size, state_size))

        print("New data, epoch", epoch_idx)

        for batch_idx in range(num_batches):
            start_idx = batch_idx * truncated_backprop_length
            end_idx = start_idx + truncated_backprop_length

            batchX = x[:,start_idx:end_idx]
            batchY = y[:,start_idx:end_idx]

            _total_loss, _train_step, _current_state, _predictions_series = sess.run(
                [total_loss, train_step, current_state, predictions_series],
                feed_dict={
                    batchX_placeholder: batchX,
                    batchY_placeholder: batchY,
                    init_state: _current_state
                })


            loss_list.append(_total_loss)

            if batch_idx%100 == 0:
                print("Step",batch_idx, "Batch loss", _total_loss)
                plot(loss_list, _predictions_series, batchX, batchY)

plt.ioff()
plt.show()


<matplotlib.figure.Figure at 0x21d6b3971d0>

New data, epoch 0
Step 0 Batch loss 0.69395506


<matplotlib.figure.Figure at 0x21d6cfb55c0>

Step 100 Batch loss 0.66662335


<matplotlib.figure.Figure at 0x21dee6e32e8>

Step 200 Batch loss 0.5226827


<matplotlib.figure.Figure at 0x21dee67d3c8>

Step 300 Batch loss 0.44879404


<matplotlib.figure.Figure at 0x21d6ec2fc18>

Step 400 Batch loss 0.25740954


<matplotlib.figure.Figure at 0x21d6ec278d0>

Step 500 Batch loss 0.2633376


<matplotlib.figure.Figure at 0x21ded118128>

Step 600 Batch loss 0.008374205


<matplotlib.figure.Figure at 0x21dee7b20b8>

New data, epoch 1
Step 0 Batch loss 0.32695904


<matplotlib.figure.Figure at 0x21e000b02e8>

Step 100 Batch loss 0.002164044


<matplotlib.figure.Figure at 0x21e0a274908>

Step 200 Batch loss 0.0018318464


<matplotlib.figure.Figure at 0x21e0a450710>

Step 300 Batch loss 0.0013997598


<matplotlib.figure.Figure at 0x21e0a2715c0>

Step 400 Batch loss 0.0011014303


<matplotlib.figure.Figure at 0x21dee375160>

Step 500 Batch loss 0.0009138199


<matplotlib.figure.Figure at 0x21ded271208>

Step 600 Batch loss 0.0007049917


<matplotlib.figure.Figure at 0x21d6ec3b710>

New data, epoch 2
Step 0 Batch loss 0.507721


<matplotlib.figure.Figure at 0x21decfacac8>

Step 100 Batch loss 0.00062423915


<matplotlib.figure.Figure at 0x21dee729c18>

Step 200 Batch loss 0.00056607067


<matplotlib.figure.Figure at 0x21ded2dbb00>

Step 300 Batch loss 0.0005106461


<matplotlib.figure.Figure at 0x21decf38128>

Step 400 Batch loss 0.00046286325


<matplotlib.figure.Figure at 0x21dee611668>

Step 500 Batch loss 0.00042856814


<matplotlib.figure.Figure at 0x21ded1749b0>

Step 600 Batch loss 0.00038027245


<matplotlib.figure.Figure at 0x21ded349dd8>

New data, epoch 3
Step 0 Batch loss 0.81967944


<matplotlib.figure.Figure at 0x21e0a4c9128>

Step 100 Batch loss 0.00044142202


<matplotlib.figure.Figure at 0x21ded321a90>

Step 200 Batch loss 0.00036821366


<matplotlib.figure.Figure at 0x21d6ec65940>

Step 300 Batch loss 0.00032491906


<matplotlib.figure.Figure at 0x21e0a470198>

Step 400 Batch loss 0.00030739343


<matplotlib.figure.Figure at 0x21dee824f60>

Step 500 Batch loss 0.00028551513


<matplotlib.figure.Figure at 0x21decf34dd8>

Step 600 Batch loss 0.00026742835


<matplotlib.figure.Figure at 0x21ded2e3e80>

New data, epoch 4
Step 0 Batch loss 0.614921


<matplotlib.figure.Figure at 0x21ded285f60>

Step 100 Batch loss 0.00034183403


<matplotlib.figure.Figure at 0x21dee631940>

Step 200 Batch loss 0.00029021865


<matplotlib.figure.Figure at 0x21decfb2438>

Step 300 Batch loss 0.00023091346


<matplotlib.figure.Figure at 0x21dee7ba0b8>

Step 400 Batch loss 0.00024590114


<matplotlib.figure.Figure at 0x21e05893358>

Step 500 Batch loss 0.00021660494


<matplotlib.figure.Figure at 0x21e0a4d16d8>

Step 600 Batch loss 0.00021964003


<matplotlib.figure.Figure at 0x21ded21d588>