## Learning to count sequences of ones with RNNs

Inspired from http://monik.in/a-noobs-guide-to-implementing-rnn-lstm-using-tensorflow/

In [1]:
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()

tf.__version__

'0.12.1'

In [2]:
n_seqs = 100000
seq_length = 5

training_seqs = np.random.randint(2, size=(n_seqs, seq_length, 1)).astype(np.float64)
targets = training_seqs.sum(axis=1, dtype=np.float64)

print(training_seqs[:3, :, 0])

print(targets[:3])

[[ 1.  1.  1.  0.  1.]
 [ 0.  1.  0.  0.  0.]
 [ 0.  0.  1.  0.  1.]]
[[ 4.]
 [ 1.]
 [ 2.]]


In [3]:
BATCH_SIZE = None

sequence_input = tf.placeholder(np.float64, [BATCH_SIZE, seq_length, 1])
target_input = tf.placeholder(np.float64, [BATCH_SIZE, 1])

## RNN model in TF
As explained in http://monik.in/a-noobs-guide-to-implementing-rnn-lstm-using-tensorflow/:

 * We first define a basic [RNNCell](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/RNNCell) architecture with a given number of hiddent units
 * The function [dynamic_rnn](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn) then unrolls the recursion and return the hidden states $h_t$ for all time steps given an input sequence $x$.
 
From http://colah.github.io/posts/2015-08-Understanding-LSTMs :

![](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png)

In [4]:
num_hidden = 10
# for 1.0 tf.contrib.rnn import LSTMCell
cell = tf.nn.rnn_cell.LSTMCell(num_units=num_hidden)

sequence_hidden_states, _ = tf.nn.dynamic_rnn(cell, inputs=sequence_input, dtype=np.float64, scope='unrolled_cells')

tf.global_variables_initializer().run()

print(training_seqs[:1, :, 0])

print(np.round(sequence_hidden_states.eval(feed_dict={sequence_input: training_seqs[:1]}), 2))

[[ 1.  1.  1.  0.  1.]]
[[[-0.12 -0.   -0.11 -0.08 -0.06 -0.04 -0.01 -0.01 -0.1  -0.  ]
  [-0.21 -0.   -0.16 -0.13 -0.1  -0.09 -0.03  0.01 -0.15 -0.  ]
  [-0.27 -0.01 -0.18 -0.17 -0.14 -0.12 -0.04  0.03 -0.19 -0.01]
  [-0.21 -0.03 -0.09 -0.15 -0.12 -0.16 -0.04  0.1  -0.13 -0.01]
  [-0.31 -0.04 -0.15 -0.14 -0.13 -0.12 -0.03  0.1  -0.16 -0.03]]]


## Predicting the sum at the end of the sequence

We are going to fit RNN cell inner weights (responsible to compute $h_t$) as well as the fully connected weights $W$ of size $H \times 1$ 
$$ y = \sum_t^T x_t \approx h_T W$$

In [5]:
last_hidden_state = tf.gather(tf.transpose(sequence_hidden_states, [1, 0, 2]), seq_length - 1)

np.round(last_hidden_state.eval(feed_dict={sequence_input: training_seqs[:1]}) , 2).tolist()

[[-0.31, -0.04, -0.15, -0.14, -0.13, -0.12, -0.03, 0.1, -0.16, -0.03]]

In [6]:
W = tf.Variable(tf.random_normal([num_hidden, 1], dtype=np.float64))
tf.global_variables_initializer().run()

prediction = tf.matmul(last_hidden_state, W)

prediction.eval(feed_dict={sequence_input: training_seqs[:1]})

array([[-0.3097567]])

## Training

Minimising the Mean Square error loss $ ||\sum_t^T x_t - h_T W||_2$ on our training data set

In [7]:
mse_loss = tf.reduce_mean(tf.squared_difference(prediction, target_input))
train_step = tf.train.AdamOptimizer().minimize(mse_loss)

tf.global_variables_initializer().run()

mse_loss.eval(feed_dict={sequence_input: training_seqs[:2], target_input: targets[:2]})

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


7.9947950011152678

In [28]:
BATCH_SIZE = 2048
N_BATCHES = 10

# may have to run several times
for i in range(N_BATCHES):
    batch_indexes = np.random.choice(n_seqs, BATCH_SIZE)
    batch_loss, _ = sess.run(
        [mse_loss, train_step],
        {sequence_input: training_seqs[batch_indexes], target_input: targets[batch_indexes]})
    
    print('Epoch {:2d} error {:3.1f}'.format(i + 1, batch_loss))

Epoch  1 error 0.3
Epoch  2 error 0.2
Epoch  3 error 0.3
Epoch  4 error 0.2
Epoch  5 error 0.2
Epoch  6 error 0.2
Epoch  7 error 0.2
Epoch  8 error 0.2
Epoch  9 error 0.2
Epoch 10 error 0.2


## Inspecting predictions for any sequence

It's neither very good or impressive! Possibly turning the prediction into a multi-class problem may help (as done in the initial blog post). 

The gist is that we are able to map a sequence of variable length into a fixed size representation that we can use as an input feature to a supervised task.


In [32]:
prediction.eval(feed_dict={sequence_input: np.array([1, 0, 0, 0, 1]).reshape((1, -1, 1))})

array([[ 2.25796719]])