In [1]:
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, RepeatVector, TimeDistributed, Dense, Activation

Using TensorFlow backend.


## Learning to add number with a recurrent neural network

Long Short-Term Memory (LSTM) networks are a type of Recurrent Neural Network (RNN) that are capable of learning the relationships between elements in an input sequence. A good demonstration of LSTMs is to learn how to combine multiple terms together using a mathematical operation such as a sum and outputting the result of the calculation. This is called "sequence to sequence learning for performing addition" as in http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf and effectively, we indeed teach the machine to add numbers.

Let us see how this works. First, we create a Python class that allows to link characters (0, 1, 2,...) and one-hot-encoded categories, as well as to perform the reverse operation. This will save us a lot of time.

In [2]:
class CharacterTable(object):
    def __init__(self, chars):
        """Initialize character table.
        # Arguments
            chars: Characters that can appear in the input.
        """
        self.chars = sorted(set(chars))
        self.char_indices = dict((c, i) for i, c in enumerate(self.chars))
        self.indices_char = dict((i, c) for i, c in enumerate(self.chars))

    def encode(self, C, num_rows):
        """One hot encode given string C.
        # Arguments
            num_rows: Number of rows in the returned one hot encoding. This is
                used to keep the # of rows for each data the same.
        """
        x = np.zeros((num_rows, len(self.chars)))
        for i, c in enumerate(C):
            x[i, self.char_indices[c]] = 1
        return x

    def decode(self, x, calc_argmax=True):
        if calc_argmax:
            x = x.argmax(axis=-1)
        return ''.join(self.indices_char[x] for x in x)

    
# All the numbers, plus sign and space for padding.
chars = '0123456789+ '
ctable = CharacterTable(chars)

Let us see how it works, for instance for '12+89'. 

In [3]:
digits = 2
# Maximum length of input is 'int + int' (e.g., '12+89'). Maximum length of int is digits.
max_len = digits + 1 + digits
coded = ctable.encode('12+89', max_len)
print(coded)

[[ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]]


In [4]:
print(ctable.decode(coded))

a = ctable.decode(coded)
for i in range(max_len):
    print(i, " ", a[i])

12+89
0   1
1   2
2   +
3   8
4   9


Now, that this is done, we create a training set, with a lot of sentences and answers.

Note that a trick (proposed in see http://arxiv.org/abs/1410.4615) is to revert the order of the characters in the questions, which was shown to improve performance. You may try with and without!

In [5]:
# Parameters for the model and dataset.
training_size = 5000
invert = True 

questions = []
expected = []
seen = set()

print('Generating data...')
while len(questions) < training_size:
    f = lambda: int(''.join(np.random.choice(list('0123456789'))
                    for i in range(np.random.randint(1, digits + 1))))
    a, b = f(), f()
    
    # Skip any addition questions we've already seen
    # Also skip any such that x+Y == Y+x (hence the sorting).
    key = tuple(sorted((a, b)))
    if key in seen:
        continue
    seen.add(key)
    
    # Pad the data with spaces such that it is always max_len.
    q = '{}+{}'.format(a, b)
    query = q + ' ' * (max_len - len(q))
    ans = str(a + b)
    
    # Answers can be of maximum size digits + 1.
    ans += ' ' * (digits + 1 - len(ans))
    if invert:
        # As recommended in  see http://arxiv.org/abs/1410.4615
        # reverse the query, e.g., '12+345  ' becomes '  543+21'. (Note the space used for padding.)
        query = query[::-1]
    questions.append(query)
    expected.append(ans)
    
print('Total addition questions:', len(questions))
print('Examples :')
for i in range(0,10):
    q = questions[i]
    if invert:
        # reverse the query for printing
        q = q[::-1]
    print(q, " ", expected[i])

Generating data...
Total addition questions: 5000
Examples :
17+4    21 
93+6    99 
71+7    78 
89+42   131
73+9    82 
6+11    17 
98+96   194
73+7    80 
6+3     9  
3+61    64 


Now, we rewrite those with a one-hot encoding using our object

In [6]:
print('translation from characteres to one-hot encoded question and answers...')
x = np.zeros((len(questions), max_len, len(chars)), dtype=np.bool)
y = np.zeros((len(questions), digits + 1, len(chars)), dtype=np.bool)
for i, sentence in enumerate(questions):
    x[i] = ctable.encode(sentence, max_len)
for i, sentence in enumerate(expected):
    y[i] = ctable.encode(sentence, digits + 1)

# Shuffle (x, y) in unison as the later parts of x will almost all be larger digits.
indices = np.arange(len(y))
np.random.shuffle(indices)
x = x[indices]
y = y[indices]

# Explicitly set apart 10% for validation data that we never train over.
split_at = len(x) - len(x) // 10
(x_train, x_val) = x[:split_at], x[split_at:]
(y_train, y_val) = y[:split_at], y[split_at:]

print('Training Data:')
print(x_train.shape)
print(y_train.shape)

print('Validation Data:')
print(x_val.shape)
print(y_val.shape)

translation from characteres to one-hot encoded question and answers...
Training Data:
(4500, 5, 12)
(4500, 3, 12)
Validation Data:
(500, 5, 12)
(500, 3, 12)


Now, the first example reads:

In [7]:
x_train[0,:,:]
y_train[0:,:]

array([[[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [ True, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [ True, False, False, ..., False, False, False]],

       ..., 
       [[False, False, False, ..., False,  True, False],
        [False, False, False, ..., False, False, False],
        [ True, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [ True, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        

Let us see now how to create a recurrent neural network. You can try to add more layers, or to use a simple RNN instead of a LSTM.

It is useful to think of our network as encoder and a decoder. To build our LSTM, we first use a LSTM *encoder* to turn our input sequences into a single vector that contains information about the entire sequence. Then we repeat this vector $n$ times (where $n$ is the number of timesteps in the output sequence, in our case `digits + 1`). Then we run a LSTM decoder to turn this constant sequence into the target sequence.

First we create the encoder part: We *encode* the input sequence using an RNN, producing an output of `hidden_size`. The output is the last last hidden state of the RNN.

In [8]:
hidden_size = 128

model = Sequential()
model.add(LSTM(hidden_size, input_shape=(max_len, len(chars))))

Now we create the decoder part. As the decoder RNN's input, we will repeatedly provide with the last hidden state of RNN for each time step. This means that we should repeat `digits + 1` times as that's the maximum length of output, e.g., when `digits = 3`, max output is 999+999 = 1998. We let the possiblity to add more than one layer. Note that by setting `return_sequences` to `True`, we return not only the last output but all the outputs so far in the form of `(num_samples, timesteps, output_dim)`. This is necessary as `TimeDistributed` below expects the first dimension to be the number of time steps.

In [9]:
layers = 1
model.add(RepeatVector(digits + 1))
for _ in range(layers):
    model.add(LSTM(hidden_size, return_sequences=True))

Finally, we apply a dense layer to the every temporal slice of an input. For each step of the output sequence, it decides which character should be chosen.

In [10]:
model.add(TimeDistributed(Dense(len(chars))))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

Now we can look to the model that  we have created:

In [11]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 128)               72192     
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 3, 128)            0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 3, 128)            131584    
_________________________________________________________________
time_distributed_1 (TimeDist (None, 3, 12)             1548      
_________________________________________________________________
activation_1 (Activation)    (None, 3, 12)             0         
Total params: 205,324
Trainable params: 205,324
Non-trainable params: 0
_________________________________________________________________


What is left to do is to train the neural network. We train the model and at each generation we show predictions against the validation dataset by selecting $10$ samples from the validation set at random so we can visualize errors.

In [12]:
batch_size = 128
problems = np.random.randint(0, len(x_val), 10)

for iteration in range(100):
    print('\nIteration', iteration)
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=1,
              validation_data=(x_val, y_val))
   
    # Show output of ten problems chosen at random
    for i in problems:
        question = ctable.decode(x_val[i])
        pred = model.predict_classes(x_val[[i]])
        correct = ctable.decode(y_val[i])
        guess = ctable.decode(pred[0], calc_argmax=False)
        
        print('%s = %s' % (question[::-1] if invert else question, guess), end=' ')
        if correct == guess:
            print('-- ☑')
        else:
            print('-- ☒ %s' % (correct))
        
    # Count the fraction of good predictions
    preds = model.predict_classes(x_val)
    correct = np.array([ctable.decode(y_val[i]) for i in range(len(y_val))])   
    guess = np.array([ctable.decode(preds[i], calc_argmax=False) for i in range(len(y_val))])    
    print("Fraction well predicted: ", np.mean(correct == guess))


Iteration 0
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 1   -- ☒ 83 
68+16 = 11  -- ☒ 84 
45+79 = 11  -- ☒ 124
28+41 = 1   -- ☒ 69 
2+99  = 11  -- ☒ 101
28+34 = 11  -- ☒ 62 
2+64  = 1   -- ☒ 66 
36+63 = 11  -- ☒ 99 
56+84 = 11  -- ☒ 140
92+44 = 11  -- ☒ 136
Fraction well predicted:  0.002

Iteration 1
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 11  -- ☒ 83 
68+16 = 11  -- ☒ 84 
45+79 = 116 -- ☒ 124
28+41 = 11  -- ☒ 69 
2+99  = 16  -- ☒ 101
28+34 = 11  -- ☒ 62 
2+64  = 16  -- ☒ 66 
36+63 = 11  -- ☒ 99 
56+84 = 116 -- ☒ 140
92+44 = 11  -- ☒ 136
Fraction well predicted:  0.008

Iteration 2
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 12  -- ☒ 83 
68+16 = 122 -- ☒ 84 
45+79 = 111 -- ☒ 124
28+41 = 12  -- ☒ 69 
2+99  = 12  -- ☒ 101
28+34 = 12  -- ☒ 62 
2+64  = 12  -- ☒ 66 
36+63 = 12  -- ☒ 99 
56+84 = 122 -- ☒ 140
92+44 = 122 -- ☒ 136
Fraction well predicted:  0.008

Iteration 3
Train on 4500 samples, validate on 500 samples
Epoc

Fraction well predicted:  0.042

Iteration 19
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 87  -- ☒ 83 
68+16 = 97  -- ☒ 84 
45+79 = 122 -- ☒ 124
28+41 = 77  -- ☒ 69 
2+99  = 57  -- ☒ 101
28+34 = 77  -- ☒ 62 
2+64  = 37  -- ☒ 66 
36+63 = 101 -- ☒ 99 
56+84 = 147 -- ☒ 140
92+44 = 137 -- ☒ 136
Fraction well predicted:  0.048

Iteration 20
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 74  -- ☒ 83 
68+16 = 87  -- ☒ 84 
45+79 = 122 -- ☒ 124
28+41 = 72  -- ☒ 69 
2+99  = 59  -- ☒ 101
28+34 = 62  -- ☑
2+64  = 32  -- ☒ 66 
36+63 = 90  -- ☒ 99 
56+84 = 132 -- ☒ 140
92+44 = 122 -- ☒ 136
Fraction well predicted:  0.032

Iteration 21
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 77  -- ☒ 83 
68+16 = 87  -- ☒ 84 
45+79 = 122 -- ☒ 124
28+41 = 77  -- ☒ 69 
2+99  = 59  -- ☒ 101
28+34 = 67  -- ☒ 62 
2+64  = 30  -- ☒ 66 
36+63 = 90  -- ☒ 99 
56+84 = 132 -- ☒ 140
92+44 = 133 -- ☒ 136
Fraction well predicted:  0.036

Iteration 22
Train on 4500 sampl

Fraction well predicted:  0.116

Iteration 38
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 82  -- ☒ 83 
68+16 = 82  -- ☒ 84 
45+79 = 123 -- ☒ 124
28+41 = 70  -- ☒ 69 
2+99  = 55  -- ☒ 101
28+34 = 60  -- ☒ 62 
2+64  = 36  -- ☒ 66 
36+63 = 100 -- ☒ 99 
56+84 = 143 -- ☒ 140
92+44 = 133 -- ☒ 136
Fraction well predicted:  0.124

Iteration 39
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 82  -- ☒ 83 
68+16 = 82  -- ☒ 84 
45+79 = 122 -- ☒ 124
28+41 = 70  -- ☒ 69 
2+99  = 78  -- ☒ 101
28+34 = 62  -- ☑
2+64  = 38  -- ☒ 66 
36+63 = 90  -- ☒ 99 
56+84 = 140 -- ☑
92+44 = 132 -- ☒ 136
Fraction well predicted:  0.12

Iteration 40
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 82  -- ☒ 83 
68+16 = 82  -- ☒ 84 
45+79 = 129 -- ☒ 124
28+41 = 70  -- ☒ 69 
2+99  = 58  -- ☒ 101
28+34 = 60  -- ☒ 62 
2+64  = 38  -- ☒ 66 
36+63 = 90  -- ☒ 99 
56+84 = 140 -- ☑
92+44 = 134 -- ☒ 136
Fraction well predicted:  0.108

Iteration 41
Train on 4500 samples, valid

14+69 = 83  -- ☑
68+16 = 84  -- ☑
45+79 = 124 -- ☑
28+41 = 79  -- ☒ 69 
2+99  = 99  -- ☒ 101
28+34 = 62  -- ☑
2+64  = 39  -- ☒ 66 
36+63 = 99  -- ☑
56+84 = 140 -- ☑
92+44 = 136 -- ☑
Fraction well predicted:  0.68

Iteration 58
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 73  -- ☒ 83 
68+16 = 84  -- ☑
45+79 = 124 -- ☑
28+41 = 79  -- ☒ 69 
2+99  = 99  -- ☒ 101
28+34 = 62  -- ☑
2+64  = 39  -- ☒ 66 
36+63 = 99  -- ☑
56+84 = 130 -- ☒ 140
92+44 = 136 -- ☑
Fraction well predicted:  0.64

Iteration 59
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 83  -- ☑
68+16 = 84  -- ☑
45+79 = 124 -- ☑
28+41 = 79  -- ☒ 69 
2+99  = 90  -- ☒ 101
28+34 = 62  -- ☑
2+64  = 37  -- ☒ 66 
36+63 = 900 -- ☒ 99 
56+84 = 140 -- ☑
92+44 = 136 -- ☑
Fraction well predicted:  0.682

Iteration 60
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 83  -- ☑
68+16 = 84  -- ☑
45+79 = 124 -- ☑
28+41 = 79  -- ☒ 69 
2+99  = 90  -- ☒ 101
28+34 = 62  -- ☑
2+64  = 37  -- ☒ 66 
36+6

14+69 = 83  -- ☑
68+16 = 84  -- ☑
45+79 = 124 -- ☑
28+41 = 79  -- ☒ 69 
2+99  = 900 -- ☒ 101
28+34 = 62  -- ☑
2+64  = 47  -- ☒ 66 
36+63 = 99  -- ☑
56+84 = 140 -- ☑
92+44 = 136 -- ☑
Fraction well predicted:  0.866

Iteration 78
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 83  -- ☑
68+16 = 84  -- ☑
45+79 = 124 -- ☑
28+41 = 69  -- ☑
2+99  = 90  -- ☒ 101
28+34 = 62  -- ☑
2+64  = 47  -- ☒ 66 
36+63 = 99  -- ☑
56+84 = 140 -- ☑
92+44 = 136 -- ☑
Fraction well predicted:  0.89

Iteration 79
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 83  -- ☑
68+16 = 84  -- ☑
45+79 = 124 -- ☑
28+41 = 69  -- ☑
2+99  = 900 -- ☒ 101
28+34 = 62  -- ☑
2+64  = 57  -- ☒ 66 
36+63 = 99  -- ☑
56+84 = 140 -- ☑
92+44 = 136 -- ☑
Fraction well predicted:  0.87

Iteration 80
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 83  -- ☑
68+16 = 84  -- ☑
45+79 = 124 -- ☑
28+41 = 79  -- ☒ 69 
2+99  = 90  -- ☒ 101
28+34 = 62  -- ☑
2+64  = 57  -- ☒ 66 
36+63 = 99  -- ☑
56+84 =

36+63 = 99  -- ☑
56+84 = 140 -- ☑
92+44 = 136 -- ☑
Fraction well predicted:  0.92

Iteration 98
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 83  -- ☑
68+16 = 84  -- ☑
45+79 = 124 -- ☑
28+41 = 69  -- ☑
2+99  = 900 -- ☒ 101
28+34 = 62  -- ☑
2+64  = 66  -- ☑
36+63 = 99  -- ☑
56+84 = 140 -- ☑
92+44 = 136 -- ☑
Fraction well predicted:  0.924

Iteration 99
Train on 4500 samples, validate on 500 samples
Epoch 1/1
14+69 = 83  -- ☑
68+16 = 84  -- ☑
45+79 = 124 -- ☑
28+41 = 69  -- ☑
2+99  = 900 -- ☒ 101
28+34 = 62  -- ☑
2+64  = 66  -- ☑
36+63 = 99  -- ☑
56+84 = 140 -- ☑
92+44 = 136 -- ☑
Fraction well predicted:  0.938


This yields about 90% accuracy.