[View in Colaboratory](https://colab.research.google.com/github/mathemakitten/keras-workshop/blob/master/Intro_Keras_Part1_Addition.ipynb)

# Intro to Keras - Teaching a Neural Network to Add 
 
This workshop is based on work created and shared by the Keras team at Google, and used according to terms described in The MIT License (MIT). 

Source: https://github.com/keras-team/keras/tree/master/examples

In this section, we will teach a recurrent neural network how to successfully add without ever explicitly defining the addition function with sequence-to-sequence learning. ex. Input: "535+61" will produce Output: "596"

Input may optionally be reversed, shown to increase performance in many tasks, as in the following papers: 

"Learning to Execute": http://arxiv.org/abs/1410.4615
"Sequence to Sequence Learning with Neural Networks": http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf

Two digits reversed:
+ One layer LSTM (128 HN), 5k training examples = 99% train/test accuracy in 55 epochs

In [0]:
!pip install keras 

from __future__ import print_function
from keras.models import Sequential
from keras import layers
import numpy as np
from six.moves import range



# Creating the addition dataset

The success of a machine learning model is fully dependent on the quality of data being fed into it. We're going to generate 50k examples of addition to use to train our model.

In [0]:
"""Given a set of characters:
    + Encode them to a one hot integer representation
    + Decode the one hot integer representation to their character output
    + Decode a vector of probabilities to their character output
    """

class CharacterTable(object):

    def __init__(self, chars):
        """Initialize character table.

        # Arguments
            chars: Characters that can appear in the input.
        """
        self.chars = sorted(set(chars))
        self.char_indices = dict((c, i) for i, c in enumerate(self.chars))
        self.indices_char = dict((i, c) for i, c in enumerate(self.chars))

    def encode(self, C, num_rows):
        """One hot encode given string C.

        # Arguments
            num_rows: Number of rows in the returned one hot encoding. This is
                used to keep the # of rows for each data the same.
        """
        x = np.zeros((num_rows, len(self.chars)))
        for i, c in enumerate(C):
            x[i, self.char_indices[c]] = 1
        return x

    def decode(self, x, calc_argmax=True):
        if calc_argmax:
            x = x.argmax(axis=-1)
        return ''.join(self.indices_char[x] for x in x)

# Use this to identify whether examples were correct or not (red = wrong, green = right)
class colors:
    ok = '\033[92m'
    fail = '\033[91m'
    close = '\033[0m'

# Setup training data parameters 

You can adjust these parameters to generate more data or more complex addition examples. 

In [0]:
# Parameters for the model and dataset.
TRAINING_SIZE = 50000
DIGITS = 3
REVERSE = True

# Maximum length of input is 'int + int' (e.g., '345+678'). Maximum length of
# int is DIGITS.
MAXLEN = DIGITS + 1 + DIGITS

# All the numbers, plus sign and space for padding.
chars = '0123456789+ '
ctable = CharacterTable(chars)

questions = []
expected = []
seen = set()

In [0]:
# This step actually generates the data 

print('Generating data...')
while len(questions) < TRAINING_SIZE:
    f = lambda: int(''.join(np.random.choice(list('0123456789'))
                    for i in range(np.random.randint(1, DIGITS + 1))))
    a, b = f(), f()
    
    # Skip any addition questions we've already seen
    # Also skip any such that x+Y == Y+x (hence the sorting).
    key = tuple(sorted((a, b)))
    if key in seen:
        continue
    seen.add(key)
    
    # Pad the data with spaces such that it is always MAXLEN.
    q = '{}+{}'.format(a, b)
    query = q + ' ' * (MAXLEN - len(q))
    ans = str(a + b)
    
    # Answers can be of maximum size DIGITS + 1.
    ans += ' ' * (DIGITS + 1 - len(ans))
    
    if REVERSE:
        # Reverse the query, e.g., '12+345  ' becomes '  543+21'. (Note the
        # space used for padding.)
        query = query[::-1]
    questions.append(query)
    expected.append(ans)
print('Total addition questions:', len(questions))

print('Vectorization...')
x = np.zeros((len(questions), MAXLEN, len(chars)), dtype=np.bool)
y = np.zeros((len(questions), DIGITS + 1, len(chars)), dtype=np.bool)
for i, sentence in enumerate(questions):
    x[i] = ctable.encode(sentence, MAXLEN)
for i, sentence in enumerate(expected):
    y[i] = ctable.encode(sentence, DIGITS + 1)

# Shuffle (x, y) in unison as the later parts of x will almost all be larger
# digits.
indices = np.arange(len(y))
np.random.shuffle(indices)
x = x[indices]
y = y[indices]

Generating data...
Total addition questions: 50000
Vectorization...


# Setup machine learning 

We're going to split our created dataset into training and validation datasets to assess model performance. In industry, we often use an additional test set to assess model performance, and validation data is used for fine-tuning the model.

In [0]:
# Explicitly set apart 10% for validation data that we never train over

split_at = len(x) - len(x) // 10
(x_train, x_val) = x[:split_at], x[split_at:]
(y_train, y_val) = y[:split_at], y[split_at:]

print('Training Data:')
print(x_train.shape)
print(y_train.shape)

print('Validation Data:')
print(x_val.shape)
print(y_val.shape)

Training Data:
(45000, 7, 12)
(45000, 4, 12)
Validation Data:
(5000, 7, 12)
(5000, 4, 12)


We're going to start by training a long-short term memory machine network (LSTM), which is a recurrent neural network made up of long-short term memory units. 

In [0]:
# Try replacing GRU, or SimpleRNN.
RNN = layers.LSTM
HIDDEN_SIZE = 128
BATCH_SIZE = 128
LAYERS = 1

print('Build model...')
model = Sequential()

# "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE.
# Note: In a situation where your input sequences have a variable length,
# use input_shape=(None, num_feature).
model.add(RNN(HIDDEN_SIZE, input_shape=(MAXLEN, len(chars))))

# As the decoder RNN's input, repeatedly provide with the last hidden state of
# RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum
# length of output, e.g., when DIGITS=3, max output is 999+999=1998.
model.add(layers.RepeatVector(DIGITS + 1))

# The decoder RNN could be multiple layers stacked or a single layer.
for _ in range(LAYERS):
    # By setting return_sequences to True, return not only the last output but
    # all the outputs so far in the form of (num_samples, timesteps,
    # output_dim). This is necessary as TimeDistributed in the below expects
    # the first dimension to be the timesteps.
    model.add(RNN(HIDDEN_SIZE, return_sequences=True))

# Apply a dense layer to the every temporal slice of an input. For each of step
# of the output sequence, decide which character should be chosen.
model.add(layers.TimeDistributed(layers.Dense(len(chars))))
model.add(layers.Activation('softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer='adam', # more on optimizers: https://keras.io/optimizers/
              metrics=['accuracy'])
model.summary()

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_7 (LSTM)                (None, 128)               72192     
_________________________________________________________________
repeat_vector_4 (RepeatVecto (None, 4, 128)            0         
_________________________________________________________________
lstm_8 (LSTM)                (None, 4, 128)            131584    
_________________________________________________________________
time_distributed_4 (TimeDist (None, 4, 12)             1548      
_________________________________________________________________
activation_4 (Activation)    (None, 4, 12)             0         
Total params: 205,324
Trainable params: 205,324
Non-trainable params: 0
_________________________________________________________________


# Let's start training! 

We're going to start compiling the model over 200 iterations. 

In [0]:
# Train the model each generation and show predictions against the validation dataset.
for iteration in range(1, 200):
    print()
    print('-' * 50)
    print('Iteration', iteration)
    model.fit(x_train, y_train,
              batch_size=BATCH_SIZE,
              epochs=1,
              validation_data=(x_val, y_val))
    
    # For each iteration, select 10 samples from the validation set at random so we can visualize errors.
    for i in range(10):
        ind = np.random.randint(0, len(x_val))
        rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])]
        preds = model.predict_classes(rowx, verbose=0)
        q = ctable.decode(rowx[0])
        correct = ctable.decode(rowy[0])
        guess = ctable.decode(preds[0], calc_argmax=False)
        print('Q', q[::-1] if REVERSE else q, end=' ')
        print('T', correct, end=' ')
        if correct == guess:
            print(colors.ok + '☑' + colors.close, end=' ')
        else:
            print(colors.fail + '☒' + colors.close, end=' ')
        print(guess)


--------------------------------------------------
Iteration 1
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Q 229+89  T 318  [91m☒[0m 101 
Q 0+74    T 74   [91m☒[0m 13  
Q 56+26   T 82   [91m☒[0m 137 
Q 967+733 T 1700 [91m☒[0m 1107
Q 858+306 T 1164 [91m☒[0m 1107
Q 495+481 T 976  [91m☒[0m 101 
Q 15+171  T 186  [91m☒[0m 101 
Q 983+547 T 1530 [91m☒[0m 1107
Q 346+84  T 430  [91m☒[0m 137 
Q 451+72  T 523  [91m☒[0m 131 

--------------------------------------------------
Iteration 2
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Q 78+653  T 731  [91m☒[0m 806 
Q 6+55    T 61   [91m☒[0m 16  
Q 978+70  T 1048 [91m☒[0m 909 
Q 68+325  T 393  [91m☒[0m 389 
Q 388+91  T 479  [91m☒[0m 902 
Q 924+15  T 939  [91m☒[0m 502 
Q 112+55  T 167  [91m☒[0m 222 
Q 46+364  T 410  [91m☒[0m 469 
Q 46+652  T 698  [91m☒[0m 666 
Q 858+306 T 1164 [91m☒[0m 102 

--------------------------------------------------
Iteration 3
Train on 45000 samples, valida

Q 726+15  T 741  [92m☑[0m 741 
Q 129+25  T 154  [92m☑[0m 154 
Q 929+525 T 1454 [92m☑[0m 1454
Q 83+260  T 343  [92m☑[0m 343 
Q 11+745  T 756  [92m☑[0m 756 
Q 5+40    T 45   [92m☑[0m 45  
Q 976+931 T 1907 [91m☒[0m 1817
Q 86+74   T 160  [92m☑[0m 160 
Q 61+159  T 220  [92m☑[0m 220 
Q 73+603  T 676  [92m☑[0m 676 

--------------------------------------------------
Iteration 16
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Q 751+0   T 751  [92m☑[0m 751 
Q 866+535 T 1401 [92m☑[0m 1401
Q 537+671 T 1208 [92m☑[0m 1208
Q 33+32   T 65   [92m☑[0m 65  
Q 341+5   T 346  [92m☑[0m 346 
Q 347+302 T 649  [92m☑[0m 649 
Q 337+62  T 399  [92m☑[0m 399 
Q 8+506   T 514  [92m☑[0m 514 
Q 384+361 T 745  [92m☑[0m 745 
Q 344+758 T 1102 [92m☑[0m 1102

--------------------------------------------------
Iteration 17
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Q 246+379 T 625  [92m☑[0m 625 
Q 0+714   T 714  [92m☑[0m 714 
Q 896+65  T 961  [92m☑[0

Q 12+725  T 737  [92m☑[0m 737 
Q 440+27  T 467  [92m☑[0m 467 
Q 227+466 T 693  [92m☑[0m 693 
Q 223+569 T 792  [92m☑[0m 792 
Q 58+134  T 192  [92m☑[0m 192 
Q 634+88  T 722  [92m☑[0m 722 
Q 46+652  T 698  [92m☑[0m 698 
Q 114+855 T 969  [92m☑[0m 969 
Q 14+480  T 494  [92m☑[0m 494 
Q 646+205 T 851  [92m☑[0m 851 

--------------------------------------------------
Iteration 30
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Q 732+3   T 735  [92m☑[0m 735 
Q 707+861 T 1568 [92m☑[0m 1568
Q 297+60  T 357  [92m☑[0m 357 
Q 505+537 T 1042 [92m☑[0m 1042
Q 40+501  T 541  [92m☑[0m 541 
Q 921+99  T 1020 [92m☑[0m 1020
Q 40+547  T 587  [92m☑[0m 587 
Q 670+675 T 1345 [92m☑[0m 1345
Q 81+70   T 151  [92m☑[0m 151 
Q 84+305  T 389  [92m☑[0m 389 

--------------------------------------------------
Iteration 31
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Q 15+80   T 95   [92m☑[0m 95  
Q 9+795   T 804  [92m☑[0m 804 
Q 12+25   T 37   [92m☑[0

Q 669+434 T 1103 [92m☑[0m 1103
Q 750+67  T 817  [92m☑[0m 817 
Q 549+478 T 1027 [91m☒[0m 1037
Q 684+7   T 691  [92m☑[0m 691 
Q 149+300 T 449  [92m☑[0m 449 
Q 467+12  T 479  [92m☑[0m 479 
Q 891+41  T 932  [92m☑[0m 932 
Q 499+94  T 593  [92m☑[0m 593 
Q 6+21    T 27   [91m☒[0m 26  
Q 23+274  T 297  [92m☑[0m 297 

--------------------------------------------------
Iteration 44
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Q 28+150  T 178  [92m☑[0m 178 
Q 82+561  T 643  [92m☑[0m 643 
Q 125+29  T 154  [92m☑[0m 154 
Q 201+17  T 218  [92m☑[0m 218 
Q 26+44   T 70   [92m☑[0m 70  
Q 229+42  T 271  [92m☑[0m 271 
Q 15+42   T 57   [92m☑[0m 57  
Q 11+408  T 419  [92m☑[0m 419 
Q 887+91  T 978  [92m☑[0m 978 
Q 127+702 T 829  [92m☑[0m 829 

--------------------------------------------------
Iteration 45
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Q 88+20   T 108  [92m☑[0m 108 
Q 382+25  T 407  [92m☑[0m 407 
Q 570+289 T 859  [92m☑[0