## 1.0 Mathematical Addition using RNN
This is a sequence to sequence learning problem where the RNN model learns to add two numbers (as string). This is a simple example where, the input and output length are always same, unlike other language models or sequence problems. To always make sure that, the input is of a fixed length `MAXLEN`, the required amount of `(space)` characters are padded. This notebook experiment is based on [keras official tutorial](https://github.com/keras-team/keras/blob/master/examples/addition_rnn.py). 

## 2.0 Generate Dataset

In [1]:
from __future__ import print_function
from keras.models import Sequential
from keras import layers
import numpy as np
from six.moves import range

Using TensorFlow backend.


In [2]:
DIGITS = 3
TRAINING_SIZE = 50000
MAXLEN = 2*DIGITS + 1

In [3]:
questions = []
answers = []

In [6]:
seen = set()
while len(questions) < TRAINING_SIZE:
    f = lambda: int(''.join(np.random.choice(list('0123456789'))for i in range(np.random.randint(1, DIGITS +1))))
    
    a , b= f(), f()
    
    key = tuple(sorted((a, b)))
    if key in seen:
        continue
 
    seen.add(key)
    
    q = '{}+{}'.format(a,b)
    question = q + ' ' * (MAXLEN - len(q))
    
    a = str(a + b)
    answer = a + ' ' * (DIGITS +1 - len(a))
    
    questions.append(question)
    answers.append(answer)
    

In [11]:
questions[20:25]

['1+7    ', '361+3  ', '57+564 ', '87+6   ', '140+538']

In [12]:
answers[20:25]

['8   ', '364 ', '621 ', '93  ', '678 ']

In [13]:
print("Total number of questions/dataset: ", len(questions))

Total number of questions/dataset:  50000


## 3.0 Vectorization

In [15]:
chars = '0123456789+ '

In [16]:
x = np.zeros((len(questions), MAXLEN, len(chars)), dtype=np.bool)
y = np.zeros((len(questions), DIGITS + 1, len(chars)), dtype=np.bool)

In [19]:
x[0]

array([[False, False, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False]])

In [20]:
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

In [21]:
char_indices

{'0': 0,
 '1': 1,
 '2': 2,
 '3': 3,
 '4': 4,
 '5': 5,
 '6': 6,
 '7': 7,
 '8': 8,
 '9': 9,
 '+': 10,
 ' ': 11}

In [24]:
def encode(C, num_rows):
    """One-hot encode given string C.
    # Arguments
        C: string, to be encoded.
        num_rows: Number of rows in the returned one-hot encoding. This is
            used to keep the # of rows for each data the same.
    """
    x = np.zeros((num_rows, len(chars)))
    for i, c in enumerate(C):
        x[i, char_indices[c]] = 1
    return x

In [25]:
print(encode(questions[0], MAXLEN))

[[0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]


In [26]:
for i, question in enumerate(questions):
    x[i] = encode(question, MAXLEN)

In [27]:
for i, answer in enumerate(answers):
    y[i] = encode(answer, DIGITS + 1)

## 4.0 Shuffle Dataset

In [29]:
indices = np.arange(len(questions))

In [31]:
np.random.shuffle(indices)

In [33]:
x = x[indices]
y = y[indices]

## 5.0 Train Test Split

In [40]:
split_at = len(x) - len(x) // 10

In [41]:
split_at

45000

In [42]:
(x_train, x_test) = x[:split_at], x[split_at:]
(y_train, y_test) = y[:split_at], y[split_at:]

In [43]:
x_train.shape

(45000, 7, 12)

In [44]:
y_train.shape

(45000, 4, 12)

## 5.0 Build Model

In [46]:
HIDDEN_SIZE = 128

In [47]:
model = Sequential()

In [49]:
model.add(layers.LSTM(HIDDEN_SIZE, input_shape=(MAXLEN, len(chars))))

In [50]:
model.add(layers.RepeatVector(DIGITS + 1))

In [51]:
model.add(layers.LSTM(HIDDEN_SIZE, return_sequences=True))

In [52]:
model.add(layers.TimeDistributed(layers.Dense(len(chars), activation='softmax')))

In [53]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [54]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 128)               72192     
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 4, 128)            0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 4, 128)            131584    
_________________________________________________________________
time_distributed_1 (TimeDist (None, 4, 12)             1548      
Total params: 205,324
Trainable params: 205,324
Non-trainable params: 0
_________________________________________________________________


In [55]:
BATCH_SIZE = 128

In [57]:
def decode(x, calc_argmax=True):
    """Decode the given vector or 2D array to their character output.
    # Arguments
        x: A vector or a 2D array of probabilities or one-hot representations;
            or a vector of character indices (used with `calc_argmax=False`).
        calc_argmax: Whether to find the character index with maximum
            probability, defaults to `True`.
    """
    if calc_argmax:
        x = x.argmax(axis=-1)
    return ''.join(indices_char[x] for x in x)

In [75]:
for iteration in range(1, 25):
    print()
    print('-'*50)
    print('Iteration: ', iteration)
    model.fit(x_train, y_train, batch_size=BATCH_SIZE, epochs=1, validation_data=(x_test, y_test))
    
    for i in range(10):
        rand_val_index = np.random.randint(0, len(x_test))
        
        test_row_x, test_row_y = x_test[rand_val_index], y_test[rand_val_index]
               
        origianl_question = decode(test_row_x)
        original_answer = decode(test_row_y)
        
        predicted_class = model.predict_classes(test_row_x.reshape(1, test_row_x.shape[0],test_row_x.shape[1] ), verbose=0)
        #print(predicted_class)
        predicted_answer = decode(predicted_class[0], calc_argmax=False)
        
        if original_answer == predicted_answer:
            print('Correct: ', origianl_question,'=',predicted_answer)
        else:
            print('Wrong: ', origianl_question,'!=',predicted_answer)
        


--------------------------------------------------
Iteration:  1
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Correct:  683+197 = 880 
Correct:  955+412 = 1367
Correct:  761+50  = 811 
Correct:  997+68  = 1065
Correct:  829+8   = 837 
Correct:  477+386 = 863 
Correct:  903+444 = 1347
Correct:  23+281  = 304 
Correct:  646+50  = 696 
Correct:  969+68  = 1037

--------------------------------------------------
Iteration:  2
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Correct:  19+15   = 34  
Correct:  614+6   = 620 
Correct:  56+10   = 66  
Correct:  16+419  = 435 
Correct:  18+626  = 644 
Correct:  489+47  = 536 
Correct:  166+63  = 229 
Correct:  654+104 = 758 
Correct:  906+43  = 949 
Correct:  216+967 = 1183

--------------------------------------------------
Iteration:  3
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Correct:  430+4   = 434 
Correct:  886+43  = 929 
Correct:  649+867 = 1516
Correct:  88+14   = 102 
Wrong:  297+23  != 310 
C

Correct:  38+519  = 557 
Correct:  65+238  = 303 
Correct:  951+82  = 1033
Correct:  831+51  = 882 
Correct:  881+961 = 1842
Correct:  695+65  = 760 
Correct:  87+623  = 710 
Correct:  419+81  = 500 
Correct:  280+437 = 717 
Correct:  420+50  = 470 

--------------------------------------------------
Iteration:  18
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Correct:  441+92  = 533 
Correct:  11+334  = 345 
Correct:  701+93  = 794 
Correct:  4+589   = 593 
Correct:  80+621  = 701 
Correct:  72+83   = 155 
Correct:  365+277 = 642 
Correct:  335+774 = 1109
Correct:  964+793 = 1757
Correct:  902+771 = 1673

--------------------------------------------------
Iteration:  19
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Correct:  0+266   = 266 
Correct:  923+42  = 965 
Correct:  72+288  = 360 
Correct:  248+6   = 254 
Correct:  925+2   = 927 
Correct:  48+94   = 142 
Correct:  950+76  = 1026
Correct:  765+514 = 1279
Correct:  641+497 = 1138
Correct:  532+7   = 539