# Sequence to sequence model for mathematical sum using keras LSTM

* Sequence to Sequence : Input is in form of sequence and output is also in the form of sequence

## There are different kinds of seq2seq models. Some of them are:
* one-to-one: The network produces one output for each input time step
* one-to-many: The network produces multiple outputs for a single input. eg: image captioning
* many-to-one: The network produces one output after taking inputs at multiple time steps. eg: forecasting
* many-to-many: The network will provide multiple outputs for multiple input time steps. eg: translation

#### We are going to use many to many model. 

The many to many sequence model uses something called encoder-decoder model.

![alt text](https://media.geeksforgeeks.org/wp-content/uploads/seq2seq.png)
###### image source: https://media.geeksforgeeks.org/wp-content/uploads/seq2seq.png 

* In this simple model,it can be simplified that encodes encodes the input and store the information in a thought vector and then decoder decodes the sequence data to provide the output.

* In our model, the integers will be converted to strings and then added using the neural networks.  

#Program Flow

Generate input data to our model: 
* X: [[3, 10], [2, 5], [2, 8]]   # pair of 2 numbers which are less than 10
* Y: [13, 7, 10]                 # sum of each pair of x 

converting them to string: 
* X: ['  3+10', '   2+5', '   2+8']
* Y: ['13', ' 7', '10']

Integer encoding them as machines cant understand human language:
* X: [[11, 11, 3, 10, 1, 0], [11, 11, 11, 2, 10, 5], [11, 11, 11, 2, 10, 8]]
* Y: [[1, 3], [11, 7], [1, 0]]

One-hot encoding them or binary encoding them
* X: shape = 6 X 12
*[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], 
* [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], 
* [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0], 
* [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], 
* [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
* [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

* Y: shaep = 2 X 12
* [[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
* -[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]] 

Creating a seq2seq model using keras lstm layers and fitting model on X and Y

Finally, predicting the values or summation



In [0]:
# Importing essential libraries

import tensorflow as tf
import numpy as np

from math import ceil, log10
from random import randint,seed

In [2]:
#function to generate random pair of numbers and their sum

def random_number_generator(n_examples, n_numbers, largest):
  x,y = list(), list()
  for i in range(n_examples):
    in_pattern = [randint(1,largest) for _ in range(n_numbers)]
    out_pattern = sum(in_pattern)
    x.append(in_pattern)
    y.append(out_pattern)

  return x, y

seed(1)
n_examples = 1000
n_numbers = 2
largest = 10

x,y = random_number_generator(n_examples, n_numbers, largest)
print(x[:3])
print(y[:3])

[[3, 10], [2, 5], [2, 8]]
[13, 7, 10]


In [0]:
# Function to convert numbers to string

def string_convert(x_list, y_list, n_numbers, largest): 
  x_string = list()
  y_string = list()

  max_input_length = n_numbers * ceil(log10((largest+1)) + n_numbers - 1)   # using ceil and log10 funciton from math library to get the length of sequence
  
  for word in x_list:
    x_str = '+'.join([str(i) for i in word])
    x_strr = ''.join([' ' for _ in range(max_input_length - len(x_str))]) + x_str
    x_string.append(x_strr)
  
  max_output_length = ceil(log10(n_numbers * (largest + 1)))
  
  for word in y_list:
    y_str = str(word) 
    y_strr = ''.join([' ' for _ in range(max_output_length - len(y_str))]) + y_str
    y_string.append(y_strr)

  return x_string, y_string

In [4]:
x,y = string_convert(x, y, n_numbers, largest)
print(x[:3])
print(y[:3])

['  3+10', '   2+5', '   2+8']
['13', ' 7', '10']


In [5]:
# Integer encoding

# The number in the generated data is not greater than 10 that means the vocabulary will look like alphabet below:
alphabet = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', ' ']

def integer_encoding(x,y,alphabet):
  char_to_int = dict((c,i) for i,c in enumerate(alphabet))     # dictionary of word and its index in alphabet. eg: {'0':0, '1':1, ...}
                                                               # which means the index of string '0' is 0
  x_encode = list()                                            # this is used to get the index of character for integer encoding  
  for number in x:
    integer = [char_to_int[word] for word in number]
    x_encode.append(integer)

  y_encode = list()
  for number in y:
    integer = [char_to_int[word] for word in number]
    y_encode.append(integer)

  return x_encode,y_encode

x_enc,y_enc = integer_encoding(x,y, alphabet)
print(x_enc[:3])
print(y_enc[:3])

[[11, 11, 3, 10, 1, 0], [11, 11, 11, 2, 10, 5], [11, 11, 11, 2, 10, 8]]
[[1, 3], [11, 7], [1, 0]]


In [6]:
#binary encoding or one_hot_encoding

def one_hot_encoding(x,y,vocab_size):
  x_enc = list()
  for seq in x:
    pattern = list()
    for index in seq:
      vector = [0 for _ in range(vocab_size)]
      vector[index] = 1                         # iterating through each character in sequence and binary encoding it
      pattern.append(vector)
    x_enc.append(pattern)

  y_enc = list()
  for seq in y:
    pattern = list()
    for index in seq:
      vector = [0 for _ in range(vocab_size)]
      vector[index] = 1
      pattern.append(vector)
    y_enc.append(pattern)

  return x_enc, y_enc

x_onehot, y_onehot = one_hot_encoding(x_enc,y_enc, len(alphabet))
print(x_onehot[:3])
print(y_onehot[:3])

[[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]], [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]]]
[[[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]], [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]], [[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]


In [7]:
# Converting the list of one hot encoded data to array

x_onehot, y_onehot = np.array(x_onehot), np.array(y_onehot)
print(x_onehot.shape)
print(y_onehot.shape)

# 
t = x_onehot.shape[1]                     # t = length of input sequence
o = y_onehot.shape[1]                     # o = length of output sequence
s = len(alphabet)                         # s = vocabulary size
print('length of input sequence: ', t)
print('length of output sequence: ', o)
print('Vocab size :', s)

(1000, 6, 12)
(1000, 2, 12)
length of input sequence:  6
length of output sequence:  2
Vocab size : 12


In [0]:
# Putting all the above functions in one funciton to generate random set of data at any point of time

def generate_data(n_samples, n_numbers, largest, alphabet):
  #generating number pair
  x, y = random_number_generator(n_examples, n_numbers, largest)
  #converting to string
  x,y = string_convert(x, y, n_numbers, largest)
  #integer encoding
  x_enc,y_enc = integer_encoding(x,y, alphabet)
  #onehot encoding
  x_onehot, y_onehot = one_hot_encoding(x_enc,y_enc, len(alphabet))
  #converting to array
  x_onehot, y_onehot = np.array(x_onehot), np.array(y_onehot)

  return x_onehot, y_onehot



In [0]:
# inversion of integer encoding

# This is reverse of integer encoding. Here, we will use the index of the list to take out its word at that given index 
# eg: {1:'1', ... 3:'3'}, if we have 3 integer index, we can convert it into string value of 3

def invert(seq, alphabet):
  int_to_char = dict((i,c) for i,c in enumerate(alphabet))
  strings = list()
  for pattern in seq:
    string = int_to_char[np.argmax(pattern)]                # using argmax to avoid getting index in float
    strings.append(string)
  return ''.join(strings)

In [0]:
# building model

from tensorflow.keras.layers import Input, LSTM, RepeatVector, TimeDistributed, Dense
from tensorflow.keras.models import Model

i = Input(shape=(t,s))                                  # t = length of input sequence,  s = vocabulary size
x = LSTM(100)(i)                                        # Encoder layer lstm
x = RepeatVector(2)(x)                                  # The output of encoder lstm layer is 1D whereas the decoder lstm layer accepts 3D input
                                                        # -so repeat vector repeats the input layer to match with the decoder lstm layer input
x = LSTM(50, return_sequences=True)(x)                  # Decoder layer lstm
x = TimeDistributed(Dense(s, activation='softmax'))(x)  # Dense layer can give only one output and in our example we need 2 output vectors(sum of 20+20 = 40)
                                                        # - so we use the same Dense layer twice using TimeDistributed layer

model = Model(i,x)

In [0]:
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [12]:
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 6, 12)]           0         
_________________________________________________________________
lstm (LSTM)                  (None, 100)               45200     
_________________________________________________________________
repeat_vector (RepeatVector) (None, 2, 100)            0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 2, 50)             30200     
_________________________________________________________________
time_distributed (TimeDistri (None, 2, 12)             612       
Total params: 76,012
Trainable params: 76,012
Non-trainable params: 0
_________________________________________________________________


In [13]:
r = model.fit(x_onehot, y_onehot, epochs=100, validation_split=0.2)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [0]:
# Generating random data to test our model
x,y = generate_data(n_examples, n_numbers, largest, alphabet)  
result = model.predict(x)

In [0]:
# Testing our model
expected = [invert(x, alphabet) for x in y]
predicted = [invert(x,alphabet) for x in result]

In [16]:
# Getting first 20 numbers and sum prediction
for i in range(20):
	print('Expected=%s, Predicted=%s' % (expected[i], predicted[i]))

Expected= 6, Predicted= 6
Expected=10, Predicted=10
Expected=12, Predicted=12
Expected= 7, Predicted= 7
Expected=11, Predicted=11
Expected=11, Predicted=11
Expected= 6, Predicted= 6
Expected= 8, Predicted= 8
Expected= 5, Predicted= 5
Expected=12, Predicted=12
Expected=16, Predicted=16
Expected= 5, Predicted= 5
Expected=12, Predicted=12
Expected=16, Predicted=16
Expected=10, Predicted=10
Expected=14, Predicted=14
Expected=17, Predicted=17
Expected= 6, Predicted= 6
Expected=11, Predicted=11
Expected=11, Predicted=11


In [0]:
# saving the model
model.save('math_sum.h5') # creates HDF5 file of the model