# **Encoder-Decoder LSTM (seq2seq)**

 >The ***encoder*** maps a variable-length source
sequence to a *fixed-length vector*, and the ***decoder*** maps the vector representation
back to a *variable-length target sequence*.


 ## **Applications**



>- Machine Translation : English to French translation of phrases.
>- Learning to Execute : calculate the outcome of small programs.
>- Image Captioning : generating a text description for images.
>- Conversational Modeling : generating answers to textual questions.
>- Movement Classification : generating a sequence of commands from a sequence of
gestures.

## **Implementation**


#### **Encoder**
- One or more LSTM layers can be used to implement the encoder model
- The number of memory cells in this layer defines the length of this fixed-sized vector.
```python
model = Sequential()
model.add(LSTM(..., input_shape=(...)))
```


#### **Decoder** 
-  One or more LSTM layers can also be used to implement the
decoder model
- This model reads from the fixed sized output from the encoder model
- a Dense layer is used as the output for the network
-  The same weights can
be used to output each time step in the output sequence by wrapping the Dense layer in a **TimeDistributed** wrapper.
```python
model.add(LSTM(..., return_sequences=True))
model.add(TimeDistributed(Dense(...)))
```


#### **RepeatVector**
- There’s a problem though .That is, the encoder will produce a 2-dimensional matrix of outputs,  The decoder is an LSTM layer that expects a 3D input 
- RepeatVector : layer simply repeats the provided 2D input multiple times to create a 3D output.
```python
model.add(RepeatVector(...))
```

### **Addition Prediction Problem**
- The problem is defined as calculating the sum output of two input numbers. This is
challenging as each digit and mathematical symbol is provided as a character and the expected
output is also expected as characters. For example, the input 10+6 with the output 16 would
be represented by the sequences:

``` pythoon
Input: [ '1' , '0' , '+' , '6' ]
Output: [ '1' , '6' ]
```
- The model must learn not only the integer nature of the characters, but also the nature
of the mathematical operation to perform


In [2]:
from random import seed
from random import randint
from numpy import array
from math import ceil
from math import log10
from math import sqrt
from numpy import argmax
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import TimeDistributed
from keras.layers import RepeatVector


## Data Prepration for Model

 ### Generate Sum Pairs


In [3]:
# generate lists of random integers and their sum
def random_sum_pairs(n_examples,n_numbers,largest):
    X,y = list(), list()
    for i in range(n_examples):
      in_pattern = [randint(1,largest) for _ in range(n_numbers)]
      out_pattern = sum(in_pattern)
      X.append(in_pattern)
      y.append(out_pattern)
    return X,y

### Integers to Padded Strings


In [4]:
# convert data to strings
def to_string(X, y, n_numbers, largest):
  max_length = n_numbers * ceil(log10(largest+1)) + n_numbers - 1
  Xstr = list()
  for pattern in X:
    strp = '+'.join([str(n) for n in pattern])
    strp =''.join([' ' for _ in range(int(max_length-len(strp)))]) + strp
    Xstr.append(strp)
  max_length = ceil(log10(n_numbers * (largest+1)))
  ystr = list()
  for pattern in y:
    strp = str(pattern)
    strp =''.join([' ' for _ in range(int(max_length-len(strp)))]) + strp
    ystr.append(strp)
  return Xstr, ystr


### Maximum Length Calculation
```python
max_length = n_numbers * ceil(log10(largest+1)) + n_numbers - 1
max_length = 3 * ceil(log10(10+1)) + 3 - 1
max_length = 3 * ceil(1.0413926851582251) + 3 - 1
max_length = 3 * 2 + 3 - 1
max_length = 6 + 3 - 1
max_length = 8
```

 ### Integer Encoded Sequences


In [5]:
# integer encode strings
def integer_encode(X, y, alphabet):
  char_to_int = dict((c, i) for i, c in enumerate(alphabet))

  Xenc = list()
  for pattern in X:
    integer_encoded = [char_to_int[char] for char in pattern]
    Xenc.append(integer_encoded)
  yenc=list()
  for pattern in y:
    integer_encoded = [char_to_int[char] for char in pattern]
    yenc.append(integer_encoded)
  return Xenc, yenc

### one Hot Encoded Sequences


In [6]:
# one hot encode
def one_hot_encode(X, y, max_int):
  Xenc = list()
  for seq in X:
    pattern = list()
    for index in seq:
      vector = [0 for _ in range(max_int)]
      vector[index] = 1
      pattern.append(vector)
    Xenc.append(pattern)
  yenc = list()
  for seq in y:
    pattern = list()
    for index in seq:
      vector = [0 for _ in range(max_int)]
      vector[index] = 1
      pattern.append(vector)
    yenc.append(pattern)
  return Xenc, yenc

### Sequence Generation Pipeline


In [7]:
# generate an encoded dataset 
def generate_data(n_samples, n_numbers, largest, alphabet):
  #generate pairs
  X, y = random_sum_pairs(n_samples,n_numbers,largest)
  #convert to string
  X, y = to_string(X, y, n_numbers,largest)
  #integer encode 
  X, y = integer_encode(X, y, alphabet)
  #one hot encode
  X, y = one_hot_encode(X, y, len(alphabet))
  #return as numpy arrays
  X, y = array(X), array(y)
  return X, y 

### Invert Encdoing

In [8]:
#invert encoding
def invert(seq,alphabet):
  int_to_char = dict((i,c) for i, c in enumerate(alphabet)) 
  strings = list()
  for pattern in seq:
    string = int_to_char[argmax(pattern)]
    strings.append(string)
  return ''.join(strings)

### Example and usage

In [9]:
seed(1)
n_samples = 1

n_numbers = 2
largest = 10
# generate pairs
X, y = random_sum_pairs(n_samples, n_numbers, largest)
print(X, y)
# convert to strings
X, y = to_string(X, y, n_numbers, largest)
print(X, y)
# integer encode
alphabet = [ '0' , '1' , '2' , '3' , '4' , '5' , '6' , '7' , '8' , '9' , '+' ,' ']
X, y = integer_encode(X, y, alphabet)
print(X, y)
# ' ' space character encoded as 11 
# '+' encoded as 10
# character space for model to learn 0:9
# one hot encode
X, y = one_hot_encode(X, y, len(alphabet))
print(X, y)

[[3, 10]] [13]
[' 3+10'] ['13']
[[11, 3, 10, 1, 0]] [[1, 3]]
[[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]] [[[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]]]


## Define and Compile the Model
 

 ### specifications of the sequence prediction problem. 

- n terms: The number of terms in the equation, (e.g. 2 for 10+10).
- largest: The largest numerical value for each term (e.g. 10 for values between 1-10).
-alphabet: The symbols used to encode the input and output sequences (e.g. 0-9, + and ' ')


In [10]:
"""Configuration of input sequence"""
# number of math terms
n_terms = 3
# largest value for any single input digit
largest = 10
# scope of possible symbols for each input or output timestep
alphabet = [str(x) for x in range(10)] + ['+', ' ']


### The network needs three configuration values

- n_chars: The size of the alphabet for a single time step (e.g. 12 for 0-9, ‘+’ and ‘ ’).
>The n chars variable is used to define the number of features in the input layer and the
number of features in the output layer for each input and output time step. 
- n_in_seq_length: The number of time steps of encoded input sequences (e.g. 8 for
‘10+10+10’).
>The n in seq length
variable is used to define the number of time steps for the input layer of the network.
- n_out_seq_length: The number of time steps of an encoded output sequence (e.g. 2 for
‘30’)
>n out seq length variable is used to define the number of times to repeat the encoded input in
the RepeatVector that in turn defines the length of the sequence fed to the decoder for creating
the output sequence. 


In [11]:
# size of alphabet: (12 for 0-9 , + and ' ')
n_chars = len(alphabet)
#length of encoded input sequence (8 for '10+10+10')
#n_terms -1 stands for how many plus signs we need
# n_terms * ceil(log10(largest+1)) to get an idea of how many chars are needed for each number
n_in_seq_length = n_terms * ceil(log10(largest+1)) + n_terms -1
#length of encoded output sequence (2 for '30')
n_out_seq_length = ceil(log10(n_terms * (largest+1)))

In [12]:
# define LSTM
model = Sequential()
model.add(LSTM(75,input_shape=(n_in_seq_length,n_chars)))
model.add(RepeatVector(n_out_seq_length))
model.add(LSTM(50, return_sequences=True))
model.add(TimeDistributed(Dense(n_chars,activation='softmax')))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm (LSTM)                  (None, 75)                26400     
_________________________________________________________________
repeat_vector (RepeatVector) (None, 2, 75)             0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 2, 50)             25200     
_________________________________________________________________
time_distributed (TimeDistri (None, 2, 12)             612       
Total params: 52,212
Trainable params: 52,212
Non-trainable params: 0
_________________________________________________________________


In [13]:
X, y = generate_data(75000,n_terms,largest,alphabet)
hist=model.fit(X, y, epochs=1, batch_size=32)



In [15]:
#evaluate model
X, y = generate_data(100, n_terms, largest, alphabet)
loss, acc = model.evaluate(X,y,verbose=0)
print('Loss: %f Accuracy %f' % (loss,acc*100))

Loss: 0.162141 Accuracy 99.500000


In [17]:
# predict
for _ in range(10):
  # generate an input-output pair
  X, y = generate_data(1, n_terms, largest, alphabet)
  # make prediction
  yhat = model.predict(X, verbose=0)
  # decode input, expected and predicted
  in_seq = invert(X[0], alphabet)
  out_seq = invert(y[0], alphabet)
  predicted = invert(yhat[0], alphabet)
  print( '%s = %s (expect %s)' % (in_seq, predicted, out_seq))


   5+8+2 = 15 (expect 15)
   2+6+9 = 17 (expect 17)
  5+10+6 = 21 (expect 21)
   5+7+9 = 21 (expect 21)
  1+7+10 = 18 (expect 18)
   1+6+4 = 11 (expect 11)
  6+10+1 = 17 (expect 17)
   5+6+9 = 20 (expect 20)
  3+10+6 = 19 (expect 19)
  2+10+7 = 19 (expect 19)


In [21]:
model.save_weights('/../encoder_decoder')