# Simple Seq2Seq Translation

## Preparation
- Bidirectional LSTM
    * Allow information from future inputs
    * LSTM only allows past information
<img src="https://cdn-images-1.medium.com/max/764/1*6QnPUSv_t9BY9Fv8_aLb-Q.png" width="500">


    


- RNN in Keras
https://keras.io/layers/recurrent/

    * keras.layers.RNN(cell, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False)
    * keras.layers.LSTM(units, activation='tanh', recurrent_activation='hard_sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, implementation=1, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False)
    * 3D tensor with shape (batch_size, timesteps, input_dim).



## Import Data

Reference:

https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.htmlReference:



In [44]:
%load_ext autoreload
%autoreload 2
from __future__ import print_function

from keras.models import Model, Sequential
from keras.layers import Input, LSTM, Dense, Lambda
from keras import backend as K
# https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq.py
from script.seq2seq import generateInOut
import numpy as np

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [45]:
encoder_input_data,\
decoder_input_data,\
decoder_target_data,\
num_samples, \
input_vocabulary, \
output_vocabulary, \
input_sequence_length, \
output_sequence_length, \
input_token_index, \
target_token_index = \
generateInOut(data_path = './data/fra.txt', num_samples = 10000)

Number of samples: 10000
Number of unique input tokens: 71
Number of unique output tokens: 94
Max sequence length for inputs: 16
Max sequence length for outputs: 59


In [46]:
encoder_input_data.shape # num_sample, input_sequence_length, input_vocabulary

(10000, 16, 71)

In [47]:
decoder_input_data.shape # num_sample, output_seq_length, output_vocabulary

(10000, 59, 94)

In [48]:
decoder_target_data.shape # num_sample, output_seq_length, output_vocabulary

(10000, 59, 94)

## Model Definition


- Encoder: 1-direction LSTM
- Decoder: 1-direction LSTM
- Embedding: None

### Model Params

In [49]:
epochs = 1
batch_size = 64
lstm_units = 256

### Encoder Network

Note, if use word instead of character:
https://keras.io/layers/embeddings/

In [50]:
# Define Encoder Input
encoder_inputs = Input(shape=(input_sequence_length, input_vocabulary)) # or (None, input_vocabulary)

# Define Encoder itself
encoder = LSTM(lstm_units, return_state=True)

# Define Encoder Output: Output, hidden state 'h' and 'c'
encoder_outputs, state_h, state_c = encoder(encoder_inputs)

# For Encoder, Output is not used
encoder_states = [state_h, state_c]

### Decoder Network (V1) - Teacher Forcing

In [18]:
# Define Decoder Input
decoder_inputs = Input(shape=(output_sequence_length, output_vocabulary))

# Define Decoder itself, note the difference with encoder
decoder_lstm = LSTM(lstm_units, return_sequences=True, return_state=True)

# Extract Output, note the differene: 
# initial state is given from encoder, instead of default (Zero??)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
                                     initial_state = encoder_states)

decoder_dense = Dense(output_vocabulary, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

### Decoder Network (V2) - Sampling

In [19]:
# Note, sequence length is "ONE"
decoder_inputs = Input(shape=(1, output_vocabulary))

# Same Decoder
decoder_lstm = LSTM(lstm_units, return_sequences=True, return_state=True)

# Same Dense Layer
decoder_dense = Dense(output_vocabulary, activation='softmax')

# Define some lists
final_outputs = []
previous_states = encoder_states # States from encoder
current_inputs = decoder_inputs # Start sentence index

# Generate States, Outputs one-by-one
for _ in range(output_sequence_length):
    
    outputs, state_h, state_c = decoder_lstm(current_inputs, initial_state = previous_states)
    densed_output = decoder_dense(outputs) 
    final_outputs.append(densed_output)
    
    current_inputs = densed_output
    previous_states = [state_h, state_c]

# Concatenate all predictions
decoder_outputs = Lambda(lambda x: K.concatenate(x, axis=1))(final_outputs)

In [26]:
# Start of Sentence one-hot encoding
target_token_index['\t']

0

In [24]:
# OVERWRITE decoder_input_data to be "START" character
decoder_input_data = np.zeros(shape = (num_samples, 1, output_vocabulary))
decoder_input_data[:, 0, target_token_index['\t']] = 1.
decoder_input_data.shape  # num_sample, output_seq_length -> 1, output_vocabulary

(10000, 1, 94)

## Model Training

In [22]:
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
          batch_size = batch_size,
          epochs = epochs,
          validation_split = 0.2)

Train on 8000 samples, validate on 2000 samples
Epoch 1/1


<keras.callbacks.History at 0x1863443d30>

## Model Prediction

In [41]:
# Note, only apply to 2nd type of training method (using sampling)
probs = model.predict([encoder_input_data[:5,:,:], decoder_input_data[:5,:,:]])
predictions = np.argmax(probs, axis = 1)
predictions[0]

array([ 0, 53, 16, 16,  0,  0,  1, 10,  0,  0,  0, 10, 58,  0,  0,  0,  0,
        0,  0,  0,  0, 41,  4,  4,  4,  4,  4,  4,  2,  2,  4,  3,  0,  3,
        4,  4,  4,  4,  4,  5,  4,  6,  3,  5,  3, 14, 11, 12, 13, 16, 10,
       10, 11, 19,  0,  0, 14, 13, 16, 14, 12, 10, 16, 16, 16, 15, 11,  0,
       10, 10, 10,  0,  0,  0,  2,  4,  2,  3, 10,  2,  0,  0, 13,  9,  0,
        0,  0,  1,  0,  0,  0,  1,  0, 11])

Then use reverse index to generate actual sentence until first "end of sentence" character

# Attention

Reference:
- https://medium.com/datalogue/attention-in-keras-1892773a4f22
- https://github.com/datalogue/keras-attention/blob/master/models/custom_recurrents.py
- https://guillaumegenthial.github.io/sequence-to-sequence.html


# Reference

- https://guillaumegenthial.github.io/sequence-to-sequence.html
- https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html