# Home 4: Build a seq2seq model for machine translation.

### Name: [Your-Name?]

### Task: Translate English to [what-language?]

## 0. You will do the following:

1. Read and run my code.
2. Complete the code in Section 1.1 and Section 4.2.

    * Translation English to **German** is not acceptable!!! Try another language.
    
3. **Make improvements.** Directly modify the code in Section 3. Do at least one of the followings. By doing more, you will get up to 2 bonus scores to the total.

    * Bi-LSTM instead of LSTM
    
    * Multi-task learning (e.g., both English to French and English to Spanish)
    
    * Attention
    
4. Evaluate the translation using the BLEU score. 

    * Optional. Up to 1 bonus scores to the total.
    
5. Convert the notebook to .HTML file. 

    * The HTML file must contain the code and the output after execution.

6. Put the .HTML file in your own Github repo. 

7. Submit the link to the HTML file to Canvas.    


### Hint: 

To implement ```Bi-LSTM```, you will need the following code to build the encoder; the decoder won't be much different.

In [0]:
from keras.layers import Bidirectional, Concatenate, LSTM

encoder_bilstm = Bidirectional(LSTM(latent_dim, return_state=True, 
                                  dropout=0.5, name='encoder_lstm'))
_, forward_h, forward_c, backward_h, backward_c = encoder_bilstm(encoder_inputs)

state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])

NameError: ignored

In [0]:
# from keras import backend as K 
# K.clear_session()

### Hint: 

To implement multi-task training, you can refer to ```Section 7.1.3 Multi-output models``` of the textbook, ```Deep Learning with Python```.

## 1. Data preparation

1. Download data (e.g., "deu-eng.zip") from http://www.manythings.org/anki/
2. Unzip the .ZIP file.
3. Put the .TXT file (e.g., "deu.txt") in the directory "./Data/".

### 1.1. Load and clean text


In [0]:
import re
import string
from unicodedata import normalize
import numpy

# load doc into memory
def load_doc(filename):
    # open the file as read only
    file = open(filename, mode='rt', encoding='utf-8')
    # read all text
    text = file.read()
    # close the file
    file.close()
    return text


# split a loaded document into sentences
def to_pairs(doc):
    lines = doc.strip().split('\n')
    pairs = [line.split('\t') for line in  lines]
    return pairs

def clean_data(lines):
    cleaned = list()
    # prepare regex for char filtering
    re_print = re.compile('[^%s]' % re.escape(string.printable))
    # prepare translation table for removing punctuation
    table = str.maketrans('', '', string.punctuation)
    for pair in lines:
        clean_pair = list()
        for line in pair:
            # normalize unicode characters
            line = normalize('NFD', line).encode('ascii', 'ignore')
            line = line.decode('UTF-8')
            # tokenize on white space
            line = line.split()
            # convert to lowercase
            line = [word.lower() for word in line]
            # remove punctuation from each token
            line = [word.translate(table) for word in line]
            # remove non-printable chars form each token
            line = [re_print.sub('', w) for w in line]
            # remove tokens with numbers in them
            line = [word for word in line if word.isalpha()]
            # store as string
            clean_pair.append(' '.join(line))
        cleaned.append(clean_pair)
    return numpy.array(cleaned)

Upload data to Google Drive, and then import from local directory to Google Colab

In [81]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [0]:
import os
dir = os.path.join("/content/gdrive/My Drive/data", "spa.txt")

#### Fill the following blanks:

In [0]:
# e.g., filename = 'Data/deu.txt'
# filename = '/Users/yangyangyu/Documents/SIT_semester1/CS583/hw4/fra-eng/fra.txt'
#filename = '/Users/yangyangyu/Documents/SIT_semester1/CS583/hw4/spa-eng/spa.txt'
filename = dir
# e.g., n_train = 20000
n_train = 20000

In [0]:
# load dataset
doc = load_doc(filename)

# split into Language1-Language2 pairs
pairs = to_pairs(doc)

# clean sentences
clean_pairs = clean_data(pairs)[0:n_train, :]

In [8]:
for i in range(1000, 1010):
    print('[' + clean_pairs[i, 0] + '] => [' + clean_pairs[i, 1] + ']')

[come early] => [veni temprano]
[come early] => [ven temprano]
[come early] => [vengan temprano]
[come early] => [venga temprano]
[come on in] => [pasale]
[come on in] => [pasele]
[come on in] => [pasenle]
[come on in] => [entre]
[come on in] => [pase]
[come quick] => [ven rapido]


In [9]:
input_texts = clean_pairs[:, 0]
target_texts = ['\t' + text + '\n' for text in clean_pairs[:, 1]]

print('Length of input_texts:  ' + str(input_texts.shape))
print('Length of target_texts: ' + str(input_texts.shape))

Length of input_texts:  (20000,)
Length of target_texts: (20000,)


In [10]:
max_encoder_seq_length = max(len(line) for line in input_texts)
max_decoder_seq_length = max(len(line) for line in target_texts)

print('max length of input  sentences: %d' % (max_encoder_seq_length))
print('max length of target sentences: %d' % (max_decoder_seq_length))

max length of input  sentences: 19
max length of target sentences: 68


**Remark:** To this end, you have two lists of sentences: input_texts and target_texts

## 2. Text processing

### 2.1. Convert texts to sequences

- Input: A list of $n$ sentences (with max length $t$).
- It is represented by a $n\times t$ matrix after the tokenization and zero-padding.

In [11]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

# encode and pad sequences
def text2sequences(max_len, lines):
    tokenizer = Tokenizer(char_level=True, filters='')
    tokenizer.fit_on_texts(lines)
    seqs = tokenizer.texts_to_sequences(lines)
    seqs_pad = pad_sequences(seqs, maxlen=max_len, padding='post')
    return seqs_pad, tokenizer.word_index


encoder_input_seq, input_token_index = text2sequences(max_encoder_seq_length, 
                                                      input_texts)
decoder_input_seq, target_token_index = text2sequences(max_decoder_seq_length, 
                                                       target_texts)

print('shape of encoder_input_seq: ' + str(encoder_input_seq.shape))
print('shape of input_token_index: ' + str(len(input_token_index)))
print('shape of decoder_input_seq: ' + str(decoder_input_seq.shape))
print('shape of target_token_index: ' + str(len(target_token_index)))

shape of encoder_input_seq: (20000, 19)
shape of input_token_index: 27
shape of decoder_input_seq: (20000, 68)
shape of target_token_index: 29


In [12]:
num_encoder_tokens = len(input_token_index) + 1
num_decoder_tokens = len(target_token_index) + 1

print('num_encoder_tokens: ' + str(num_encoder_tokens))
print('num_decoder_tokens: ' + str(num_decoder_tokens))

num_encoder_tokens: 28
num_decoder_tokens: 30


**Remark:** To this end, the input language and target language texts are converted to 2 matrices. 

- Their number of rows are both n_train.
- Their number of columns are respective max_encoder_seq_length and max_decoder_seq_length.

The followings print a sentence and its representation as a sequence.

In [13]:
target_texts[100]

'\tsed buenas\n'

In [14]:
decoder_input_seq[100, :]

array([ 6,  5,  2, 15,  1, 18, 14,  2,  8,  3,  5,  7,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
      dtype=int32)

## 2.2. One-hot encode

- Input: A list of $n$ sentences (with max length $t$).
- It is represented by a $n\times t$ matrix after the tokenization and zero-padding.
- It is represented by a $n\times t \times v$ tensor ($t$ is the number of unique chars) after the one-hot encoding.

In [15]:
from keras.utils import to_categorical

# one hot encode target sequence
def onehot_encode(sequences, max_len, vocab_size):
    n = len(sequences)
    data = numpy.zeros((n, max_len, vocab_size))
    for i in range(n):
        data[i, :, :] = to_categorical(sequences[i], num_classes=vocab_size)
    return data

encoder_input_data = onehot_encode(encoder_input_seq, max_encoder_seq_length, num_encoder_tokens)
decoder_input_data = onehot_encode(decoder_input_seq, max_decoder_seq_length, num_decoder_tokens)

decoder_target_seq = numpy.zeros(decoder_input_seq.shape)
decoder_target_seq[:, 0:-1] = decoder_input_seq[:, 1:]
decoder_target_data = onehot_encode(decoder_target_seq, 
                                    max_decoder_seq_length, 
                                    num_decoder_tokens)

print(encoder_input_data.shape)
print(decoder_input_data.shape)

(20000, 19, 28)
(20000, 68, 30)


## 3. Build the networks (for training)

- Build encoder, decoder, and connect the two modules to get "model". 

- Fit the model on the bilingual data to train the parameters in the encoder and decoder.

### 3.1. Encoder network

- Input:  one-hot encode of the input language

- Return: 

    -- output (all the hidden states   $h_1, \cdots , h_t$) are always discarded
    
    -- the final hidden state  $h_t$
    
    -- the final conveyor belt $c_t$

In [16]:
# create BILSTM model

from keras.layers import LSTM,Bidirectional,Input,Concatenate
from keras.models import Model

latent_dim = 256
num_encoder_tokens =28

# inputs of the encoder network
encoder_inputs = Input(shape=(None, num_encoder_tokens), 
                       name='encoder_inputs')

# set the LSTM layer
encoder_bilstm = Bidirectional(LSTM(latent_dim, return_state=True, dropout=0.5, name='encoder_bilstm'))
encoder_outputs, forward_h, forward_c, backward_h, backward_c = encoder_bilstm(encoder_inputs)
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
encoder_states = [state_h, state_c]
# build the encoder network model
encoder_model_bi = Model(inputs = encoder_inputs, outputs = encoder_states, name = 'encoder')




Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


In [17]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot, plot_model

SVG(model_to_dot(encoder_model_bi, show_shapes=False).create(prog='dot', format='svg'))

plot_model(
    model=encoder_model_bi, show_shapes=False
    ,
    to_file='encoder.png'
)

encoder_model_bi.summary()

Model: "encoder"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_inputs (InputLayer)     (None, None, 28)     0                                            
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) [(None, 512), (None, 583680      encoder_inputs[0][0]             
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 512)          0           bidirectional_1[0][1]            
                                                                 bidirectional_1[0][3]            
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 512)          0           bidirectional_1[0][2]      

In [0]:

# from keras.layers import Input, LSTM
# from keras.models import Model

# latent_dim = 256
# num_encoder_tokens =28

# # inputs of the encoder network
# encoder_inputs = Input(shape=(None, num_encoder_tokens), 
#                        name='encoder_inputs')

# # set the LSTM layer
# encoder_lstm = LSTM(latent_dim, return_state=True, 
#                     dropout=0.5, name='encoder_lstm')
# _, state_h, state_c = encoder_lstm(encoder_inputs)

# # build the encoder network model
# encoder_model = Model(inputs=encoder_inputs, 
#                       outputs=[state_h, state_c],
#                       name='encoder')

Print a summary and save the encoder network structure to "./encoder.pdf"

In [0]:
# from IPython.display import SVG
# from keras.utils.vis_utils import model_to_dot, plot_model

# SVG(model_to_dot(encoder_model, show_shapes=False).create(prog='dot', format='svg'))

# plot_model(
#     model=encoder_model, show_shapes=False
#     ,
#     to_file='encoder.png'
# )

# encoder_model.summary()

### 3.2. Decoder network

- Inputs:  

    -- one-hot encode of the target language
    
    -- The initial hidden state $h_t$ 
    
    -- The initial conveyor belt $c_t$ 

- Return: 

    -- output (all the hidden states) $h_1, \cdots , h_t$

    -- the final hidden state  $h_t$ (discarded in the training and used in the prediction)
    
    -- the final conveyor belt $c_t$ (discarded in the training and used in the prediction)

In [0]:
from keras.layers import Input, LSTM, Dense
from keras.models import Model

# inputs of the decoder network
decoder_input_x = Input(shape=(None, num_decoder_tokens), name='decoder_input_x')

decoder_state_input_h = Input(shape=(latent_dim *2,), name='decoder_input_h')
decoder_state_input_c = Input(shape=(latent_dim *2,), name='decoder_input_c')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

# set the LSTM layer
decoder_bilstm = LSTM(latent_dim * 2, return_sequences=True, return_state=True, dropout=0.5, name='decoder_bilstm')
decoder_bilstm_outputs, state_h1, state_c1 = decoder_bilstm(decoder_input_x, initial_state=decoder_states_inputs)

# set the dense layer
decoder_dense = Dense(num_decoder_tokens, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_bilstm_outputs)

# build the decoder network model
decoder_model_bi = Model(inputs=[decoder_input_x, decoder_state_input_h, decoder_state_input_c],
                      outputs=[decoder_outputs, state_h1, state_c1],
                      name='decoder')


# decoder_model_bi = Model(inputs=[encoder_inputs, decoder_input_x],
#                       outputs=[decoder_outputs, state_h, state_c],
#                       name='decoder')


Print a summary and save the encoder network structure to "./decoder.pdf"

In [21]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot, plot_model

SVG(model_to_dot(decoder_model_bi, show_shapes=False).create(prog='dot', format='svg'))

plot_model(
    model=decoder_model_bi, show_shapes=False
    ,
    to_file='decoder.png'
)

decoder_model_bi.summary()

Model: "decoder"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
decoder_input_x (InputLayer)    (None, None, 30)     0                                            
__________________________________________________________________________________________________
decoder_input_h (InputLayer)    (None, 512)          0                                            
__________________________________________________________________________________________________
decoder_input_c (InputLayer)    (None, 512)          0                                            
__________________________________________________________________________________________________
decoder_bilstm (LSTM)           [(None, None, 512),  1112064     decoder_input_x[0][0]            
                                                                 decoder_input_h[0][0]      

In [0]:
# # inputs of the decoder network
# decoder_input_h = Input(shape=(latent_dim,), name='decoder_input_h')
# decoder_input_c = Input(shape=(latent_dim,), name='decoder_input_c')
# decoder_input_x = Input(shape=(None, num_decoder_tokens), name='decoder_input_x')

# # set the LSTM layer
# decoder_lstm = LSTM(latent_dim, return_sequences=True, 
#                     return_state=True, dropout=0.5, name='decoder_lstm')
# decoder_lstm_outputs, state_h, state_c = decoder_lstm(decoder_input_x, 
#                                                       initial_state=[decoder_input_h, decoder_input_c])

# # set the dense layer
# decoder_dense = Dense(num_decoder_tokens, activation='softmax', name='decoder_dense')
# decoder_outputs = decoder_dense(decoder_lstm_outputs)

# # build the decoder network model
# decoder_model = Model(inputs=[decoder_input_x, decoder_input_h, decoder_input_c],
#                       outputs=[decoder_outputs, state_h, state_c],
#                       name='decoder')

In [0]:
# from IPython.display import SVG
# from keras.utils.vis_utils import model_to_dot, plot_model

# SVG(model_to_dot(decoder_model, show_shapes=False).create(prog='dot', format='svg'))

# plot_model(
#     model=decoder_model, show_shapes=False
#     ,
#     to_file='decoder.png'
# )

# decoder_model.summary()

### 3.3. Connect the encoder and decoder

In [0]:

# input layers
encoder_input_x = Input(shape=(None, num_encoder_tokens), name='encoder_input_x')
decoder_input_x = Input(shape=(None, num_decoder_tokens), name='decoder_input_x')



encoder_final_states = encoder_model_bi([encoder_input_x])
decoder_bilstm_output, state_h2, state_c2 = decoder_bilstm(decoder_input_x, initial_state= encoder_final_states)
# decoder_bilstm_output, state_h, state_c = decoder_bilstm(decoder_input_x, initial_state=decoder_states_inputs)
decoder_states = [state_h2, state_c2]

# connect encoder to decoder
decoder_pred = decoder_dense(decoder_bilstm_output)
# model_bi = Model(inputs = [decoder_input_x] + decoder_states_inputs, outputs = [decoder_pred] + decoder_states, name='model_training')
model_bi = Model(inputs = [encoder_input_x, decoder_input_x], outputs = decoder_pred, name='model_training')

In [23]:
print(state_h)
print(decoder_state_input_h)

Tensor("concatenate_1/concat:0", shape=(?, 512), dtype=float32)
Tensor("decoder_input_h:0", shape=(?, 512), dtype=float32)


In [24]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot, plot_model

SVG(model_to_dot(model_bi, show_shapes=False).create(prog='dot', format='svg'))

plot_model(
    model=model_bi, show_shapes=False
    ,
    to_file='model_training.png'
)

model_bi.summary()

Model: "model_training"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_input_x (InputLayer)    (None, None, 28)     0                                            
__________________________________________________________________________________________________
decoder_input_x (InputLayer)    (None, None, 30)     0                                            
__________________________________________________________________________________________________
encoder (Model)                 [(None, 512), (None, 583680      encoder_input_x[0][0]            
__________________________________________________________________________________________________
decoder_bilstm (LSTM)           [(None, None, 512),  1112064     decoder_input_x[0][0]            
                                                                 encoder[1][0]       

In [0]:
# # input layers
# encoder_input_x = Input(shape=(None, num_encoder_tokens), name='encoder_input_x')
# decoder_input_x = Input(shape=(None, num_decoder_tokens), name='decoder_input_x')

# # connect encoder to decoder
# encoder_final_states = encoder_model([encoder_input_x])
# decoder_lstm_output, _, _ = decoder_lstm(decoder_input_x, initial_state=encoder_final_states)
# decoder_pred = decoder_dense(decoder_lstm_output)

# model = Model(inputs=[encoder_input_x, decoder_input_x], 
#               outputs=decoder_pred, 
#               name='model_training')

In [0]:
# print(state_h)
# print(decoder_input_h)

In [0]:
# from IPython.display import SVG
# from keras.utils.vis_utils import model_to_dot, plot_model

# SVG(model_to_dot(model, show_shapes=False).create(prog='dot', format='svg'))

# plot_model(
#     model=model, show_shapes=False
#     ,
#     to_file='model_training.png'
# )

# model.summary()

### 3.5. Fit the model on the bilingual dataset

- encoder_input_data: one-hot encode of the input language

- decoder_input_data: one-hot encode of the input language

- decoder_target_data: labels (left shift of decoder_input_data)

- tune the hyper-parameters

- stop when the validation loss stop decreasing.

In [27]:
print('shape of encoder_input_data' + str(encoder_input_data.shape))
print('shape of decoder_input_data' + str(decoder_input_data.shape))
print('shape of decoder_target_data' + str(decoder_target_data.shape))

shape of encoder_input_data(20000, 19, 28)
shape of decoder_input_data(20000, 68, 30)
shape of decoder_target_data(20000, 68, 30)


assign GPU and CPU

In [0]:
# configure GPU and CPU
# import tensorflow as tf
# import warnings as wa
# import keras

# wa.filterwarnings("ignore")

# config0 = tf.ConfigProto( device_count = {'GPU': 1 , 'CPU': 64} ) 
# session0 = tf.Session(config=config0) 
# keras.backend.set_session(session0)

In [28]:
model_bi.compile(optimizer='rmsprop', loss='categorical_crossentropy')

model_bi.fit([encoder_input_data, decoder_input_data],  # training data
          decoder_target_data,                       # labels (left shift of the target sequences)
          batch_size=64, epochs=50, validation_split=0.2)

model_bi.save('seq2seq_bi_01.h5')



Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where



Train on 16000 samples, validate on 4000 samples
Epoch 1/50





Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [0]:
# model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

# model.fit([encoder_input_data, decoder_input_data],  # training data
#           decoder_target_data,                       # labels (left shift of the target sequences)
#           batch_size=64, epochs=50, validation_split=0.2)

# model.save('seq2seq.h5')

## 4. Make predictions


### 4.1. Translate English to XXX

1. Encoder read a sentence (source language) and output its final states, $h_t$ and $c_t$.
2. Take the [star] sign "\t" and the final state $h_t$ and $c_t$ as input and run the decoder.
3. Get the new states and predicted probability distribution.
4. sample a char from the predicted probability distribution
5. take the sampled char and the new states as input and repeat the process (stop if reach the [stop] sign "\n").

In [0]:
from keras.models import load_model
# model=load_model('seq2seq_bi_01.h5')

In [0]:
# Reverse-lookup token index to decode sequences back to something readable.
reverse_input_char_index = dict((i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict((i, char) for char, i in target_token_index.items())

In [0]:
def decode_sequence(input_seq):
    states_value = encoder_model_bi.predict(input_seq)

    target_seq = numpy.zeros((1, 1, num_decoder_tokens))
    target_seq[0, 0, target_token_index['\t']] = 1.

    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model_bi.predict([target_seq] + states_value)

        # this line of code is greedy selection
        # try to use multinomial sampling instead (with temperature)
        sampled_token_index = numpy.argmax(output_tokens[0, -1, :])
        
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char

        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True

        target_seq = numpy.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        states_value = [h, c]

    return decoded_sentence


In [32]:
for seq_index in range(2100, 2120):
    # Take one sequence (part of the training set)
    # for trying out decoding.
    input_seq = encoder_input_data[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('English:       ', input_texts[seq_index])
    print('Spanish (true): ', target_texts[seq_index][1:-1])
    print('Spanish (pred): ', decoded_sentence[0:-1])


-
English:        is he right
Spanish (true):  esta el bien
Spanish (pred):  esta libre
-
English:        is he right
Spanish (true):  se encuentra bien
Spanish (pred):  esta libre
-
English:        is he right
Spanish (true):  ha acertado
Spanish (pred):  esta libre
-
English:        is he right
Spanish (true):  esta en lo cierto
Spanish (pred):  esta libre
-
English:        is he right
Spanish (true):  es lo que el dice
Spanish (pred):  esta libre
-
English:        is it clean
Spanish (true):  esta limpio
Spanish (pred):  esta limpio
-
English:        is it clean
Spanish (true):  esta limpia
Spanish (pred):  esta limpio
-
English:        is it there
Spanish (true):  esta ahi
Spanish (pred):  esta ahi
-
English:        is it to go
Spanish (true):  para llevar
Spanish (pred):  esta bien hoy
-
English:        is it windy
Spanish (true):  esta ventoso
Spanish (pred):  esta loco
-
English:        is it yours
Spanish (true):  es suyo
Spanish (pred):  es vuestro
-
English:        is it your

### 4.2. Translate an English sentence to the target language

1. Tokenization
2. One-hot encode
3. Translate

In [0]:
def transfer_str(str0, max_len = max_encoder_seq_length):
  str0 = str0.strip().lower()
  temp_vec = []
  for indx, i in enumerate(str0):
    if indx < max_len:
      temp_vec.append(input_token_index[i])
  temp_vec = pad_sequences([temp_vec], maxlen = max_len, padding = 'post')    
  return temp_vec  


In [35]:
# input_sentence = 'why is that'

# input_sequence = <do tokenization...>

# input_x = <do one-hot encode...>

# translated_sentence = <do translation...>

# print('source sentence is: ' + input_sentence)
# print('translated sentence is: ' + translated_sentence)



input_sentence = 'why is that'
input_sequence = transfer_str(input_sentence)
input_x = onehot_encode(input_sequence, max_encoder_seq_length, num_encoder_tokens)
translated_sentence = decode_sequence(input_x)
print(translated_sentence)


por que esto



## 5. Evaluate the translation using BLEU score

Reference: 
- https://machinelearningmastery.com/calculate-bleu-score-for-text-python/
- https://en.wikipedia.org/wiki/BLEU


**Hint:** 

- Randomly partition the dataset to training, validation, and test. 

- Evaluate the BLEU score using the test set. Report the average.

- A reasonable BLEU score should be 0.1 ~ 0.3.

Randomly partition the dataset to training, validation, and test.

1. shuffle the data set‘

data preparation and processing

In [0]:
# load dataset
doc = load_doc(filename)

# split into Language1-Language2 pairs
pairs = to_pairs(doc)

# clean sentences
clean_pairs1 = clean_data(pairs)[0:60000, :]

In [86]:
# detect the number of elements in pairs
# type(clean_pairs1)
len(clean_pairs1)

60000

In [87]:
# look into data before shuffle
print(clean_pairs1[0:15, :])

[['go' 've' 'ccby france attribution tatoebaorg cm cueyayotl']
 ['go' 'vete' 'ccby france attribution tatoebaorg cm cueyayotl']
 ['go' 'vaya' 'ccby france attribution tatoebaorg cm cueyayotl']
 ['go' 'vayase' 'ccby france attribution tatoebaorg cm arh']
 ['hi' 'hola' 'ccby france attribution tatoebaorg cm leono']
 ['run' 'corre'
  'ccby france attribution tatoebaorg papabear elenitigormiti']
 ['run' 'corran' 'ccby france attribution tatoebaorg papabear cueyayotl']
 ['run' 'corra' 'ccby france attribution tatoebaorg papabear seael']
 ['run' 'corred' 'ccby france attribution tatoebaorg papabear seael']
 ['run' 'corred' 'ccby france attribution tatoebaorg jsakuragi arh']
 ['who' 'quien' 'ccby france attribution tatoebaorg ck shishir']
 ['wow' 'orale' 'ccby france attribution tatoebaorg zifre cueyayotl']
 ['fire' 'fuego' 'ccby france attribution tatoebaorg spamster shishir']
 ['fire' 'incendio'
  'ccby france attribution tatoebaorg spamster marcelostockle']
 ['fire' 'disparad'
  'ccby fran

In [88]:
#shuffle data
import numpy
numpy.random.shuffle(clean_pairs1)
# check data after shuffle
print(clean_pairs1[0:15, :])

[['the lake is frozen' 'el lago esta helado'
  'ccby france attribution tatoebaorg hybrid seael']
 ['she sleeps on her back' 'ella duerme de costado'
  'ccby france attribution tatoebaorg nancy']
 ['tom started flipping out' 'tom empezo a perder los papeles'
  'ccby france attribution tatoebaorg cm albrusgher']
 ['we rented the apartment' 'alquilamos el departamento'
  'ccby france attribution tatoebaorg eldad donramon']
 ['i want to hire you' 'te quiero contratar'
  'ccby france attribution tatoebaorg ck hayastan']
 ['i made a deal with tom' 'hice un trato con tom'
  'ccby france attribution tatoebaorg ck pchamorro']
 ['tom eventually confessed' 'tom finalmente confeso'
  'ccby france attribution tatoebaorg ck albrusgher']
 ['im waiting for a phone call' 'estoy esperando una llamada'
  'ccby france attribution tatoebaorg ck schuager']
 ['tom cut his finger' 'tomas se corto el dedo'
  'ccby france attribution tatoebaorg ck donramon']
 ['i dont feel up to it' 'no me siento capaz de hace

In [89]:
# #  after  randomly shuffle, select 25000 data as full data set, -- 20k training, 5k test
# later will split
n_full = 20000
clean_pairs_f = clean_pairs1[0:n_full, :]
# check the partition --- total number of data : 122936
len(clean_pairs_f)
# len(clean_test)

20000

In [90]:
for i in range(1000, 1010):
    print('[' + clean_pairs_f[i, 0] + '] => [' + clean_pairs_f[i, 1] + ']')

[im not sure im ready] => [no estoy segura de estar preparada]
[the milk tastes sour] => [la leche tiene un sabor agrio]
[tom is a slow walker] => [tom camina lentamente]
[can you open this door] => [puede abrir esta puerta]
[i have an older brother] => [tengo un hermano mayor]
[i cant stay here forever] => [no puedo quedarme aqui para siempre]
[tom had to resign] => [tom tuvo que renunciar]
[write your name in pencil] => [escribe tu nombre con lapiz]
[she has a beautiful voice] => [ella tiene una bonita voz]
[i used to eat pizza] => [yo solia comer pizza]


In [91]:
input_texts1 = clean_pairs_f[:, 0]
target_texts1 = ['\t' + text + '\n' for text in clean_pairs_f[:, 1]]

print('Length of input_texts:  ' + str(input_texts1.shape))
print('Length of target_texts: ' + str(input_texts1.shape))

Length of input_texts:  (20000,)
Length of target_texts: (20000,)


In [92]:
max_encoder_seq_length1 = max(len(line) for line in input_texts1)
max_decoder_seq_length1 = max(len(line) for line in target_texts1)

print('max length of input  sentences: %d' % (max_encoder_seq_length1))
print('max length of target sentences: %d' % (max_decoder_seq_length1))

max length of input  sentences: 28
max length of target sentences: 62


In [93]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

# encode and pad sequences
def text2sequences(max_len, lines):
    tokenizer = Tokenizer(char_level=True, filters='')
    tokenizer.fit_on_texts(lines)
    seqs = tokenizer.texts_to_sequences(lines)
    seqs_pad = pad_sequences(seqs, maxlen=max_len, padding='post')
    return seqs_pad, tokenizer.word_index, tokenizer


encoder_input_seq1, input_token_index1, english_tokenizer = text2sequences(max_encoder_seq_length1, 
                                                      input_texts1)
decoder_input_seq1, target_token_index1, spanish_tokenizer = text2sequences(max_decoder_seq_length1, 
                                                       target_texts1)

print('shape of encoder_input_seq1: ' + str(encoder_input_seq1.shape))
print('shape of input_token_index1: ' + str(len(input_token_index1)))
print('shape of decoder_input_seq1: ' + str(decoder_input_seq1.shape))
print('shape of target_token_index1: ' + str(len(target_token_index1)))

shape of encoder_input_seq1: (20000, 28)
shape of input_token_index1: 27
shape of decoder_input_seq1: (20000, 62)
shape of target_token_index1: 29


In [94]:
num_encoder_tokens1 = len(input_token_index1) + 1
num_decoder_tokens1 = len(target_token_index1) + 1

print('num_encoder_tokens: ' + str(num_encoder_tokens1))
print('num_decoder_tokens: ' + str(num_decoder_tokens1))

num_encoder_tokens: 28
num_decoder_tokens: 30


In [95]:
target_texts1[100]

'\tcuando viniste a japon\n'

In [96]:
decoder_input_seq1[100, :]

array([10, 16, 13,  3,  6, 15,  4,  1, 19,  9,  6,  9,  5,  8,  2,  1,  3,
        1, 25,  3, 17,  4,  6, 11,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0], dtype=int32)

one-hot encode
 

In [97]:
from keras.utils import to_categorical

# one hot encode target sequence
def onehot_encode(sequences, max_len, vocab_size):
    n = len(sequences)
    data = numpy.zeros((n, max_len, vocab_size))
    for i in range(n):
        data[i, :, :] = to_categorical(sequences[i], num_classes=vocab_size)
    return data

encoder_input_data1 = onehot_encode(encoder_input_seq1, max_encoder_seq_length1, num_encoder_tokens1)
decoder_input_data1 = onehot_encode(decoder_input_seq1, max_decoder_seq_length1, num_decoder_tokens1)

decoder_target_seq1 = numpy.zeros(decoder_input_seq1.shape)
decoder_target_seq1[:, 0:-1] = decoder_input_seq1[:, 1:]
decoder_target_data1 = onehot_encode(decoder_target_seq1, 
                                    max_decoder_seq_length1, 
                                    num_decoder_tokens1)

print(decoder_target_data1.shape)
print(encoder_input_data1.shape)
print(decoder_input_data1.shape)

(20000, 62, 30)
(20000, 28, 28)
(20000, 62, 30)


In [98]:
# split data here into training and test
training_size = 15000
# get training set
encoder_input_data_training = encoder_input_data1[0:training_size, :, : ]
decoder_input_data_training = decoder_input_data1[0:training_size, :, : ]
decoder_target_data_training = decoder_target_data1[0:training_size, :, : ]

print(encoder_input_data_training.shape)
print(decoder_input_data_training.shape)
print(decoder_target_data_training.shape)

# get test set
encoder_input_data_test = encoder_input_data1[training_size:, :, : ]
decoder_input_data_test = decoder_input_data1[training_size:, :, : ]
decoder_target_data_test = decoder_target_data1[training_size:, :, : ]

print(encoder_input_data_test.shape)
print(decoder_input_data_test.shape)
print(decoder_target_data_test.shape)

(15000, 28, 28)
(15000, 62, 30)
(15000, 62, 30)
(5000, 28, 28)
(5000, 62, 30)
(5000, 62, 30)


encoder network

In [0]:
# create BILSTM model

from keras.layers import LSTM,Bidirectional,Input,Concatenate
from keras.models import Model

latent_dim = 256
num_encoder_tokens = num_encoder_tokens1

# inputs of the encoder network
encoder_inputs = Input(shape=(None, num_encoder_tokens), 
                       name='encoder_inputs')

# set the LSTM layer
encoder_bilstm = Bidirectional(LSTM(latent_dim, return_state=True, dropout=0.5, name='encoder_bilstm'))
encoder_outputs, forward_h, forward_c, backward_h, backward_c = encoder_bilstm(encoder_inputs)
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
encoder_states = [state_h, state_c]
# build the encoder network model
encoder_model_bi = Model(inputs = encoder_inputs, outputs = encoder_states, name = 'encoder')

In [100]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot, plot_model

SVG(model_to_dot(encoder_model_bi, show_shapes=False).create(prog='dot', format='svg'))

plot_model(
    model=encoder_model_bi, show_shapes=False
    ,
    to_file='encoder.png'
)

encoder_model_bi.summary()

Model: "encoder"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_inputs (InputLayer)     (None, None, 28)     0                                            
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) [(None, 512), (None, 583680      encoder_inputs[0][0]             
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 512)          0           bidirectional_1[0][1]            
                                                                 bidirectional_1[0][3]            
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 512)          0           bidirectional_1[0][2]      

decoder networks

In [0]:
from keras.layers import Input, LSTM, Dense
from keras.models import Model

num_decoder_tokens = num_decoder_tokens1

# inputs of the decoder network
decoder_input_x = Input(shape=(None, num_decoder_tokens), name='decoder_input_x')

decoder_state_input_h = Input(shape=(latent_dim *2,), name='decoder_input_h')
decoder_state_input_c = Input(shape=(latent_dim *2,), name='decoder_input_c')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

# set the LSTM layer
decoder_bilstm = LSTM(latent_dim * 2, return_sequences=True, return_state=True, dropout=0.5, name='decoder_bilstm')
decoder_bilstm_outputs, state_h1, state_c1 = decoder_bilstm(decoder_input_x, initial_state=decoder_states_inputs)

# set the dense layer
decoder_dense = Dense(num_decoder_tokens, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_bilstm_outputs)

# build the decoder network model
decoder_model_bi = Model(inputs=[decoder_input_x, decoder_state_input_h, decoder_state_input_c],
                      outputs=[decoder_outputs, state_h1, state_c1],
                      name='decoder')

In [102]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot, plot_model

SVG(model_to_dot(decoder_model_bi, show_shapes=False).create(prog='dot', format='svg'))

plot_model(
    model=decoder_model_bi, show_shapes=False
    ,
    to_file='decoder.png'
)

decoder_model_bi.summary()

Model: "decoder"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
decoder_input_x (InputLayer)    (None, None, 30)     0                                            
__________________________________________________________________________________________________
decoder_input_h (InputLayer)    (None, 512)          0                                            
__________________________________________________________________________________________________
decoder_input_c (InputLayer)    (None, 512)          0                                            
__________________________________________________________________________________________________
decoder_bilstm (LSTM)           [(None, None, 512),  1112064     decoder_input_x[0][0]            
                                                                 decoder_input_h[0][0]      

 Connect the encoder and decoder

In [0]:
# input layers
encoder_input_x = Input(shape=(None, num_encoder_tokens), name='encoder_input_x')
decoder_input_x = Input(shape=(None, num_decoder_tokens), name='decoder_input_x')



encoder_final_states = encoder_model_bi([encoder_input_x])
decoder_bilstm_output, state_h2, state_c2 = decoder_bilstm(decoder_input_x, initial_state= encoder_final_states)
decoder_states = [state_h2, state_c2]

# connect encoder to decoder
decoder_pred = decoder_dense(decoder_bilstm_output)
model_bi_01 = Model(inputs = [encoder_input_x, decoder_input_x], outputs = decoder_pred, name='model_training')

In [104]:
print(state_h)
print(decoder_state_input_h)

Tensor("concatenate_1/concat:0", shape=(?, 512), dtype=float32)
Tensor("decoder_input_h:0", shape=(?, 512), dtype=float32)


In [105]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot, plot_model

SVG(model_to_dot(model_bi_01, show_shapes=False).create(prog='dot', format='svg'))

plot_model(
    model=model_bi_01, show_shapes=False
    ,
    to_file='model_training.png'
)

model_bi_01.summary()

Model: "model_training"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_input_x (InputLayer)    (None, None, 28)     0                                            
__________________________________________________________________________________________________
decoder_input_x (InputLayer)    (None, None, 30)     0                                            
__________________________________________________________________________________________________
encoder (Model)                 [(None, 512), (None, 583680      encoder_input_x[0][0]            
__________________________________________________________________________________________________
decoder_bilstm (LSTM)           [(None, None, 512),  1112064     decoder_input_x[0][0]            
                                                                 encoder[1][0]       

 Fit the model on the training set

In [106]:
# print(encoder_input_data_training.shape)
# print(decoder_input_data_training.shape)

print('shape of encoder_input_data_training' + str(encoder_input_data_training.shape))
print('shape of decoder_input_data_training' + str(decoder_input_data_training.shape))
print('shape of decoder_target_data_training' + str(decoder_target_data_training.shape))
print('shape of encoder_input_data_test' + str(encoder_input_data_test.shape))
print('shape of decoder_input_data_test' + str(decoder_input_data_test.shape))
print('shape of decoder_target_data_test' + str(decoder_target_data_test.shape))

shape of encoder_input_data_training(15000, 28, 28)
shape of decoder_input_data_training(15000, 62, 30)
shape of decoder_target_data_training(15000, 62, 30)
shape of encoder_input_data_test(5000, 28, 28)
shape of decoder_input_data_test(5000, 62, 30)
shape of decoder_target_data_test(5000, 62, 30)


In [107]:
model_bi_01.compile(optimizer='rmsprop', loss='categorical_crossentropy')

model_bi_01.fit([encoder_input_data_training, decoder_input_data_training],  # training data
          decoder_target_data_training,                       # labels (left shift of the target sequences)
          batch_size=64, epochs=25, validation_data= ([encoder_input_data_test, decoder_input_data_test], decoder_target_data_test))

model_bi_01.save('seq2seq_bi_02.h5')

Train on 15000 samples, validate on 5000 samples
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


make predictions


In [0]:
# Reverse-lookup token index to decode sequences back to something readable.
reverse_input_char_index1 = dict((i, char) for char, i in input_token_index1.items())
reverse_target_char_index1 = dict((i, char) for char, i in target_token_index1.items())

In [0]:
def decode_sequence(input_seq):
    states_value = encoder_model_bi.predict(input_seq)

    target_seq = numpy.zeros((1, 1, num_decoder_tokens1))
    target_seq[0, 0, target_token_index1['\t']] = 1.

    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model_bi.predict([target_seq] + states_value)

        # this line of code is greedy selection
        # try to use multinomial sampling instead (with temperature)
        sampled_token_index = numpy.argmax(output_tokens[0, -1, :])
        
        sampled_char = reverse_target_char_index1[sampled_token_index]
        decoded_sentence += sampled_char

        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length1):
            stop_condition = True

        target_seq = numpy.zeros((1, 1, num_decoder_tokens1))
        target_seq[0, 0, sampled_token_index] = 1.

        states_value = [h, c]

    return decoded_sentence

In [109]:
for seq_index in range(2100, 2110):
    # Take one sequence (part of the training set)
    # for trying out decoding.
    input_seq = encoder_input_data_training[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('English:       ', input_texts1[seq_index])
    print('Spanish (true): ', target_texts1[seq_index][1:-1])
    print('Spanish (pred): ', decoded_sentence[0:-1])

-
English:        toms answers were wrong
Spanish (true):  las respuestas de tom eran erroneas
Spanish (pred):  el trabajo es de en casa
-
English:        i won an award as well
Spanish (true):  yo tambien recibi un premio
Spanish (pred):  estaba en presionero en la casa
-
English:        its your turn next
Spanish (true):  despues viene tu turno
Spanish (pred):  es tu nombre estaba ocupado
-
English:        i never said anything
Spanish (true):  nunca dije nada
Spanish (pred):  nunca ha estado en el trabajo
-
English:        what have you heard tom
Spanish (true):  que has escuchado tom
Spanish (pred):  que estas en casa
-
English:        youre under arrest
Spanish (true):  quedas detenido
Spanish (pred):  eres muy ingeligente
-
English:        tom and i help each other
Spanish (true):  tom y yo nos ayudamos mutuamente
Spanish (pred):  tom y yo estara en pariera
-
English:        my aunt had three children
Spanish (true):  mi tia tenia tres hijos
Spanish (pred):  mi hizo es de esta en

calculate bleu score

In [0]:
# load datasets

train = clean_pairs_f[0:15000, 0:2]
test = clean_pairs_f[15000:, 0:2]
# prepare english tokenizer
eng_tokenizer = english_tokenizer
eng_vocab_size = num_encoder_tokens1
eng_length = max_encoder_seq_length1
# prepare german tokenizer
spa_tokenizer = spanish_tokenizer
spa_vocab_size = num_decoder_tokens1
spa_length = max_encoder_seq_length1
# prepare data
trainX = [encoder_input_data_training, decoder_input_data_training]
testX = [encoder_input_data_test, decoder_input_data_test]


# print(trainX)
# print(trainX.shape)

In [0]:
from numpy import array
from numpy import argmax
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import load_model
from nltk.translate.bleu_score import corpus_bleu

def pred_str(source):
  pred_str_list = list()
  for seq_index in range(0, len(source)):
  # for seq_index in range(0, 4):
      # Take one sequence (part of the training set)
      # for trying out decoding.      
      input_seq = source[seq_index: seq_index + 1]
      decoded_sentence = decode_sequence(input_seq)[0:-1]
      # print(decoded_sentence)
      pred_str_list.append(decoded_sentence)
  return pred_str_list



 


In [119]:
input_seq_training = pred_str(encoder_input_data_training)
print(input_seq_training[1:10])

['ella le agare el carro', 'tom encino a la camisa', 'escuchamos el carro', 'quiero verlo', 'yo tengo un gran erabajo', 'tom se esta a dermiendo', 'estoy encontrado una cara a la casa', 'tom se quedo la camida', 'no me alguno de esto']


In [120]:
print(len(input_seq_training))

15000


In [116]:

input_seq_test = pred_str(encoder_input_data_test)
print(input_seq_test[1:10])



['quiero estar en cosa', 'tengo que decir eso', 'sabemos que tomas estuvieras', 'vi a un arigo', 'todas los gratos', 'eso es muy bien', 'quieres argo de acuerdo', 'ella es muy bien', 'ponte el trabajo']


In [45]:
print(len(input_seq_test))

5000

In [0]:
# function to calculate bleu score
def evaluate_model(input_seq, raw_dataset):
  actual,predicted = list(), list()
  for i in range(0, len(input_seq)):
    translation = input_seq[i]
    raw_src, raw_target = raw_dataset[i]
    if i < 10:
      print('src=[%s], target=[%s], predicted=[%s]' % (raw_src, raw_target, translation))
    actual.append([raw_target.split()])
    predicted.append(translation.split())
  # calculate BLEU score
  # this is 1-gram
  print('BLEU-1: %f' % corpus_bleu(actual, predicted, weights=(1.0, 0, 0, 0)))
  # this is n-gram (n>=2)
  print('BLEU-2: %f' % corpus_bleu(actual, predicted, weights=(0.5, 0.5, 0, 0)))
  print('BLEU-3: %f' % corpus_bleu(actual, predicted, weights=(0.3, 0.3, 0.3, 0)))
  print('BLEU-4: %f' % corpus_bleu(actual, predicted, weights=(0.25, 0.25, 0.25, 0.25)))


In [121]:
# calculate bleu score for training

evaluate_model(input_seq_training, train)

src=[the lake is frozen], target=[el lago esta helado], predicted=[el tiene es algo]
src=[she sleeps on her back], target=[ella duerme de costado], predicted=[ella le agare el carro]
src=[tom started flipping out], target=[tom empezo a perder los papeles], predicted=[tom encino a la camisa]
src=[we rented the apartment], target=[alquilamos el departamento], predicted=[escuchamos el carro]
src=[i want to hire you], target=[te quiero contratar], predicted=[quiero verlo]
src=[i made a deal with tom], target=[hice un trato con tom], predicted=[yo tengo un gran erabajo]
src=[tom eventually confessed], target=[tom finalmente confeso], predicted=[tom se esta a dermiendo]
src=[im waiting for a phone call], target=[estoy esperando una llamada], predicted=[estoy encontrado una cara a la casa]
src=[tom cut his finger], target=[tomas se corto el dedo], predicted=[tom se quedo la camida]
src=[i dont feel up to it], target=[no me siento capaz de hacerlo], predicted=[no me alguno de esto]
BLEU-1: 0.2

In [118]:
# calculate bleu score for test

evaluate_model(input_seq_test, test)

src=[i dont know who to turn to], target=[no se a quien acudir], predicted=[no se que no habe eso]
src=[i want to go fishing], target=[quiero ir a pescar], predicted=[quiero estar en cosa]
src=[ive come here to help you], target=[he venido aqui a ayudarte], predicted=[tengo que decir eso]
src=[we know what you did], target=[sabemos lo que hiciste], predicted=[sabemos que tomas estuvieras]
src=[i saw a sleeping dog], target=[vi a un perro durmiendo], predicted=[vi a un arigo]
src=[make way please], target=[abran paso por favor], predicted=[todas los gratos]
src=[thatll work], target=[eso funcionara], predicted=[eso es muy bien]
src=[would you like to be famous], target=[te gustaria ser famosa], predicted=[quieres argo de acuerdo]
src=[she is cooking for him], target=[ella esta cocinando para el], predicted=[ella es muy bien]
src=[put on your cap], target=[ponte el sombrero], predicted=[ponte el trabajo]
BLEU-1: 0.264672
BLEU-2: 0.155016
BLEU-3: 0.113151
BLEU-4: 0.049260
