
## Machine Translation
Por Alexander Siavichay

En este notebook se implementará 5 modelos para realizar Machine Translation, basado en el notebook original para iniciar el aprendizaje de Machine Translation


## Introducción
A continuación se elebaorará una red neuronal que funciona como parte de un maquina para traducción basado en pipeline, que es una técnica de implementación por medio de la cual se puede traslapar la ejecución de instrucciones. En la actualidad la segmentación es una de las tecnologías utilizadas para hacer procesadores más rápidos[1].

Se pasará como dato de ingreso texto en Inglés, y retornará la traducción e francés.

- **Preproceso** - Se convertirá el texto en sequencias de enteros.
- **Modelos** - Se elaborará modelos que acepten un secuencia de enteros como entrada y retorne un distribución de probabilidad para una posible traducción.
- **Predicción** Correr el modelo con el texto en inglés.

In [1]:
%load_ext autoreload
%aimport helper, tests
%autoreload 1

In [2]:
import collections

import helper
import numpy as np
import project_tests as tests

from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Model, Sequential
from keras.layers import GRU, Input, Dense, TimeDistributed, Activation, RepeatVector, Bidirectional, Dropout, LSTM
from keras.layers.embeddings import Embedding
from keras.optimizers import Adam
from keras.losses import sparse_categorical_crossentropy

Using TensorFlow backend.


### Verificación de acceso al GPU
En el caso de disponer de un equipo con GPU (Se sugiere para que el proceso de aprendizaje no tome demasiado tiempo), se provee el ´codigo para verificar el hardware.

In [4]:
#from tensorflow.python.client import device_lib
import tensorflow as tf
tf.config.experimental.set_memory_growth
tf.config.experimental.set_visible_devices([], 'GPU')
#print(device_lib.list_local_devices())

## Dataset
Se entretará y evaluará el pipeline de [WMT](http://www.statmt.org/) con una muestra que contiene un pequeño vocabulario. 

### Carga de datos
Se localizará los datos en `data/small_vocab_en` y `data/small_vocab_fr`. El archivo `small_vocab_en` contiene las sentencias en Inglés con sus traducciones en Francés en el archivo `small_vocab_fr`. 

A continuación se cargan estos archivos:

In [3]:
# Load English data
english_sentences = helper.load_data('data/small_vocab_en')
# Load French data
french_sentences = helper.load_data('data/small_vocab_fr')

print('Dataset Loaded')

Dataset Loaded


### Archivos
Muestra de los archivos  `small_vocab_en` en Inglés con la traducción e Francés del archivo `small_vocab_fr`.

In [4]:
for sample_i in range(2):
    print('small_vocab_en Line {}:  {}'.format(sample_i + 1, english_sentences[sample_i]))
    print('small_vocab_fr Line {}:  {}'.format(sample_i + 1, french_sentences[sample_i]))

small_vocab_en Line 1:  new jersey is sometimes quiet during autumn , and it is snowy in april .
small_vocab_fr Line 1:  new jersey est parfois calme pendant l' automne , et il est neigeux en avril .
small_vocab_en Line 2:  the united states is usually chilly during july , and it is usually freezing in november .
small_vocab_fr Line 2:  les Ã©tats-unis est gÃ©nÃ©ralement froid en juillet , et il gÃ¨le habituellement en novembre .


Los textos han sido preprocesados. Las puntuaciones se han delimitado utilizando espacios. Se han convertido en minúsculas.

### Vocabulario
La complejidad del vocabulario genera la complejidad del problema. 

In [5]:
english_words_counter = collections.Counter([word for sentence in english_sentences for word in sentence.split()])
french_words_counter = collections.Counter([word for sentence in french_sentences for word in sentence.split()])

print('{} English words.'.format(len([word for sentence in english_sentences for word in sentence.split()])))
print('{} unique English words.'.format(len(english_words_counter)))
print('10 Most common words in the English dataset:')
print('"' + '" "'.join(list(zip(*english_words_counter.most_common(10)))[0]) + '"')
print()
print('{} French words.'.format(len([word for sentence in french_sentences for word in sentence.split()])))
print('{} unique French words.'.format(len(french_words_counter)))
print('10 Most common words in the French dataset:')
print('"' + '" "'.join(list(zip(*french_words_counter.most_common(10)))[0]) + '"')

1823250 English words.
227 unique English words.
10 Most common words in the English dataset:
"is" "," "." "in" "it" "during" "the" "but" "and" "sometimes"

1961295 French words.
355 unique French words.
10 Most common words in the French dataset:
"est" "." "," "en" "il" "les" "mais" "et" "la" "parfois"


## Preproceso
Los textos se convertirán en secuencias de enteros usando:

1. Tokenize que generará los identificaodres de las palabras
2. Add padding para que todas las secuencias tengan la misma longitud.

### Tokenize (IMPLEMENTACIÓN)

Los datos son convertidos en números para que sean procesados dentro de la red neuronal.

Cadad secuencia se puede convertir en secuencia de palabras usando la función Keras [`Tokenizer`](https://keras.io/preprocessing/text/#tokenizer).

In [6]:
def tokenize(x):
    """
    Tokenize x
    :param x: List of sentences/strings to be tokenized
    :return: Tuple of (tokenized x data, tokenizer used to tokenize x)
    """
    # TODO: Implement
    text_tokenizer = Tokenizer()
    text_tokenizer.fit_on_texts(x)
    text_tokenized = text_tokenizer.texts_to_sequences(x)
    return text_tokenized, text_tokenizer
tests.test_tokenize(tokenize)

# Tokenize Example output
text_sentences = [
    'The quick brown fox jumps over the lazy dog .',
    'By Jove , my quick study of lexicography won a prize .',
    'This is a short sentence .']
text_tokenized, text_tokenizer = tokenize(text_sentences)

print(text_tokenizer.word_index)
print()

for sample_i, (sent, token_sent) in enumerate(zip(text_sentences, text_tokenized)):
    print('Sequence {} in x'.format(sample_i + 1))
    print('  Input:  {}'.format(sent))
    print('  Output: {}'.format(token_sent))

{'the': 1, 'quick': 2, 'a': 3, 'brown': 4, 'fox': 5, 'jumps': 6, 'over': 7, 'lazy': 8, 'dog': 9, 'by': 10, 'jove': 11, 'my': 12, 'study': 13, 'of': 14, 'lexicography': 15, 'won': 16, 'prize': 17, 'this': 18, 'is': 19, 'short': 20, 'sentence': 21}

Sequence 1 in x
  Input:  The quick brown fox jumps over the lazy dog .
  Output: [1, 2, 4, 5, 6, 7, 1, 8, 9]
Sequence 2 in x
  Input:  By Jove , my quick study of lexicography won a prize .
  Output: [10, 11, 12, 2, 13, 14, 15, 16, 3, 17]
Sequence 3 in x
  Input:  This is a short sentence .
  Output: [18, 19, 3, 20, 21]


### Padding (IMPLEMENTACION)
Se realiza un proceso mediante el cual cada palabra se le asigna la misma longitud. Las sentencias son dinámicas en longitud, por ello es necesario normalizar su longitud.
Para esto utilizamos la función [`pad_sequences`](https://keras.io/preprocessing/sequence/#pad_sequences).

In [7]:
def pad(x, length=None):
    """
    Pad x
    :param x: List of sequences.
    :param length: Length to pad the sequence to.  If None, use length of longest sequence in x.
    :return: Padded numpy array of sequences
    """
    # TODO: Implement
    if length == None:
        length = max([len(sentance) for sentance in x])
    x = np.array(x)
    x_padded = pad_sequences(x, maxlen=length, dtype='int32', padding='post')
    return x_padded

    return None
tests.test_pad(pad)

# Pad Tokenized output
test_pad = pad(text_tokenized)
for sample_i, (token_sent, pad_sent) in enumerate(zip(text_tokenized, test_pad)):
    print('Sequence {} in x'.format(sample_i + 1))
    print('  Input:  {}'.format(np.array(token_sent)))
    print('  Output: {}'.format(pad_sent))

Sequence 1 in x
  Input:  [1 2 4 5 6 7 1 8 9]
  Output: [1 2 4 5 6 7 1 8 9 0]
Sequence 2 in x
  Input:  [10 11 12  2 13 14 15 16  3 17]
  Output: [10 11 12  2 13 14 15 16  3 17]
Sequence 3 in x
  Input:  [18 19  3 20 21]
  Output: [18 19  3 20 21  0  0  0  0  0]


### Preproceso Pipeline
El objetivo es elaborar una arquitectura de red neuronal. La función `preprocess` realiza este proceso.

In [8]:
def preprocess(x, y):
    """
    Preprocess x and y
    :param x: Feature List of sentences
    :param y: Label List of sentences
    :return: Tuple of (Preprocessed x, Preprocessed y, x tokenizer, y tokenizer)
    """
    preprocess_x, x_tk = tokenize(x)
    preprocess_y, y_tk = tokenize(y)

    preprocess_x = pad(preprocess_x)
    preprocess_y = pad(preprocess_y)

    # Keras's sparse_categorical_crossentropy function requires the labels to be in 3 dimensions
    preprocess_y = preprocess_y.reshape(*preprocess_y.shape, 1)

    return preprocess_x, preprocess_y, x_tk, y_tk

preproc_english_sentences, preproc_french_sentences, english_tokenizer, french_tokenizer =\
    preprocess(english_sentences, french_sentences)
    
max_english_sequence_length = preproc_english_sentences.shape[1]
max_french_sequence_length = preproc_french_sentences.shape[1]
english_vocab_size = len(english_tokenizer.word_index)
french_vocab_size = len(french_tokenizer.word_index)

print('Data Preprocessed')
print("Max English sentence length:", max_english_sequence_length)
print("Max French sentence length:", max_french_sequence_length)
print("English vocabulary size:", english_vocab_size)
print("French vocabulary size:", french_vocab_size)

Data Preprocessed
Max English sentence length: 15
Max French sentence length: 21
English vocabulary size: 199
French vocabulary size: 345


## Modelos
A continuación tenemos la parte medular del ddocumento, que es, generara arquitecturas de red neuronal. Se empezará con una red sencilla para luego ir modificando e incrementando características a la arquitectura.

- Modelo 1 es una simple RNN
- Modelo 2 is a RNN se le agrega Embedding
- Modelo 3 es una RNN Bidirectional
- Modelo 4 es una RNN Encoder-Decoder 

Al final se construye una arquitectura de red personalizada.

### Retornando texto a partir de los identificadores Ids
Se utiliza la función `logits_to_text` como puente entre la lógica enre el texto en Inglés y Francés.

In [9]:
def logits_to_text(logits, tokenizer):
    """
    Turn logits from a neural network into text using the tokenizer
    :param logits: Logits from a neural network
    :param tokenizer: Keras Tokenizer fit on the labels
    :return: String that represents the text of the logits
    """
    index_to_words = {id: word for word, id in tokenizer.word_index.items()}
    index_to_words[0] = '<PAD>'

    return ' '.join([index_to_words[prediction] for prediction in np.argmax(logits, 1)])

print('`logits_to_text` function loaded.')

`logits_to_text` function loaded.


### Modelo 1: RNN (IMPLEMENTACION)
![RNN](images/rnn.png)
Se presenta un modelo básico que traduce el Inglés al Francés.
La primera arquitectura que se presenta utiliza la clase Sequential de Keras. Se basa en ir agregando capas de manera lineal. Es importante indicarle la forma de la entrada. Luego el model creado es compilado con el método `complile`. Aquí le indicamos parámetros como la métrica `accuracy`. Recordemos que se pasaron las palabras a números, por lo tanto utilizaremos Numpy para su tratamiento mediante la función `fit`[2].

La función softmax, es una función de activación que devuelve los números de ingreso en probabilidades que suman uno. Es una regresión logística, cuyo resultado es una distribución de probabilidad. Por lo tanto, el resultado será obtenido en un rango de 0 a 1. Computa la pérdida que experimentan los datos en un entrenamiento.

In [13]:
def simple_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train a basic RNN on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    # Hyperparameters
    learning_rate = 0.005
    
    # TODO: Build the layers
    model = Sequential()
    model.add(GRU(256, input_shape=input_shape[1:], return_sequences=True))
    model.add(TimeDistributed(Dense(1024, activation='relu')))
    model.add(Dropout(0.5))
    model.add(TimeDistributed(Dense(french_vocab_size, activation='softmax'))) 

    # Compile model
    model.compile(loss=sparse_categorical_crossentropy,
                  optimizer=Adam(learning_rate),
                  metrics=['accuracy'])
    return model

#tests.test_simple_model(simple_model)

# Reshaping the input to work with a basic RNN
tmp_x = pad(preproc_english_sentences, max_french_sequence_length)
tmp_x = tmp_x.reshape((-1, preproc_french_sentences.shape[-2], 1))

# Train the neural network
#simple_rnn_model = simple_model(
#    tmp_x.shape,
#    max_french_sequence_length-1,
#    english_vocab_size,
#    french_vocab_size)


simple_rnn_model = simple_model(
    tmp_x.shape,
    preproc_french_sentences.shape[1],
    len(english_tokenizer.word_index)+1,
    len(french_tokenizer.word_index)+1)

print(simple_rnn_model.summary())
simple_rnn_model.fit(tmp_x, preproc_french_sentences, batch_size=1024, epochs=10, validation_split=0.2)

# Print prediction(s)
print(logits_to_text(simple_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))




Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
gru_2 (GRU)                  (None, 21, 256)           198144    
_________________________________________________________________
time_distributed_3 (TimeDist (None, 21, 1024)          263168    
_________________________________________________________________
dropout_2 (Dropout)          (None, 21, 1024)          0         
_________________________________________________________________
time_distributed_4 (TimeDist (None, 21, 346)           354650    
Total params: 815,962
Trainable params: 815,962
Non-trainable params: 0
_________________________________________________________________
None
Train on 110288 samples, validate on 27573 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
new jersey est parfois parfois en l' et il est il en en <PAD> <PAD> <PAD> <PAD>

In [14]:
# Print prediction(s)
print("Prediction:")
print(logits_to_text(simple_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

print("\nCorrect Translation:")
print(french_sentences[:1])

print("\nOriginal text:")
print(english_sentences[:1])

Prediction:
new jersey est parfois parfois en l' et il est il en en <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>

Correct Translation:
["new jersey est parfois calme pendant l' automne , et il est neigeux en avril ."]

Original text:
['new jersey is sometimes quiet during autumn , and it is snowy in april .']


## Análisis
El primer modelo nos presenta un accuracy de 72%, cabe indicar que el batchsize es igual a 1024, con ello mejoramos el tiempo de procesado pero, al observar los resultados, se verifica un bajo rendimiento.

A continuación se trabajará en estos aspectos con la finalidad de mejorar el desempeño.

### Modelo 2: Embedding (IMPLEMENTACION)
![RNN](images/embedding.png)
En el primer modelo se trabajó con ids numéricos, pero, existe una mejor forma de representar las palabras mediante Word Embeddings. No es más que un vecor que representa cada palabra, esta es la intención del siguiente modelo.

In [23]:
def embed_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train a RNN model using word embedding on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    number_units = 100
    model = Sequential()
    model.add(Embedding(input_dim=english_vocab_size, output_dim=20))
    model.add(GRU(number_units, return_sequences=True, input_shape=input_shape[1:]))
    model.add(TimeDistributed(Dense(french_vocab_size)))
    
    model.compile(loss='sparse_categorical_crossentropy',
                  optimizer=Adam(lr=0.0014, decay=0.00004),
                  metrics=['accuracy'])
   
    return model
#tests.test_embed_model(embed_model)

# TODO: Reshape the input
tmp_x = pad(preproc_english_sentences, 21)


#simple_rnn_model = simple_model(
#    tmp_x.shape,
#    preproc_french_sentences.shape[1],
#    len(english_tokenizer.word_index)+1,
#    len(french_tokenizer.word_index)+1)

#model = embed_model(tmp_x.shape,
#                    max_french_sequence_length,
#                    english_vocab_size,
#                    french_vocab_size)

model = embed_model(tmp_x.shape,
                    preproc_french_sentences.shape[1],
                    len(english_tokenizer.word_index)+1,
                    len(french_tokenizer.word_index)+1)
# TODO: Train the neural network
model.fit(tmp_x, 
          preproc_french_sentences, 
          batch_size=256, 
          epochs=10, 
          validation_split=0.2)

# TODO: Print prediction(s)
print(logits_to_text(model.predict(tmp_x[:1])[0], french_tokenizer))

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 110288 samples, validate on 27573 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
nous nous est parfois calme calme l' l' novembre novembre novembre novembre est en <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>


## Análisis
En nuestro primer Embeddin vemos un resultado menor al anterior, con un accuracy del 60% aproximadamente. Como se analizó en el primero modelo se especificó un batchsize de 1024, ahora se redujo a 254, pero parce que esta medida menora el desempeño del resultado final.

Aumentamos además el lerning rate a 0,005. Además, se utilizará la función de activación `relu` en primera instancia y luego `softmax`.

In [18]:
def embed_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train a RNN model using word embedding on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    # TODO: Implement

    # Hyperparameters
    learning_rate = 0.005
    
    # TODO: Build the layers
    model = Sequential()
    model.add(Embedding(english_vocab_size, 256, input_length=input_shape[1], input_shape=input_shape[1:]))
    model.add(GRU(256, return_sequences=True))    
    model.add(TimeDistributed(Dense(1024, activation='relu')))
    model.add(Dropout(0.5))
    model.add(TimeDistributed(Dense(french_vocab_size, activation='softmax'))) 

    # Compile model
    model.compile(loss=sparse_categorical_crossentropy,
                  optimizer=Adam(learning_rate),
                  metrics=['accuracy'])
    return model

#tests.test_embed_model(embed_model)

# TODO: Reshape the input
tmp_x = pad(preproc_english_sentences, preproc_french_sentences.shape[1])
tmp_x = tmp_x.reshape((-1, preproc_french_sentences.shape[-2]))

# TODO: Train the neural network
embed_rnn_model = embed_model(
    tmp_x.shape,
    preproc_french_sentences.shape[1],
    len(english_tokenizer.word_index)+1,
    len(french_tokenizer.word_index)+1)

embed_rnn_model.summary()

embed_rnn_model.fit(tmp_x, preproc_french_sentences, batch_size=512, epochs=10, validation_split=0.2)

# TODO: Print prediction(s)
print(logits_to_text(embed_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 21, 256)           51200     
_________________________________________________________________
gru_5 (GRU)                  (None, 21, 256)           393984    
_________________________________________________________________
time_distributed_9 (TimeDist (None, 21, 1024)          263168    
_________________________________________________________________
dropout_5 (Dropout)          (None, 21, 1024)          0         
_________________________________________________________________
time_distributed_10 (TimeDis (None, 21, 346)           354650    
Total params: 1,063,002
Trainable params: 1,063,002
Non-trainable params: 0
_________________________________________________________________


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 110288 samples, validate on 27573 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
new jersey est parfois calme en l' automne et il est neigeux en avril <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>


In [19]:
# Print prediction(s)
print("Prediction:")
print(logits_to_text(embed_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

print("\nCorrect Translation:")
print(french_sentences[:1])

print("\nOriginal text:")
print(english_sentences[:1])

Prediction:
new jersey est parfois calme en l' automne et il est neigeux en avril <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>

Correct Translation:
["new jersey est parfois calme pendant l' automne , et il est neigeux en avril ."]

Original text:
['new jersey is sometimes quiet during autumn , and it is snowy in april .']


## Análisis
Con los cambios mencianos se ha conseguido una mejora notable, esto a llevado a obtener un accuracy del 93%. Si observamos la predicción, es mucho mejor.

### Modelo 3: RNN Bidereccional (IMPLEMENTACION)
![RNN](images/bidirectional.png)
Una restricción de una red neuronal es, que no puede ver las entradas futuras, por ello, sus resultados se basan en entradas pasadas. Aquí es donde entra una red neuronal bidirección. Esta arquitectura permite ver los datos futuros.

In [25]:
def bd_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train a bidirectional RNN model on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    # TODO: Implement
    number_units = 100
    model = Sequential()
    model.add(Bidirectional(GRU(number_units, return_sequences=True), input_shape=input_shape[1:]))
    model.add(TimeDistributed(Dense(french_vocab_size)))
    model.compile(loss='sparse_categorical_crossentropy',
                  optimizer=Adam(lr=0.0014, decay=0.00003),
                  metrics=['accuracy'])
   
    return model 
    
tmp_x = pad(preproc_english_sentences, 21)
tmp_x = tmp_x.reshape((-1, 21, 1))
#model = bd_model(tmp_x.shape,
#                 max_french_sequence_length,
#                 english_vocab_size,
#                 french_vocab_size)

model = bd_model(tmp_x.shape,
                 preproc_french_sentences.shape[1],
                 len(english_tokenizer.word_index)+1,
                 len(french_tokenizer.word_index)+1)

#preproc_french_sentences.shape[1],
#                    len(english_tokenizer.word_index)+1,
#                    len(french_tokenizer.word_index)+1



model.fit(tmp_x, 
          preproc_french_sentences, 
          batch_size=500, 
          epochs=10, 
          validation_split=0.2)


print(logits_to_text(model.predict(tmp_x[:1])[0], french_tokenizer))
# TODO: Train and Print prediction(s)

Train on 110288 samples, validate on 27573 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
le le et et le le le et et et le et le <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>


In [20]:
def bd_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train a bidirectional RNN model on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    # TODO: Implement

    # Hyperparameters
    learning_rate = 0.003
    
    # TODO: Build the layers
    model = Sequential()
    model.add(Bidirectional(GRU(128, return_sequences=True), input_shape=input_shape[1:]))
    model.add(TimeDistributed(Dense(1024, activation='relu')))
    model.add(Dropout(0.5))
    model.add(TimeDistributed(Dense(french_vocab_size, activation='softmax'))) 

    # Compile model
    model.compile(loss=sparse_categorical_crossentropy,
                  optimizer=Adam(learning_rate),
                  metrics=['accuracy'])
    return model

#tests.test_bd_model(bd_model)

# TODO: Reshape the input
tmp_x = pad(preproc_english_sentences, preproc_french_sentences.shape[1])
tmp_x = tmp_x.reshape((-1, preproc_french_sentences.shape[-2]))

# TODO: Train and Print prediction(s)
embed_rnn_model = embed_model(
    tmp_x.shape,
    preproc_french_sentences.shape[1],
    len(english_tokenizer.word_index)+1,
    len(french_tokenizer.word_index)+1)

embed_rnn_model.summary()

embed_rnn_model.fit(tmp_x, preproc_french_sentences, batch_size=512, epochs=10, validation_split=0.2)

print(logits_to_text(embed_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 21, 256)           51200     
_________________________________________________________________
gru_6 (GRU)                  (None, 21, 256)           393984    
_________________________________________________________________
time_distributed_11 (TimeDis (None, 21, 1024)          263168    
_________________________________________________________________
dropout_6 (Dropout)          (None, 21, 1024)          0         
_________________________________________________________________
time_distributed_12 (TimeDis (None, 21, 346)           354650    
Total params: 1,063,002
Trainable params: 1,063,002
Non-trainable params: 0
_________________________________________________________________


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 110288 samples, validate on 27573 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
new jersey est parfois calme en l' automne et il est neigeux en avril <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>


In [21]:
# Print prediction(s)
print("Prediction:")
print(logits_to_text(embed_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

print("\nCorrect Translation:")
print(french_sentences[:1])

print("\nOriginal text:")
print(english_sentences[:1])

Prediction:
new jersey est parfois calme en l' automne et il est neigeux en avril <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>

Correct Translation:
["new jersey est parfois calme pendant l' automne , et il est neigeux en avril ."]

Original text:
['new jersey is sometimes quiet during autumn , and it is snowy in april .']


## Análisis.
Nuevamente, en nuestro primer intento obtuvimos un resultado muy precario. Por ello se hizo unas modificaciones del modelo manteniendo un batchsize de 512. De igual manera que en la red neuronal anterior, se añadío una capa con la función de activación `relu` mejorando notablemente el accuracy.

### Model 4: Encoder-Decoder (IMPLEMENTATION)
Time to look at encoder-decoder models.  This model is made up of an encoder and decoder. The encoder creates a matrix representation of the sentence.  The decoder takes this matrix as input and predicts the translation as output.

Create an encoder-decoder model in the cell below.

In [None]:
def encdec_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train an encoder-decoder model on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    # TODO: Implement
    learning_rate = 100
    model = Sequential()
    model.add(GRU(128, input_shape = input_shape[1:], return_sequences = False))
    model.add(RepeatVector(output_sequence_length))
    model.add(GRU(128, return_sequences = True))
    model.add(TimeDistributed(Dense(french_vocab_size, activation = 'softmax')))
    
    model.compile(loss='sparse_categorical_crossentropy',
                  optimizer=Adam(lr=0.0014, decay=0.00003),
                  metrics=['accuracy'])
    return model


tmp_x = pad(preproc_english_sentences)
tmp_x = tmp_x.reshape((-1, preproc_english_sentences.shape[1], 1))

encodeco_model = encdec_model(
    tmp_x.shape,
    preproc_french_sentences.shape[1],
    len(english_tokenizer.word_index)+1,
    len(french_tokenizer.word_index)+1)
encodeco_model.fit(tmp_x, preproc_french_sentences, batch_size=1024, epochs=20, validation_split=0.2)

# TODO: Train and Print prediction(s)
print(logits_to_text(encodeco_model.predict(tmp_x[:1])[0], french_tokenizer))

In [28]:
def encdec_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train an encoder-decoder model on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    # OPTIONAL: Implement
    
    # Hyperparameters
    learning_rate = 0.001
    
    # Build the layers    
    model = Sequential()
    # Encoder
    model.add(GRU(256, input_shape=input_shape[1:], go_backwards=True))
    model.add(RepeatVector(output_sequence_length))
    # Decoder
    model.add(GRU(256, return_sequences=True))
    model.add(TimeDistributed(Dense(1024, activation='relu')))
    model.add(Dropout(0.5))
    model.add(TimeDistributed(Dense(french_vocab_size, activation='softmax')))

    # Compile model
    model.compile(loss=sparse_categorical_crossentropy,
                  optimizer=Adam(learning_rate),
                  metrics=['accuracy'])
    
    return model

#tests.test_encdec_model(encdec_model)

# Reshape the input
tmp_x = pad(preproc_english_sentences, preproc_french_sentences.shape[1])
tmp_x = tmp_x.reshape((-1, preproc_french_sentences.shape[-2], 1))

# Train and Print prediction(s)
encdec_rnn_model = encdec_model(
    tmp_x.shape,
    preproc_french_sentences.shape[1],
    len(english_tokenizer.word_index)+1,
    len(french_tokenizer.word_index)+1)

encdec_rnn_model.summary()

encdec_rnn_model.fit(tmp_x, preproc_french_sentences, batch_size=512, epochs=10, validation_split=0.2)

Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
gru_11 (GRU)                 (None, 256)               198144    
_________________________________________________________________
repeat_vector_3 (RepeatVecto (None, 21, 256)           0         
_________________________________________________________________
gru_12 (GRU)                 (None, 21, 256)           393984    
_________________________________________________________________
time_distributed_17 (TimeDis (None, 21, 1024)          263168    
_________________________________________________________________
dropout_9 (Dropout)          (None, 21, 1024)          0         
_________________________________________________________________
time_distributed_18 (TimeDis (None, 21, 346)           354650    
Total params: 1,209,946
Trainable params: 1,209,946
Non-trainable params: 0
____________________________________________

<keras.callbacks.callbacks.History at 0x16a4c55f388>

In [29]:
# Print prediction(s)
print("Prediction:")
print(logits_to_text(encdec_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

print("\nCorrect Translation:")
print(french_sentences[:1])

print("\nOriginal text:")
print(english_sentences[:1])

Prediction:
new jersey est parfois chaud en mois de il est il est en en <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>

Correct Translation:
["new jersey est parfois calme pendant l' automne , et il est neigeux en avril ."]

Original text:
['new jersey is sometimes quiet during autumn , and it is snowy in april .']


### Model 5: Custom (IMPLEMENTATION)
Use everything you learned from the previous models to create a model that incorporates embedding and a bidirectional rnn into one model.

In [29]:
def model_final(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train a model that incorporates embedding, encoder-decoder, and bidirectional RNN on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    # TODO: Implement    
    model = Sequential()
    model.add(Embedding(input_dim=english_vocab_size,output_dim=128,input_length=input_shape[1]))
    model.add(Bidirectional(GRU(256,return_sequences=False)))
    model.add(RepeatVector(output_sequence_length))
    model.add(Bidirectional(GRU(256,return_sequences=True)))
    model.add(TimeDistributed(Dense(french_vocab_size,activation='softmax')))
    learning_rate = 0.005
    
    model.compile(loss = sparse_categorical_crossentropy, 
                 optimizer = Adam(learning_rate), 
                 metrics = ['accuracy'])
    
    return model
#tests.test_model_final(model_final)


print('Final Model Loaded')
# TODO: Train the final model



Final Model Loaded


In [22]:
def model_final(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train a model that incorporates embedding, encoder-decoder, and bidirectional RNN on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    # TODO: Implement

    # Hyperparameters
    learning_rate = 0.003
    
    # Build the layers    
    model = Sequential()
    # Embedding
    model.add(Embedding(english_vocab_size, 128, input_length=input_shape[1],
                         input_shape=input_shape[1:]))
    # Encoder
    model.add(Bidirectional(GRU(128)))
    model.add(RepeatVector(output_sequence_length))
    # Decoder
    model.add(Bidirectional(GRU(128, return_sequences=True)))
    model.add(TimeDistributed(Dense(512, activation='relu')))
    model.add(Dropout(0.5))
    model.add(TimeDistributed(Dense(french_vocab_size, activation='softmax')))
    model.compile(loss=sparse_categorical_crossentropy,
                  optimizer=Adam(learning_rate),
                  metrics=['accuracy'])
    return model

#tests.test_model_final(model_final)

print('Final Model Loaded')

Final Model Loaded


## Prediction (IMPLEMENTATION)

In [None]:
def final_predictions(x, y, x_tk, y_tk):
    
    """
    Gets predictions using the final model
    :param x: Preprocessed English data
    :param y: Preprocessed French data
    :param x_tk: English tokenizer
    :param y_tk: French tokenizer
    """
    # TODO: Train neural network using model_final
    tmp_X = pad(preproc_english_sentences)
    
    model = model_final(tmp_X.shape, 
                        preproc_french_sentences.shape[1],
                        len(english_tokenizer.word_index)+1,
                        len(french_tokenizer.word_index)+1)
    
    model.fit(tmp_X, preproc_french_sentences, batch_size = 1024, epochs = 17, validation_split = 0.2)
    
    ## DON'T EDIT ANYTHING BELOW THIS LINE
    y_id_to_word = {value: key for key, value in y_tk.word_index.items()}
    y_id_to_word[0] = '<PAD>'

    sentence = 'he saw a old yellow truck'
    sentence = [x_tk.word_index[word] for word in sentence.split()]
    sentence = pad_sequences([sentence], maxlen=x.shape[-1], padding='post')
    sentences = np.array([sentence[0], x[0]])
    predictions = model.predict(sentences, len(sentences))

    print('Sample 1:')
    print(' '.join([y_id_to_word[np.argmax(x)] for x in predictions[0]]))
    print('Il a vu un vieux camion jaune')
    print('Sample 2:')
    print(' '.join([y_id_to_word[np.argmax(x)] for x in predictions[1]]))
    print(' '.join([y_id_to_word[np.max(x)] for x in y[0]]))


final_predictions(preproc_english_sentences, preproc_french_sentences, english_tokenizer, french_tokenizer)

In [25]:
def final_predictions(x, y, x_tk, y_tk):
    """
    Gets predictions using the final model
    :param x: Preprocessed English data
    :param y: Preprocessed French data
    :param x_tk: English tokenizer
    :param y_tk: French tokenizer
    """
    # TODO: Train neural network using model_final
    model = model_final(x.shape,y.shape[1],
                        len(x_tk.word_index)+1,
                        len(y_tk.word_index)+1)
    model.summary()
    model.fit(x, y, batch_size=512, epochs=10, validation_split=0.2)

    
    ## DON'T EDIT ANYTHING BELOW THIS LINE
    y_id_to_word = {value: key for key, value in y_tk.word_index.items()}
    y_id_to_word[0] = '<PAD>'

    sentence = 'he saw a old yellow truck'
    sentence = [x_tk.word_index[word] for word in sentence.split()]
    sentence = pad_sequences([sentence], maxlen=x.shape[-1], padding='post')
    sentences = np.array([sentence[0], x[0]])
    predictions = model.predict(sentences, len(sentences))

    print('Sample 1:')
    print(' '.join([y_id_to_word[np.argmax(x)] for x in predictions[0]]))
    print('Il a vu un vieux camion jaune')
    print('Sample 2:')
    print(' '.join([y_id_to_word[np.argmax(x)] for x in predictions[1]]))
    print(' '.join([y_id_to_word[np.max(x)] for x in y[0]]))


final_predictions(preproc_english_sentences, preproc_french_sentences, english_tokenizer, french_tokenizer)

Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, 15, 128)           25600     
_________________________________________________________________
bidirectional_4 (Bidirection (None, 256)               197376    
_________________________________________________________________
repeat_vector_2 (RepeatVecto (None, 21, 256)           0         
_________________________________________________________________
bidirectional_5 (Bidirection (None, 21, 256)           295680    
_________________________________________________________________
time_distributed_15 (TimeDis (None, 21, 512)           131584    
_________________________________________________________________
dropout_8 (Dropout)          (None, 21, 512)           0         
_________________________________________________________________
time_distributed_16 (TimeDis (None, 21, 346)          

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 110288 samples, validate on 27573 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Sample 1:
il a vu un vieux camion jaune <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>
Il a vu un vieux camion jaune
Sample 2:
new jersey est parfois calme pendant l' automne et il est neigeux en avril <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>
new jersey est parfois calme pendant l' automne et il est neigeux en avril <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>


In [30]:
# Print prediction(s)
print("Prediction:")
print(logits_to_text(simple_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

print("\nCorrect Translation:")
print(french_sentences[:1])

print("\nOriginal text:")
print(english_sentences[:1])

Prediction:
new jersey est parfois parfois en l' et il est il en en <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>

Correct Translation:
["new jersey est parfois calme pendant l' automne , et il est neigeux en avril ."]

Original text:
['new jersey is sometimes quiet during autumn , and it is snowy in april .']


### Generate the html

**Save your notebook before running the next cell to generate the HTML output.** Then submit your project.

In [31]:
# Save before you run this cell!
!!jupyter nbconvert *.ipynb

['[NbConvertApp] Converting notebook machine_translationv1.ipynb to html',
 '[NbConvertApp] Writing 357192 bytes to machine_translationv1.html',
 '[NbConvertApp] Converting notebook machine_translationv2 - copia.ipynb to html',
 '[NbConvertApp] Writing 535003 bytes to machine_translationv2 - copia.html',
 '[NbConvertApp] Converting notebook machine_translationv2.ipynb to html',
 '[NbConvertApp] Writing 545081 bytes to machine_translationv2.html']

# Referencias
[1]Pipeline (Segmentación), http://ciecfie.epn.edu.ec/wss/VirtualDirectories/80/pag_personales/PChico/Materiales_Micros/pipeline-intro.pdf 
[2]Getting started with the Keras Sequential model https://keras.io/getting-started/sequential-model-guide/
