## Machine Translation Project

## Introduction
In this notebook, we will build a deep neural network that functions as part of an end-to-end machine translation pipeline. The completed pipeline will accept English text as input and return the French translation.

- **Preprocess** - convert text to sequence of integers.
- **Models** - Create models which accepts a sequence of integers as input and returns a probability distribution over possible translations.
- **Prediction** - Run the model on English text.

In [None]:
#imports
import os
import collections
import numpy as np
import project_tests as tests

from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Model
from keras.layers import GRU, LSTM, Input, Dense, TimeDistributed, Activation, RepeatVector, Bidirectional ,Dropout
from keras.layers.embeddings import Embedding
from keras.optimizers import Adam
from keras.losses import sparse_categorical_crossentropy

In [None]:
#load dataset
def load_data(path):
    input_file = os.path.join(path)
    with open(input_file, "r") as f:
        data = f.read()

    return data.split('\n')

### Load Data
The data is located in `data/small_vocab_en` and `data/small_vocab_fr`. The `small_vocab_en` file contains English sentences with their French translations in the `small_vocab_fr` file. Load the English and French data from these files from running the cell below.

In [None]:
#load English Data
english_sentences = load_data('data/small_vocab_en')
#load French Data
french_sentences = load_data('data/small_vocab_fr')

### Files
Each line in `small_vocab_en` contains an English sentence with the respective translation in each line of `small_vocab_fr`.  View the first two lines from each file.

In [None]:
#visualize data
for text in range(2):
  print('Englist text {}: {}'.format(text+1, english_sentences[text]))
  print('French text {}: {}'.format(text+1, french_sentences[text]))


Englist text 1: new jersey is sometimes quiet during autumn , and it is snowy in april .
French text 1: new jersey est parfois calme pendant l' automne , et il est neigeux en avril .
Englist text 2: the united states is usually chilly during july , and it is usually freezing in november .
French text 2: les états-unis est généralement froid en juillet , et il gèle habituellement en novembre .


In [None]:
#total no of unique words in each vocabualary
english_word_counter = collections.Counter([word for sentence in english_sentences for word in sentence.split()])
french_word_counter = collections.Counter([word for sentence in french_sentences for word in sentence.split()])

print('{} English word.'.format(len([word for sentence in english_sentences for word in sentence.split()])))
print('{} unique Englist words.'.format(len(english_word_counter)))
print('10 Most common words in English dataset:')
print('"' + '" "'. join(list(zip(*english_word_counter.most_common(10)))[0])+'"')
print("")
print('{} French word.'.format(len([word for sentence in french_sentences for word in sentence.split()])))
print('{} unique French words.'.format(len(french_word_counter)))
print('10 Most common words in French dataset:')
print('"' + '" "'. join(list(zip(*french_word_counter.most_common(10)))[0])+'"')

1823250 English word.
227 unique Englist words.
10 Most common words in English dataset:
"is" "," "." "in" "it" "during" "the" "but" "and" "sometimes"

1961295 French word.
355 unique French words.
10 Most common words in French dataset:
"est" "." "," "en" "il" "les" "mais" "et" "la" "parfois"


## Preprocess
we need to convert the text into sequences of integers using the following preprocess methods:
1. Tokenize the words into ids
2. Add padding to make all the sequences the same length.

### Tokenize
we will Turn each sentence into a sequence of words ids using Keras's [`Tokenizer`](https://keras.io/preprocessing/text/#tokenizer) function. 

In [None]:
def tokenize(sentences):
  """
  Tokenize sentences
  :param sentence: List of sentences to be tokenized
  :return: Tuple of tokenized sentences and tokenizer
  """
  tokenizer = Tokenizer()
  tokenizer.fit_on_texts(sentences)
  text_tokenized = tokenizer.texts_to_sequences(sentences)

  return text_tokenized, tokenizer

tests.test_tokenize(tokenize)

# Tokenize Example output
text_sentences = [
    'The quick brown fox jumps over the lazy dog .',
    'By Jove , my quick study of lexicography won a prize .',
    'This is a short sentence .']
text_tokenized, text_tokenizer = tokenize(text_sentences)
print(text_tokenizer.word_index)
print()
for sample_i, (sent, token_sent) in enumerate(zip(text_sentences, text_tokenized)):
    print('Sequence {} in x'.format(sample_i + 1))
    print('  Input:  {}'.format(sent))
    print('  Output: {}'.format(token_sent))

{'the': 1, 'quick': 2, 'a': 3, 'brown': 4, 'fox': 5, 'jumps': 6, 'over': 7, 'lazy': 8, 'dog': 9, 'by': 10, 'jove': 11, 'my': 12, 'study': 13, 'of': 14, 'lexicography': 15, 'won': 16, 'prize': 17, 'this': 18, 'is': 19, 'short': 20, 'sentence': 21}

Sequence 1 in x
  Input:  The quick brown fox jumps over the lazy dog .
  Output: [1, 2, 4, 5, 6, 7, 1, 8, 9]
Sequence 2 in x
  Input:  By Jove , my quick study of lexicography won a prize .
  Output: [10, 11, 12, 2, 13, 14, 15, 16, 3, 17]
Sequence 3 in x
  Input:  This is a short sentence .
  Output: [18, 19, 3, 20, 21]


### Padding
we need to make sure that all the sequences are of same length by adding padding to the end of each sequence

In [None]:
def pad(sentences, length=None):
  """
  pad sentences
  :param sentences: List of sentences
  :parem length: Lenght to pad the sequence to. if None use the length of the longest sequence
  :return: padded numpy array of sequences
  """
  if length is None:
    length=max([len(sentence) for sentence in sentences])

  return pad_sequences(sentences, maxlen=length, padding='post')

tests.test_pad(pad)

#pad Tokenized output
test_pad = pad(text_tokenized)
for text, (token_sent, pad_sent) in enumerate(zip(text_tokenized, test_pad)):
  print('Sequence {} in x'.format(text+1))
  print('   Input: {}'.format(np.array(token_sent)))
  print('   Output: {}'.format(pad_sent))

Sequence 1 in x
   Input: [1 2 4 5 6 7 1 8 9]
   Output: [1 2 4 5 6 7 1 8 9 0]
Sequence 2 in x
   Input: [10 11 12  2 13 14 15 16  3 17]
   Output: [10 11 12  2 13 14 15 16  3 17]
Sequence 3 in x
   Input: [18 19  3 20 21]
   Output: [18 19  3 20 21  0  0  0  0  0]


### Preprocess Pipeline


In [None]:
def preprocess(x, y):
  """
  preprocess x and y
  :param x: Feature List of sentences
  :param y: Label List of sentences
  :return: Tuple of (Preprocessed x, Preprocessed y, x_tokenizer, y_tokenizer)
  """
  preprocess_x, x_tokenizer = tokenize(x)
  preprocess_y, y_tokenizer = tokenize(y)

  preprocess_x = pad(preprocess_x)
  preprocess_y = pad(preprocess_y)

  #keras's sparse_categorical_crossentropy function requires the labels to be in 3 dimensions
  preprocess_y = preprocess_y.reshape(*preprocess_y.shape, 1)

  return preprocess_x, preprocess_y, x_tokenizer, y_tokenizer

preprocessed_english_sentences, preprocessed_french_sentences, english_tokenizer, french_tokenizer =\
  preprocess(english_sentences, french_sentences)


max_english_sequence_length = preprocessed_english_sentences.shape[1]
max_french_sequence_length = preprocessed_french_sentences.shape[1]
english_vocab_size = len(english_tokenizer.word_index)
french_vocab_size = len(french_tokenizer.word_index)

print('Data Preprocessed')
print("Max English sentence length:", max_english_sequence_length)
print("Max French sentence length:", max_french_sequence_length)
print("English vocabulary size:", english_vocab_size)
print("French vocabulary size:", french_vocab_size)

Data Preprocessed
Max English sentence length: 15
Max French sentence length: 21
English vocabulary size: 199
French vocabulary size: 344


In [None]:
def logits_to_text(logits, tokenizer):
  """
  Turn logits from Neural Network into text using the tokenizer
  :param logits: Logits from Neural Network
  :param tokenizer: keras Tokenizer fit on the labels
  :return: String that represents the text of logits
  """
  index_to_words = {id:word for word, id in tokenizer.word_index.items()}
  index_to_words[0]='<PAD>'

  return ' '.join([index_to_words[prediction] for prediction in np.argmax(logits, 1)])

  
print('`logits_to_text` function loaded.')

`logits_to_text` function loaded.


### Model 1: RNN
A basic RNN model is a good baseline for sequence data. 

In [None]:
def simple_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train a basic RNN on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
   
    # TODO: Build the layers
    learning_rate = 1e-2
    input_seq = Input(input_shape[1:])
    rnn = LSTM(64, return_sequences=True)(input_seq)
    dropout = Dropout(0.2)(rnn)
    # we change `plaintext_vocab_size` to `french_vocab_size` as our ouput is now French sentences intead of plain words
    logits = TimeDistributed(Dense(french_vocab_size))(dropout)

    model = Model(input_seq, Activation('softmax')(logits))
    model.compile(loss=sparse_categorical_crossentropy,
                  optimizer=Adam(learning_rate),
                  metrics=['accuracy'])

    
    return model

# tests.test_simple_model(simple_model)


#reshape input
tmp_x = pad(preprocessed_english_sentences, max_french_sequence_length)
tmp_x = tmp_x.reshape((-1, preprocessed_french_sentences.shape[-2], 1))

#Train the neural network
simple_rnn_model = simple_model(
    tmp_x.shape,
    max_french_sequence_length,
    english_vocab_size,
    french_vocab_size
)

simple_rnn_model.fit(tmp_x, preprocessed_french_sentences, batch_size=1024, epochs=10, validation_split=0.2)

#print predictions
print(logits_to_text(simple_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

Train on 110288 samples, validate on 27573 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
new jersey est parfois calme en l' et il est il est en en <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>


### Model 2: Embedding (IMPLEMENTATION)
In the simple model we have used ids to represent words, but there's a better representation of a word. This is called word embeddings. An embedding is a vector representation of the word that is close to similar words in n-dimensional space, where the n represents the size of the embedding vectors.

In this model, we willcreate a RNN model using embedding.

In [None]:
def embedding_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
  """
  Build and train a RNN model using word embedding on x and y
  :param input_shape: Tuple of input shape
  :param output_sequence_length: length og output sequence
  :param english_vocab_size: Number of unique English words in the dataset
  :param french_vocab_size: Number of unique French words in dataset
  :return: keras model built, but not trained
  """
  
  learning_rate = 1e-2
  
  input_seq = Input(input_shape[1:])
  
  embed_layer = Embedding(english_vocab_size, 64, input_length=output_sequence_length)(input_seq)

  rnn = LSTM(64, return_sequences=True)(embed_layer)

  dropout = Dropout(0.2)(rnn)

  logits = TimeDistributed(Dense(french_vocab_size))(dropout)

  model = Model(input_seq, Activation('softmax')(logits))

  model.compile(loss=sparse_categorical_crossentropy,
                optimizer=Adam(learning_rate),
                metrics=['accuracy'])
  
  return model

# tests.test_embed_model(embedding_model)

#reshaping input
tmp_x = pad(preprocessed_english_sentences, max_french_sequence_length)
# tmp_x = tmp_x.reshape((-1, preprocessed_french_sentences.shape[-2], 1))

#train the neural network
embedding_rnn_model = embedding_model(
    tmp_x.shape,
    max_french_sequence_length,
    english_vocab_size,
    french_vocab_size)
embedding_rnn_model.fit(tmp_x, preprocessed_french_sentences, batch_size=512, epochs=64, validation_split=0.2)

print(logits_to_text(embedding_rnn_model.predict(tmp_x[:1])[0] , french_tokenizer))


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 110288 samples, validate on 27573 samples
Epoch 1/64
Epoch 2/64
Epoch 3/64
Epoch 4/64
Epoch 5/64
Epoch 6/64
Epoch 7/64
Epoch 8/64
Epoch 9/64
Epoch 10/64
Epoch 11/64
Epoch 12/64
Epoch 13/64
Epoch 14/64
Epoch 15/64
Epoch 16/64
Epoch 17/64
Epoch 18/64
Epoch 19/64
Epoch 20/64
Epoch 21/64
Epoch 22/64
Epoch 23/64
Epoch 24/64
Epoch 25/64
Epoch 26/64
Epoch 27/64
Epoch 28/64
Epoch 29/64
Epoch 30/64
Epoch 31/64
Epoch 32/64
Epoch 33/64
Epoch 34/64
Epoch 35/64
Epoch 36/64
Epoch 37/64
Epoch 38/64
Epoch 39/64
Epoch 40/64
Epoch 41/64
Epoch 42/64
Epoch 43/64
Epoch 44/64
Epoch 45/64
Epoch 46/64
Epoch 47/64
Epoch 48/64
Epoch 49/64
Epoch 50/64
Epoch 51/64
Epoch 52/64
Epoch 53/64
Epoch 54/64
Epoch 55/64
Epoch 56/64
Epoch 57/64
Epoch 58/64
Epoch 59/64
Epoch 60/64
Epoch 61/64
Epoch 62/64
Epoch 63/64
Epoch 64/64
new jersey est parfois calme en l'automne automne l' il est neigeux est en en <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>


### Model 3: Bidirectional RNNs 
One restriction of a RNN is that it can't see the future input, only the past.  This is where bidirectional recurrent neural networks come in.  They are able to see the future data.

In [None]:
def bidrectional_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
  """
  Build and train a bidirectional RNN model on x and y
  :param input_shape: Tuple of input shape
  :param output_sequence_length: Length of output sequence
  :param english_vocab_size: Number of unique English words in the dataset
  :param french_vocab_size: Number of unique French words in the dataset
  :return: Keras model built, but not trained
  """

  learning_rate = 1e-2

  input_seq = Input(input_shape[1:])

  rnn = Bidirectional(LSTM(64, return_sequences=True))(input_seq)

  dropout = Dropout(0.2)(rnn)

  logits = TimeDistributed(Dense(french_vocab_size))(dropout)

  model = Model(input_seq, Activation('softmax')(logits))

  model.compile(loss=sparse_categorical_crossentropy,
                  optimizer=Adam(learning_rate),
                  metrics=['accuracy'])
  return model

# Reshaping the input to work with a basic RNN
tmp_x = pad(preprocessed_english_sentences, max_french_sequence_length)
tmp_x = tmp_x.reshape((-1, preprocessed_french_sentences.shape[-2], 1))

# Train the neural network
bidrectional_rnn_model = bidrectional_model(
    tmp_x.shape,
    max_french_sequence_length,
    english_vocab_size,
    french_vocab_size)
bidrectional_rnn_model.fit(tmp_x, preprocessed_french_sentences, batch_size=512, epochs=64, validation_split=0.2)

# Print prediction(s)
print(logits_to_text(bidrectional_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

Train on 110288 samples, validate on 27573 samples
Epoch 1/64
Epoch 2/64
Epoch 3/64
Epoch 4/64
Epoch 5/64
Epoch 6/64
Epoch 7/64
Epoch 8/64
Epoch 9/64
Epoch 10/64
Epoch 11/64
Epoch 12/64
Epoch 13/64
Epoch 14/64
Epoch 15/64
Epoch 16/64
Epoch 17/64
Epoch 18/64
Epoch 19/64
Epoch 20/64
Epoch 21/64
Epoch 22/64
Epoch 23/64
Epoch 24/64
Epoch 25/64
Epoch 26/64
Epoch 27/64
Epoch 28/64
Epoch 29/64
Epoch 30/64
Epoch 31/64
Epoch 32/64
Epoch 33/64
Epoch 34/64
Epoch 35/64
Epoch 36/64
Epoch 37/64
Epoch 38/64
Epoch 39/64
Epoch 40/64
Epoch 41/64
Epoch 42/64
Epoch 43/64
Epoch 44/64
Epoch 45/64
Epoch 46/64
Epoch 47/64
Epoch 48/64
Epoch 49/64
Epoch 50/64
Epoch 51/64
Epoch 52/64
Epoch 53/64
Epoch 54/64
Epoch 55/64
Epoch 56/64
Epoch 57/64
Epoch 58/64
Epoch 59/64
Epoch 60/64
Epoch 61/64
Epoch 62/64
Epoch 63/64
Epoch 64/64
new jersey est parfois calme en mois de l' il est il en en <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>


### Model 4: Encoder-Decoder
This model is made up of an encoder and decoder. The encoder creates a matrix representation of the sentence. The decoder takes this matrix as input and predicts the translation as output.

In [None]:
from keras.layers import RepeatVector

def encdec_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train an encoder-decoder model on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    

    learning_rate = 0.03

    #Encoder
    encoder_inputs = Input(shape=input_shape[1:])
    encoder_gru = GRU(output_sequence_length)(encoder_inputs)
    encoder_outputs = Dense(64, activation='relu')(encoder_gru)
    
    #Decoder
    decoder_inputs = RepeatVector(output_sequence_length)(encoder_outputs)
    decoder_gru = GRU(64, return_sequences=True)(decoder_inputs)
    output_layer = TimeDistributed(Dense(french_vocab_size, activation='softmax'))
    outputs = output_layer(decoder_gru)

    #Create Model from parameters defined above
    model = Model(inputs=encoder_inputs, outputs=outputs)
    model.compile(loss=sparse_categorical_crossentropy,
                  optimizer=Adam(learning_rate),
                  metrics=['accuracy'])
    
    return model


#reshape input
tmp_x = pad(preprocessed_english_sentences, max_french_sequence_length)
tmp_x = tmp_x.reshape((-1, preprocessed_french_sentences.shape[-2], 1))

# Train the neural network
encdec_rnn_model = encdec_model(
    tmp_x.shape,
    max_french_sequence_length,
    english_vocab_size,
    french_vocab_size)
encdec_rnn_model.fit(tmp_x, preprocessed_french_sentences, batch_size=512, epochs=64, validation_split=0.2)

# Print prediction(s)
print(logits_to_text(encdec_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

Train on 110288 samples, validate on 27573 samples
Epoch 1/64
Epoch 2/64
Epoch 3/64
Epoch 4/64
Epoch 5/64
Epoch 6/64
Epoch 7/64
Epoch 8/64
Epoch 9/64
Epoch 10/64
Epoch 11/64
Epoch 12/64
Epoch 13/64
Epoch 14/64
Epoch 15/64
Epoch 16/64
Epoch 17/64
Epoch 18/64
Epoch 19/64
Epoch 20/64
Epoch 21/64
Epoch 22/64
Epoch 23/64
Epoch 24/64
Epoch 25/64
Epoch 26/64
Epoch 27/64
Epoch 28/64
Epoch 29/64
Epoch 30/64
Epoch 31/64
Epoch 32/64
Epoch 33/64
Epoch 34/64
Epoch 35/64
Epoch 36/64
Epoch 37/64
Epoch 38/64
Epoch 39/64
Epoch 40/64
Epoch 41/64
Epoch 42/64
Epoch 43/64
Epoch 44/64
Epoch 45/64
Epoch 46/64
Epoch 47/64
Epoch 48/64
Epoch 49/64
Epoch 50/64
Epoch 51/64
Epoch 52/64
Epoch 53/64
Epoch 54/64
Epoch 55/64
Epoch 56/64
Epoch 57/64
Epoch 58/64
Epoch 59/64
Epoch 60/64
Epoch 61/64
Epoch 62/64
Epoch 63/64
Epoch 64/64
new jersey est jamais agréable en l' et il est il est en en <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>


### Model 5: Custom 
Use everything from the previous models to create a model that incorporates embedding and a bidirectional rnn into one model.

In [None]:
def model_final_GRU(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train a model that incorporates embedding, encoder-decoder, and bidirectional RNN on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    # TODO: Implement
    learning_rate = 1e-2
    
    input_seq = Input(input_shape[1:])

    embed_layer = Embedding(english_vocab_size , 64 , input_length = output_sequence_length)(input_seq)
    
    rnn = Bidirectional(GRU(64, return_sequences=True))(embed_layer)
    
    dropout = Dropout(0.2)(rnn)
    
    logits = TimeDistributed(Dense(french_vocab_size))(dropout)
    
 
    model = Model(input_seq, Activation('softmax')(logits))
    model.compile(loss=sparse_categorical_crossentropy,
                  optimizer=Adam(learning_rate),
                  metrics=['accuracy'])
    
    
    
    return model



print('Final Model Loaded')
# TODO: Train the final model

tmp_x = pad(preprocessed_english_sentences, max_french_sequence_length)
#tmp_x = tmp_x.reshape((-1, preproc_french_sentences.shape[-2], 1))

# Train the neural network
final_gru_rnn_model = model_final_GRU(
    tmp_x.shape,
    max_french_sequence_length,
    english_vocab_size,
    french_vocab_size)
final_gru_rnn_model.fit(tmp_x, preprocessed_french_sentences, batch_size=512, epochs=48, validation_split=0.2)

# Print prediction(s)
print(logits_to_text(final_gru_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

Final Model Loaded


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 110288 samples, validate on 27573 samples
Epoch 1/48
Epoch 2/48
Epoch 3/48
Epoch 4/48
Epoch 5/48
Epoch 6/48
Epoch 7/48
Epoch 8/48
Epoch 9/48
Epoch 10/48
Epoch 11/48
Epoch 12/48
Epoch 13/48
Epoch 14/48
Epoch 15/48
Epoch 16/48
Epoch 17/48
Epoch 18/48
Epoch 19/48
Epoch 20/48
Epoch 21/48
Epoch 22/48
Epoch 23/48
Epoch 24/48
Epoch 25/48
Epoch 26/48
Epoch 27/48
Epoch 28/48
Epoch 29/48
Epoch 30/48
Epoch 31/48
Epoch 32/48
Epoch 33/48
Epoch 34/48
Epoch 35/48
Epoch 36/48
Epoch 37/48
Epoch 38/48
Epoch 39/48
Epoch 40/48
Epoch 41/48
Epoch 42/48
Epoch 43/48
Epoch 44/48
Epoch 45/48
Epoch 46/48
Epoch 47/48
Epoch 48/48
new jersey est parfois calme au l' automne il il est neigeux en avril <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>


## Prediction

In [None]:
def final_predictions_GRU(x, y, x_tk, y_tk):
    """
    Gets predictions using the final model
    :param x: Preprocessed English data
    :param y: Preprocessed French data
    :param x_tk: English tokenizer
    :param y_tk: French tokenizer
    """
    # TODO: Train neural network using model_final 
    max_french_sequence_length = y.shape[1]
    english_vocab_size = len(x_tk.word_index)
    french_vocab_size = len(y_tk.word_index)

    # Pad the input
    x = pad(x, max_french_sequence_length)

    # Train
    model = model_final_GRU(
        x.shape,
        max_french_sequence_length,
        english_vocab_size,
        french_vocab_size)

    model.fit(x, y, batch_size=512, epochs=80, validation_split=0.2)
    
    
    ## DON'T EDIT ANYTHING BELOW THIS LINE
    y_id_to_word = {value: key for key, value in y_tk.word_index.items()}
    y_id_to_word[0] = '<PAD>'

    sentence = 'he saw a old yellow truck'
    sentence = [x_tk.word_index[word] for word in sentence.split()]
    sentence = pad_sequences([sentence], maxlen=x.shape[-1], padding='post')
    sentences = np.array([sentence[0], x[0]])
    predictions = model.predict(sentences, len(sentences))

    print('Sample 1:')
    print(' '.join([y_id_to_word[np.argmax(x)] for x in predictions[0]]))
    print('Il a vu un vieux camion jaune')
    print('Sample 2:')
    print(' '.join([y_id_to_word[np.argmax(x)] for x in predictions[1]]))
    print(' '.join([y_id_to_word[np.max(x)] for x in y[0]]))


final_predictions_GRU(preprocessed_english_sentences, preprocessed_french_sentences, english_tokenizer, french_tokenizer)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 110288 samples, validate on 27573 samples
Epoch 1/80
Epoch 2/80
Epoch 3/80
Epoch 4/80
Epoch 5/80
Epoch 6/80
Epoch 7/80
Epoch 8/80
Epoch 9/80
Epoch 10/80
Epoch 11/80
Epoch 12/80
Epoch 13/80
Epoch 14/80
Epoch 15/80
Epoch 16/80
Epoch 17/80
Epoch 18/80
Epoch 19/80
Epoch 20/80
Epoch 21/80
Epoch 22/80
Epoch 23/80
Epoch 24/80
Epoch 25/80
Epoch 26/80
Epoch 27/80
Epoch 28/80
Epoch 29/80
Epoch 30/80
Epoch 31/80
Epoch 32/80
Epoch 33/80
Epoch 34/80
Epoch 35/80
Epoch 36/80
Epoch 37/80
Epoch 38/80
Epoch 39/80
Epoch 40/80
Epoch 41/80
Epoch 42/80
Epoch 43/80
Epoch 44/80
Epoch 45/80
Epoch 46/80
Epoch 47/80
Epoch 48/80
Epoch 49/80
Epoch 50/80
Epoch 51/80
Epoch 52/80
Epoch 53/80
Epoch 54/80
Epoch 55/80
Epoch 56/80
Epoch 57/80
Epoch 58/80
Epoch 59/80
Epoch 60/80
Epoch 61/80
Epoch 62/80
Epoch 63/80
Epoch 64/80
Epoch 65/80
Epoch 66/80
Epoch 67/80
Epoch 68/80
Epoch 69/80
Epoch 70/80
Epoch 71/80
Epoch 72/80
Epoch 73/80
Epoch 74/80
Epoch 75/80
Epoch 76/80
Epoch 77/80
Epoch 78/80
Epoch 79/80
Epoch 80/8