# Language Translation
In this project, I'll be training a sequence to sequence model on a dataset of English and French sentences that can translate new sentences from English to French.
## The Data
Since translating the whole language of English to French will take lots of time to train, I've used only a small portion of the English corpus.

In [1]:
import helper
import problem_unittests as tests

source_path = 'data/small_vocab_en'
target_path = 'data/small_vocab_fr'
source_text = helper.load_data(source_path)
target_text = helper.load_data(target_path)

## Exploring the Data
Played around with view_sentence_range to view different parts of the data.

In [2]:
view_sentence_range = (0, 10)

import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in source_text.split()})))

sentences = source_text.split('\n')
word_counts = [len(sentence.split()) for sentence in sentences]
print('Number of sentences: {}'.format(len(sentences)))
print('Average number of words in a sentence: {}'.format(np.average(word_counts)))

print()
print('English sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(source_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))
print()
print('French sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(target_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))

Dataset Stats
Roughly the number of unique words: 227
Number of sentences: 137861
Average number of words in a sentence: 13.225277634719028

English sentences 0 to 10:
new jersey is sometimes quiet during autumn , and it is snowy in april .
the united states is usually chilly during july , and it is usually freezing in november .
california is usually quiet during march , and it is usually hot in june .
the united states is sometimes mild during june , and it is cold in september .
your least liked fruit is the grape , but my least liked is the apple .
his favorite fruit is the orange , but my favorite is the grape .
paris is relaxing during december , but it is usually chilly in july .
new jersey is busy during spring , and it is never hot in march .
our least liked fruit is the lemon , but my least liked is the grape .
the united states is sometimes busy during january , and it is sometimes warm in november .

French sentences 0 to 10:
new jersey est parfois calme pendant l' automne 

## Implementing Preprocessing Function
### Text to Word Ids
First I need to turn the text into a number so the computer can understand it. In the function `text_to_ids()`, I'll convert `source_text` and `target_text` from words to ids.  I've also added the `<EOS>` word id at the end of `target_text`.  This will help the neural network predict when the sentence should end.



In [3]:
def text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int):
    
    # TODO: Implement Function
    
    source_id_text = [[source_vocab_to_int[word] for word in sentence.split()] for sentence in source_text.split('\n')]
        
    target_id_text = [[target_vocab_to_int[word] for word in sentence.split()] + [target_vocab_to_int['<EOS>']]
              for sentence in target_text.split('\n')]

    return source_id_text, target_id_text

tests.test_text_to_ids(text_to_ids)

Tests Passed


### Preprocessed all the data and now saving it~~~

In [4]:
helper.preprocess_and_save_data(source_path, target_path, text_to_ids)

# 1st checkpoint


In [5]:
import numpy as np
import helper

(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = helper.load_preprocess()

In [6]:
from distutils.version import LooseVersion
import warnings
import tensorflow as tf
from tensorflow.python.layers.core import Dense

# Checking TensorFlow Version
assert LooseVersion(tf.__version__) >= LooseVersion('1.1'), 'Please use TensorFlow version 1.1 or newer'
print('TensorFlow Version: {}'.format(tf.__version__))

# Checking for a GPU
if not tf.test.gpu_device_name():
    warnings.warn('No GPU found. Please use a GPU to train your neural network.')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.1.0
Default GPU Device: /gpu:0


## Building the Neural Network
I've build the components necessary to build a Sequence-to-Sequence model by implementing the following functions below:
- `model_inputs`
- `process_decoder_input`
- `encoding_layer`
- `decoding_layer_train`
- `decoding_layer_infer`
- `decoding_layer`
- `seq2seq_model`

### Input
The `model_inputs()` function is used to create the following TF Placeholders for the Neural Network:

- Input text placeholder named "input" using the TF Placeholder name parameter with rank 2.
- Targets placeholder with rank 2.
- Learning rate placeholder with rank 0.
- Keep probability placeholder named "keep_prob" using the TF Placeholder name parameter with rank 0.
- Target sequence length placeholder named "target_sequence_length" with rank 1
- Max target sequence length tensor named "max_target_len" getting its value from applying tf.reduce_max on the target_sequence_length placeholder. Rank 0.
- Source sequence length placeholder named "source_sequence_length" with rank 1

The placeholders are returned in the following the tuple (input, targets, learning rate, keep probability, target sequence length, max target sequence length, source sequence length)

In [7]:
def model_inputs():
 
    # TODO: Implement Function
    Input = tf.placeholder(tf.int32, [None, None], name= 'input')
    target = tf.placeholder(tf.int32, [None, None], name= 'target')
    lr = tf.placeholder(tf.float32,name='lr')
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    target_sequence_length = tf.placeholder(tf.int32,[None,],name ='target_sequence_length')
    max_target_len = tf.reduce_max(target_sequence_length)
    source_sequence_length = tf.placeholder(tf.int32,[None,],name ='source_sequence_length')
    
    
    
    return (Input, target, lr, keep_prob, target_sequence_length,max_target_len, source_sequence_length)


tests.test_model_inputs(model_inputs)

Tests Passed


### Process Decoder Input
The function `process_decoder_input` is implemented below by removing the last word id from each batch in `target_data` and concating the GO ID to the begining of each batch.

In [8]:
def process_decoder_input(target_data, target_vocab_to_int, batch_size):
    
    # TODO: Implement Function
    ending = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1])
    dec_input = tf.concat([tf.fill([batch_size, 1], target_vocab_to_int['<GO>']), ending], 1)
    return dec_input
    


tests.test_process_encoding_input(process_decoder_input)

Tests Passed


### Encoding
Below are the "To-Dos" for the  `encoding_layer()` function to create a Encoder RNN layer:
 * Embed the encoder input using [`tf.contrib.layers.embed_sequence`](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence)
 * Construct a [stacked](https://github.com/tensorflow/tensorflow/blob/6947f65a374ebf29e74bb71e36fd82760056d82c/tensorflow/docs_src/tutorials/recurrent.md#stacking-multiple-lstms) [`tf.contrib.rnn.LSTMCell`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LSTMCell) wrapped in a [`tf.contrib.rnn.DropoutWrapper`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/DropoutWrapper)
 * Pass cell and embedded input to [`tf.nn.dynamic_rnn()`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn)

In [10]:
from imp import reload
reload(tests)

def encoding_layer(rnn_inputs, rnn_size, num_layers,
                   keep_prob, source_sequence_length,
                   source_vocab_size,encoding_embedding_size): 
                                   
  
    enc_embed_input = tf.contrib.layers.embed_sequence(rnn_inputs, source_vocab_size, encoding_embedding_size)


    def make_cell(rnn_size):
        enc_cell = tf.contrib.rnn.LSTMCell(rnn_size,initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
                                    
        return enc_cell

    enc_cell = tf.contrib.rnn.MultiRNNCell([make_cell(rnn_size) for _ in range(num_layers)])
    
    enc_output, enc_state = tf.nn.dynamic_rnn(enc_cell, enc_embed_input, sequence_length=source_sequence_length, dtype=tf.float32)
    
    return enc_output, enc_state

tests.test_encoding_layer(encoding_layer)

Tests Passed


### Decoding - Training
To-Dos for creating a training decoding layer:
* Create a [`tf.contrib.seq2seq.TrainingHelper`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/TrainingHelper) 
* Create a [`tf.contrib.seq2seq.BasicDecoder`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/BasicDecoder)
* Obtain the decoder outputs from [`tf.contrib.seq2seq.dynamic_decode`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/dynamic_decode)

In [14]:

def decoding_layer_train(encoder_state, dec_cell, dec_embed_input, 
                         target_sequence_length, max_summary_length, 
                         output_layer, keep_prob):
   
    # TODO: Just Implement Function
    training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=dec_embed_input,
                                                            sequence_length=target_sequence_length,
                                                            time_major=False)
 
    training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, training_helper, encoder_state, output_layer)
    
    BasicDecoderOutput,_ = tf.contrib.seq2seq.dynamic_decode(training_decoder, impute_finished=True, maximum_iterations=max_summary_length)
    
    return BasicDecoderOutput


tests.test_decoding_layer_train(decoding_layer_train)

Tests Passed


### Decoding - Inference
To-Dos for Creating inference decoder:
* Create a [`tf.contrib.seq2seq.GreedyEmbeddingHelper`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/GreedyEmbeddingHelper)
* Create a [`tf.contrib.seq2seq.BasicDecoder`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/BasicDecoder)
* Obtain the decoder outputs from [`tf.contrib.seq2seq.dynamic_decode`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/dynamic_decode)

In [15]:
def decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id,
                         end_of_sequence_id, max_target_sequence_length,
                         vocab_size, output_layer, batch_size, keep_prob):
    
    # TODO: Implement Function
    inference_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(dec_embeddings, 
                                                                start_tokens=tf.tile([start_of_sequence_id], [batch_size]),
                                                                end_token=end_of_sequence_id)

    
    
    inference_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,
                                                       inference_helper,
                                                       encoder_state,
                                                       output_layer)
    
    
       
    
    inference_decoder_output, _ = tf.contrib.seq2seq.dynamic_decode(decoder=inference_decoder,
                                                                impute_finished=True,
                                                                maximum_iterations=max_target_sequence_length)


    
    return inference_decoder_output
    
    



tests.test_decoding_layer_infer(decoding_layer_infer)

Tests Passed


### Building the Decoding Layer
Next `decoding_layer()` function is implemented to create a Decoder RNN layer.
The To-Dos~~ LOL
* Embed the target sequences
* Construct the decoder LSTM cell (just like we constructed the encoder cell above)
* Create an output layer to map the outputs of the decoder to the elements of our vocabulary
* Use the `decoding_layer_train(encoder_state, dec_cell, dec_embed_input, target_sequence_length, max_target_sequence_length, output_layer, keep_prob)` function to get the training logits.
* Use the `decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id, end_of_sequence_id, max_target_sequence_length, vocab_size, output_layer, batch_size, keep_prob)` function to get the inference logits.

Note: I've used [tf.variable_scope](https://www.tensorflow.org/api_docs/python/tf/variable_scope) to share variables between training and inference.

In [36]:
def decoding_layer(dec_input, encoder_state,
                   target_sequence_length, max_target_sequence_length,
                   rnn_size,
                   num_layers, target_vocab_to_int, target_vocab_size,
                   batch_size, keep_prob, decoding_embedding_size):
    
    # TODO: Implement Function
    def build_cell(rnn_size, keep_prob):
        lstm = tf.contrib.rnn.LSTMCell(rnn_size)
        lstm_drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
        return lstm_drop
    # Stack them all
    stacked_lstm = tf.contrib.rnn.MultiRNNCell([build_cell(rnn_size, keep_prob) for _ in range(num_layers)])
    
    dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
    dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)

    dense_layer = Dense(target_vocab_size,
                         kernel_initializer = tf.truncated_normal_initializer(mean = 0.0, stddev=0.1))
    
    with tf.variable_scope("decode") as scope:
        tr_decoder_output = decoding_layer_train(
            encoder_state, stacked_lstm, dec_embed_input, 
            target_sequence_length, max_target_sequence_length, 
            dense_layer, keep_prob)
        scope.reuse_variables()
        inf_decoder_output = decoding_layer_infer(
            encoder_state, stacked_lstm, dec_embeddings, 
            target_vocab_to_int['<GO>'], target_vocab_to_int['<EOS>'], 
            max_target_sequence_length, target_vocab_size, 
            dense_layer, batch_size, keep_prob)
    
    return tr_decoder_output, inf_decoder_output
   
    


tests.test_decoding_layer(decoding_layer)

Tests Passed


### Building the Neural Network
Applying the above functions to~~

- Apply embedding to the input data for the encoder.
- Encode the input using the `encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob,  source_sequence_length, source_vocab_size, encoding_embedding_size)`.
- Process target data using the `process_decoder_input(target_data, target_vocab_to_int, batch_size)` function.
- Apply embedding to the target data for the decoder.
- Decode the encoded input using the `decoding_layer(dec_input, enc_state, target_sequence_length, max_target_sentence_length, rnn_size, num_layers, target_vocab_to_int, target_vocab_size, batch_size, keep_prob, dec_embedding_size)` function.

In [17]:
def seq2seq_model(input_data, target_data, keep_prob, batch_size,
                  source_sequence_length, target_sequence_length,
                  max_target_sentence_length,
                  source_vocab_size, target_vocab_size,
                  enc_embedding_size, dec_embedding_size,
                  rnn_size, num_layers, target_vocab_to_int):
   
    # TODO: Implement Function
    _,enc_state = encoding_layer(input_data, rnn_size,
                                     num_layers, keep_prob,
                                     source_sequence_length,
                                     source_vocab_size, enc_embedding_size)
    
    
    # Prepare the target sequences we'll feed to the decoder in training mode
    dec_input = process_decoder_input(target_data, target_vocab_to_int, batch_size)
    
    # Pass encoder state and decoder inputs to the decoders
    training_decoder_output, inference_decoder_output = decoding_layer(dec_input, enc_state, 
                                                                       target_sequence_length, 
                                                                       max_target_sentence_length, rnn_size, 
                                                                       num_layers, target_vocab_to_int, 
                                                                       target_vocab_size, batch_size, keep_prob,
                                                                       dec_embedding_size)
    
    return (training_decoder_output, inference_decoder_output)
    



tests.test_seq2seq_model(seq2seq_model)

Tests Passed


## Neural Network Training
### Hyperparameters
Tuning the parameters~~


In [18]:
# Number of Epochs
epochs = 4
# Batch Size
batch_size =128
# RNN Size
rnn_size = 256
# Number of Layers
num_layers = 2
# Embedding Size
encoding_embedding_size = 150
decoding_embedding_size = 150
# Learning Rate
learning_rate = 0.001
# Dropout Keep Probability
keep_probability = 0.8
display_step = 10   #to state how many steps between each debug output statement

In [19]:
save_path = 'checkpoints/dev'
(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = helper.load_preprocess()
max_target_sentence_length = max([len(sentence) for sentence in source_int_text])

train_graph = tf.Graph()
with train_graph.as_default():
    input_data, targets, lr, keep_prob, target_sequence_length, max_target_sequence_length, source_sequence_length = model_inputs()

    #sequence_length = tf.placeholder_with_default(max_target_sentence_length, None, name='sequence_length')
    input_shape = tf.shape(input_data)

    train_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                   targets,
                                                   keep_prob,
                                                   batch_size,
                                                   source_sequence_length,
                                                   target_sequence_length,
                                                   max_target_sequence_length,
                                                   len(source_vocab_to_int),
                                                   len(target_vocab_to_int),
                                                   encoding_embedding_size,
                                                   decoding_embedding_size,
                                                   rnn_size,
                                                   num_layers,
                                                   target_vocab_to_int)


    training_logits = tf.identity(train_logits.rnn_output, name='logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')

    masks = tf.sequence_mask(target_sequence_length, max_target_sequence_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Loss function
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizer
        optimizer = tf.train.AdamOptimizer(lr)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)


### Batch and pad the source and target sequences

In [20]:

def pad_sentence_batch(sentence_batch, pad_int):
    """Pad sentences with <PAD> so that each sentence of a batch has the same length"""
    max_sentence = max([len(sentence) for sentence in sentence_batch])
    return [sentence + [pad_int] * (max_sentence - len(sentence)) for sentence in sentence_batch]


def get_batches(sources, targets, batch_size, source_pad_int, target_pad_int):
    """Batch targets, sources, and the lengths of their sentences together"""
    for batch_i in range(0, len(sources)//batch_size):
        start_i = batch_i * batch_size

        # Slice the right amount for the batch
        sources_batch = sources[start_i:start_i + batch_size]
        targets_batch = targets[start_i:start_i + batch_size]

        # Pad
        pad_sources_batch = np.array(pad_sentence_batch(sources_batch, source_pad_int))
        pad_targets_batch = np.array(pad_sentence_batch(targets_batch, target_pad_int))

        # Need the lengths for the _lengths parameters
        pad_targets_lengths = []
        for target in pad_targets_batch:
            pad_targets_lengths.append(len(target))

        pad_source_lengths = []
        for source in pad_sources_batch:
            pad_source_lengths.append(len(source))

        yield pad_sources_batch, pad_targets_batch, pad_source_lengths, pad_targets_lengths


### Training
Training the neural network below on the preprocessed data.

In [21]:
def get_accuracy(target, logits):
    """
    Calculate accuracy
    """
    max_seq = max(target.shape[1], logits.shape[1])
    if max_seq - target.shape[1]:
        target = np.pad(
            target,
            [(0,0),(0,max_seq - target.shape[1])],
            'constant')
    if max_seq - logits.shape[1]:
        logits = np.pad(
            logits,
            [(0,0),(0,max_seq - logits.shape[1])],
            'constant')

    return np.mean(np.equal(target, logits))

# Split data to training and validation sets
train_source = source_int_text[batch_size:]
train_target = target_int_text[batch_size:]
valid_source = source_int_text[:batch_size]
valid_target = target_int_text[:batch_size]
(valid_sources_batch, valid_targets_batch, valid_sources_lengths, valid_targets_lengths ) = next(get_batches(valid_source,
                                                                                                             valid_target,
                                                                                                             batch_size,
                                                                                                             source_vocab_to_int['<PAD>'],
                                                                                                             target_vocab_to_int['<PAD>']))                                                                                                  
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(epochs):
        for batch_i, (source_batch, target_batch, sources_lengths, targets_lengths) in enumerate(
                get_batches(train_source, train_target, batch_size,
                            source_vocab_to_int['<PAD>'],
                            target_vocab_to_int['<PAD>'])):

            _, loss = sess.run(
                [train_op, cost],
                {input_data: source_batch,
                 targets: target_batch,
                 lr: learning_rate,
                 target_sequence_length: targets_lengths,
                 source_sequence_length: sources_lengths,
                 keep_prob: keep_probability})


            if batch_i % display_step == 0 and batch_i > 0:


                batch_train_logits = sess.run(
                    inference_logits,
                    {input_data: source_batch,
                     source_sequence_length: sources_lengths,
                     target_sequence_length: targets_lengths,
                     keep_prob: 1.0})


                batch_valid_logits = sess.run(
                    inference_logits,
                    {input_data: valid_sources_batch,
                     source_sequence_length: valid_sources_lengths,
                     target_sequence_length: valid_targets_lengths,
                     keep_prob: 1.0})

                train_acc = get_accuracy(target_batch, batch_train_logits)

                valid_acc = get_accuracy(valid_targets_batch, batch_valid_logits)

                print('Epoch {:>3} Batch {:>4}/{} - Train Accuracy: {:>6.4f}, Validation Accuracy: {:>6.4f}, Loss: {:>6.4f}'
                      .format(epoch_i, batch_i, len(source_int_text) // batch_size, train_acc, valid_acc, loss))

    # Save Model
    saver = tf.train.Saver()
    saver.save(sess, save_path)
    print('Model Trained and Saved')

Epoch   0 Batch   10/1077 - Train Accuracy: 0.2858, Validation Accuracy: 0.3803, Loss: 3.6743
Epoch   0 Batch   20/1077 - Train Accuracy: 0.3852, Validation Accuracy: 0.4482, Loss: 2.8504
Epoch   0 Batch   30/1077 - Train Accuracy: 0.3910, Validation Accuracy: 0.4627, Loss: 2.7234
Epoch   0 Batch   40/1077 - Train Accuracy: 0.4223, Validation Accuracy: 0.4883, Loss: 2.5103
Epoch   0 Batch   50/1077 - Train Accuracy: 0.4219, Validation Accuracy: 0.4950, Loss: 2.4129
Epoch   0 Batch   60/1077 - Train Accuracy: 0.4721, Validation Accuracy: 0.5160, Loss: 2.1428
Epoch   0 Batch   70/1077 - Train Accuracy: 0.4149, Validation Accuracy: 0.4794, Loss: 2.1404
Epoch   0 Batch   80/1077 - Train Accuracy: 0.4434, Validation Accuracy: 0.5053, Loss: 1.9330
Epoch   0 Batch   90/1077 - Train Accuracy: 0.4609, Validation Accuracy: 0.5188, Loss: 1.8752
Epoch   0 Batch  100/1077 - Train Accuracy: 0.4770, Validation Accuracy: 0.5192, Loss: 1.7621
Epoch   0 Batch  110/1077 - Train Accuracy: 0.4762, Validati

Epoch   0 Batch  890/1077 - Train Accuracy: 0.8862, Validation Accuracy: 0.8335, Loss: 0.1768
Epoch   0 Batch  900/1077 - Train Accuracy: 0.8797, Validation Accuracy: 0.8452, Loss: 0.1913
Epoch   0 Batch  910/1077 - Train Accuracy: 0.8508, Validation Accuracy: 0.8462, Loss: 0.1723
Epoch   0 Batch  920/1077 - Train Accuracy: 0.8562, Validation Accuracy: 0.8608, Loss: 0.1724
Epoch   0 Batch  930/1077 - Train Accuracy: 0.8805, Validation Accuracy: 0.8352, Loss: 0.1559
Epoch   0 Batch  940/1077 - Train Accuracy: 0.9031, Validation Accuracy: 0.8370, Loss: 0.1486
Epoch   0 Batch  950/1077 - Train Accuracy: 0.8836, Validation Accuracy: 0.8693, Loss: 0.1367
Epoch   0 Batch  960/1077 - Train Accuracy: 0.8694, Validation Accuracy: 0.8469, Loss: 0.1458
Epoch   0 Batch  970/1077 - Train Accuracy: 0.9156, Validation Accuracy: 0.8697, Loss: 0.1571
Epoch   0 Batch  980/1077 - Train Accuracy: 0.8828, Validation Accuracy: 0.8778, Loss: 0.1477
Epoch   0 Batch  990/1077 - Train Accuracy: 0.8803, Validati

Epoch   1 Batch  700/1077 - Train Accuracy: 0.9594, Validation Accuracy: 0.9183, Loss: 0.0342
Epoch   1 Batch  710/1077 - Train Accuracy: 0.9344, Validation Accuracy: 0.9339, Loss: 0.0363
Epoch   1 Batch  720/1077 - Train Accuracy: 0.9198, Validation Accuracy: 0.9272, Loss: 0.0511
Epoch   1 Batch  730/1077 - Train Accuracy: 0.9437, Validation Accuracy: 0.9190, Loss: 0.0576
Epoch   1 Batch  740/1077 - Train Accuracy: 0.9391, Validation Accuracy: 0.9396, Loss: 0.0393
Epoch   1 Batch  750/1077 - Train Accuracy: 0.9293, Validation Accuracy: 0.9272, Loss: 0.0430
Epoch   1 Batch  760/1077 - Train Accuracy: 0.9414, Validation Accuracy: 0.9350, Loss: 0.0499
Epoch   1 Batch  770/1077 - Train Accuracy: 0.9267, Validation Accuracy: 0.9318, Loss: 0.0434
Epoch   1 Batch  780/1077 - Train Accuracy: 0.9184, Validation Accuracy: 0.9162, Loss: 0.0577
Epoch   1 Batch  790/1077 - Train Accuracy: 0.8773, Validation Accuracy: 0.9435, Loss: 0.0538
Epoch   1 Batch  800/1077 - Train Accuracy: 0.9184, Validati

Epoch   2 Batch  510/1077 - Train Accuracy: 0.9480, Validation Accuracy: 0.9574, Loss: 0.0290
Epoch   2 Batch  520/1077 - Train Accuracy: 0.9810, Validation Accuracy: 0.9666, Loss: 0.0195
Epoch   2 Batch  530/1077 - Train Accuracy: 0.9520, Validation Accuracy: 0.9695, Loss: 0.0325
Epoch   2 Batch  540/1077 - Train Accuracy: 0.9785, Validation Accuracy: 0.9577, Loss: 0.0224
Epoch   2 Batch  550/1077 - Train Accuracy: 0.9383, Validation Accuracy: 0.9542, Loss: 0.0307
Epoch   2 Batch  560/1077 - Train Accuracy: 0.9602, Validation Accuracy: 0.9652, Loss: 0.0251
Epoch   2 Batch  570/1077 - Train Accuracy: 0.9544, Validation Accuracy: 0.9585, Loss: 0.0337
Epoch   2 Batch  580/1077 - Train Accuracy: 0.9680, Validation Accuracy: 0.9567, Loss: 0.0196
Epoch   2 Batch  590/1077 - Train Accuracy: 0.9581, Validation Accuracy: 0.9563, Loss: 0.0269
Epoch   2 Batch  600/1077 - Train Accuracy: 0.9673, Validation Accuracy: 0.9751, Loss: 0.0275
Epoch   2 Batch  610/1077 - Train Accuracy: 0.9679, Validati

Epoch   3 Batch  320/1077 - Train Accuracy: 0.9746, Validation Accuracy: 0.9538, Loss: 0.0263
Epoch   3 Batch  330/1077 - Train Accuracy: 0.9688, Validation Accuracy: 0.9602, Loss: 0.0198
Epoch   3 Batch  340/1077 - Train Accuracy: 0.9889, Validation Accuracy: 0.9691, Loss: 0.0209
Epoch   3 Batch  350/1077 - Train Accuracy: 0.9633, Validation Accuracy: 0.9602, Loss: 0.0165
Epoch   3 Batch  360/1077 - Train Accuracy: 0.9805, Validation Accuracy: 0.9680, Loss: 0.0125
Epoch   3 Batch  370/1077 - Train Accuracy: 0.9717, Validation Accuracy: 0.9521, Loss: 0.0183
Epoch   3 Batch  380/1077 - Train Accuracy: 0.9664, Validation Accuracy: 0.9670, Loss: 0.0138
Epoch   3 Batch  390/1077 - Train Accuracy: 0.9863, Validation Accuracy: 0.9723, Loss: 0.0228
Epoch   3 Batch  400/1077 - Train Accuracy: 0.9727, Validation Accuracy: 0.9716, Loss: 0.0259
Epoch   3 Batch  410/1077 - Train Accuracy: 0.9391, Validation Accuracy: 0.9670, Loss: 0.0335
Epoch   3 Batch  420/1077 - Train Accuracy: 0.9910, Validati

In [22]:
# Save parameters for checkpoint
helper.save_params(save_path)

# 2nd Checkpoint

In [23]:
import tensorflow as tf
import numpy as np
import helper
import problem_unittests as tests

_, (source_vocab_to_int, target_vocab_to_int), (source_int_to_vocab, target_int_to_vocab) = helper.load_preprocess()
load_path = helper.load_params()

## Sentence to Sequence
Implemented the function `sentence_to_seq()` to preprocess new sentences.
To-dos :p

- Convert the sentence to lowercase
- Convert words into ids using `vocab_to_int`
 - Convert words not in the vocabulary, to the `<UNK>` word id.

In [24]:
def sentence_to_seq(sentence, vocab_to_int):
  
    # TODO: Implement Function
    sentence_int = list()
    
    for word in sentence.lower().split(' '):
        try:
            word_int = vocab_to_int[word]
        except:
            word_int = vocab_to_int['<UNK>']
        sentence_int.append(word_int)

    return sentence_int


tests.test_sentence_to_seq(sentence_to_seq)

Tests Passed


## Translation...yosh!
This will translate `translate_sentence` from English to French.

In [25]:
translate_sentence = 'he saw a old yellow truck .'

translate_sentence = sentence_to_seq(translate_sentence, source_vocab_to_int)

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(load_path + '.meta')
    loader.restore(sess, load_path)

    input_data = loaded_graph.get_tensor_by_name('input:0')
    logits = loaded_graph.get_tensor_by_name('predictions:0')
    target_sequence_length = loaded_graph.get_tensor_by_name('target_sequence_length:0')
    source_sequence_length = loaded_graph.get_tensor_by_name('source_sequence_length:0')
    keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')

    translate_logits = sess.run(logits, {input_data: [translate_sentence]*batch_size,
                                         target_sequence_length: [len(translate_sentence)*2]*batch_size,
                                         source_sequence_length: [len(translate_sentence)]*batch_size,
                                         keep_prob: 1.0})[0]

print('Input')
print('  Word Ids:      {}'.format([i for i in translate_sentence]))
print('  English Words: {}'.format([source_int_to_vocab[i] for i in translate_sentence]))

print('\nPrediction')
print('  Word Ids:      {}'.format([i for i in translate_logits]))
print('  French Words: {}'.format(" ".join([target_int_to_vocab[i] for i in translate_logits])))


INFO:tensorflow:Restoring parameters from checkpoints/dev
Input
  Word Ids:      [209, 204, 124, 85, 34, 213, 17]
  English Words: ['he', 'saw', 'a', 'old', 'yellow', 'truck', '.']

Prediction
  Word Ids:      [80, 49, 69, 183, 141, 312, 284, 190, 1]
  French Words: il a vu un vieux camion jaune . <EOS>


## Imperfect Translation
I noticed that some sentences translated better than others.  The reason was since the dataset I've used only has a vocabulary of 227 English words of the thousands that we use, I'll only see good results using these words.  For this project, I didn't need a perfect translation but a learning experience. However, if we want to create a better translation model, we'll need better data and a better GPU and days of training-phewww.

