# Recurrent Neural Network
### Family of Neural Networks for Processing Sequencial Data

In [92]:
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse

from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force
··········
Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force
Please enter the verification code: Access token retrieved correctly.


In [96]:
!mkdir -p colabData
!google-drive-ocamlfuse colabData

fuse: mountpoint is not empty
fuse: if you are sure this is safe, use the 'nonempty' mount option


In [98]:
%%bash
echo "Hello World...!!!" > colabData/Colab Notebooks/Neural_Machine_Tranlation/hello.txt
ls colabData

97-things-every-programmer-should-know-en.epub
Algoma U
Algorithms
Apresentação TCC.pdf
Ariadny Ricci curriculo.docx
artigos-blog
Artigos Ciêntificos (Bibliografias)
Assuntos.odt
Books
Brock U
Bug fixes for OpenTraj Package.odt
Canada Docs
Carta de Apresentação - PT.odt
cartão
Carta Oferta de Trabalho - DaitanGroup - Thalles Santos Silva - Fev18 - Signed.pdf
CBIE_CertParticipation_PT_Fall2013_Part1408.pdf
Chatbot_dataset
ChatBot_Dataset.ods
cnpq.pdf
Colab
Colab Notebooks
Comprovantes Contas AP
Copy of Ariadny Ricci curriculo.docx
Copy of GANTT CHART TEMPLATE.ods
Cryptography Course Materials
Deeplearning
Deliverable
dlnd_language_translation.ipynb
documentos apto
Documents
Documents (7d91cd91)
English-Financial-Letter.odt
Essays
Fedora Drivers
Ficha de emprego.doc
File Manager App's Description.odt
Final Project Proposal.odt
Final_Report.doc
Français
hello.txt
IBM letter.odt
Intro to Machine Learning
Itau.odt
Latex Documents
lesson1.ipynb
Minimum Spanning Tree Algorithm.odt
MLND.ods
ML

In [93]:
# inpout .py files and import as libraries
from google.colab import files
src = list(files.upload().values())[0]
open('helper.py','wb').write(src)
import helper
import problem_unittests as tests

Saving problem_unittests.py to problem_unittests (2).py
Saving helper.py to helper (3).py


In [94]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving small_vocab_fr to small_vocab_fr (2)
Saving small_vocab_en to small_vocab_en (2)
User uploaded file "small_vocab_fr" with length 10135742 bytes
User uploaded file "small_vocab_en" with length 9085267 bytes


Limitation of Vanilla Neural Networks is that their API is too constrained: **they accept a fixed-sized vector as input (e.g. an image) and produce a fixed-sized vectors as output (e.g. probabilities of different classes)**. 

Not only that: These models perform mapping (from input to output) using a fixed amount of computational steps (e.g. the number of layers in the model). 

Recurrent nets allow us to operate over sequences of vectors: Sequences in the input, the output, or in the most general case both.

<img src="images/rnn_archtectures.jpeg"/>

- Each rectangle is a vector and arrows represent functions (e.g. matrix multiply). 

- Input vectors are in red, output vectors are in blue and green vectors hold the RNN's state. 

(1) Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification). 

(2) Sequence output (e.g. image captioning takes an image and outputs a sentence of words). 

(3) Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment). 

(4) Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French). 

(5) Synced sequence input and output (e.g. video classification where we wish to label each frame of the video). 

Notice that there is no pre-specified constraints on the lengths sequences because the recurrent transformation (green) is NOT fixed and can be applied as many times as we like.

**An RNN output is influenced not only by the input you just fed in, but also on the entire history of inputs you’ve fed in in the past.**

Credits: https://karpathy.github.io/2015/05/21/rnn-effectiveness/

### RNNs

The RNN models, may differ in terms of: 
- (a) directionality – unidirectional or bidirectional; 
- (b) depth – single or multi-layer;
- (c) type – often either a vanilla **RNN**, a **Long Short-term Memory (LSTM)**, or a **gated recurrent unit (GRU)**

## Applications

- Machine translation
- Speech recognition
- Text summarization
- Image captioning 
- Video analytics

### Model overview
Training RNNs is very expensive. The main reason is Back Propagation Through Time (BPTT). Basically, computing Gradients for a sequential model cannot be EFFICIENTLY done in parallel. **The gradients depends on each other**.

<img src="https://images.pexels.com/photos/248797/pexels-photo-248797.jpeg?auto=compress&cs=tinysrgb&h=350"/>

# Language Translation
A Sequence to Sequence model trained on a dataset of English and French sentences that can translate new sentences from English to French.
## Get the Data
The dataset is a limited.
    - 227 English words (faster training)

In [0]:
import helper
import problem_unittests as tests
import tensorflow as tf
from tensorflow.python.layers.core import Dense

source_path = 'small_vocab_en'
target_path = 'small_vocab_fr'
source_text = helper.load_data(source_path)
target_text = helper.load_data(target_path)

<img src="images/data_sample.png"/>

## Explore the Data
View different parts of the data.

In [71]:
view_sentence_range = (0, 10)

"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in source_text.split()})))

sentences = source_text.split('\n')
word_counts = [len(sentence.split()) for sentence in sentences]
print('Number of sentences: {}'.format(len(sentences)))
print('Average number of words in a sentence: {}'.format(np.average(word_counts)))

print()
print('English sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(source_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))
print()
print('French sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(target_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))

Dataset Stats
Roughly the number of unique words: 227
Number of sentences: 137861
Average number of words in a sentence: 13.225277634719028

English sentences 0 to 10:
new jersey is sometimes quiet during autumn , and it is snowy in april .
the united states is usually chilly during july , and it is usually freezing in november .
california is usually quiet during march , and it is usually hot in june .
the united states is sometimes mild during june , and it is cold in september .
your least liked fruit is the grape , but my least liked is the apple .
his favorite fruit is the orange , but my favorite is the grape .
paris is relaxing during december , but it is usually chilly in july .
new jersey is busy during spring , and it is never hot in march .
our least liked fruit is the lemon , but my least liked is the grape .
the united states is sometimes busy during january , and it is sometimes warm in november .

French sentences 0 to 10:
new jersey est parfois calme pendant l' automne 

## Implement Preprocessing Function
### Text to Word Ids
We must turn the text into numbers so the computer can understand it. The function `text_to_ids()`, turns `source_text` and `target_text` from words to ids.  However, we also need to add the `<EOS>` word id at the end of `target_text`.  This will help the neural network predict when the sentence should end.

* **`<PAD>`** : During training, we'll need to feed our examples to the network in batches. The inputs in these batches all need to be the same width for the network to do its calculation. Our examples, however, are not of the same length. That's why we'll need to pad shorter inputs to bring them to the same width of the batch

* **`<EOS>`**: This is another necessity of batching as well, but more on the decoder side. It allows us to tell the decoder where a sentence ends, and it allows the decoder to indicate the same thing in its outputs as well.

* **`<UNK>`**: If you're training your model on real data, you'll find you can vastly improve the resource efficiency of your model by ignoring words that don't show up often enough in your vocabulary to warrant consideration. We replace those with `<UNK>`.

* **`<GO>`**: This is the input to the first time step of the decoder to let the decoder know when to start generating output.

## Example

Let's suppose we want to train our model with the following dataset.

- How are you?
- I am fine


In [72]:
def text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int):
    """
    Convert source and target text to proper word ids
    :param source_text: String that contains all the source text.
    :param target_text: String that contains all the target text.
    :param source_vocab_to_int: Dictionary to go from the source words to an id
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :return: A tuple of lists (source_id_text, target_id_text)
    """
    source_id_text = []
    target_id_text = []
    
    source_id_text = [[source_vocab_to_int[word] for word in line.split()] for line in source_text.split('\n')]
    target_id_text = [[target_vocab_to_int[word] for word in line.split()] + [target_vocab_to_int['<EOS>']] for line in target_text.split('\n')]
    
    return source_id_text, target_id_text

tests.test_text_to_ids(text_to_ids)

Tests Passed


In [0]:
# save the dataset after processing
helper.preprocess_and_save_data(source_path, target_path, text_to_ids)

In [0]:
import numpy as np
import helper
import problem_unittests as tests

(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), (source_int_to_vocab, target_int_to_vocab) = helper.load_preprocess()

## Build the Neural Network

<img src="images/encdec.jpg"/>

**Encoder-decoder architecture** – example of a general approach for Neural Machine Translation. 

An encoder converts a source sentence into a "meaning" vector which is passed through a decoder to produce a translation.

### Input

In [0]:
def model_inputs():
    """
    TF Placeholders for input, targets, learning rate, and lengths of source and target sequences.
    :return: Tuple (input, targets, learning rate, keep probability, target sequence length,
    max target sequence length, source sequence length)
    """
    input_ = tf.placeholder(dtype=tf.int32, shape=[None, None], name="input")
    targets = tf.placeholder(dtype=tf.int32, shape=[None, None], name="target_text_placeholder")
    learning_rate = tf.placeholder(dtype=tf.float32, name="learning_rate_placeholder")
    keep_prob = tf.placeholder(dtype=tf.float32, name="keep_prob")
    target_sequence_length = tf.placeholder(dtype=tf.int32, shape=[None], name="target_sequence_length")
    max_target_len = tf.reduce_max(target_sequence_length)
    source_sequence_length = tf.placeholder(dtype=tf.int32, shape=[None], name="source_sequence_length")

    return input_, targets, learning_rate, keep_prob, target_sequence_length, max_target_len, source_sequence_length

### Process Decoder Input
Implement `process_decoder_input` by removing the last word id from each batch in `target_data` and concat the GO ID to the begining of each batch.

In [76]:
def process_decoder_input(target_data, target_vocab_to_int, batch_size):
    """
    Preprocess target data for encoding
    :param target_data: Target Placehoder
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :param batch_size: Batch Size
    :return: Preprocessed target data
    """
    # remove last word id from each batch in target_data
    target_data = tf.strided_slice(target_data, begin=[0,0], end=[batch_size,-1], strides=[1,1])
    
    # create new tensor with the GO ID
    go_ids = tf.fill([batch_size, 1], target_vocab_to_int['<GO>'])
    
    # add the GO ID to the beginning of each batch
    target_data = tf.concat([go_ids, target_data], 1)
    return target_data
    
tests.test_process_encoding_input(process_decoder_input)

Tests Passed


### Encoding

- The Encoder RNN consumes all  the import sequence and does **NOT** make any prediction. 

- Instead, it outputs a vector that encapsulates the **meaning** of the input sequence (the state).

<img src="images/encoder.png"/>

In [77]:
from imp import reload
reload(tests)

def encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob, 
                   source_sequence_length, source_vocab_size, 
                   encoding_embedding_size):
    """
    Create encoding layer
    :param rnn_inputs: Inputs for the RNN
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param keep_prob: Dropout keep probability
    :param source_sequence_length: a list of the lengths of each sequence in the batch
    :param source_vocab_size: vocabulary size of source data
    :param encoding_embedding_size: embedding size of source data
    :return: tuple (RNN output, RNN state)
    """
    # Maps a sequence of symbols to a sequence of embeddings [batch_size, doc_length, embed_dim].
    # Embed the encoder input using tf.contrib.layers.embed_sequence
    embed_sequence = tf.contrib.layers.embed_sequence(ids=rnn_inputs,
                                     vocab_size=source_vocab_size,
                                     embed_dim=encoding_embedding_size) 
    
    def lstm_cell(rnn_size, keep_prob):
        cell = tf.contrib.rnn.LSTMCell(num_units=rnn_size) # Maybe we beed to Use basic lstm cell
        return tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob=keep_prob)
    
    # Construct a stacked tf.contrib.rnn.LSTMCell wrapped in a tf.contrib.rnn.DropoutWrapper
    enc_cell = tf.contrib.rnn.MultiRNNCell([lstm_cell(rnn_size, keep_prob) for _ in range(num_layers)])
    
    # 'enc_output' is a tensor of shape [batch_size, max_time, 256]
    # 'state' is a N-tuple where N is the number of LSTMCells containing a
    # tf.contrib.rnn.LSTMStateTuple for each cell
    enc_output, enc_state = tf.nn.dynamic_rnn(enc_cell, embed_sequence, sequence_length=source_sequence_length, dtype=tf.float32)
    
    return enc_output, enc_state

tests.test_encoding_layer(encoding_layer)

Tests Passed


### Decoding - Training
Create the training decoder layer. 

- The training Decoder waits for the <GO> simgle to start decoding.
- At each time step, it outputs a word prediction based on the state it received.
- However, during training, at each time step, we do **NOT** pass the predictions as inputs to the next cells - **We pass the actual target words instead**.

<img src="images/decoder_train.png"/>

In [78]:
def decoding_layer_train(encoder_state, dec_cell, dec_embed_input, 
                         target_sequence_length, max_summary_length, 
                         output_layer, keep_prob):
    """
    Create a decoding layer for training
    :param encoder_state: Encoder State
    :param dec_cell: Decoder RNN Cell
    :param dec_embed_input: Decoder embedded input
    :param target_sequence_length: The lengths of each sequence in the target batch
    :param max_summary_length: The length of the longest sequence in the batch
    :param output_layer: Function to apply the output layer
    :param keep_prob: Dropout keep probability
    :return: BasicDecoderOutput containing training logits and sample_id
    """

    # A helper for use during training. Only reads inputs. 
    # Returned sample_ids are the argmax of the RNN output logits.
    training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=dec_embed_input, 
                                                        sequence_length=target_sequence_length)
    
    # cell = tf.contrib.rnn.DropoutWrapper(dec_cell, input_keep_prob=keep_prob)
    
    basic_decoder = tf.contrib.seq2seq.BasicDecoder(cell=dec_cell, # An RNNCell instance.
                                                    helper=training_helper, # A Helper instance.
                                                    initial_state=encoder_state, # The initial state of the RNNCell.
                                                    output_layer=output_layer) # Optional layer to apply to the RNN output prior to storing the result or sampling. 
    
    # (final_outputs, final_state, final_sequence_lengths).
    dec_train_logits, _, _ = tf.contrib.seq2seq.dynamic_decode(basic_decoder, 
                                                               maximum_iterations=max_summary_length)
    return dec_train_logits


tests.test_decoding_layer_train(decoding_layer_train)

Tests Passed


### Decoding - Inference

The Decoder for inference time differs in its architecture.

- For evaluation, we do not have the targets. 
- We pass the actual **predictions** as input at the next time steps. 
    - We assume they are good enough at this point.

<img src="images/decoder_inference.png"/>

In [79]:
def decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id,
                         end_of_sequence_id, max_target_sequence_length,
                         vocab_size, output_layer, batch_size, keep_prob):
    """
    Create a decoding layer for inference
    :param encoder_state: Encoder state
    :param dec_cell: Decoder RNN Cell
    :param dec_embeddings: Decoder embeddings
    :param start_of_sequence_id: GO ID
    :param end_of_sequence_id: EOS Id
    :param max_target_sequence_length: Maximum length of target sequences
    :param vocab_size: Size of decoder/target vocabulary
    :param decoding_scope: TenorFlow Variable Scope for decoding
    :param output_layer: Function to apply the output layer
    :param batch_size: Batch size
    :param keep_prob: Dropout keep probability
    :return: BasicDecoderOutput containing inference logits and sample_id
    """
    
    start_tokens = tf.tile(tf.constant([start_of_sequence_id], dtype=tf.int32), [batch_size], name='start_tokens')

    # A helper for use during inference.
    inference_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(
        embedding=dec_embeddings, # A callable that takes a vector tensor of ids (argmax ids), or the params argument for embedding_lookup. The returned tensor will be passed to the decoder input.
        start_tokens=start_tokens, # int32 vector shaped [batch_size], the start tokens.
        end_token=end_of_sequence_id) # int32 scalar, the token that marks end of decoding.
    
    # cell = tf.contrib.rnn.DropoutWrapper(dec_cell, input_keep_prob=keep_prob)

    basic_decoder = tf.contrib.seq2seq.BasicDecoder(cell=dec_cell, # An RNNCell instance.
                                    helper=inference_helper, # A Helper instance.
                                    initial_state=encoder_state, #  The initial state of the RNNCell.
                                    output_layer=output_layer) # Optional layer to apply to the RNN output prior to storing the result or sampling.)
    
    dec_infer_logits, _, _ = tf.contrib.seq2seq.dynamic_decode(basic_decoder, 
                                                               maximum_iterations=max_target_sequence_length)
    return dec_infer_logits
    
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_decoding_layer_infer(decoding_layer_infer)

Tests Passed


### Build the Decoding Layer
Implement `decoding_layer()` to create a Decoder RNN layer.

* Embed the target sequences
* Construct the decoder LSTM cell (just like you constructed the encoder cell above)
* Create an output layer to map the outputs of the decoder to the elements of our vocabulary
* Use the your `decoding_layer_train(encoder_state, dec_cell, dec_embed_input, target_sequence_length, max_target_sequence_length, output_layer, keep_prob)` function to get the training logits.
* Use your `decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id, end_of_sequence_id, max_target_sequence_length, vocab_size, output_layer, batch_size, keep_prob)` function to get the inference logits.

Note: You'll need to use [tf.variable_scope](https://www.tensorflow.org/api_docs/python/tf/variable_scope) to share variables between training and inference.

In [80]:
def decoding_layer(dec_input, encoder_state,
                   target_sequence_length, max_target_sequence_length,
                   rnn_size,
                   num_layers, target_vocab_to_int, target_vocab_size,
                   batch_size, keep_prob, decoding_embedding_size):
    """
    Create decoding layer
    :param dec_input: Decoder input
    :param encoder_state: Encoder state
    :param target_sequence_length: The lengths of each sequence in the target batch
    :param max_target_sequence_length: Maximum length of target sequences
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :param target_vocab_size: Size of target vocabulary
    :param batch_size: The size of the batch
    :param keep_prob: Dropout keep probability
    :param decoding_embedding_size: Decoding embedding size
    :return: Tuple of (Training BasicDecoderOutput, Inference BasicDecoderOutput)
    """
    
    # embed the target sequences
    dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size], -1.0, 1.0))
    dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)

    def lstm_cell(rnn_size, keep_prob):
        cell = tf.contrib.rnn.LSTMCell(num_units=rnn_size, use_peepholes=False)
        return tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob=keep_prob)

    dec_cell = tf.contrib.rnn.MultiRNNCell([lstm_cell(rnn_size, keep_prob) for _ in range(num_layers)])
    
    # 3. Dense layer to translate the decoder's output at each time 
    # step into a choice from the target vocabulary
    output_layer = Dense(target_vocab_size,
                         kernel_initializer = tf.truncated_normal_initializer(mean = 0.0, stddev=0.1))
    
    # We contruct two LSTMs instances (They share parameters) 
    # One is for training, the other for testing
    with tf.variable_scope('decoder', reuse=False):
        train_logits = decoding_layer_train(encoder_state, dec_cell, dec_embed_input, target_sequence_length, 
                             max_target_sequence_length, output_layer, keep_prob)
        
    with tf.variable_scope('decoder', reuse=True):
        val_logits = decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, target_vocab_to_int['<GO>'], 
                             target_vocab_to_int['<EOS>'], max_target_sequence_length, target_vocab_size, 
                             output_layer, batch_size, keep_prob)
    

    return train_logits, val_logits


tests.test_decoding_layer(decoding_layer)

Tests Passed


### Build the Neural Network
Apply the functions you implemented above to:

- Encode the input using your `encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob,  source_sequence_length, source_vocab_size, encoding_embedding_size)`.
- Process target data using your `process_decoder_input(target_data, target_vocab_to_int, batch_size)` function.
- Decode the encoded input using your `decoding_layer(dec_input, enc_state, target_sequence_length, max_target_sentence_length, rnn_size, num_layers, target_vocab_to_int, target_vocab_size, batch_size, keep_prob, dec_embedding_size)` function.

#### 4- Training decoder
Essentially, we'll be creating two decoders which share their parameters. One for training and one for inference.  They differ, in that we feed the target sequences as inputs to the training decoder at each time step to make it more robust.

We can think of the training decoder as looking like this (except that it works with sequences in batches):
<img src="images/sequence-to-sequence-training-decoder.png"/>

#### 5- Inference decoder
The inference decoder is the one we'll use when we deploy our model to the wild.

<img src="images/sequence-to-sequence-inference-decoder.png"/>

In [81]:
def seq2seq_model(input_data, target_data, keep_prob, batch_size,
                  source_sequence_length, target_sequence_length,
                  max_target_sentence_length,
                  source_vocab_size, target_vocab_size,
                  enc_embedding_size, dec_embedding_size,
                  rnn_size, num_layers, target_vocab_to_int):
    """
    Sequence-to-Sequence part of the neural network
    :param input_data: Input placeholder
    :param target_data: Target placeholder
    :param keep_prob: Dropout keep probability placeholder
    :param batch_size: Batch Size
    :param source_sequence_length: Sequence Lengths of source sequences in the batch
    :param target_sequence_length: Sequence Lengths of target sequences in the batch
    :param source_vocab_size: Source vocabulary size
    :param target_vocab_size: Target vocabulary size
    :param enc_embedding_size: Decoder embedding size
    :param dec_embedding_size: Encoder embedding size
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :return: Tuple of (Training BasicDecoderOutput, Inference BasicDecoderOutput)
    """

    enc_output, enc_state = encoding_layer(input_data, rnn_size, num_layers, keep_prob, 
                   source_sequence_length, source_vocab_size, 
                   enc_embedding_size)
    
    # Remove the last word id from each batch in target_data and concat the GO ID to the begining of each batch.
    dec_input = process_decoder_input(target_data, target_vocab_to_int, batch_size)
    
    train_logits, val_logits = decoding_layer(dec_input, enc_state,
                   target_sequence_length, max_target_sentence_length,
                   rnn_size,
                   num_layers, target_vocab_to_int, target_vocab_size,
                   batch_size, keep_prob, dec_embedding_size)
    
    return train_logits, val_logits


tests.test_seq2seq_model(seq2seq_model)

Tests Passed


## Neural Network Training
### Hyperparameters
Tune the following parameters:

- Set `epochs` to the number of epochs.
- Set `batch_size` to the batch size.
- Set `rnn_size` to the size of the RNNs.
- Set `num_layers` to the number of layers.
- Set `encoding_embedding_size` to the size of the embedding for the encoder.
- Set `decoding_embedding_size` to the size of the embedding for the decoder.
- Set `learning_rate` to the learning rate.
- Set `keep_probability` to the Dropout keep probability
- Set `display_step` to state how many steps between each debug output statement

In [0]:
# Number of Epochs
epochs = 10
# Batch Size
batch_size = 512
# RNN Size
rnn_size = 256
# Number of Layers
num_layers = 3
# Embedding Size
encoding_embedding_size = 256
decoding_embedding_size = 256
# Learning Rate
learning_rate = .001
# Dropout Keep Probability
keep_probability = 0.5
display_step = 50

### Build the Graph

In [0]:
save_path = 'checkpoints/dev'
(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = helper.load_preprocess()
max_target_sentence_length = max([len(sentence) for sentence in source_int_text])

train_graph = tf.Graph()
with train_graph.as_default():
    input_data, targets, lr, keep_prob, target_sequence_length, max_target_sequence_length, source_sequence_length = model_inputs()

    input_shape = tf.shape(input_data)

    train_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                   targets,
                                                   keep_prob,
                                                   batch_size,
                                                   source_sequence_length,
                                                   target_sequence_length,
                                                   max_target_sequence_length,
                                                   len(source_vocab_to_int),
                                                   len(target_vocab_to_int),
                                                   encoding_embedding_size,
                                                   decoding_embedding_size,
                                                   rnn_size,
                                                   num_layers,
                                                   target_vocab_to_int)


    training_logits = tf.identity(train_logits.rnn_output, name='logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')

    masks = tf.sequence_mask(target_sequence_length, max_target_sequence_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Loss function
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizer
        optimizer = tf.train.AdamOptimizer(lr)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)


Batch and pad the source and target sequences

In [0]:
def pad_sentence_batch(sentence_batch, pad_int):
    """Pad sentences with <PAD> so that each sentence of a batch has the same length"""
    max_sentence = max([len(sentence) for sentence in sentence_batch])
    return [sentence + [pad_int] * (max_sentence - len(sentence)) for sentence in sentence_batch]


def get_batches(sources, targets, batch_size, source_pad_int, target_pad_int):
    """Batch targets, sources, and the lengths of their sentences together"""
    for batch_i in range(0, len(sources)//batch_size):
        start_i = batch_i * batch_size

        # Slice the right amount for the batch
        sources_batch = sources[start_i:start_i + batch_size]
        targets_batch = targets[start_i:start_i + batch_size]

        # Pad
        pad_sources_batch = np.array(pad_sentence_batch(sources_batch, source_pad_int))
        pad_targets_batch = np.array(pad_sentence_batch(targets_batch, target_pad_int))

        # Need the lengths for the _lengths parameters
        pad_targets_lengths = []
        for target in pad_targets_batch:
            pad_targets_lengths.append(len(target))

        pad_source_lengths = []
        for source in pad_sources_batch:
            pad_source_lengths.append(len(source))

        yield pad_sources_batch, pad_targets_batch, pad_source_lengths, pad_targets_lengths

In [0]:
def word_id_to_text_sentence(sentence_ids, source_int_to_vocab):
    output = []
    for word in sentence_ids:
        output.append(source_int_to_vocab[word])
    return ' '.join(output)

In [0]:
# for batch_i, (source_batch, target_batch, sources_lengths, targets_lengths) in enumerate(
#     get_batches(train_source, train_target, 1,
#                 source_vocab_to_int['<PAD>'],
#                 target_vocab_to_int['<PAD>'])):
    
#     print(word_id_to_text_sentence(source_batch.squeeze(), source_int_to_vocab))
#     print(word_id_to_text_sentence(target_batch.squeeze(), target_int_to_vocab))
#     print("-------")

### Train
Train the neural network on the preprocessed data. If you have a hard time getting a good loss, check the forms to see if anyone is having the same problem.

In [87]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
def get_accuracy(target, logits):
    """
    Calculate accuracy
    """
    max_seq = max(target.shape[1], logits.shape[1])
    if max_seq - target.shape[1]:
        target = np.pad(
            target,
            [(0,0),(0,max_seq - target.shape[1])],
            'constant')
    if max_seq - logits.shape[1]:
        logits = np.pad(
            logits,
            [(0,0),(0,max_seq - logits.shape[1])],
            'constant')

    return np.mean(np.equal(target, logits))

# Split data to training and validation sets
train_source = source_int_text[batch_size:]
train_target = target_int_text[batch_size:]
valid_source = source_int_text[:batch_size]
valid_target = target_int_text[:batch_size]
(valid_sources_batch, valid_targets_batch, valid_sources_lengths, valid_targets_lengths ) = next(get_batches(valid_source,
                                                                                                             valid_target,
                                                                                                             batch_size,
                                                                                                             source_vocab_to_int['<PAD>'],
                                                                                                             target_vocab_to_int['<PAD>']))                                                                                                  
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(epochs):
        for batch_i, (source_batch, target_batch, sources_lengths, targets_lengths) in enumerate(
                get_batches(train_source, train_target, batch_size,
                            source_vocab_to_int['<PAD>'],
                            target_vocab_to_int['<PAD>'])):

            #print(word_id_to_text_sentence(source_batch[0], source_int_to_vocab))
            _, loss = sess.run(
                [train_op, cost],
                {input_data: source_batch,
                 targets: target_batch,
                 lr: learning_rate,
                 target_sequence_length: targets_lengths,
                 source_sequence_length: sources_lengths,
                 keep_prob: keep_probability})


            if batch_i % display_step == 0 and batch_i > 0:


                batch_train_logits = sess.run(
                    inference_logits,
                    {input_data: source_batch,
                     source_sequence_length: sources_lengths,
                     target_sequence_length: targets_lengths,
                     keep_prob: 1.0})


                batch_valid_logits = sess.run(
                    inference_logits,
                    {input_data: valid_sources_batch,
                     source_sequence_length: valid_sources_lengths,
                     target_sequence_length: valid_targets_lengths,
                     keep_prob: 1.0})

                train_acc = get_accuracy(target_batch, batch_train_logits)

                valid_acc = get_accuracy(valid_targets_batch, batch_valid_logits)

                print('Epoch {:>3} Batch {:>4}/{} - Train Accuracy: {:>6.4f}, Validation Accuracy: {:>6.4f}, Loss: {:>6.4f}'
                      .format(epoch_i, batch_i, len(source_int_text) // batch_size, train_acc, valid_acc, loss))

    # Save Model
    saver = tf.train.Saver()
    saver.save(sess, save_path)
    print('Model Trained and Saved')

Epoch   0 Batch   10/538 - Train Accuracy: 0.2209, Validation Accuracy: 0.3157, Loss: 4.7975
Epoch   0 Batch   20/538 - Train Accuracy: 0.2974, Validation Accuracy: 0.3457, Loss: 3.6260
Epoch   0 Batch   30/538 - Train Accuracy: 0.2711, Validation Accuracy: 0.3427, Loss: 3.2826
Epoch   0 Batch   40/538 - Train Accuracy: 0.3622, Validation Accuracy: 0.3555, Loss: 2.8023
Epoch   0 Batch   50/538 - Train Accuracy: 0.3172, Validation Accuracy: 0.3714, Loss: 2.8973
Epoch   0 Batch   60/538 - Train Accuracy: 0.3557, Validation Accuracy: 0.4212, Loss: 2.7979
Epoch   0 Batch   70/538 - Train Accuracy: 0.3841, Validation Accuracy: 0.4318, Loss: 2.5700
Epoch   0 Batch   80/538 - Train Accuracy: 0.3861, Validation Accuracy: 0.4522, Loss: 2.5898
Epoch   0 Batch   90/538 - Train Accuracy: 0.4165, Validation Accuracy: 0.4519, Loss: 2.3765
Epoch   0 Batch  100/538 - Train Accuracy: 0.3861, Validation Accuracy: 0.4478, Loss: 2.3407
Epoch   0 Batch  110/538 - Train Accuracy: 0.3992, Validation Accuracy

Epoch   0 Batch  460/538 - Train Accuracy: 0.5004, Validation Accuracy: 0.5391, Loss: 0.8769
Epoch   0 Batch  470/538 - Train Accuracy: 0.5205, Validation Accuracy: 0.5501, Loss: 0.8712
Epoch   0 Batch  480/538 - Train Accuracy: 0.5320, Validation Accuracy: 0.5529, Loss: 0.8466
Epoch   0 Batch  490/538 - Train Accuracy: 0.5260, Validation Accuracy: 0.5522, Loss: 0.8384
Epoch   0 Batch  500/538 - Train Accuracy: 0.5494, Validation Accuracy: 0.5588, Loss: 0.7857
Epoch   0 Batch  510/538 - Train Accuracy: 0.5536, Validation Accuracy: 0.5621, Loss: 0.8149
Epoch   0 Batch  520/538 - Train Accuracy: 0.5350, Validation Accuracy: 0.5673, Loss: 0.8446
Epoch   0 Batch  530/538 - Train Accuracy: 0.5285, Validation Accuracy: 0.5609, Loss: 0.8589
Epoch   1 Batch   10/538 - Train Accuracy: 0.5146, Validation Accuracy: 0.5719, Loss: 0.8362
Epoch   1 Batch   20/538 - Train Accuracy: 0.5534, Validation Accuracy: 0.5701, Loss: 0.7864
Epoch   1 Batch   30/538 - Train Accuracy: 0.5484, Validation Accuracy

Epoch   1 Batch  380/538 - Train Accuracy: 0.5857, Validation Accuracy: 0.6321, Loss: 0.6212
Epoch   1 Batch  390/538 - Train Accuracy: 0.6352, Validation Accuracy: 0.6268, Loss: 0.6020
Epoch   1 Batch  400/538 - Train Accuracy: 0.6055, Validation Accuracy: 0.6381, Loss: 0.6068
Epoch   1 Batch  410/538 - Train Accuracy: 0.5922, Validation Accuracy: 0.6385, Loss: 0.6164
Epoch   1 Batch  420/538 - Train Accuracy: 0.6158, Validation Accuracy: 0.6394, Loss: 0.6050
Epoch   1 Batch  430/538 - Train Accuracy: 0.6258, Validation Accuracy: 0.6360, Loss: 0.6094
Epoch   1 Batch  440/538 - Train Accuracy: 0.6168, Validation Accuracy: 0.6406, Loss: 0.6285
Epoch   1 Batch  450/538 - Train Accuracy: 0.6349, Validation Accuracy: 0.6399, Loss: 0.6059
Epoch   1 Batch  460/538 - Train Accuracy: 0.6047, Validation Accuracy: 0.6438, Loss: 0.5751
Epoch   1 Batch  470/538 - Train Accuracy: 0.6332, Validation Accuracy: 0.6385, Loss: 0.5786
Epoch   1 Batch  480/538 - Train Accuracy: 0.6306, Validation Accuracy

Epoch   2 Batch  300/538 - Train Accuracy: 0.6957, Validation Accuracy: 0.6873, Loss: 0.4644
Epoch   2 Batch  310/538 - Train Accuracy: 0.6990, Validation Accuracy: 0.6758, Loss: 0.4756
Epoch   2 Batch  320/538 - Train Accuracy: 0.6907, Validation Accuracy: 0.6799, Loss: 0.4540
Epoch   2 Batch  330/538 - Train Accuracy: 0.6884, Validation Accuracy: 0.6758, Loss: 0.4406
Epoch   2 Batch  340/538 - Train Accuracy: 0.6896, Validation Accuracy: 0.6916, Loss: 0.4696
Epoch   2 Batch  350/538 - Train Accuracy: 0.7091, Validation Accuracy: 0.6875, Loss: 0.4525
Epoch   2 Batch  360/538 - Train Accuracy: 0.6877, Validation Accuracy: 0.6886, Loss: 0.4602
Epoch   2 Batch  370/538 - Train Accuracy: 0.6988, Validation Accuracy: 0.6982, Loss: 0.4632
Epoch   2 Batch  380/538 - Train Accuracy: 0.7029, Validation Accuracy: 0.6891, Loss: 0.4426
Epoch   2 Batch  390/538 - Train Accuracy: 0.7171, Validation Accuracy: 0.6951, Loss: 0.4294
Epoch   2 Batch  400/538 - Train Accuracy: 0.7052, Validation Accuracy

Epoch   3 Batch  220/538 - Train Accuracy: 0.7411, Validation Accuracy: 0.7644, Loss: 0.3459
Epoch   3 Batch  230/538 - Train Accuracy: 0.7633, Validation Accuracy: 0.7621, Loss: 0.3556
Epoch   3 Batch  240/538 - Train Accuracy: 0.7750, Validation Accuracy: 0.7628, Loss: 0.3494
Epoch   3 Batch  250/538 - Train Accuracy: 0.7674, Validation Accuracy: 0.7635, Loss: 0.3482
Epoch   3 Batch  260/538 - Train Accuracy: 0.7370, Validation Accuracy: 0.7701, Loss: 0.3412
Epoch   3 Batch  270/538 - Train Accuracy: 0.7473, Validation Accuracy: 0.7717, Loss: 0.3429
Epoch   3 Batch  280/538 - Train Accuracy: 0.7941, Validation Accuracy: 0.7770, Loss: 0.3195
Epoch   3 Batch  290/538 - Train Accuracy: 0.7656, Validation Accuracy: 0.7681, Loss: 0.3319
Epoch   3 Batch  300/538 - Train Accuracy: 0.7567, Validation Accuracy: 0.7782, Loss: 0.3301
Epoch   3 Batch  310/538 - Train Accuracy: 0.8023, Validation Accuracy: 0.7828, Loss: 0.3398
Epoch   3 Batch  320/538 - Train Accuracy: 0.7978, Validation Accuracy

Epoch   4 Batch  140/538 - Train Accuracy: 0.8273, Validation Accuracy: 0.8210, Loss: 0.2565
Epoch   4 Batch  150/538 - Train Accuracy: 0.8344, Validation Accuracy: 0.8223, Loss: 0.2376
Epoch   4 Batch  160/538 - Train Accuracy: 0.8186, Validation Accuracy: 0.8249, Loss: 0.2234
Epoch   4 Batch  170/538 - Train Accuracy: 0.8242, Validation Accuracy: 0.8152, Loss: 0.2380
Epoch   4 Batch  180/538 - Train Accuracy: 0.8514, Validation Accuracy: 0.8283, Loss: 0.2265
Epoch   4 Batch  190/538 - Train Accuracy: 0.8352, Validation Accuracy: 0.8228, Loss: 0.2432
Epoch   4 Batch  200/538 - Train Accuracy: 0.8463, Validation Accuracy: 0.8329, Loss: 0.2231
Epoch   4 Batch  210/538 - Train Accuracy: 0.8140, Validation Accuracy: 0.8377, Loss: 0.2301
Epoch   4 Batch  220/538 - Train Accuracy: 0.8170, Validation Accuracy: 0.8393, Loss: 0.2224
Epoch   4 Batch  230/538 - Train Accuracy: 0.8406, Validation Accuracy: 0.8253, Loss: 0.2243
Epoch   4 Batch  240/538 - Train Accuracy: 0.8207, Validation Accuracy

Epoch   5 Batch   60/538 - Train Accuracy: 0.8861, Validation Accuracy: 0.8713, Loss: 0.1634
Epoch   5 Batch   70/538 - Train Accuracy: 0.8668, Validation Accuracy: 0.8652, Loss: 0.1517
Epoch   5 Batch   80/538 - Train Accuracy: 0.8631, Validation Accuracy: 0.8803, Loss: 0.1681
Epoch   5 Batch   90/538 - Train Accuracy: 0.8499, Validation Accuracy: 0.8686, Loss: 0.1671
Epoch   5 Batch  100/538 - Train Accuracy: 0.8973, Validation Accuracy: 0.8688, Loss: 0.1459
Epoch   5 Batch  110/538 - Train Accuracy: 0.8693, Validation Accuracy: 0.8633, Loss: 0.1510
Epoch   5 Batch  120/538 - Train Accuracy: 0.9127, Validation Accuracy: 0.8757, Loss: 0.1460
Epoch   5 Batch  130/538 - Train Accuracy: 0.8769, Validation Accuracy: 0.8709, Loss: 0.1367
Epoch   5 Batch  140/538 - Train Accuracy: 0.8740, Validation Accuracy: 0.8683, Loss: 0.1653
Epoch   5 Batch  150/538 - Train Accuracy: 0.8932, Validation Accuracy: 0.8754, Loss: 0.1455
Epoch   5 Batch  160/538 - Train Accuracy: 0.8653, Validation Accuracy

Epoch   5 Batch  510/538 - Train Accuracy: 0.8910, Validation Accuracy: 0.9102, Loss: 0.1130
Epoch   5 Batch  520/538 - Train Accuracy: 0.8891, Validation Accuracy: 0.9098, Loss: 0.1155
Epoch   5 Batch  530/538 - Train Accuracy: 0.8873, Validation Accuracy: 0.9052, Loss: 0.1194
Epoch   6 Batch   10/538 - Train Accuracy: 0.8906, Validation Accuracy: 0.9151, Loss: 0.1132
Epoch   6 Batch   20/538 - Train Accuracy: 0.8983, Validation Accuracy: 0.9132, Loss: 0.1126
Epoch   6 Batch   30/538 - Train Accuracy: 0.8914, Validation Accuracy: 0.9153, Loss: 0.1227
Epoch   6 Batch   40/538 - Train Accuracy: 0.9137, Validation Accuracy: 0.9148, Loss: 0.0869
Epoch   6 Batch   50/538 - Train Accuracy: 0.8996, Validation Accuracy: 0.9023, Loss: 0.1085
Epoch   6 Batch   60/538 - Train Accuracy: 0.9006, Validation Accuracy: 0.9098, Loss: 0.1044
Epoch   6 Batch   70/538 - Train Accuracy: 0.8912, Validation Accuracy: 0.9006, Loss: 0.0983
Epoch   6 Batch   80/538 - Train Accuracy: 0.9074, Validation Accuracy

Epoch   6 Batch  430/538 - Train Accuracy: 0.8928, Validation Accuracy: 0.9196, Loss: 0.0858
Epoch   6 Batch  440/538 - Train Accuracy: 0.9070, Validation Accuracy: 0.9176, Loss: 0.0915
Epoch   6 Batch  450/538 - Train Accuracy: 0.9023, Validation Accuracy: 0.9173, Loss: 0.0985
Epoch   6 Batch  460/538 - Train Accuracy: 0.8925, Validation Accuracy: 0.9135, Loss: 0.0896
Epoch   6 Batch  470/538 - Train Accuracy: 0.9189, Validation Accuracy: 0.9171, Loss: 0.0782
Epoch   6 Batch  480/538 - Train Accuracy: 0.9137, Validation Accuracy: 0.9201, Loss: 0.0810
Epoch   6 Batch  490/538 - Train Accuracy: 0.9129, Validation Accuracy: 0.9210, Loss: 0.0774
Epoch   6 Batch  500/538 - Train Accuracy: 0.9386, Validation Accuracy: 0.9219, Loss: 0.0681
Epoch   6 Batch  510/538 - Train Accuracy: 0.9254, Validation Accuracy: 0.9256, Loss: 0.0792
Epoch   6 Batch  520/538 - Train Accuracy: 0.8943, Validation Accuracy: 0.9325, Loss: 0.0799
Epoch   6 Batch  530/538 - Train Accuracy: 0.8930, Validation Accuracy

Epoch   7 Batch  350/538 - Train Accuracy: 0.9315, Validation Accuracy: 0.9217, Loss: 0.0731
Epoch   7 Batch  360/538 - Train Accuracy: 0.9037, Validation Accuracy: 0.9341, Loss: 0.0724
Epoch   7 Batch  370/538 - Train Accuracy: 0.9297, Validation Accuracy: 0.9395, Loss: 0.0689
Epoch   7 Batch  380/538 - Train Accuracy: 0.9059, Validation Accuracy: 0.9389, Loss: 0.0626
Epoch   7 Batch  390/538 - Train Accuracy: 0.9224, Validation Accuracy: 0.9286, Loss: 0.0560
Epoch   7 Batch  400/538 - Train Accuracy: 0.9280, Validation Accuracy: 0.9284, Loss: 0.0671
Epoch   7 Batch  410/538 - Train Accuracy: 0.9221, Validation Accuracy: 0.9341, Loss: 0.0721
Epoch   7 Batch  420/538 - Train Accuracy: 0.9365, Validation Accuracy: 0.9297, Loss: 0.0645
Epoch   7 Batch  430/538 - Train Accuracy: 0.9076, Validation Accuracy: 0.9343, Loss: 0.0636
Epoch   7 Batch  440/538 - Train Accuracy: 0.9223, Validation Accuracy: 0.9277, Loss: 0.0716
Epoch   7 Batch  450/538 - Train Accuracy: 0.9131, Validation Accuracy

Epoch   8 Batch  270/538 - Train Accuracy: 0.9316, Validation Accuracy: 0.9441, Loss: 0.0562
Epoch   8 Batch  280/538 - Train Accuracy: 0.9258, Validation Accuracy: 0.9386, Loss: 0.0524
Epoch   8 Batch  290/538 - Train Accuracy: 0.9406, Validation Accuracy: 0.9336, Loss: 0.0559
Epoch   8 Batch  300/538 - Train Accuracy: 0.9226, Validation Accuracy: 0.9382, Loss: 0.0661
Epoch   8 Batch  310/538 - Train Accuracy: 0.9533, Validation Accuracy: 0.9363, Loss: 0.0568
Epoch   8 Batch  320/538 - Train Accuracy: 0.9338, Validation Accuracy: 0.9419, Loss: 0.0528
Epoch   8 Batch  330/538 - Train Accuracy: 0.9567, Validation Accuracy: 0.9483, Loss: 0.0522
Epoch   8 Batch  340/538 - Train Accuracy: 0.9314, Validation Accuracy: 0.9439, Loss: 0.0608
Epoch   8 Batch  350/538 - Train Accuracy: 0.9384, Validation Accuracy: 0.9442, Loss: 0.0649
Epoch   8 Batch  360/538 - Train Accuracy: 0.9166, Validation Accuracy: 0.9439, Loss: 0.0617
Epoch   8 Batch  370/538 - Train Accuracy: 0.9348, Validation Accuracy

Epoch   9 Batch  190/538 - Train Accuracy: 0.9273, Validation Accuracy: 0.9455, Loss: 0.0694
Epoch   9 Batch  200/538 - Train Accuracy: 0.9357, Validation Accuracy: 0.9501, Loss: 0.0475
Epoch   9 Batch  210/538 - Train Accuracy: 0.9321, Validation Accuracy: 0.9412, Loss: 0.0554
Epoch   9 Batch  220/538 - Train Accuracy: 0.9273, Validation Accuracy: 0.9334, Loss: 0.0534
Epoch   9 Batch  230/538 - Train Accuracy: 0.9293, Validation Accuracy: 0.9450, Loss: 0.0505
Epoch   9 Batch  240/538 - Train Accuracy: 0.9420, Validation Accuracy: 0.9483, Loss: 0.0551
Epoch   9 Batch  250/538 - Train Accuracy: 0.9508, Validation Accuracy: 0.9513, Loss: 0.0484
Epoch   9 Batch  260/538 - Train Accuracy: 0.9068, Validation Accuracy: 0.9506, Loss: 0.0592
Epoch   9 Batch  270/538 - Train Accuracy: 0.9379, Validation Accuracy: 0.9425, Loss: 0.0491
Epoch   9 Batch  280/538 - Train Accuracy: 0.9407, Validation Accuracy: 0.9508, Loss: 0.0474
Epoch   9 Batch  290/538 - Train Accuracy: 0.9506, Validation Accuracy

### Save Parameters
Save the `batch_size` and `save_path` parameters for inference.

In [0]:
# Save parameters for checkpoint
helper.save_params(save_path)

# Checkpoint

In [0]:
import tensorflow as tf
import numpy as np
import helper
import problem_unittests as tests

_, (source_vocab_to_int, target_vocab_to_int), (source_int_to_vocab, target_int_to_vocab) = helper.load_preprocess()
load_path = helper.load_params()

## Sentence to Sequence
To feed a sentence into the model for translation, you first need to preprocess it.  Implement the function `sentence_to_seq()` to preprocess new sentences.

- Convert the sentence to lowercase
- Convert words into ids using `vocab_to_int`
 - Convert words not in the vocabulary, to the `<UNK>` word id.

In [90]:
def sentence_to_seq(sentence, vocab_to_int):
    """
    Convert a sentence to a sequence of ids
    :param sentence: String
    :param vocab_to_int: Dictionary to go from the words to an id
    :return: List of word ids
    """
    # Convert words not in the vocabulary, to the <UNK> word id.
    return [vocab_to_int.get(word, vocab_to_int["<UNK>"]) for word in sentence.lower().split(' ')]

tests.test_sentence_to_seq(sentence_to_seq)

Tests Passed


## Translate
This will translate `translate_sentence` from English to French.

In [91]:
translate_sentence = 'he saw a old yellow truck .'


translate_sentence = sentence_to_seq(translate_sentence, source_vocab_to_int)

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(load_path + '.meta')
    loader.restore(sess, load_path)

    input_data = loaded_graph.get_tensor_by_name('input:0')
    logits = loaded_graph.get_tensor_by_name('predictions:0')
    target_sequence_length = loaded_graph.get_tensor_by_name('target_sequence_length:0')
    source_sequence_length = loaded_graph.get_tensor_by_name('source_sequence_length:0')
    keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')

    translate_logits = sess.run(logits, {input_data: [translate_sentence]*batch_size,
                                         target_sequence_length: [len(translate_sentence)*2]*batch_size,
                                         source_sequence_length: [len(translate_sentence)]*batch_size,
                                         keep_prob: 1.0})[0]

print('Input')
print('  Word Ids:      {}'.format([i for i in translate_sentence]))
print('  English Words: {}'.format([source_int_to_vocab[i] for i in translate_sentence]))

print('\nPrediction')
print('  Word Ids:      {}'.format([i for i in translate_logits]))
print('  French Words: {}'.format(" ".join([target_int_to_vocab[i] for i in translate_logits])))


INFO:tensorflow:Restoring parameters from checkpoints/dev
Input
  Word Ids:      [42, 164, 144, 200, 67, 124, 188]
  English Words: ['he', 'saw', 'a', 'old', 'yellow', 'truck', '.']

Prediction
  Word Ids:      [4, 208, 149, 235, 94, 113, 139, 326, 79, 1]
  French Words: il pense se la france en octobre dernier . <EOS>


## Imperfect Translation
You might notice that some sentences translate better than others.  Since the dataset we are using only has a vocabulary of 227 English words of the thousands that you use, you're only going to see good results using these words.

A more complete dataset: [WMT10 French-English corpus](http://www.statmt.org/wmt10/training-giga-fren.tar).
