# Recurrent Neural Network
### Family of Neural Networks for Processing Sequencial Data

In [1]:
# # inpout .py files and import as libraries
# from google.colab import files
# src = list(files.upload().values())[0]
# open('helper.py','wb').write(src)
# import helper
# import problem_unittests as tests

In [2]:
# from google.colab import files

# uploaded = files.upload()

# for fn in uploaded.keys():
#   print('User uploaded file "{name}" with length {length} bytes'.format(
#       name=fn, length=len(uploaded[fn])))

In [3]:
import helper
import problem_unittests as tests
import tensorflow as tf
from tensorflow.python.layers.core import Dense
source_path = 'data/small_vocab_en'
target_path = 'data/small_vocab_fr'
source_text = helper.load_data(source_path)
target_text = helper.load_data(target_path)

Limitation of Vanilla Neural Networks is that their API is too constrained: **they accept a fixed-sized vector as input (e.g. an image) and produce a fixed-sized vectors as output (e.g. probabilities of different classes)**. 

Not only that: These models perform mapping (from input to output) using a fixed amount of computational steps (e.g. the number of layers in the model). 

Recurrent nets allow us to operate over sequences of vectors: Sequences in the input, the output, or in the most general case both.

<img src="images/rnn_archtectures.jpeg"/>

- Each rectangle is a vector and arrows represent functions (e.g. matrix multiply). 

- Input vectors are in red, output vectors are in blue and green vectors hold the RNN's state. 

(1) Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification). 

(2) Sequence output (e.g. image captioning takes an image and outputs a sentence of words). 

(3) Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment). 

(4) Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French). 

(5) Synced sequence input and output (e.g. video classification where we wish to label each frame of the video). 

Notice that there is no pre-specified constraints on the lengths sequences because the recurrent transformation (green) is NOT fixed and can be applied as many times as we like.

**An RNN output is influenced not only by the input you just fed in, but also on the entire history of inputs you’ve fed in in the past.**

Credits: https://karpathy.github.io/2015/05/21/rnn-effectiveness/

### RNNs

The RNN models, may differ in terms of: 
- (a) directionality – unidirectional or bidirectional; 
- (b) depth – single or multi-layer;
- (c) type – often either a vanilla **RNN**, a **Long Short-term Memory (LSTM)**, or a **gated recurrent unit (GRU)**

## Applications

- Machine translation
- Speech recognition
- Text summarization
- Image captioning 
- Video analytics

### Model overview
Training RNNs is very expensive. The main reason is Back Propagation Through Time (BPTT). Basically, computing Gradients for a sequential model cannot be EFFICIENTLY done in parallel. **The gradients depends on each other**.

<img src="images/seq2seq.jpg"/>

# Language Translation
A Sequence to Sequence model trained on a dataset of English and French sentences that can translate new sentences from English to French.
## Get the Data
The dataset is a limited.
    - 227 English words (faster training)

<img src="images/data_sample.png"/>

## Explore the Data
View different parts of the data.

In [4]:
view_sentence_range = (0, 10)

import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in source_text.split()})))

sentences = source_text.split('\n')
word_counts = [len(sentence.split()) for sentence in sentences]
print('Number of sentences: {}'.format(len(sentences)))
print('Average number of words in a sentence: {}'.format(np.average(word_counts)))

print()
print('English sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(source_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))
print()
print('French sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(target_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))

Dataset Stats
Roughly the number of unique words: 227
Number of sentences: 137861
Average number of words in a sentence: 13.225277634719028

English sentences 0 to 10:
new jersey is sometimes quiet during autumn , and it is snowy in april .
the united states is usually chilly during july , and it is usually freezing in november .
california is usually quiet during march , and it is usually hot in june .
the united states is sometimes mild during june , and it is cold in september .
your least liked fruit is the grape , but my least liked is the apple .
his favorite fruit is the orange , but my favorite is the grape .
paris is relaxing during december , but it is usually chilly in july .
new jersey is busy during spring , and it is never hot in march .
our least liked fruit is the lemon , but my least liked is the grape .
the united states is sometimes busy during january , and it is sometimes warm in november .

French sentences 0 to 10:
new jersey est parfois calme pendant l' automne 

## Implement Preprocessing Function
### Text to Word Ids
We must turn the text into numbers so the computer can understand it. The function `text_to_ids()`, turns `source_text` and `target_text` from words to ids.  However, we also need to add the `<EOS>` word id at the end of `target_text`.  This will help the neural network predict when the sentence should end.

## Example

Let's suppose we want to train our model with the following dataset.

- How are you?
- I am fine


In [5]:
def text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int):
    """
    Convert source and target text to proper word ids
    :param source_text: String that contains all the source text.
    :param target_text: String that contains all the target text.
    :param source_vocab_to_int: Dictionary to go from the source words to an id
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :return: A tuple of lists (source_id_text, target_id_text)
    """
    source_id_text = []
    target_id_text = []
    
    source_id_text = [[source_vocab_to_int[word] for word in line.split()] for line in source_text.split('\n')]
    target_id_text = [[target_vocab_to_int[word] for word in line.split()] + [target_vocab_to_int['<EOS>']] for line in target_text.split('\n')]
    
    return source_id_text, target_id_text

tests.test_text_to_ids(text_to_ids)

Tests Passed


In [6]:
# save the dataset after processing
helper.preprocess_and_save_data(source_path, target_path, text_to_ids)

In [7]:
import numpy as np
import helper
import problem_unittests as tests

(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = helper.load_preprocess()

## Build the Neural Network

<img src="images/encdec.jpg"/>

**Encoder-decoder architecture** – example of a general approach for Neural Machine Translation. 

An encoder converts a source sentence into a "meaning" vector which is passed through a decoder to produce a translation.

### Input

In [8]:
def model_inputs():
    """
    TF Placeholders for input, targets, learning rate, and lengths of source and target sequences.
    :return: Tuple (input, targets, learning rate, keep probability, target sequence length,
    max target sequence length, source sequence length)
    """
    input_ = tf.placeholder(dtype=tf.int32, shape=[None, None], name="input")
    targets = tf.placeholder(dtype=tf.int32, shape=[None, None], name="target_text_placeholder")
    learning_rate = tf.placeholder(dtype=tf.float32, name="learning_rate_placeholder")
    keep_prob = tf.placeholder(dtype=tf.float32, name="keep_prob")
    target_sequence_length = tf.placeholder(dtype=tf.int32, shape=[None], name="target_sequence_length")
    max_target_len = tf.reduce_max(target_sequence_length)
    source_sequence_length = tf.placeholder(dtype=tf.int32, shape=[None], name="source_sequence_length")

    return input_, targets, learning_rate, keep_prob, target_sequence_length, max_target_len, source_sequence_length

### Process Decoder Input
Implement `process_decoder_input` by removing the last word id from each batch in `target_data` and concat the GO ID to the begining of each batch.

Let's suppose we want to train a Machine Translation model to a dataset with the given sequence of question/answer.

<img src="images/example_sentences.png"/>

Before training, we need to convert the text into numbers, so the RNN can understand it.
- Create a vocabulary with the unique words

<img src="images/vocab_example.png"/>

* **`<PAD>`** : During training, we'll need to feed our examples to the network in batches. The inputs in these batches all need to be the same width for the network to do its calculation. Our examples, however, are not of the same length. That's why we'll need to pad shorter inputs to bring them to the same width of the batch

* **`<EOS>`**: This is another necessity of batching as well, but more on the decoder side. It allows us to tell the decoder where a sentence ends, and it allows the decoder to indicate the same thing in its outputs as well.

* **`<UNK>`**: If you're training your model on real data, you'll find you can vastly improve the resource efficiency of your model by ignoring words that don't show up often enough in your vocabulary to warrant consideration. We replace those with `<UNK>`.

* **`<GO>`**: This is the input to the first time step of the decoder to let the decoder know when to start generating output.

Lastly, because we train our model using batches, the sentences (within a batch) need to have the **same size**.

<img src="images/example_encoding.png"/>

Now, we just need to represent each of of the words as an embedding vector.

<img src="images/embedding_vector_representation.png"/>


In [9]:
def process_decoder_input(target_data, target_vocab_to_int, batch_size):
    """
    Preprocess target data for encoding
    :param target_data: Target Placehoder
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :param batch_size: Batch Size
    :return: Preprocessed target data
    """
    # remove last word id from each batch in target_data
    target_data = tf.strided_slice(target_data, begin=[0,0], end=[batch_size,-1], strides=[1,1])
    
    # create new tensor with the GO ID
    go_ids = tf.fill([batch_size, 1], target_vocab_to_int['<GO>'])
    
    # add the GO ID to the beginning of each batch
    target_data = tf.concat([go_ids, target_data], 1)
    return target_data
    
tests.test_process_encoding_input(process_decoder_input)

Tests Passed


### Encoding

- The Encoder RNN consumes all  the import sequence and does **NOT** make any prediction. 

- Instead, it outputs a vector that encapsulates the **meaning** of the input sequence (the state).

<img src="images/encoder.png"/>

### Embeddings




In [10]:
from imp import reload
reload(tests)

def encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob, 
                   source_sequence_length, source_vocab_size, 
                   encoding_embedding_size):
    """
    Create encoding layer
    :param rnn_inputs: Inputs for the RNN
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param keep_prob: Dropout keep probability
    :param source_sequence_length: a list of the lengths of each sequence in the batch
    :param source_vocab_size: vocabulary size of source data
    :param encoding_embedding_size: embedding size of source data
    :return: tuple (RNN output, RNN state)
    """
    # Maps a sequence of symbols to a sequence of embeddings [batch_size, doc_length, embed_dim].
    # Embed the encoder input using tf.contrib.layers.embed_sequence
    embed_sequence = tf.contrib.layers.embed_sequence(ids=rnn_inputs,
                                     vocab_size=source_vocab_size,
                                     embed_dim=encoding_embedding_size) 
    
    def lstm_cell(rnn_size, keep_prob):
        cell = tf.contrib.rnn.LSTMCell(num_units=rnn_size) # Maybe we beed to Use basic lstm cell
        return tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob=keep_prob)
    
    # Construct a stacked tf.contrib.rnn.LSTMCell wrapped in a tf.contrib.rnn.DropoutWrapper
    enc_cell = tf.contrib.rnn.MultiRNNCell([lstm_cell(rnn_size, keep_prob) for _ in range(num_layers)])
    
    # 'enc_output' is a tensor of shape [batch_size, max_time, 256]
    # 'state' is a N-tuple where N is the number of LSTMCells containing a
    # tf.contrib.rnn.LSTMStateTuple for each cell
    enc_output, enc_state = tf.nn.dynamic_rnn(enc_cell, embed_sequence, sequence_length=source_sequence_length, dtype=tf.float32)
    
    return enc_output, enc_state

tests.test_encoding_layer(encoding_layer)

Instructions for updating:
Use the retry module or similar alternatives.
Tests Passed


### Decoding - Training
Create the training decoder layer. 

- The training Decoder waits for the <GO> simgle to start decoding.
- At each time step, it outputs a word prediction based on the state it received.
- However, during training, at each time step, we do **NOT** pass the predictions as inputs to the next cells - **We pass the actual target words instead**.

<img src="images/decoder_train.png"/>

In [11]:

def decoding_layer_train(encoder_state, dec_cell, dec_embed_input, 
                         target_sequence_length, max_summary_length, 
                         output_layer, keep_prob):
    """
    Create a decoding layer for training
    :param encoder_state: Encoder State
    :param dec_cell: Decoder RNN Cell
    :param dec_embed_input: Decoder embedded input
    :param target_sequence_length: The lengths of each sequence in the target batch
    :param max_summary_length: The length of the longest sequence in the batch
    :param output_layer: Function to apply the output layer
    :param keep_prob: Dropout keep probability
    :return: BasicDecoderOutput containing training logits and sample_id
    """
    # A helper for use during training. Only reads inputs. 
    # Returned sample_ids are the argmax of the RNN output logits.
    training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=dec_embed_input, 
                                                        sequence_length=target_sequence_length)
    
    # cell = tf.contrib.rnn.DropoutWrapper(dec_cell, input_keep_prob=keep_prob)
    
    basic_decoder = tf.contrib.seq2seq.BasicDecoder(cell=dec_cell, # An RNNCell instance.
                                                    helper=training_helper, # A Helper instance.
                                                    initial_state=encoder_state, # The initial state of the RNNCell.
                                                    output_layer=output_layer) # Optional layer to apply to the RNN output prior to storing the result or sampling. 
    
    # (final_outputs, final_state, final_sequence_lengths).
    dec_train_logits, _, _ = tf.contrib.seq2seq.dynamic_decode(basic_decoder, 
                                                               maximum_iterations=max_summary_length)
    return dec_train_logits


tests.test_decoding_layer_train(decoding_layer_train)

Tests Passed


### Decoding - Inference

The Decoder for inference time differs in its architecture.

- For evaluation, we do not have the targets. 
- We pass the actual **predictions** as input at the next time steps. 
    - We assume they are good enough at this point.

<img src="images/decoder_inference.png"/>

In [12]:
def decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id,
                         end_of_sequence_id, max_target_sequence_length,
                         vocab_size, output_layer, batch_size, keep_prob):
    """
    Create a decoding layer for inference
    :param encoder_state: Encoder state
    :param dec_cell: Decoder RNN Cell
    :param dec_embeddings: Decoder embeddings
    :param start_of_sequence_id: GO ID
    :param end_of_sequence_id: EOS Id
    :param max_target_sequence_length: Maximum length of target sequences
    :param vocab_size: Size of decoder/target vocabulary
    :param decoding_scope: TenorFlow Variable Scope for decoding
    :param output_layer: Function to apply the output layer
    :param batch_size: Batch size
    :param keep_prob: Dropout keep probability
    :return: BasicDecoderOutput containing inference logits and sample_id
    """
    start_tokens = tf.tile(tf.constant([start_of_sequence_id], dtype=tf.int32), [batch_size], name='start_tokens')

    # A helper for use during inference.
    inference_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(
        embedding=dec_embeddings, # A callable that takes a vector tensor of ids (argmax ids), or the params argument for embedding_lookup. The returned tensor will be passed to the decoder input.
        start_tokens=start_tokens, # int32 vector shaped [batch_size], the start tokens.
        end_token=end_of_sequence_id) # int32 scalar, the token that marks end of decoding.
    
    # cell = tf.contrib.rnn.DropoutWrapper(dec_cell, input_keep_prob=keep_prob)

    basic_decoder = tf.contrib.seq2seq.BasicDecoder(cell=dec_cell, # An RNNCell instance.
                                    helper=inference_helper, # A Helper instance.
                                    initial_state=encoder_state, #  The initial state of the RNNCell.
                                    output_layer=output_layer) # Optional layer to apply to the RNN output prior to storing the result or sampling.)
    
    dec_infer_logits, _, _ = tf.contrib.seq2seq.dynamic_decode(basic_decoder, 
                                                               maximum_iterations=max_target_sequence_length)
    return dec_infer_logits


tests.test_decoding_layer_infer(decoding_layer_infer)

Tests Passed


### Build the Decoding Layer
Implement `decoding_layer()` to create a Decoder RNN layer.

* Embed the target sequences
* Construct the decoder LSTM cell (just like you constructed the encoder cell above)
* Create an output layer to map the outputs of the decoder to the elements of our vocabulary
* Use the your `decoding_layer_train(encoder_state, dec_cell, dec_embed_input, target_sequence_length, max_target_sequence_length, output_layer, keep_prob)` function to get the training logits.
* Use your `decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id, end_of_sequence_id, max_target_sequence_length, vocab_size, output_layer, batch_size, keep_prob)` function to get the inference logits.

Note: You'll need to use [tf.variable_scope](https://www.tensorflow.org/api_docs/python/tf/variable_scope) to share variables between training and inference.

In [13]:
def decoding_layer(dec_input, encoder_state,
                   target_sequence_length, max_target_sequence_length,
                   rnn_size,
                   num_layers, target_vocab_to_int, target_vocab_size,
                   batch_size, keep_prob, decoding_embedding_size):
    """
    Create decoding layer
    :param dec_input: Decoder input
    :param encoder_state: Encoder state
    :param target_sequence_length: The lengths of each sequence in the target batch
    :param max_target_sequence_length: Maximum length of target sequences
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :param target_vocab_size: Size of target vocabulary
    :param batch_size: The size of the batch
    :param keep_prob: Dropout keep probability
    :param decoding_embedding_size: Decoding embedding size
    :return: Tuple of (Training BasicDecoderOutput, Inference BasicDecoderOutput)
    """
    # embed the target sequences
    dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size], -1.0, 1.0))
    dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)

    def lstm_cell(rnn_size, keep_prob):
        cell = tf.contrib.rnn.LSTMCell(num_units=rnn_size, use_peepholes=False)
        return tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob=keep_prob)

    dec_cell = tf.contrib.rnn.MultiRNNCell([lstm_cell(rnn_size, keep_prob) for _ in range(num_layers)])
    
    # 3. Dense layer to translate the decoder's output at each time 
    # step into a choice from the target vocabulary
    output_layer = Dense(target_vocab_size,
                         kernel_initializer = tf.truncated_normal_initializer(mean = 0.0, stddev=0.1))
    
    # We contruct two LSTMs instances (They share parameters) 
    # One is for training, the other for testing
    with tf.variable_scope('decoder', reuse=False):
        train_logits = decoding_layer_train(encoder_state, dec_cell, dec_embed_input, target_sequence_length, 
                             max_target_sequence_length, output_layer, keep_prob)
        
    with tf.variable_scope('decoder', reuse=True):
        val_logits = decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, target_vocab_to_int['<GO>'], 
                             target_vocab_to_int['<EOS>'], max_target_sequence_length, target_vocab_size, 
                             output_layer, batch_size, keep_prob)
    

    return train_logits, val_logits

tests.test_decoding_layer(decoding_layer)

Tests Passed


### Build the Neural Network
Apply the functions you implemented above to:

- Encode the input using your `encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob,  source_sequence_length, source_vocab_size, encoding_embedding_size)`.
- Process target data using your `process_decoder_input(target_data, target_vocab_to_int, batch_size)` function.
- Decode the encoded input using your `decoding_layer(dec_input, enc_state, target_sequence_length, max_target_sentence_length, rnn_size, num_layers, target_vocab_to_int, target_vocab_size, batch_size, keep_prob, dec_embedding_size)` function.

#### 4- Training decoder
Essentially, we'll be creating two decoders which share their parameters. One for training and one for inference.  They differ, in that we feed the target sequences as inputs to the training decoder at each time step to make it more robust.

We can think of the training decoder as looking like this (except that it works with sequences in batches):
<img src="images/sequence-to-sequence-training-decoder.png"/>

#### 5- Inference decoder
The inference decoder is the one we'll use when we deploy our model to the wild.

<img src="images/sequence-to-sequence-inference-decoder.png"/>

In [14]:
def seq2seq_model(input_data, target_data, keep_prob, batch_size,
                  source_sequence_length, target_sequence_length,
                  max_target_sentence_length,
                  source_vocab_size, target_vocab_size,
                  enc_embedding_size, dec_embedding_size,
                  rnn_size, num_layers, target_vocab_to_int):
    """
    Build the Sequence-to-Sequence part of the neural network
    :param input_data: Input placeholder
    :param target_data: Target placeholder
    :param keep_prob: Dropout keep probability placeholder
    :param batch_size: Batch Size
    :param source_sequence_length: Sequence Lengths of source sequences in the batch
    :param target_sequence_length: Sequence Lengths of target sequences in the batch
    :param source_vocab_size: Source vocabulary size
    :param target_vocab_size: Target vocabulary size
    :param enc_embedding_size: Decoder embedding size
    :param dec_embedding_size: Encoder embedding size
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :return: Tuple of (Training BasicDecoderOutput, Inference BasicDecoderOutput)
    """
    enc_output, enc_state = encoding_layer(input_data, rnn_size, num_layers, keep_prob, 
                   source_sequence_length, source_vocab_size, 
                   enc_embedding_size)
    
    # Remove the last word id from each batch in target_data and concat the GO ID to the begining of each batch.
    dec_input = process_decoder_input(target_data, target_vocab_to_int, batch_size)
    
    train_logits, val_logits = decoding_layer(dec_input, enc_state,
                   target_sequence_length, max_target_sentence_length,
                   rnn_size,
                   num_layers, target_vocab_to_int, target_vocab_size,
                   batch_size, keep_prob, dec_embedding_size)
    
    return train_logits, val_logits


tests.test_seq2seq_model(seq2seq_model)

Tests Passed


## Neural Network Training
### Hyperparameters
Tune the following parameters:

- Set `epochs` to the number of epochs.
- Set `batch_size` to the batch size.
- Set `rnn_size` to the size of the RNNs.
- Set `num_layers` to the number of layers.
- Set `encoding_embedding_size` to the size of the embedding for the encoder.
- Set `decoding_embedding_size` to the size of the embedding for the decoder.
- Set `learning_rate` to the learning rate.
- Set `keep_probability` to the Dropout keep probability
- Set `display_step` to state how many steps between each debug output statement

In [15]:
# Number of Epochs
epochs = 10
# Batch Size
batch_size = 512
# RNN Size
rnn_size = 256
# Number of Layers
num_layers = 3
# Embedding Size
encoding_embedding_size = 256
decoding_embedding_size = 256
# Learning Rate
learning_rate = .001
# Dropout Keep Probability
keep_probability = 0.5
display_step = 50

### Build the Graph

In [16]:
save_path = 'checkpoints/dev'
(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = helper.load_preprocess()
max_target_sentence_length = max([len(sentence) for sentence in source_int_text])

train_graph = tf.Graph()
with train_graph.as_default():
    input_data, targets, lr, keep_prob, target_sequence_length, max_target_sequence_length, source_sequence_length = model_inputs()

    #sequence_length = tf.placeholder_with_default(max_target_sentence_length, None, name='sequence_length')
    input_shape = tf.shape(input_data)

    train_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                   targets,
                                                   keep_prob,
                                                   batch_size,
                                                   source_sequence_length,
                                                   target_sequence_length,
                                                   max_target_sequence_length,
                                                   len(source_vocab_to_int),
                                                   len(target_vocab_to_int),
                                                   encoding_embedding_size,
                                                   decoding_embedding_size,
                                                   rnn_size,
                                                   num_layers,
                                                   target_vocab_to_int)


    training_logits = tf.identity(train_logits.rnn_output, name='logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')

    masks = tf.sequence_mask(target_sequence_length, max_target_sequence_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Loss function
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizer
        optimizer = tf.train.AdamOptimizer(lr)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)


Batch and pad the source and target sequences

In [17]:
def pad_sentence_batch(sentence_batch, pad_int):
    """Pad sentences with <PAD> so that each sentence of a batch has the same length"""
    max_sentence = max([len(sentence) for sentence in sentence_batch])
    return [sentence + [pad_int] * (max_sentence - len(sentence)) for sentence in sentence_batch]


def get_batches(sources, targets, batch_size, source_pad_int, target_pad_int):
    """Batch targets, sources, and the lengths of their sentences together"""
    for batch_i in range(0, len(sources)//batch_size):
        start_i = batch_i * batch_size

        # Slice the right amount for the batch
        sources_batch = sources[start_i:start_i + batch_size]
        targets_batch = targets[start_i:start_i + batch_size]

        # Pad
        pad_sources_batch = np.array(pad_sentence_batch(sources_batch, source_pad_int))
        pad_targets_batch = np.array(pad_sentence_batch(targets_batch, target_pad_int))

        # Need the lengths for the _lengths parameters
        pad_targets_lengths = []
        for target in pad_targets_batch:
            pad_targets_lengths.append(len(target))

        pad_source_lengths = []
        for source in pad_sources_batch:
            pad_source_lengths.append(len(source))

        yield pad_sources_batch, pad_targets_batch, pad_source_lengths, pad_targets_lengths

In [18]:
def word_id_to_text_sentence(sentence_ids, source_int_to_vocab):
    output = []
    for word in sentence_ids:
        output.append(source_int_to_vocab[word])
    return ' '.join(output)

### Train
Train the neural network on the preprocessed data. If you have a hard time getting a good loss, check the forms to see if anyone is having the same problem.

In [19]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
def get_accuracy(target, logits):
    """
    Calculate accuracy
    """
    max_seq = max(target.shape[1], logits.shape[1])
    if max_seq - target.shape[1]:
        target = np.pad(
            target,
            [(0,0),(0,max_seq - target.shape[1])],
            'constant')
    if max_seq - logits.shape[1]:
        logits = np.pad(
            logits,
            [(0,0),(0,max_seq - logits.shape[1])],
            'constant')

    return np.mean(np.equal(target, logits))

# Split data to training and validation sets
train_source = source_int_text[batch_size:]
train_target = target_int_text[batch_size:]
valid_source = source_int_text[:batch_size]
valid_target = target_int_text[:batch_size]
(valid_sources_batch, valid_targets_batch, valid_sources_lengths, valid_targets_lengths ) = next(get_batches(valid_source,
                                                                                                             valid_target,
                                                                                                             batch_size,
                                                                                                             source_vocab_to_int['<PAD>'],
                                                                                                             target_vocab_to_int['<PAD>']))                                                                                                  
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(epochs):
        for batch_i, (source_batch, target_batch, sources_lengths, targets_lengths) in enumerate(
                get_batches(train_source, train_target, batch_size,
                            source_vocab_to_int['<PAD>'],
                            target_vocab_to_int['<PAD>'])):

            #print(word_id_to_text_sentence(source_batch[0], source_int_to_vocab))
            _, loss = sess.run(
                [train_op, cost],
                {input_data: source_batch,
                 targets: target_batch,
                 lr: learning_rate,
                 target_sequence_length: targets_lengths,
                 source_sequence_length: sources_lengths,
                 keep_prob: keep_probability})


            if batch_i % display_step == 0 and batch_i > 0:


                batch_train_logits = sess.run(
                    inference_logits,
                    {input_data: source_batch,
                     source_sequence_length: sources_lengths,
                     target_sequence_length: targets_lengths,
                     keep_prob: 1.0})


                batch_valid_logits = sess.run(
                    inference_logits,
                    {input_data: valid_sources_batch,
                     source_sequence_length: valid_sources_lengths,
                     target_sequence_length: valid_targets_lengths,
                     keep_prob: 1.0})

                train_acc = get_accuracy(target_batch, batch_train_logits)

                valid_acc = get_accuracy(valid_targets_batch, batch_valid_logits)

                print('Epoch {:>3} Batch {:>4}/{} - Train Accuracy: {:>6.4f}, Validation Accuracy: {:>6.4f}, Loss: {:>6.4f}'
                      .format(epoch_i, batch_i, len(source_int_text) // batch_size, train_acc, valid_acc, loss))

    # Save Model
    saver = tf.train.Saver()
    saver.save(sess, save_path)
    print('Model Trained and Saved')

Epoch   0 Batch   50/269 - Train Accuracy: 0.3661, Validation Accuracy: 0.4284, Loss: 2.2193
Epoch   0 Batch  100/269 - Train Accuracy: 0.4591, Validation Accuracy: 0.4753, Loss: 1.4421
Epoch   0 Batch  150/269 - Train Accuracy: 0.4860, Validation Accuracy: 0.5123, Loss: 1.2168
Epoch   0 Batch  200/269 - Train Accuracy: 0.5024, Validation Accuracy: 0.5366, Loss: 1.0249
Epoch   0 Batch  250/269 - Train Accuracy: 0.5214, Validation Accuracy: 0.5626, Loss: 0.8762
Epoch   1 Batch   50/269 - Train Accuracy: 0.5448, Validation Accuracy: 0.5695, Loss: 0.7618
Epoch   1 Batch  100/269 - Train Accuracy: 0.5892, Validation Accuracy: 0.5872, Loss: 0.6550
Epoch   1 Batch  150/269 - Train Accuracy: 0.5938, Validation Accuracy: 0.6078, Loss: 0.6255
Epoch   1 Batch  200/269 - Train Accuracy: 0.6208, Validation Accuracy: 0.6407, Loss: 0.5990
Epoch   1 Batch  250/269 - Train Accuracy: 0.6310, Validation Accuracy: 0.6503, Loss: 0.5593
Epoch   2 Batch   50/269 - Train Accuracy: 0.6520, Validation Accuracy

### Save Parameters
Save the `batch_size` and `save_path` parameters for inference.

In [20]:
# Save parameters for checkpoint
helper.save_params(save_path)

# Checkpoint

In [21]:
import tensorflow as tf
import numpy as np
import helper
import problem_unittests as tests

_, (source_vocab_to_int, target_vocab_to_int), (source_int_to_vocab, target_int_to_vocab) = helper.load_preprocess()
load_path = helper.load_params()

## Sentence to Sequence
To feed a sentence into the model for translation, you first need to preprocess it.  Implement the function `sentence_to_seq()` to preprocess new sentences.

- Convert the sentence to lowercase
- Convert words into ids using `vocab_to_int`
 - Convert words not in the vocabulary, to the `<UNK>` word id.

In [22]:
def sentence_to_seq(sentence, vocab_to_int):
    """
    Convert a sentence to a sequence of ids
    :param sentence: String
    :param vocab_to_int: Dictionary to go from the words to an id
    :return: List of word ids
    """
    # Convert words not in the vocabulary, to the <UNK> word id.
    return [vocab_to_int.get(word, vocab_to_int["<UNK>"]) for word in sentence.lower().split(' ')]

tests.test_sentence_to_seq(sentence_to_seq)

Tests Passed


## Translate
This will translate `translate_sentence` from English to French.

In [23]:
translate_sentence = 'he saw a old yellow truck .'


translate_sentence = sentence_to_seq(translate_sentence, source_vocab_to_int)

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(load_path + '.meta')
    loader.restore(sess, load_path)

    input_data = loaded_graph.get_tensor_by_name('input:0')
    logits = loaded_graph.get_tensor_by_name('predictions:0')
    target_sequence_length = loaded_graph.get_tensor_by_name('target_sequence_length:0')
    source_sequence_length = loaded_graph.get_tensor_by_name('source_sequence_length:0')
    keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')

    translate_logits = sess.run(logits, {input_data: [translate_sentence]*batch_size,
                                         target_sequence_length: [len(translate_sentence)*2]*batch_size,
                                         source_sequence_length: [len(translate_sentence)]*batch_size,
                                         keep_prob: 1.0})[0]

print('Input')
print('  Word Ids:      {}'.format([i for i in translate_sentence]))
print('  English Words: {}'.format([source_int_to_vocab[i] for i in translate_sentence]))

print('\nPrediction')
print('  Word Ids:      {}'.format([i for i in translate_logits]))
print('  French Words: {}'.format(" ".join([target_int_to_vocab[i] for i in translate_logits])))


INFO:tensorflow:Restoring parameters from checkpoints/dev
Input
  Word Ids:      [159, 24, 93, 86, 161, 30, 140]
  English Words: ['he', 'saw', 'a', 'old', 'yellow', 'truck', '.']

Prediction
  Word Ids:      [181, 286, 282, 178, 130, 304, 1]
  French Words: il a vu jaune camion . <EOS>


## Imperfect Translation
You might notice that some sentences translate better than others.  Since the dataset we are using only has a vocabulary of 227 English words of the thousands that you use, you're only going to see good results using these words.

A more complete dataset: [WMT10 French-English corpus](http://www.statmt.org/wmt10/training-giga-fren.tar).
