# 语言翻译
　　使用seq2seq模型将句子从英语翻译成法语。
## Get the Data
　　因为将整个英语语言翻译成法语需要花费大量时间，这里只使用一小部分英语的语料库进行训练。

In [1]:
import helper
import problem_unittests as tests

source_path = 'data/small_vocab_en'
target_path = 'data/small_vocab_fr'
source_text = helper.load_data(source_path)
target_text = helper.load_data(target_path)

  from ._conv import register_converters as _register_converters


## Explore the Data
　　使用 view_sentence_range 来查看数据的不同部分。

In [2]:
view_sentence_range = (0, 10)

import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in source_text.split()})))

sentences = source_text.split('\n')
word_counts = [len(sentence.split()) for sentence in sentences]
print('Number of sentences: {}'.format(len(sentences)))
print('Average number of words in a sentence: {}'.format(np.average(word_counts)))

print()
print('English sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(source_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))
print()
print('French sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(target_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))

Dataset Stats
Roughly the number of unique words: 227
Number of sentences: 137861
Average number of words in a sentence: 13.225277634719028

English sentences 0 to 10:
new jersey is sometimes quiet during autumn , and it is snowy in april .
the united states is usually chilly during july , and it is usually freezing in november .
california is usually quiet during march , and it is usually hot in june .
the united states is sometimes mild during june , and it is cold in september .
your least liked fruit is the grape , but my least liked is the apple .
his favorite fruit is the orange , but my favorite is the grape .
paris is relaxing during december , but it is usually chilly in july .
new jersey is busy during spring , and it is never hot in march .
our least liked fruit is the lemon , but my least liked is the grape .
the united states is sometimes busy during january , and it is sometimes warm in november .

French sentences 0 to 10:
new jersey est parfois calme pendant l' automne 

## 实现预处理函数
### 将文本转为词id
　　就像做其它RNN任务一样，首先需要将文本转为数字。使用自定义函数  `text_to_ids()` 将 `source_text` 和 `target_text` 从单词转为数字ID。但是，还需要在 `target_text` 的末尾处添加 `<EOS>`  字符，这有助于神经网络预测句子何时结束。

In [3]:
def text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int):
    '''
    将 source 和 target 文本转为适当的词ID
    Args:
        - source_text: 输入文本（英语）；
        - target_text: 目标文本（法语）；
        - source_vocab_to_int: 输入文本的映射字典，用于将输入文本的词转为ID；
        - target_vocab_to_int: 目标文本的映射字典，用于将目标文本的词转为ID；
    return:
        返回包含元组的列表，每组元组为  (source_id_text, target_id_text)
    '''

    source_sentences = [sentence for sentence in source_text.split('\n')]
    target_sentences = [sentence + ' <EOS>' for sentence in target_text.split('\n')]
    
    source_id_text = [[source_vocab_to_int[word] for word in sentence.split()] for sentence in source_sentences]
    target_id_text = [[target_vocab_to_int[word] for word in sentence.split()] for sentence in target_sentences]
    
    return source_id_text, target_id_text

tests.test_text_to_ids(text_to_ids)

Tests Passed


### 预处理所有数据，并保存
　　运行下面代码预处理所有数据，并保存为文件。

In [4]:
helper.preprocess_and_save_data(source_path, target_path, text_to_ids)

# 检查点
　　这是第一个检查点，如果决定重启 notebook，可以从这里开始，预处理已经保存到磁盘。

In [5]:
import numpy as np
import helper
import problem_unittests as tests

# 每个单词一个id
(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = helper.load_preprocess()

## 建立神经网络
　　seq2seq模型由以下几个部分组成:
- `model_inputs（模型输入）`
- `process_decoder_input`
- `encoding_layer`
- `decoding_layer_train`
- `decoding_layer_infer`
- `decoding_layer`
- `seq2seq_model`

### 模型输入
　　实现 `model_inputs()`  函数为神经网络创建占位符:
- 输入文本的占位符，为  'input' ，维度为2；
- 目标文本的占位符，维度为2；
- 学习速率占位符；
- dropout 保留概率的占位符，为 'keep_prob'；
- 目标序列长度的占位符，为 'target_sequence_length'；
- 目标序列最大长度的占位符，为 'max_target_len'，通过对 target_sequence_length 使用 tf.reduce_max 获得；
- 输入序列长度的占位符，为 'source_sequence_length'。

In [6]:
import tensorflow as tf

def model_inputs():
    input_data = tf.placeholder(tf.int32, [None, None], name='input')
    targets = tf.placeholder(tf.int32, [None, None], name='targets')
    lr = tf.placeholder(tf.float32, None, name='learning_rate')
    keep_prob = tf.placeholder(tf.float32, None, name='keep_prob')
    target_sequence_length = tf.placeholder(tf.int32, (None,), name='target_sequence_length')
    max_target_sequence_length = tf.reduce_max(target_sequence_length, name='max_target_sequence_length')
    source_sequence_length = tf.placeholder(tf.int32, (None,), name='source_sequence_length')
                                            
    return input_data, targets, lr, keep_prob, target_sequence_length, max_target_sequence_length, source_sequence_length


tests.test_model_inputs(model_inputs)

Tests Passed


### 处理解码器的输入
　　实现 `process_decoder_input` 来移除每个batch每个序列的最后一个词id，因为解码器没用到。以及在每个batch每个序列开头添加'GO'字符串标识对应的ID，表示从这里开始。

In [7]:
def process_decoder_input(target_data, target_vocab_to_int, batch_size):
    '''
    处理目标数据用于编码
    Args:
        - target_data: 目标数据的占位符;
        - target_vocab_to_int: 映射词典，将目标数据的词映射为ID;
        - batch_size: 每个 batch 的大小;
    return:
        返回预处理的目标数据
    '''
    # 对每个 batch 移除最后一个词，并在开头添加 <GO>
    ending = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1])
    # tf.fill(dims, value)参数会生成一个dims形状并用value填充的tensor
    # tf.fill([2,2], 7) => [[7,7], [7,7]]。tf.concat()会按照某个维度将两个tensor拼接起来
    dec_input = tf.concat([tf.fill([batch_size, 1], target_vocab_to_int['<GO>']), ending], 1)

    return dec_input

tests.test_process_encoding_input(process_decoder_input)

Tests Passed


### 编码器
　　实现 `encoding_layer()` 来创建编码器RNN层:

 * 使用 [`tf.contrib.layers.embed_sequence`](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence) 对编码输入进行 embe
 * 构建一个包含 [`tf.contrib.rnn.DropoutWrapper`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/DropoutWrapper) 的 [stacked](https://github.com/tensorflow/tensorflow/blob/6947f65a374ebf29e74bb71e36fd82760056d82c/tensorflow/docs_src/tutorials/recurrent.md#stacking-multiple-lstms) 的 [`tf.contrib.rnn.LSTMCell`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LSTMCell) 
 * 将神经层单元 cell 和经过 embedding 的输入传入 [`tf.nn.dynamic_rnn()`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn)

In [8]:
from imp import reload
reload(tests)

def encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob, 
                   source_sequence_length, source_vocab_size, 
                   encoding_embedding_size):
    '''
    创建编码层（器）
    Args:
        - rnn_inputs: RNN 的输入
        - rnn_size: RNN 隐藏层的神经元数
        - num_layers: 堆叠的 rnn cell 数量
        - keep_prob: dropout 保留概率
        - source_sequence_length: 在 batch 中每个序列的长度
        - source_vocab_size: 输入数据的词汇表大小
        - encoding_embedding_size: 对输入数据 embedding 的大小
    return:
        返回编码器的输出和隐藏层的状态，tuple (RNN output, RNN state)
    '''
    # 对输入数据进行 embedding
    enc_embed_input = tf.contrib.layers.embed_sequence(rnn_inputs, source_vocab_size, encoding_embedding_size)
    
    # 初始化 RNN
    def rnnCell(rnn_size):
        enc_cell = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
        return enc_cell
    
    # 根据 num_layers 堆叠 rnn cell
    enc_cell = tf.contrib.rnn.MultiRNNCell([rnnCell(rnn_size) for _ in range(num_layers)])
    
    # 动态 RNN
    enc_output, enc_state = tf.nn.dynamic_rnn(cell=enc_cell, inputs=enc_embed_input,
                                              sequence_length=source_sequence_length, dtype=tf.float32)
    
    return enc_output, enc_state

tests.test_encoding_layer(encoding_layer)

Tests Passed


### training decoder
* 创建 [`tf.contrib.seq2seq.TrainingHelper`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/TrainingHelper) 
* 创建 [`tf.contrib.seq2seq.BasicDecoder`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/BasicDecoder)
* 从 [`tf.contrib.seq2seq.dynamic_decode`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/dynamic_decode) 获取解码器的输出

In [9]:
def decoding_layer_train(encoder_state, dec_cell, dec_embed_input, 
                         target_sequence_length, max_summary_length, 
                         output_layer, keep_prob):
    '''
    创建 training decoder
    Args:
        - encoder_state: 编码器状态
        - dec_cell: 解码器的 cell
        - dec_embed_input: 解码器的 embedded input
        - target_sequence_length: 目标数据每个 batch 的每个序列的长度
        - max_summary_length: batch 里最大序列长度
        - output_layer: 应用于输出层的函数
        - keep_prob: dropout 保留概率
    return:
        BasicDecoderOutput 包含 training logits 和 sample_id
    '''
    training_helper = tf.contrib.seq2seq.TrainingHelper(
                                                        # 嵌入矩阵
                                                        inputs=dec_embed_input,
                                                        # 当前batch中每个序列的长度
                                                        sequence_length=target_sequence_length,
                                                        # 为False，则输入为 [batch_size, sequence_length, embedding_size]
                                                        # 为True，则输入为 [sequence_length, batch_size, embedding_size
                                                        time_major=False)
    # Basic decoder
    # BasicDecoder的作用就是定义一个封装了decoder应该有的功能的实例，根据Helper实例的不同，这个decoder
    #可以实现不同的功能，比如在train的阶段，不把输出重新作为输入，而在inference阶段，将输出接到输入
    training_decoder = tf.contrib.seq2seq.BasicDecoder(
                                                        dec_cell,
                                                        training_helper,
                                                        encoder_state,
                                                        output_layer)

    training_decoder_output = tf.contrib.seq2seq.dynamic_decode(
                                                        training_decoder,
                                                        impute_finished=True,
                                                        maximum_iterations=max_summary_length)[0]
    
    
    return training_decoder_output

tests.test_decoding_layer_train(decoding_layer_train)

Tests Passed


### inference decoder
* 创建 [`tf.contrib.seq2seq.GreedyEmbeddingHelper`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/GreedyEmbeddingHelper)
* 创建 [`tf.contrib.seq2seq.BasicDecoder`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/BasicDecoder)
* 从 [`tf.contrib.seq2seq.dynamic_decode`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/dynamic_decode) 获取解码器的输出

In [10]:
def decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id,
                         end_of_sequence_id, max_target_sequence_length,
                         vocab_size, output_layer, batch_size, keep_prob):
    '''
    创建 inference decoder
    Args:
        - encoder_state: 编码器的状态
        - dec_cell: 解码器的 cell
        - dec_embeddings: 解码器的 embedded input
        - start_of_sequence_id: 给序列开头添加标识符 '<GO>' 的映射ID
        - end_of_sequence_id: 给序列结尾添加标识 '<EOS>' 的映射ID
        - max_target_sequence_length: 目标序列的最大长度
        - vocab_size: 解码器或目标词汇表的大小
        - decoding_scope: 用于 decoding 的 TenorFlow Variable Scope
        - output_layer: 应用于输出层的函数
        - batch_size: batch 的大小
        - keep_prob: dropout 保留概率
    return:
        BasicDecoderOutput 包含 inference logits 和 sample_id
    '''
    # 创建一个常量 tensor 并复制为 batch_size 的大小
    start_tokens = tf.tile(tf.constant([start_of_sequence_id], dtype=tf.int32), [batch_size], name='start_tokens')
    # 这是用于inference阶段的helper，将output输出后的logits使用argmax获得id再经过embedding layer来获取下一时刻的输入。
    inference_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(
                                                            dec_embeddings,
                                                            start_tokens,
                                                            end_of_sequence_id)

    inference_decoder = tf.contrib.seq2seq.BasicDecoder(
                                                            dec_cell,
                                                            inference_helper,
                                                            encoder_state,
                                                            output_layer)

    inference_decoder_output = tf.contrib.seq2seq.dynamic_decode(
                                                            inference_decoder,
                                                            impute_finished=True,
                                                            maximum_iterations=max_target_sequence_length)[0]

    return inference_decoder_output

tests.test_decoding_layer_infer(decoding_layer_infer)

Tests Passed


### 建立解码层
　　使用 `decoding_layer()` 创建解码 RNN 层
* 对目标序列进行 embedding
* 构建解码器的 LSTM cell
* 创建输出层，将解码器的输出映射到词汇表的某个词
* 使用 `decoding_layer_train(encoder_state, dec_cell, dec_embed_input, target_sequence_length, max_target_sequence_length, output_layer, keep_prob)` 函数获得 training logits.
* 使用 `decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id, end_of_sequence_id, max_target_sequence_length, vocab_size, output_layer, batch_size, keep_prob)` 函数获得 inference logits.

注意: 需要使用 [tf.variable_scope](https://www.tensorflow.org/api_docs/python/tf/variable_scope) 来共享 training 和 inference 之间的变量（参数）

In [11]:
from tensorflow.python.layers.core import Dense

def decoding_layer(dec_input, encoder_state,
                   target_sequence_length, max_target_sequence_length,
                   rnn_size,
                   num_layers, target_vocab_to_int, target_vocab_size,
                   batch_size, keep_prob, decoding_embedding_size):
    '''
    创建解码层
    Args:
        - dec_input: 解码器的输入
        - encoder_state: 编码器的状态
        - target_sequence_length: 每个batch的每个序列长度
        - max_target_sequence_length: 目标序列的最大长度
        - rnn_size: rnn 神经元数
        - num_layers: 神经元层数
        - target_vocab_to_int: 映射字典，将目标序列的词映射为数字ID
        - target_vocab_size: 目标序列词汇表的大小
        - batch_size: batch的大小
        - keep_prob: dropout 保留概率
        - decoding_embedding_size: 解码的embedding大小
    return:
        返回 (Training BasicDecoderOutput, Inference BasicDecoderOutput)
    '''
    # embedding
    dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
    dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)
    
    # 构建 decoder cell
    def rnnCell(rnn_size):
        dec_cell = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2018))
        return dec_cell
    
    # 使用 num_layers 构建多层 cell
    dec_cell = tf.contrib.rnn.MultiRNNCell([rnnCell(rnn_size) for _ in range(num_layers)])
    
    # 全连接层将每个时间步长的解码器的输出转为词汇表中的选择
    output_layer = Dense(target_vocab_size, kernel_initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.1))
    
    # 设置 training decode 和 inference decoder
    # training decode
    with tf.variable_scope('decode'):
        training_logits = decoding_layer_train(encoder_state, dec_cell, dec_embed_input, 
                                                                     target_sequence_length, max_target_sequence_length, 
                                                                     output_layer, keep_prob)
    # inference Decoder
    # 与 training decode 共享参数
    with tf.variable_scope('decode', reuse=True):
        inference_logits = decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, target_vocab_to_int['<GO>'],
                                                                        target_vocab_to_int['<EOS>'], max_target_sequence_length,
                                                                        target_vocab_size, output_layer, batch_size, keep_prob)
    
    
    return training_logits, inference_logits
        
tests.test_decoding_layer(decoding_layer)

Tests Passed


### 建立神经网络
- 使用 `encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob,  source_sequence_length, source_vocab_size, encoding_embedding_size)` 函数对输入进行 embedding；
- 使用 `process_decoder_input(target_data, target_vocab_to_int, batch_size)` 函数处理目标序列数据；
- 使用 `decoding_layer(dec_input, enc_state, target_sequence_length, max_target_sentence_length, rnn_size, num_layers, target_vocab_to_int, target_vocab_size, batch_size, keep_prob, dec_embedding_size)` 函数对编码的输入进行解码输出

In [12]:
def seq2seq_model(input_data, target_data, keep_prob, batch_size,
                  source_sequence_length, target_sequence_length,
                  max_target_sentence_length,
                  source_vocab_size, target_vocab_size,
                  enc_embedding_size, dec_embedding_size,
                  rnn_size, num_layers, target_vocab_to_int):
    '''
    建立seq2seq模型
    Args:
        - input_data: 输入占位符
        - target_data: 目标序列的占位符
        - keep_prob: dropout保留概率，占位符
        - batch_size: batch的大小
        - source_sequence_length: 每个batch每个输入序列的长度
        - target_sequence_length: 每个batch每个目标序列的长度
        - max_target_sentence_length: 目标序列的最大长度
        - source_vocab_size: 输入序列词汇表的大小
        - target_vocab_size: 目标序列词汇表的大小
        - enc_embedding_size: 编码的embedding大小
        - dec_embedding_size: 解码的embedding大小
        - rnn_size: rnn 神经元数
        - num_layers: 神经元层数
        - target_vocab_to_int: 映射字典，将目标序列的词映射为数字ID
    return:
        返回 (Training BasicDecoderOutput, Inference BasicDecoderOutput)
    '''
    # 将输入数据传到编码器，忽视编码器的输出，而使用其隐藏层状态输出
    _, enc_state = encoding_layer(input_data, rnn_size, num_layers, keep_prob, source_sequence_length, 
                                  source_vocab_size, enc_embedding_size)
    
    # 对目标序列进行预处理，并传给解码器
    dec_input = process_decoder_input(target_data, target_vocab_to_int, batch_size) 
    
    # 将编码器的隐藏层状态向量和解码器的输入（目标序列）传到解码器
    training_decoder_output, inference_decoder_output = decoding_layer(dec_input, enc_state, target_sequence_length, 
                                                                       max_target_sentence_length, rnn_size, num_layers, target_vocab_to_int, 
                                                                       target_vocab_size, batch_size, keep_prob, dec_embedding_size)
    
    return training_decoder_output, inference_decoder_output

tests.test_seq2seq_model(seq2seq_model)

Tests Passed


## seq2seq模型训练
### 超参数

In [18]:
# 迭代次数
epochs = 5
# batch 大小
batch_size = 128
# 神经元数
rnn_size = 64
# 神经元层数
num_layers = 2
# Embedding 大小
encoding_embedding_size = 30
decoding_embedding_size = 30
# 学习速率
learning_rate = 0.001
# dropout 保留概率
keep_probability = 0.5
display_step = 64

### 建立 Graph

In [14]:
save_path = 'checkpoints/dev'
(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = helper.load_preprocess()
max_target_sentence_length = max([len(sentence) for sentence in source_int_text])

train_graph = tf.Graph()
with train_graph.as_default():
    input_data, targets, lr, keep_prob, target_sequence_length, max_target_sequence_length, source_sequence_length = model_inputs()

    #sequence_length = tf.placeholder_with_default(max_target_sentence_length, None, name='sequence_length')
    input_shape = tf.shape(input_data)
    
    train_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                   targets,
                                                   keep_prob,
                                                   batch_size,
                                                   source_sequence_length,
                                                   target_sequence_length,
                                                   max_target_sequence_length,
                                                   len(source_vocab_to_int),
                                                   len(target_vocab_to_int),
                                                   encoding_embedding_size,
                                                   decoding_embedding_size,
                                                   rnn_size,
                                                   num_layers,
                                                   target_vocab_to_int)


    training_logits = tf.identity(train_logits.rnn_output, name='logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')

    masks = tf.sequence_mask(target_sequence_length, max_target_sequence_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Loss function
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizer
        optimizer = tf.train.AdamOptimizer(lr)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)


对每个batch的输入序列和目标序列进行填充，使batch的每个序列长度保持一致

In [15]:
def pad_sentence_batch(sentence_batch, pad_int):
    '''使用<PAD>进行填充，是的batch的每个序列长度保持一致'''
    max_sentence = max([len(sentence) for sentence in sentence_batch])
    return [sentence + [pad_int] * (max_sentence - len(sentence)) for sentence in sentence_batch]


def get_batches(sources, targets, batch_size, source_pad_int, target_pad_int):
    '''生成输入序列和目标序列的批量数据'''
    for batch_i in range(0, len(sources)//batch_size):
        start_i = batch_i * batch_size

        # 根据索引来切分batch
        sources_batch = sources[start_i:start_i + batch_size]
        targets_batch = targets[start_i:start_i + batch_size]

        # 填充，保持序列长度一致
        pad_sources_batch = np.array(pad_sentence_batch(sources_batch, source_pad_int))
        pad_targets_batch = np.array(pad_sentence_batch(targets_batch, target_pad_int))

        # 保存序列长度
        pad_targets_lengths = []
        for target in pad_targets_batch:
            pad_targets_lengths.append(len(target))

        pad_source_lengths = []
        for source in pad_sources_batch:
            pad_source_lengths.append(len(source))

        yield pad_sources_batch, pad_targets_batch, pad_source_lengths, pad_targets_lengths


### Train
　　对预处理好的数据进行神经网络训练

In [19]:
def get_accuracy(target, logits):
    """
    计算准确率
    """
    max_seq = max(target.shape[1], logits.shape[1])
    if max_seq - target.shape[1]:
        target = np.pad(
            target,
            [(0,0),(0,max_seq - target.shape[1])],
            'constant')
    if max_seq - logits.shape[1]:
        logits = np.pad(
            logits,
            [(0,0),(0,max_seq - logits.shape[1])],
            'constant')

    return np.mean(np.equal(target, logits))

# 切分训练集和验证集
train_source = source_int_text[batch_size:]
train_target = target_int_text[batch_size:]
valid_source = source_int_text[:batch_size]
valid_target = target_int_text[:batch_size]
(valid_sources_batch, valid_targets_batch, valid_sources_lengths, valid_targets_lengths ) = next(get_batches(valid_source,
                                                                                                             valid_target,
                                                                                                             batch_size,
                                                                                                             source_vocab_to_int['<PAD>'],
                                                                                                             target_vocab_to_int['<PAD>']))                                                                                                  
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(epochs):
        for batch_i, (source_batch, target_batch, sources_lengths, targets_lengths) in enumerate(
                get_batches(train_source, train_target, batch_size,
                            source_vocab_to_int['<PAD>'],
                            target_vocab_to_int['<PAD>'])):

            _, loss = sess.run(
                [train_op, cost],
                {input_data: source_batch,
                 targets: target_batch,
                 lr: learning_rate,
                 target_sequence_length: targets_lengths,
                 source_sequence_length: sources_lengths,
                 keep_prob: keep_probability})


            if batch_i % display_step == 0 and batch_i > 0:


                batch_train_logits = sess.run(
                    inference_logits,
                    {input_data: source_batch,
                     source_sequence_length: sources_lengths,
                     target_sequence_length: targets_lengths,
                     keep_prob: 1.0})


                batch_valid_logits = sess.run(
                    inference_logits,
                    {input_data: valid_sources_batch,
                     source_sequence_length: valid_sources_lengths,
                     target_sequence_length: valid_targets_lengths,
                     keep_prob: 1.0})

                train_acc = get_accuracy(target_batch, batch_train_logits)

                valid_acc = get_accuracy(valid_targets_batch, batch_valid_logits)

                print('Epoch {:>3} Batch {:>4}/{} - Train Accuracy: {:>6.4f}, Validation Accuracy: {:>6.4f}, Loss: {:>6.4f}'
                      .format(epoch_i, batch_i, len(source_int_text) // batch_size, train_acc, valid_acc, loss))

    # 保存模型
    saver = tf.train.Saver()
    saver.save(sess, save_path)
    print('Model Trained and Saved')

Epoch   0 Batch   64/1077 - Train Accuracy: 0.3063, Validation Accuracy: 0.3665, Loss: 3.2900
Epoch   0 Batch  128/1077 - Train Accuracy: 0.3575, Validation Accuracy: 0.3778, Loss: 2.8477
Epoch   0 Batch  192/1077 - Train Accuracy: 0.3719, Validation Accuracy: 0.4254, Loss: 2.8108
Epoch   0 Batch  256/1077 - Train Accuracy: 0.3918, Validation Accuracy: 0.4489, Loss: 2.6002
Epoch   0 Batch  320/1077 - Train Accuracy: 0.4242, Validation Accuracy: 0.4663, Loss: 2.4379
Epoch   0 Batch  384/1077 - Train Accuracy: 0.4555, Validation Accuracy: 0.4979, Loss: 2.2887
Epoch   0 Batch  448/1077 - Train Accuracy: 0.4621, Validation Accuracy: 0.4886, Loss: 2.1422
Epoch   0 Batch  512/1077 - Train Accuracy: 0.4449, Validation Accuracy: 0.5028, Loss: 2.0589
Epoch   0 Batch  576/1077 - Train Accuracy: 0.4276, Validation Accuracy: 0.4762, Loss: 2.0188
Epoch   0 Batch  640/1077 - Train Accuracy: 0.4449, Validation Accuracy: 0.4851, Loss: 1.6839
Epoch   0 Batch  704/1077 - Train Accuracy: 0.4020, Validati

### Save Parameters
　　保存 `batch_size` 和 `save_path` 的 inference 参数

In [20]:
# 保存检查点的参数
helper.save_params(save_path)

# Checkpoint

In [21]:
import tensorflow as tf
import numpy as np
import helper
import problem_unittests as tests

_, (source_vocab_to_int, target_vocab_to_int), (source_int_to_vocab, target_int_to_vocab) = helper.load_preprocess()
load_path = helper.load_params()

## seq2seq
　　要将一个句子放入模型进行翻译，需要先对其预处理。使用 `sentence_to_seq()`  函数来处理新句子

- 将句子转换为小写
- 使用 `vocab_to_int` 将词映射为ID
 - 在映射时，如果某个词不在词汇表里，则使用 `<UNK>` 的词id代替

In [22]:
def sentence_to_seq(sentence, vocab_to_int):
    '''
    转换序列（词）为id序列
    Args:
        - sentence: string
        - vocab_to_int: 映射字典，将词转换为id
    return:
        - 列表形式的词id
    '''
    sentence_id = [vocab_to_int[word] if word in vocab_to_int.keys() else vocab_to_int['<UNK>'] for word in sentence.lower().split()]
    
    return sentence_id

tests.test_sentence_to_seq(sentence_to_seq)

Tests Passed


## Translate
　　使用 `translate_sentence`  将英语转为法语

In [23]:
translate_sentence = 'he saw a old yellow truck .'

translate_sentence = sentence_to_seq(translate_sentence, source_vocab_to_int)

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # 载入保存模型
    loader = tf.train.import_meta_graph(load_path + '.meta')
    loader.restore(sess, load_path)

    input_data = loaded_graph.get_tensor_by_name('input:0')
    logits = loaded_graph.get_tensor_by_name('predictions:0')
    target_sequence_length = loaded_graph.get_tensor_by_name('target_sequence_length:0')
    source_sequence_length = loaded_graph.get_tensor_by_name('source_sequence_length:0')
    keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')

    translate_logits = sess.run(logits, {input_data: [translate_sentence]*batch_size,
                                         target_sequence_length: [len(translate_sentence)*2]*batch_size,
                                         source_sequence_length: [len(translate_sentence)]*batch_size,
                                         keep_prob: 1.0})[0]

print('Input')
print('  Word Ids:      {}'.format([i for i in translate_sentence]))
print('  English Words: {}'.format([source_int_to_vocab[i] for i in translate_sentence]))

print('\nPrediction')
print('  Word Ids:      {}'.format([i for i in translate_logits]))
print('  French Words: {}'.format(" ".join([target_int_to_vocab[i] for i in translate_logits])))

INFO:tensorflow:Restoring parameters from checkpoints/dev
Input
  Word Ids:      [211, 61, 57, 54, 138, 208, 97]
  English Words: ['he', 'saw', 'a', 'old', 'yellow', 'truck', '.']

Prediction
  Word Ids:      [177, 251, 161, 174, 1]
  French Words: il conduit un camion <EOS>
