# 生成电视剧剧本

在这个项目中，你将使用 RNN 创作你自己的[《辛普森一家》](https://zh.wikipedia.org/wiki/%E8%BE%9B%E6%99%AE%E6%A3%AE%E4%B8%80%E5%AE%B6)电视剧剧本。你将会用到《辛普森一家》第 27 季中部分剧本的[数据集](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data)。你创建的神经网络将为一个在 [Moe 酒馆](https://simpsonswiki.com/wiki/Moe's_Tavern)中的场景生成一集新的剧本。
## 获取数据
我们早已为你提供了数据。你将使用原始数据集的子集，它只包括 Moe 酒馆中的场景。数据中并不包括酒馆的其他版本，比如 “Moe 的山洞”、“燃烧的 Moe 酒馆”、“Moe 叔叔的家庭大餐”等等。

In [3]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper

data_dir = './data/simpsons/moes_tavern_lines.txt'
text = helper.load_data(data_dir)
# Ignore notice, since we don't use it for analysing the data
text = text[81:]

In [4]:
len(text)

305203

## 探索数据
使用 `view_sentence_range` 来查看数据的不同部分。

In [5]:
view_sentence_range = (0, 10)

"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in text.split()})))
scenes = text.split('\n\n')
print('Number of scenes: {}'.format(len(scenes)))
sentence_count_scene = [scene.count('\n') for scene in scenes]
print('Average number of sentences in each scene: {}'.format(np.average(sentence_count_scene)))

sentences = [sentence for scene in scenes for sentence in scene.split('\n')]
print('Number of lines: {}'.format(len(sentences)))
word_count_sentence = [len(sentence.split()) for sentence in sentences]
print('Average number of words in each line: {}'.format(np.average(word_count_sentence)))

print()
print('The sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))

Dataset Stats
Roughly the number of unique words: 11492
Number of scenes: 262
Average number of sentences in each scene: 15.251908396946565
Number of lines: 4258
Average number of words in each line: 11.50164396430249

The sentences 0 to 10:

Moe_Szyslak: (INTO PHONE) Moe's Tavern. Where the elite meet to drink.
Bart_Simpson: Eh, yeah, hello, is Mike there? Last name, Rotch.
Moe_Szyslak: (INTO PHONE) Hold on, I'll check. (TO BARFLIES) Mike Rotch. Mike Rotch. Hey, has anybody seen Mike Rotch, lately?
Moe_Szyslak: (INTO PHONE) Listen you little puke. One of these days I'm gonna catch you, and I'm gonna carve my name on your back with an ice pick.
Moe_Szyslak: What's the matter Homer? You're not your normal effervescent self.
Homer_Simpson: I got my problems, Moe. Give me another one.
Moe_Szyslak: Homer, hey, you should not drink to forget your problems.
Barney_Gumble: Yeah, you should only drink to enhance your social skills.



## 实现预处理函数
对数据集进行的第一个操作是预处理。请实现下面两个预处理函数：

- 查询表
- 标记符号的字符串

### 查询表
要创建词嵌入，你首先要将词语转换为 id。请在这个函数中创建两个字典：

- 将词语转换为 id 的字典，我们称它为 `vocab_to_int`
- 将 id 转换为词语的字典，我们称它为 `int_to_vocab`

请在下面的元组中返回这些字典
 `(vocab_to_int, int_to_vocab)`

In [6]:
import numpy as np
import problem_unittests as tests
from collections import Counter

def create_lookup_tables(text):
    """
    Create lookup tables for vocabulary
    :param text: The text of tv scripts split into words
    :return: A tuple of dicts (vocab_to_int, int_to_vocab)
    """
    # TODO: Implement Function
    word_counters = Counter(text)
    sorted_vocab = sorted(word_counters, key=word_counters.get, reverse=True)
    
    int_to_vocab = {ii: word for ii, word in enumerate(sorted_vocab)}
    vocab_to_int = {word: ii for ii, word in int_to_vocab.items()}

    return vocab_to_int, int_to_vocab


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_create_lookup_tables(create_lookup_tables)

Tests Passed


### 标记符号的字符串
我们会使用空格当作分隔符，来将剧本分割为词语数组。然而，句号和感叹号等符号使得神经网络难以分辨“再见”和“再见！”之间的区别。

实现函数 `token_lookup` 来返回一个字典，这个字典用于将 “!” 等符号标记为 “||Exclamation_Mark||” 形式。为下列符号创建一个字典，其中符号为标志，值为标记。

- period ( . )
- comma ( , )
- quotation mark ( " )
- semicolon ( ; )
- exclamation mark ( ! )
- question mark ( ? )
- left parenthesis ( ( )
- right parenthesis ( ) )
- dash ( -- )
- return ( \n )

这个字典将用于标记符号并在其周围添加分隔符（空格）。这能将符号视作单独词汇分割开来，并使神经网络更轻松地预测下一个词汇。请确保你并没有使用容易与词汇混淆的标记。与其使用 “dash” 这样的标记，试试使用“||dash||”。

In [7]:
def token_lookup():
    """
    Generate a dict to turn punctuation into a token.
    :return: Tokenize dictionary where the key is the punctuation and the value is the token
    """
    # TODO: Implement Function
    token_dict = {'.':'||period||', ',':'||comma||', '"':'||quotation_mark||', ';':'||semicolon||', 
                  '!':'||exclamation_mark||', 
                  '?':'||question_mark||', '(':'||left_parenthesis||', ')':'||right_parenthesis||', '--': '||dash||', '\n': '||return||'}
    return token_dict

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_tokenize(token_lookup)

Tests Passed


## 预处理并保存所有数据
运行以下代码将预处理所有数据，并将它们保存至文件。

In [8]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# Preprocess Training, Validation, and Testing Data
helper.preprocess_and_save_data(data_dir, token_lookup, create_lookup_tables)

# 检查点
这是你遇到的第一个检点。如果你想要回到这个 notebook，或需要重新打开 notebook，你都可以从这里开始。预处理的数据都已经保存完毕。

In [9]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper
import numpy as np
import problem_unittests as tests

int_text, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()

In [10]:
token_dict



{'\n': '||return||',
 '!': '||exclamation_mark||',
 '"': '||quotation_mark||',
 '(': '||left_parenthesis||',
 ')': '||right_parenthesis||',
 ',': '||comma||',
 '--': '||dash||',
 '.': '||period||',
 ';': '||semicolon||',
 '?': '||question_mark||'}

## 创建神经网络
你将通过实现下面的函数，来创造用于构建 RNN 的必要元素：

- get_inputs
- get_init\_cell
- get_embed
- build_rnn
- build_nn
- get_batches

### 检查 TensorFlow 版本并访问 GPU

In [11]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
from distutils.version import LooseVersion
import warnings
import tensorflow as tf

# Check TensorFlow Version
assert LooseVersion(tf.__version__) >= LooseVersion('1.0'), 'Please use TensorFlow version 1.0 or newer'
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
if not tf.test.gpu_device_name():
    warnings.warn('No GPU found. Please use a GPU to train your neural network.')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.0.1
Default GPU Device: /gpu:0


### 输入

实现函数 `get_inputs()` 来为神经网络创建 TF 占位符。它将创建下列占位符：

- 使用 [TF 占位符](https://www.tensorflow.org/api_docs/python/tf/placeholder) `name` 参量输入 "input" 文本占位符。
- Targets 占位符
- Learning Rate 占位符

返回下列元组中的占位符 `(Input, Targets, LearningRate)`

In [12]:
def get_inputs():
    """
    Create TF Placeholders for input, targets, and learning rate.
    :return: Tuple (input, targets, learning rate)
    """
    # TODO: Implement Function
    inputs = tf.placeholder(tf.int32, [None, None], name='input')
    targets = tf.placeholder(tf.int32, [None, None], name='targets')
    lr = tf.placeholder(tf.float32, name='learning_rate')    
    return inputs, targets, lr


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_inputs(get_inputs)

Tests Passed


### 创建 RNN Cell 并初始化

在 [`MultiRNNCell`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell) 中堆叠一个或多个 [`BasicLSTMCells`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicLSTMCell)

- 使用 `rnn_size` 设定 RNN 大小。
- 使用 MultiRNNCell 的 [`zero_state()`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell#zero_state) 函数初始化 Cell 状态
- 使用 [`tf.identity()`](https://www.tensorflow.org/api_docs/python/tf/identity) 为初始状态应用名称 "initial_state"
 

返回 cell 和下列元组中的初始状态 `(Cell, InitialState)`

In [13]:
import problem_unittests as tests
import importlib
importlib.reload(tests)
lstm_layer = 2
def get_init_cell(batch_size, rnn_size):
    """
    Create an RNN Cell and initialize it.
    :param batch_size: Size of batches
    :param rnn_size: Size of RNNs
    :return: Tuple (cell, initialize state)
    """
    # TODO: Implement Function
    def lstm_cell():
        return tf.contrib.rnn.BasicLSTMCell(rnn_size)
    

    cell = tf.contrib.rnn.MultiRNNCell([lstm_cell() for _ in range(lstm_layer)], state_is_tuple = True)
    
    # Add dropout to the cell
    #drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=0.5)
    
    # Stack up multiple LSTM layers, for deep learning
    #cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layer)
    #cell = tf.contrib.rnn.MultiRNNCell([lstm]*lstm_layer)
    
    initial_state = cell.zero_state(batch_size, tf.float32)
    initial_state = tf.identity(initial_state, 'initial_state')
    
    return (cell, initial_state)


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_init_cell(get_init_cell)

Tests Passed


### 词嵌入
使用 TensorFlow 将嵌入运用到 `input_data` 中。
返回嵌入序列。

In [14]:
def get_embed(input_data, vocab_size, embed_dim):
    """
    Create embedding for <input_data>.
    :param input_data: TF placeholder for text input.
    :param vocab_size: Number of words in vocabulary.
    :param embed_dim: Number of embedding dimensions
    :return: Embedded input.
    """
    # TODO: Implement Function
    embedding = tf.Variable(tf.random_uniform((vocab_size,embed_dim), -1,1))
    embed = tf.nn.embedding_lookup(embedding, input_data)
    return embed


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_embed(get_embed)

Tests Passed


### 创建 RNN
你已经在 `get_init_cell()` 函数中创建了 RNN Cell。是时候使用这个 Cell 来创建 RNN了。

- 使用 [`tf.nn.dynamic_rnn()`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn) 创建 RNN
- 使用 [`tf.identity()`](https://www.tensorflow.org/api_docs/python/tf/identity) 将名称 "final_state" 应用到最终状态中


返回下列元组中的输出和最终状态`(Outputs, FinalState)`

In [15]:
def build_rnn(cell, inputs):
    """
    Create a RNN using a RNN Cell
    :param cell: RNN Cell
    :param inputs: Input text data
    :return: Tuple (Outputs, Final State)
    """
    # TODO: Implement Function
    outputs, final_state = tf.nn.dynamic_rnn(cell, inputs, dtype=tf.float32)
    final_state = tf.identity(final_state, name = 'final_state')
    return outputs, final_state


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_build_rnn(build_rnn)

Tests Passed


### 构建神经网络
应用你在上面实现的函数，来：

- 使用你的 `get_embed(input_data, vocab_size, embed_dim)` 函数将嵌入应用到 `input_data` 中
- 使用 `cell` 和你的 `build_rnn(cell, inputs)` 函数来创建 RNN
- 应用一个完全联通线性激活和 `vocab_size` 的分层作为输出数量。

返回下列元组中的 logit 和最终状态 `Logits, FinalState`

In [16]:
def build_nn(cell, rnn_size, input_data, vocab_size, embed_dim):
    """
    Build part of the neural network
    :param cell: RNN cell
    :param rnn_size: Size of rnns
    :param input_data: Input data
    :param vocab_size: Vocabulary size
    :param embed_dim: Number of embedding dimensions
    :return: Tuple (Logits, FinalState)
    """
    # TODO: Implement Function
    embed = get_embed(input_data, vocab_size, embed_dim)

    print (embed.get_shape())
    outputs, final_state = build_rnn(cell, embed)
    
    
    logits = tf.contrib.layers.fully_connected(outputs, vocab_size, activation_fn = None)

    return (logits, final_state)


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_build_nn(build_nn)

(128, 5, 300)
Tests Passed


### 批次

实现 `get_batches` 来使用 `int_text` 创建输入与目标批次。这些批次应为 Numpy 数组，并具有形状 `(number of batches, 2, batch size, sequence length)`。每个批次包含两个元素：

- 第一个元素为**输入**的单独批次，并具有形状 `[batch size, sequence length]`
- 第二个元素为**目标**的单独批次，并具有形状 `[batch size, sequence length]`

如果你无法在最后一个批次中填入足够数据，请放弃这个批次。

例如 `get_batches([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 2, 3)` 将返回下面这个 Numpy 数组：

[
  # First Batch
  [
    # Batch of Input
    [[ 1  2  3], [ 7  8  9]],
    # Batch of targets
    [[ 2  3  4], [ 8  9 10]]
  ],
 
  # Second Batch
  [
    # Batch of Input
    [[ 4  5  6], [10 11 12]],
    # Batch of targets
    [[ 5  6  7], [11 12 13]]
  ]
]

In [20]:
def get_batches(int_text, batch_size, seq_length):
    """
    Return batches of input and target
    :param int_text: Text with the words replaced by their ids
    :param batch_size: The size of batch
    :param seq_length: The length of sequence
    :return: Batches as a Numpy array
    """
    # TODO: Implement Function
    one_batch_size = int(batch_size * seq_length)
    
    batches = len(int_text) // one_batch_size
    
    if len(int_text) <= batches * one_batch_size:
        input_real_size -= one_batch_size
        batches -= 1
    
    input_text = np.array(int_text[: batches * one_batch_size+1])
    
    output = np.zeros([batches,2,batch_size, seq_length])
    
    for i in range(batches):
        idx = i*seq_length
        ''' 
        output[i][0][0] = input_text[idx: idx + seq_length]
        output[i][0][1] = input_text[idx + 2*seq_length: idx + 2*seq_length + seq_length]
        
        output[i][1][0] = input_text[idx+1: idx+1+ seq_length]
        output[i][1][1] = input_text[idx+ 2*seq_length +1: idx+ 2*seq_length +1 + seq_length]
        '''
        for j in range(batch_size):
            start = idx + j * seq_length * batches
            output[i][0][j] = input_text[start: start + seq_length]
            output[i][1][j] = input_text[start + 1: start + 1 +seq_length]
        
    return output


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_batches(get_batches)

Tests Passed


In [19]:
get_batches([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 2, 3)

array([[[[ 1.,  2.,  3.],
         [ 7.,  8.,  9.]],

        [[ 2.,  3.,  4.],
         [ 8.,  9., 10.]]],


       [[[ 4.,  5.,  6.],
         [10., 11., 12.]],

        [[ 5.,  6.,  7.],
         [11., 12., 13.]]]])

## 神经网络训练
### 超参数
调整下列参数:

- 将 `num_epochs` 设置为训练次数。
- 将 `batch_size` 设置为程序组大小。
- 将 `rnn_size` 设置为 RNN 大小。
- 将 `embed_dim` 设置为嵌入大小。
- 将 `seq_length` 设置为序列长度。
- 将 `learning_rate` 设置为学习率。
- 将 `show_every_n_batches` 设置为神经网络应输出的程序组数量。

In [81]:
# Number of Epochs
num_epochs = 500
# Batch Size
batch_size = 64
# RNN Size
rnn_size = 256
# Embedding Dimension Size
embed_dim = 128
# Sequence Length
seq_length = 12
# Learning Rate
learning_rate = 0.001
# Show stats for every n number of batches
show_every_n_batches = 16

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
save_dir = './save'

### 创建图表
使用你实现的神经网络创建图表。

In [82]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
from tensorflow.contrib import seq2seq

train_graph = tf.Graph()
with train_graph.as_default():
    vocab_size = len(int_to_vocab)
    input_text, targets, lr = get_inputs()
    input_data_shape = tf.shape(input_text)
    cell, initial_state = get_init_cell(input_data_shape[0], rnn_size)
    logits, final_state = build_nn(cell, rnn_size, input_text, vocab_size, embed_dim)

    # Probabilities for generating words
    probs = tf.nn.softmax(logits, name='probs')

    # Loss function
    cost = seq2seq.sequence_loss(
        logits,
        targets,
        tf.ones([input_data_shape[0], input_data_shape[1]]))

    # Optimizer
    optimizer = tf.train.AdamOptimizer(lr)

    # Gradient Clipping
    gradients = optimizer.compute_gradients(cost)
    capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
    train_op = optimizer.apply_gradients(capped_gradients)

(?, ?, 128)


## 训练
在预处理数据中训练神经网络。如果你遇到困难，请查看这个[表格](https://discussions.udacity.com/)，看看是否有人遇到了和你一样的问题。

In [83]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
batches = get_batches(int_text, batch_size, seq_length)

with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(num_epochs):
        state = sess.run(initial_state, {input_text: batches[0][0]})

        for batch_i, (x, y) in enumerate(batches):
            feed = {
                input_text: x,
                targets: y,
                initial_state: state,
                lr: learning_rate}
            train_loss, state, _ = sess.run([cost, final_state, train_op], feed)

            # Show every <show_every_n_batches> batches
            if (epoch_i * len(batches) + batch_i) % show_every_n_batches == 0:
                print('Epoch {:>3} Batch {:>4}/{}   train_loss = {:.3f}'.format(
                    epoch_i,
                    batch_i,
                    len(batches),
                    train_loss))

    # Save Model
    saver = tf.train.Saver()
    saver.save(sess, save_dir)
    print('Model Trained and Saved')

Epoch   0 Batch    0/89   train_loss = 8.822
Epoch   0 Batch   16/89   train_loss = 6.624
Epoch   0 Batch   32/89   train_loss = 6.378
Epoch   0 Batch   48/89   train_loss = 6.420
Epoch   0 Batch   64/89   train_loss = 6.605
Epoch   0 Batch   80/89   train_loss = 6.676
Epoch   1 Batch    7/89   train_loss = 6.199
Epoch   1 Batch   23/89   train_loss = 6.252
Epoch   1 Batch   39/89   train_loss = 6.073
Epoch   1 Batch   55/89   train_loss = 6.225
Epoch   1 Batch   71/89   train_loss = 6.156
Epoch   1 Batch   87/89   train_loss = 6.209
Epoch   2 Batch   14/89   train_loss = 5.950
Epoch   2 Batch   30/89   train_loss = 6.252
Epoch   2 Batch   46/89   train_loss = 6.254
Epoch   2 Batch   62/89   train_loss = 6.171
Epoch   2 Batch   78/89   train_loss = 6.408
Epoch   3 Batch    5/89   train_loss = 6.128
Epoch   3 Batch   21/89   train_loss = 5.888
Epoch   3 Batch   37/89   train_loss = 6.166
Epoch   3 Batch   53/89   train_loss = 6.316
Epoch   3 Batch   69/89   train_loss = 6.136
Epoch   3 

Epoch  32 Batch   80/89   train_loss = 3.785
Epoch  33 Batch    7/89   train_loss = 3.561
Epoch  33 Batch   23/89   train_loss = 3.691
Epoch  33 Batch   39/89   train_loss = 3.473
Epoch  33 Batch   55/89   train_loss = 3.720
Epoch  33 Batch   71/89   train_loss = 3.544
Epoch  33 Batch   87/89   train_loss = 3.529
Epoch  34 Batch   14/89   train_loss = 3.397
Epoch  34 Batch   30/89   train_loss = 3.675
Epoch  34 Batch   46/89   train_loss = 3.590
Epoch  34 Batch   62/89   train_loss = 3.523
Epoch  34 Batch   78/89   train_loss = 3.538
Epoch  35 Batch    5/89   train_loss = 3.461
Epoch  35 Batch   21/89   train_loss = 3.253
Epoch  35 Batch   37/89   train_loss = 3.541
Epoch  35 Batch   53/89   train_loss = 3.591
Epoch  35 Batch   69/89   train_loss = 3.509
Epoch  35 Batch   85/89   train_loss = 3.502
Epoch  36 Batch   12/89   train_loss = 3.369
Epoch  36 Batch   28/89   train_loss = 3.437
Epoch  36 Batch   44/89   train_loss = 3.493
Epoch  36 Batch   60/89   train_loss = 3.315
Epoch  36 

Epoch  65 Batch   71/89   train_loss = 1.700
Epoch  65 Batch   87/89   train_loss = 1.794
Epoch  66 Batch   14/89   train_loss = 1.756
Epoch  66 Batch   30/89   train_loss = 1.836
Epoch  66 Batch   46/89   train_loss = 1.719
Epoch  66 Batch   62/89   train_loss = 1.648
Epoch  66 Batch   78/89   train_loss = 1.489
Epoch  67 Batch    5/89   train_loss = 1.699
Epoch  67 Batch   21/89   train_loss = 1.715
Epoch  67 Batch   37/89   train_loss = 1.731
Epoch  67 Batch   53/89   train_loss = 1.616
Epoch  67 Batch   69/89   train_loss = 1.695
Epoch  67 Batch   85/89   train_loss = 1.599
Epoch  68 Batch   12/89   train_loss = 1.680
Epoch  68 Batch   28/89   train_loss = 1.688
Epoch  68 Batch   44/89   train_loss = 1.732
Epoch  68 Batch   60/89   train_loss = 1.645
Epoch  68 Batch   76/89   train_loss = 1.612
Epoch  69 Batch    3/89   train_loss = 1.589
Epoch  69 Batch   19/89   train_loss = 1.593
Epoch  69 Batch   35/89   train_loss = 1.677
Epoch  69 Batch   51/89   train_loss = 1.515
Epoch  69 

Epoch  98 Batch   62/89   train_loss = 0.627
Epoch  98 Batch   78/89   train_loss = 0.569
Epoch  99 Batch    5/89   train_loss = 0.612
Epoch  99 Batch   21/89   train_loss = 0.689
Epoch  99 Batch   37/89   train_loss = 0.720
Epoch  99 Batch   53/89   train_loss = 0.641
Epoch  99 Batch   69/89   train_loss = 0.722
Epoch  99 Batch   85/89   train_loss = 0.659
Epoch 100 Batch   12/89   train_loss = 0.682
Epoch 100 Batch   28/89   train_loss = 0.700
Epoch 100 Batch   44/89   train_loss = 0.670
Epoch 100 Batch   60/89   train_loss = 0.622
Epoch 100 Batch   76/89   train_loss = 0.607
Epoch 101 Batch    3/89   train_loss = 0.669
Epoch 101 Batch   19/89   train_loss = 0.643
Epoch 101 Batch   35/89   train_loss = 0.689
Epoch 101 Batch   51/89   train_loss = 0.629
Epoch 101 Batch   67/89   train_loss = 0.634
Epoch 101 Batch   83/89   train_loss = 0.560
Epoch 102 Batch   10/89   train_loss = 0.568
Epoch 102 Batch   26/89   train_loss = 0.618
Epoch 102 Batch   42/89   train_loss = 0.715
Epoch 102 

Epoch 131 Batch   53/89   train_loss = 0.379
Epoch 131 Batch   69/89   train_loss = 0.411
Epoch 131 Batch   85/89   train_loss = 0.369
Epoch 132 Batch   12/89   train_loss = 0.361
Epoch 132 Batch   28/89   train_loss = 0.424
Epoch 132 Batch   44/89   train_loss = 0.398
Epoch 132 Batch   60/89   train_loss = 0.334
Epoch 132 Batch   76/89   train_loss = 0.341
Epoch 133 Batch    3/89   train_loss = 0.389
Epoch 133 Batch   19/89   train_loss = 0.369
Epoch 133 Batch   35/89   train_loss = 0.392
Epoch 133 Batch   51/89   train_loss = 0.365
Epoch 133 Batch   67/89   train_loss = 0.354
Epoch 133 Batch   83/89   train_loss = 0.363
Epoch 134 Batch   10/89   train_loss = 0.309
Epoch 134 Batch   26/89   train_loss = 0.334
Epoch 134 Batch   42/89   train_loss = 0.417
Epoch 134 Batch   58/89   train_loss = 0.383
Epoch 134 Batch   74/89   train_loss = 0.370
Epoch 135 Batch    1/89   train_loss = 0.322
Epoch 135 Batch   17/89   train_loss = 0.402
Epoch 135 Batch   33/89   train_loss = 0.411
Epoch 135 

Epoch 164 Batch   44/89   train_loss = 0.349
Epoch 164 Batch   60/89   train_loss = 0.291
Epoch 164 Batch   76/89   train_loss = 0.298
Epoch 165 Batch    3/89   train_loss = 0.344
Epoch 165 Batch   19/89   train_loss = 0.313
Epoch 165 Batch   35/89   train_loss = 0.337
Epoch 165 Batch   51/89   train_loss = 0.323
Epoch 165 Batch   67/89   train_loss = 0.303
Epoch 165 Batch   83/89   train_loss = 0.328
Epoch 166 Batch   10/89   train_loss = 0.278
Epoch 166 Batch   26/89   train_loss = 0.295
Epoch 166 Batch   42/89   train_loss = 0.369
Epoch 166 Batch   58/89   train_loss = 0.346
Epoch 166 Batch   74/89   train_loss = 0.327
Epoch 167 Batch    1/89   train_loss = 0.288
Epoch 167 Batch   17/89   train_loss = 0.348
Epoch 167 Batch   33/89   train_loss = 0.357
Epoch 167 Batch   49/89   train_loss = 0.282
Epoch 167 Batch   65/89   train_loss = 0.283
Epoch 167 Batch   81/89   train_loss = 0.345
Epoch 168 Batch    8/89   train_loss = 0.347
Epoch 168 Batch   24/89   train_loss = 0.325
Epoch 168 

Epoch 197 Batch   35/89   train_loss = 0.325
Epoch 197 Batch   51/89   train_loss = 0.316
Epoch 197 Batch   67/89   train_loss = 0.301
Epoch 197 Batch   83/89   train_loss = 0.315
Epoch 198 Batch   10/89   train_loss = 0.262
Epoch 198 Batch   26/89   train_loss = 0.288
Epoch 198 Batch   42/89   train_loss = 0.348
Epoch 198 Batch   58/89   train_loss = 0.330
Epoch 198 Batch   74/89   train_loss = 0.323
Epoch 199 Batch    1/89   train_loss = 0.277
Epoch 199 Batch   17/89   train_loss = 0.341
Epoch 199 Batch   33/89   train_loss = 0.342
Epoch 199 Batch   49/89   train_loss = 0.270
Epoch 199 Batch   65/89   train_loss = 0.266
Epoch 199 Batch   81/89   train_loss = 0.310
Epoch 200 Batch    8/89   train_loss = 0.303
Epoch 200 Batch   24/89   train_loss = 0.294
Epoch 200 Batch   40/89   train_loss = 0.296
Epoch 200 Batch   56/89   train_loss = 0.342
Epoch 200 Batch   72/89   train_loss = 0.300
Epoch 200 Batch   88/89   train_loss = 0.311
Epoch 201 Batch   15/89   train_loss = 0.314
Epoch 201 

Epoch 230 Batch   26/89   train_loss = 0.277
Epoch 230 Batch   42/89   train_loss = 0.343
Epoch 230 Batch   58/89   train_loss = 0.321
Epoch 230 Batch   74/89   train_loss = 0.310
Epoch 231 Batch    1/89   train_loss = 0.267
Epoch 231 Batch   17/89   train_loss = 0.328
Epoch 231 Batch   33/89   train_loss = 0.332
Epoch 231 Batch   49/89   train_loss = 0.263
Epoch 231 Batch   65/89   train_loss = 0.248
Epoch 231 Batch   81/89   train_loss = 0.300
Epoch 232 Batch    8/89   train_loss = 0.298
Epoch 232 Batch   24/89   train_loss = 0.288
Epoch 232 Batch   40/89   train_loss = 0.287
Epoch 232 Batch   56/89   train_loss = 0.320
Epoch 232 Batch   72/89   train_loss = 0.286
Epoch 232 Batch   88/89   train_loss = 0.302
Epoch 233 Batch   15/89   train_loss = 0.306
Epoch 233 Batch   31/89   train_loss = 0.321
Epoch 233 Batch   47/89   train_loss = 0.274
Epoch 233 Batch   63/89   train_loss = 0.285
Epoch 233 Batch   79/89   train_loss = 0.280
Epoch 234 Batch    6/89   train_loss = 0.300
Epoch 234 

Epoch 263 Batch   17/89   train_loss = 0.321
Epoch 263 Batch   33/89   train_loss = 0.326
Epoch 263 Batch   49/89   train_loss = 0.257
Epoch 263 Batch   65/89   train_loss = 0.245
Epoch 263 Batch   81/89   train_loss = 0.294
Epoch 264 Batch    8/89   train_loss = 0.295
Epoch 264 Batch   24/89   train_loss = 0.286
Epoch 264 Batch   40/89   train_loss = 0.282
Epoch 264 Batch   56/89   train_loss = 0.314
Epoch 264 Batch   72/89   train_loss = 0.281
Epoch 264 Batch   88/89   train_loss = 0.295
Epoch 265 Batch   15/89   train_loss = 0.299
Epoch 265 Batch   31/89   train_loss = 0.317
Epoch 265 Batch   47/89   train_loss = 0.268
Epoch 265 Batch   63/89   train_loss = 0.283
Epoch 265 Batch   79/89   train_loss = 0.280
Epoch 266 Batch    6/89   train_loss = 0.293
Epoch 266 Batch   22/89   train_loss = 0.297
Epoch 266 Batch   38/89   train_loss = 0.301
Epoch 266 Batch   54/89   train_loss = 0.291
Epoch 266 Batch   70/89   train_loss = 0.268
Epoch 266 Batch   86/89   train_loss = 0.258
Epoch 267 

Epoch 296 Batch    8/89   train_loss = 0.291
Epoch 296 Batch   24/89   train_loss = 0.286
Epoch 296 Batch   40/89   train_loss = 0.290
Epoch 296 Batch   56/89   train_loss = 0.315
Epoch 296 Batch   72/89   train_loss = 0.281
Epoch 296 Batch   88/89   train_loss = 0.296
Epoch 297 Batch   15/89   train_loss = 0.298
Epoch 297 Batch   31/89   train_loss = 0.316
Epoch 297 Batch   47/89   train_loss = 0.268
Epoch 297 Batch   63/89   train_loss = 0.280
Epoch 297 Batch   79/89   train_loss = 0.277
Epoch 298 Batch    6/89   train_loss = 0.293
Epoch 298 Batch   22/89   train_loss = 0.295
Epoch 298 Batch   38/89   train_loss = 0.304
Epoch 298 Batch   54/89   train_loss = 0.294
Epoch 298 Batch   70/89   train_loss = 0.267
Epoch 298 Batch   86/89   train_loss = 0.258
Epoch 299 Batch   13/89   train_loss = 0.291
Epoch 299 Batch   29/89   train_loss = 0.335
Epoch 299 Batch   45/89   train_loss = 0.277
Epoch 299 Batch   61/89   train_loss = 0.264
Epoch 299 Batch   77/89   train_loss = 0.303
Epoch 300 

Epoch 328 Batch   88/89   train_loss = 0.291
Epoch 329 Batch   15/89   train_loss = 0.296
Epoch 329 Batch   31/89   train_loss = 0.313
Epoch 329 Batch   47/89   train_loss = 0.266
Epoch 329 Batch   63/89   train_loss = 0.277
Epoch 329 Batch   79/89   train_loss = 0.273
Epoch 330 Batch    6/89   train_loss = 0.291
Epoch 330 Batch   22/89   train_loss = 0.293
Epoch 330 Batch   38/89   train_loss = 0.297
Epoch 330 Batch   54/89   train_loss = 0.287
Epoch 330 Batch   70/89   train_loss = 0.264
Epoch 330 Batch   86/89   train_loss = 0.254
Epoch 331 Batch   13/89   train_loss = 0.289
Epoch 331 Batch   29/89   train_loss = 0.335
Epoch 331 Batch   45/89   train_loss = 0.272
Epoch 331 Batch   61/89   train_loss = 0.260
Epoch 331 Batch   77/89   train_loss = 0.299
Epoch 332 Batch    4/89   train_loss = 0.304
Epoch 332 Batch   20/89   train_loss = 0.292
Epoch 332 Batch   36/89   train_loss = 0.285
Epoch 332 Batch   52/89   train_loss = 0.252
Epoch 332 Batch   68/89   train_loss = 0.307
Epoch 332 

Epoch 361 Batch   79/89   train_loss = 0.275
Epoch 362 Batch    6/89   train_loss = 0.291
Epoch 362 Batch   22/89   train_loss = 0.294
Epoch 362 Batch   38/89   train_loss = 0.297
Epoch 362 Batch   54/89   train_loss = 0.286
Epoch 362 Batch   70/89   train_loss = 0.268
Epoch 362 Batch   86/89   train_loss = 0.254
Epoch 363 Batch   13/89   train_loss = 0.292
Epoch 363 Batch   29/89   train_loss = 0.337
Epoch 363 Batch   45/89   train_loss = 0.271
Epoch 363 Batch   61/89   train_loss = 0.261
Epoch 363 Batch   77/89   train_loss = 0.300
Epoch 364 Batch    4/89   train_loss = 0.306
Epoch 364 Batch   20/89   train_loss = 0.292
Epoch 364 Batch   36/89   train_loss = 0.286
Epoch 364 Batch   52/89   train_loss = 0.253
Epoch 364 Batch   68/89   train_loss = 0.307
Epoch 364 Batch   84/89   train_loss = 0.273
Epoch 365 Batch   11/89   train_loss = 0.310
Epoch 365 Batch   27/89   train_loss = 0.320
Epoch 365 Batch   43/89   train_loss = 0.278
Epoch 365 Batch   59/89   train_loss = 0.283
Epoch 365 

Epoch 394 Batch   70/89   train_loss = 0.262
Epoch 394 Batch   86/89   train_loss = 0.252
Epoch 395 Batch   13/89   train_loss = 0.287
Epoch 395 Batch   29/89   train_loss = 0.332
Epoch 395 Batch   45/89   train_loss = 0.269
Epoch 395 Batch   61/89   train_loss = 0.258
Epoch 395 Batch   77/89   train_loss = 0.298
Epoch 396 Batch    4/89   train_loss = 0.304
Epoch 396 Batch   20/89   train_loss = 0.291
Epoch 396 Batch   36/89   train_loss = 0.283
Epoch 396 Batch   52/89   train_loss = 0.251
Epoch 396 Batch   68/89   train_loss = 0.306
Epoch 396 Batch   84/89   train_loss = 0.272
Epoch 397 Batch   11/89   train_loss = 0.310
Epoch 397 Batch   27/89   train_loss = 0.319
Epoch 397 Batch   43/89   train_loss = 0.276
Epoch 397 Batch   59/89   train_loss = 0.281
Epoch 397 Batch   75/89   train_loss = 0.256
Epoch 398 Batch    2/89   train_loss = 0.323
Epoch 398 Batch   18/89   train_loss = 0.288
Epoch 398 Batch   34/89   train_loss = 0.295
Epoch 398 Batch   50/89   train_loss = 0.285
Epoch 398 

Epoch 427 Batch   61/89   train_loss = 0.258
Epoch 427 Batch   77/89   train_loss = 0.296
Epoch 428 Batch    4/89   train_loss = 0.302
Epoch 428 Batch   20/89   train_loss = 0.290
Epoch 428 Batch   36/89   train_loss = 0.282
Epoch 428 Batch   52/89   train_loss = 0.249
Epoch 428 Batch   68/89   train_loss = 0.305
Epoch 428 Batch   84/89   train_loss = 0.270
Epoch 429 Batch   11/89   train_loss = 0.308
Epoch 429 Batch   27/89   train_loss = 0.318
Epoch 429 Batch   43/89   train_loss = 0.275
Epoch 429 Batch   59/89   train_loss = 0.280
Epoch 429 Batch   75/89   train_loss = 0.255
Epoch 430 Batch    2/89   train_loss = 0.323
Epoch 430 Batch   18/89   train_loss = 0.286
Epoch 430 Batch   34/89   train_loss = 0.294
Epoch 430 Batch   50/89   train_loss = 0.285
Epoch 430 Batch   66/89   train_loss = 0.271
Epoch 430 Batch   82/89   train_loss = 0.275
Epoch 431 Batch    9/89   train_loss = 0.294
Epoch 431 Batch   25/89   train_loss = 0.298
Epoch 431 Batch   41/89   train_loss = 0.277
Epoch 431 

Epoch 460 Batch   52/89   train_loss = 0.250
Epoch 460 Batch   68/89   train_loss = 0.307
Epoch 460 Batch   84/89   train_loss = 0.274
Epoch 461 Batch   11/89   train_loss = 0.308
Epoch 461 Batch   27/89   train_loss = 0.322
Epoch 461 Batch   43/89   train_loss = 0.278
Epoch 461 Batch   59/89   train_loss = 0.284
Epoch 461 Batch   75/89   train_loss = 0.255
Epoch 462 Batch    2/89   train_loss = 0.323
Epoch 462 Batch   18/89   train_loss = 0.301
Epoch 462 Batch   34/89   train_loss = 0.298
Epoch 462 Batch   50/89   train_loss = 0.293
Epoch 462 Batch   66/89   train_loss = 0.275
Epoch 462 Batch   82/89   train_loss = 0.279
Epoch 463 Batch    9/89   train_loss = 0.297
Epoch 463 Batch   25/89   train_loss = 0.303
Epoch 463 Batch   41/89   train_loss = 0.281
Epoch 463 Batch   57/89   train_loss = 0.284
Epoch 463 Batch   73/89   train_loss = 0.286
Epoch 464 Batch    0/89   train_loss = 0.274
Epoch 464 Batch   16/89   train_loss = 0.374
Epoch 464 Batch   32/89   train_loss = 0.340
Epoch 464 

Epoch 493 Batch   43/89   train_loss = 0.274
Epoch 493 Batch   59/89   train_loss = 0.279
Epoch 493 Batch   75/89   train_loss = 0.254
Epoch 494 Batch    2/89   train_loss = 0.322
Epoch 494 Batch   18/89   train_loss = 0.286
Epoch 494 Batch   34/89   train_loss = 0.293
Epoch 494 Batch   50/89   train_loss = 0.284
Epoch 494 Batch   66/89   train_loss = 0.271
Epoch 494 Batch   82/89   train_loss = 0.275
Epoch 495 Batch    9/89   train_loss = 0.293
Epoch 495 Batch   25/89   train_loss = 0.297
Epoch 495 Batch   41/89   train_loss = 0.277
Epoch 495 Batch   57/89   train_loss = 0.273
Epoch 495 Batch   73/89   train_loss = 0.257
Epoch 496 Batch    0/89   train_loss = 0.238
Epoch 496 Batch   16/89   train_loss = 0.320
Epoch 496 Batch   32/89   train_loss = 0.237
Epoch 496 Batch   48/89   train_loss = 0.293
Epoch 496 Batch   64/89   train_loss = 0.237
Epoch 496 Batch   80/89   train_loss = 0.282
Epoch 497 Batch    7/89   train_loss = 0.290
Epoch 497 Batch   23/89   train_loss = 0.296
Epoch 497 

In [84]:
len(int_text)

69101

## 储存参数
储存 `seq_length` 和 `save_dir` 来生成新的电视剧剧本。

In [85]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# Save parameters for checkpoint
helper.save_params((seq_length, save_dir))

# 检查点

In [86]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import tensorflow as tf
import numpy as np
import helper
import problem_unittests as tests

_, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()
seq_length, load_dir = helper.load_params()

## 实现生成函数
### 获取 Tensors
使用 [`get_tensor_by_name()`](https://www.tensorflow.org/api_docs/python/tf/Graph#get_tensor_by_name)函数从 `loaded_graph` 中获取 tensor。  使用下面的名称获取 tensor：

- "input:0"
- "initial_state:0"
- "final_state:0"
- "probs:0"

返回下列元组中的 tensor `(InputTensor, InitialStateTensor, FinalStateTensor, ProbsTensor)`

In [87]:
def get_tensors(loaded_graph):
    """
    Get input, initial state, final state, and probabilities tensor from <loaded_graph>
    :param loaded_graph: TensorFlow graph loaded from file
    :return: Tuple (InputTensor, InitialStateTensor, FinalStateTensor, ProbsTensor)
    """
    # TODO: Implement Function
    return loaded_graph.get_tensor_by_name('input:0'), loaded_graph.get_tensor_by_name('initial_state:0'),\
            loaded_graph.get_tensor_by_name('final_state:0'), loaded_graph.get_tensor_by_name('probs:0')


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_tensors(get_tensors)

Tests Passed


### 选择词汇
实现 `pick_word()` 函数来使用 `probabilities` 选择下一个词汇。

In [88]:
test_probabilities = [0.1, 0.8, 0.05, 0.05]
test_int_to_vocab = ['my', 'is', 'a', 'test']
np.random.choice(test_int_to_vocab, 1, p=test_probabilities)[0]

'is'

In [89]:
def pick_word(probabilities, int_to_vocab):
    """
    Pick the next word in the generated text
    :param probabilities: Probabilites of the next word
    :param int_to_vocab: Dictionary of word ids as the keys and words as the values
    :return: String of the predicted word
    """
    # TODO: Implement Function

    #return int_to_vocab[probabilities.argmax()]
    # Add random to choice one word
    return np.random.choice(list(int_to_vocab.values()), 1, p=probabilities)[0]


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_pick_word(pick_word)

Tests Passed


## 生成电视剧剧本
这将为你生成一个电视剧剧本。通过设置 `gen_length` 来调整你想生成的剧本长度。

In [90]:
gen_length = 2000
# homer_simpson, moe_szyslak, or Barney_Gumble
prime_word = 'moe_szyslak'

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(load_dir + '.meta')
    loader.restore(sess, load_dir)

    # Get Tensors from loaded model
    input_text, initial_state, final_state, probs = get_tensors(loaded_graph)

    # Sentences generation setup
    gen_sentences = [prime_word + ':']
    prev_state = sess.run(initial_state, {input_text: np.array([[1]])})

    # Generate sentences
    for n in range(gen_length):
        # Dynamic Input
        dyn_input = [[vocab_to_int[word] for word in gen_sentences[-seq_length:]]]
        dyn_seq_length = len(dyn_input[0])

        # Get Prediction
        probabilities, prev_state = sess.run(
            [probs, final_state],
            {input_text: dyn_input, initial_state: prev_state})
        
        pred_word = pick_word(probabilities[dyn_seq_length-1], int_to_vocab)

        gen_sentences.append(pred_word)
    
    # Remove tokens
    tv_script = ' '.join(gen_sentences)
    for key, token in token_dict.items():
        ending = ' ' if key in ['\n', '(', '"'] else ''
        tv_script = tv_script.replace(' ' + token.lower(), key)
    tv_script = tv_script.replace('\n ', '\n')
    tv_script = tv_script.replace('( ', '(')
        
    print(tv_script)

moe_szyslak: even you let me down, hitler.
man's_voice: hey, it's true.
moe_szyslak:(looking at homer) hey, barney! i'm an urban for a little bit.
barney_gumble: is on the best thing to the hobo!
duffman:(warmly) this isn't none here. he's never hungry, i only nothin' so bad. please, please!

homer_simpson:(shocked) are you sure? cuz one magic should be over here?
homer_simpson: sorry it's all.(points to barney." and sadder is a" flaming moe on the" flaming moe" and no longer! i'm a gear-head!
renee:(to self) that's your dog.
homer_simpson:(reading) very deal.
apu_nahasapeemapetilon:(moans) moe, barney.
barney_gumble: you know, how about lenny?
homer_simpson:(proudly) hey everyone look at my bar friends!
homer_simpson:(sweet) yeah, where's my car?!
bart_simpson:(thru phone) jacques. i've got the hands to moe.
homer_simpson: ahh! i gave me the liver seconds.(c."
c. ooh, eh, who is this saucy?
lenny_leonard: oh, you're just gonna see this wrong.
moe_szyslak:(to moe) oh, this is a second 

# 这个电视剧剧本是无意义的
如果这个电视剧剧本毫无意义，那也没有关系。我们的训练文本不到一兆字节。为了获得更好的结果，你需要使用更小的词汇范围或是更多数据。幸运的是，我们的确拥有更多数据！在本项目开始之初我们也曾提过，这是[另一个数据集](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data)的子集。我们并没有让你基于所有数据进行训练，因为这将耗费大量时间。然而，你可以随意使用这些数据训练你的神经网络。当然，是在完成本项目之后。
# 提交项目
在提交项目时，请确保你在保存 notebook 前运行了所有的单元格代码。请将 notebook 文件保存为 "dlnd_tv_script_generation.ipynb"，并将它作为 HTML 文件保存在 "File" -> "Download as" 中。请将 "helper.py" 和 "problem_unittests.py" 文件一并提交。