# TV Script Generation
In this project, you'll generate your own [Simpsons](https://en.wikipedia.org/wiki/The_Simpsons) TV scripts using RNNs.  You'll be using part of the [Simpsons dataset](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data) of scripts from 27 seasons.  The Neural Network you'll build will generate a new TV script for a scene at [Moe's Tavern](https://simpsonswiki.com/wiki/Moe's_Tavern).
## Get the Data
The data is already provided for you.  You'll be using a subset of the original dataset.  It consists of only the scenes in Moe's Tavern.  This doesn't include other versions of the tavern, like "Moe's Cavern", "Flaming Moe's", "Uncle Moe's Family Feed-Bag", etc..

In [1]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper

data_dir = './data/simpsons/moes_tavern_lines.txt'
text = helper.load_data(data_dir)
# Ignore notice, since we don't use it for analysing the data
text = text[81:]

## Explore the Data
Play around with `view_sentence_range` to view different parts of the data.

In [2]:
view_sentence_range = (0, 10)

"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in text.split()})))
scenes = text.split('\n\n')
print('Number of scenes: {}'.format(len(scenes)))
sentence_count_scene = [scene.count('\n') for scene in scenes]
print('Average number of sentences in each scene: {}'.format(np.average(sentence_count_scene)))

sentences = [sentence for scene in scenes for sentence in scene.split('\n')]
print('Number of lines: {}'.format(len(sentences)))
word_count_sentence = [len(sentence.split()) for sentence in sentences]
print('Average number of words in each line: {}'.format(np.average(word_count_sentence)))

print()
print('The sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))

Dataset Stats
Roughly the number of unique words: 11492
Number of scenes: 262
Average number of sentences in each scene: 15.248091603053435
Number of lines: 4257
Average number of words in each line: 11.50434578341555

The sentences 0 to 10:
Moe_Szyslak: (INTO PHONE) Moe's Tavern. Where the elite meet to drink.
Bart_Simpson: Eh, yeah, hello, is Mike there? Last name, Rotch.
Moe_Szyslak: (INTO PHONE) Hold on, I'll check. (TO BARFLIES) Mike Rotch. Mike Rotch. Hey, has anybody seen Mike Rotch, lately?
Moe_Szyslak: (INTO PHONE) Listen you little puke. One of these days I'm gonna catch you, and I'm gonna carve my name on your back with an ice pick.
Moe_Szyslak: What's the matter Homer? You're not your normal effervescent self.
Homer_Simpson: I got my problems, Moe. Give me another one.
Moe_Szyslak: Homer, hey, you should not drink to forget your problems.
Barney_Gumble: Yeah, you should only drink to enhance your social skills.




## Implement Preprocessing Functions
The first thing to do to any dataset is preprocessing.  Implement the following preprocessing functions below:
- Lookup Table
- Tokenize Punctuation

### Lookup Table
To create a word embedding, you first need to transform the words to ids.  In this function, create two dictionaries:
- Dictionary to go from the words to an id, we'll call `vocab_to_int`
- Dictionary to go from the id to word, we'll call `int_to_vocab`

Return these dictionaries in the following tuple `(vocab_to_int, int_to_vocab)`

In [3]:
import numpy as np
import problem_unittests as tests

def create_lookup_tables(text):
    """
    Create lookup tables for vocabulary
    :param text: The text of tv scripts split into words
    :return: A tuple of dicts (vocab_to_int, int_to_vocab)
    """
    # TODO: Implement Function
    _vocab_to_int = {}
    _int_to_vocab = {}
    count = 0
    for it in text:
        if it not in _vocab_to_int:
            _vocab_to_int[it] = count
            _int_to_vocab[count] = it
            count += 1
    return _vocab_to_int, _int_to_vocab


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_create_lookup_tables(create_lookup_tables)

Tests Passed


### Tokenize Punctuation
We'll be splitting the script into a word array using spaces as delimiters.  However, punctuations like periods and exclamation marks make it hard for the neural network to distinguish between the word "bye" and "bye!".

Implement the function `token_lookup` to return a dict that will be used to tokenize symbols like "!" into "||Exclamation_Mark||".  Create a dictionary for the following symbols where the symbol is the key and value is the token:
- Period ( . )
- Comma ( , )
- Quotation Mark ( " )
- Semicolon ( ; )
- Exclamation mark ( ! )
- Question mark ( ? )
- Left Parentheses ( ( )
- Right Parentheses ( ) )
- Dash ( -- )
- Return ( \n )

This dictionary will be used to token the symbols and add the delimiter (space) around it.  This separates the symbols as it's own word, making it easier for the neural network to predict on the next word. Make sure you don't use a token that could be confused as a word. Instead of using the token "dash", try using something like "||dash||".

In [4]:
def token_lookup():
    """
    Generate a dict to turn punctuation into a token.
    :return: Tokenize dictionary where the key is the punctuation and the value is the token
    """
    # TODO: Implement Function
    return {
        ".":"[PERIOD]", ",":"[COMMA]", "\"":"[QUOTE]", ";":"[SEMICOLON]", "!":"[EXCLAMATION_MARK]", 
        "?":"[QUESTION_MARK]", "(":"[LEFT_PARENTHESES]", ")":"[RIGHT_PARENTHESES]", "--":"[DASH]",
        "\n":"[RETURN]"
    }

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_tokenize(token_lookup)

Tests Passed


## Preprocess all the data and save it
Running the code cell below will preprocess all the data and save it to file.

In [5]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# Preprocess Training, Validation, and Testing Data
helper.preprocess_and_save_data(data_dir, token_lookup, create_lookup_tables)

# Check Point
This is your first checkpoint. If you ever decide to come back to this notebook or have to restart the notebook, you can start from here. The preprocessed data has been saved to disk.

In [6]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper
import numpy as np
import problem_unittests as tests

int_text, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()

## Build the Neural Network
You'll build the components necessary to build a RNN by implementing the following functions below:
- get_inputs
- get_init_cell
- get_embed
- build_rnn
- build_nn
- get_batches

### Check the Version of TensorFlow and Access to GPU

In [7]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
from distutils.version import LooseVersion
import warnings
import tensorflow as tf

# Check TensorFlow Version
assert LooseVersion(tf.__version__) >= LooseVersion('1.3'), 'Please use TensorFlow version 1.3 or newer'
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
if not tf.test.gpu_device_name():
    warnings.warn('No GPU found. Please use a GPU to train your neural network.')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.3.0
Default GPU Device: /gpu:0


### Input
Implement the `get_inputs()` function to create TF Placeholders for the Neural Network.  It should create the following placeholders:
- Input text placeholder named "input" using the [TF Placeholder](https://www.tensorflow.org/api_docs/python/tf/placeholder) `name` parameter.
- Targets placeholder
- Learning Rate placeholder

Return the placeholders in the following tuple `(Input, Targets, LearningRate)`

In [8]:
def get_inputs():
    """
    Create TF Placeholders for input, targets, and learning rate.
    :return: Tuple (input, targets, learning rate)
    """
    # TODO: Implement Function
    inn = tf.placeholder(tf.int32, shape=(None, None), name="input")
    targets = tf.placeholder(tf.int32, shape=(None, None))
    lr = tf.placeholder(tf.float32)
    return inn, targets, lr


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_inputs(get_inputs)

Tests Passed


### Build RNN Cell and Initialize
Stack one or more [`BasicLSTMCells`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicLSTMCell) in a [`MultiRNNCell`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell).
- The Rnn size should be set using `rnn_size`
- Initalize Cell State using the MultiRNNCell's [`zero_state()`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell#zero_state) function
    - Apply the name "initial_state" to the initial state using [`tf.identity()`](https://www.tensorflow.org/api_docs/python/tf/identity)

Return the cell and initial state in the following tuple `(Cell, InitialState)`

In [9]:
def get_init_cell(batch_size, rnn_size):
    """
    Create an RNN Cell and initialize it.
    :param batch_size: Size of batches
    :param rnn_size: Size of RNNs
    :return: Tuple (cell, initialize state)
    """
    # TODO: Implement Function
    cell = tf.nn.rnn_cell.MultiRNNCell([
        tf.nn.rnn_cell.BasicLSTMCell(num_units=rnn_size)
    ])
    return cell, tf.identity(cell.zero_state(batch_size, tf.float32), name="initial_state")


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_init_cell(get_init_cell)

Tests Passed


### Word Embedding
Apply embedding to `input_data` using TensorFlow.  Return the embedded sequence.

In [10]:
def get_embed(input_data, vocab_size, embed_dim):
    """
    Create embedding for <input_data>.
    :param input_data: TF placeholder for text input.
    :param vocab_size: Number of words in vocabulary.
    :param embed_dim: Number of embedding dimensions
    :return: Embedded input.
    """
    # TODO: Implement Function
    embedding_var = tf.get_variable('embedding', (vocab_size, embed_dim), dtype=tf.float32)
    return  tf.nn.embedding_lookup(embedding_var, input_data)


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_embed(get_embed)

Tests Passed


### Build RNN
You created a RNN Cell in the `get_init_cell()` function.  Time to use the cell to create a RNN.
- Build the RNN using the [`tf.nn.dynamic_rnn()`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn)
 - Apply the name "final_state" to the final state using [`tf.identity()`](https://www.tensorflow.org/api_docs/python/tf/identity)

Return the outputs and final_state state in the following tuple `(Outputs, FinalState)` 

In [11]:
def build_rnn(cell, inputs):
    """
    Create a RNN using a RNN Cell
    :param cell: RNN Cell
    :param inputs: Input text data
    :return: Tuple (Outputs, Final State)
    """
    # TODO: Implement Function
    outputs, state = tf.nn.dynamic_rnn(cell, inputs, dtype=tf.float32)
    return outputs, tf.identity(state, name="final_state")


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_build_rnn(build_rnn)

Tests Passed


### Build the Neural Network
Apply the functions you implemented above to:
- Apply embedding to `input_data` using your `get_embed(input_data, vocab_size, embed_dim)` function.
- Build RNN using `cell` and your `build_rnn(cell, inputs)` function.
- Apply a fully connected layer with a linear activation and `vocab_size` as the number of outputs.

Return the logits and final state in the following tuple (Logits, FinalState) 

In [12]:
def build_nn(cell, rnn_size, input_data, vocab_size, embed_dim):
    """
    Build part of the neural network
    :param cell: RNN cell
    :param rnn_size: Size of rnns
    :param input_data: Input data
    :param vocab_size: Vocabulary size
    :param embed_dim: Number of embedding dimensions
    :return: Tuple (Logits, FinalState)
    """
    # TODO: Implement Function
    embedded_input = get_embed(input_data, vocab_size, embed_dim)
    outputs, final_state = build_rnn(cell, embedded_input)
    dense = tf.layers.dense(outputs, activation=None, units=vocab_size)
    return dense, final_state


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_build_nn(build_nn)

Tests Passed


### Batches
Implement `get_batches` to create batches of input and targets using `int_text`.  The batches should be a Numpy array with the shape `(number of batches, 2, batch size, sequence length)`. Each batch contains two elements:
- The first element is a single batch of **input** with the shape `[batch size, sequence length]`
- The second element is a single batch of **targets** with the shape `[batch size, sequence length]`

If you can't fill the last batch with enough data, drop the last batch.

For example, `get_batches([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 3, 2)` would return a Numpy array of the following:
```
[
  # First Batch
  [
    # Batch of Input
    [[ 1  2], [ 7  8], [13 14]]
    # Batch of targets
    [[ 2  3], [ 8  9], [14 15]]
  ]

  # Second Batch
  [
    # Batch of Input
    [[ 3  4], [ 9 10], [15 16]]
    # Batch of targets
    [[ 4  5], [10 11], [16 17]]
  ]

  # Third Batch
  [
    # Batch of Input
    [[ 5  6], [11 12], [17 18]]
    # Batch of targets
    [[ 6  7], [12 13], [18  1]]
  ]
]
```

Notice that the last target value in the last batch is the first input value of the first batch. In this case, `1`. This is a common technique used when creating sequence batches, although it is rather unintuitive.

In [13]:
import math

def get_batches(int_text, batch_size, seq_length):
    """
    Return batches of input and target
    :param int_text: Text with the words replaced by their ids
    :param batch_size: The size of batch
    :param seq_length: The length of sequence
    :return: Batches as a Numpy array
    """
    # TODO: Implement Function
    amount = math.floor(len(int_text) / seq_length / batch_size)
    switch = 0
    res = []
    for it in range(amount): res.append([[], []])
    for it in range(batch_size * amount):
        inn = int_text[it*seq_length:(it*seq_length)+seq_length]
        out = int_text[(it*seq_length)+1:(it*seq_length)+seq_length+1]
        res[switch][0].append(inn)
        res[switch][1].append(out)
        switch += 1
        if switch >= amount: switch = 0
    res[-1][-1][-1][-1] = int_text[0]
    return np.array(res)


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_batches(get_batches)

Tests Passed


## Neural Network Training
### Hyperparameters
Tune the following parameters:

- Set `num_epochs` to the number of epochs.
- Set `batch_size` to the batch size.
- Set `rnn_size` to the size of the RNNs.
- Set `embed_dim` to the size of the embedding.
- Set `seq_length` to the length of sequence.
- Set `learning_rate` to the learning rate.
- Set `show_every_n_batches` to the number of batches the neural network should print progress.

In [14]:
# Number of Epochs
num_epochs = 2000
# Batch Size
batch_size = 128
# RNN Size
rnn_size = 128
# Embedding Dimension Size
embed_dim = 128
# Sequence Length
seq_length = np.max(word_count_sentence)
# Learning Rate
learning_rate = .001
# Show stats for every n number of batches
show_every_n_batches = 5

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
save_dir = './save'

### Build the Graph
Build the graph using the neural network you implemented.

In [15]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
from tensorflow.contrib import seq2seq

train_graph = tf.Graph()
with train_graph.as_default():
    vocab_size = len(int_to_vocab)
    input_text, targets, lr = get_inputs()
    input_data_shape = tf.shape(input_text)
    cell, initial_state = get_init_cell(input_data_shape[0], rnn_size)
    logits, final_state = build_nn(cell, rnn_size, input_text, vocab_size, embed_dim)

    # Probabilities for generating words
    probs = tf.nn.softmax(logits, name='probs')

    # Loss function
    cost = seq2seq.sequence_loss(
        logits,
        targets,
        tf.ones([input_data_shape[0], input_data_shape[1]]))

    # Optimizer
    optimizer = tf.train.AdamOptimizer(lr)

    # Gradient Clipping
    gradients = optimizer.compute_gradients(cost)
    capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
    train_op = optimizer.apply_gradients(capped_gradients)

## Train
Train the neural network on the preprocessed data.  If you have a hard time getting a good loss, check the [forums](https://discussions.udacity.com/) to see if anyone is having the same problem.

In [16]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
batches = get_batches(int_text, batch_size, seq_length)

with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(num_epochs):
        state = sess.run(initial_state, {input_text: batches[0][0]})

        for batch_i, (x, y) in enumerate(batches):
            feed = {
                input_text: x,
                targets: y,
                initial_state: state,
                lr: learning_rate}
            train_loss, state, _ = sess.run([cost, final_state, train_op], feed)

            # Show every <show_every_n_batches> batches
            if (epoch_i * len(batches) + batch_i) % show_every_n_batches == 0:
                print('Epoch {:>3} Batch {:>4}/{}   train_loss = {:.3f}'.format(
                    epoch_i,
                    batch_i,
                    len(batches),
                    train_loss))

    # Save Model
    saver = tf.train.Saver()
    saver.save(sess, save_dir)
    print('Model Trained and Saved')

Epoch   0 Batch    0/5   train_loss = 8.822
Epoch   1 Batch    0/5   train_loss = 8.800
Epoch   2 Batch    0/5   train_loss = 8.460
Epoch   3 Batch    0/5   train_loss = 7.773
Epoch   4 Batch    0/5   train_loss = 7.061
Epoch   5 Batch    0/5   train_loss = 6.492
Epoch   6 Batch    0/5   train_loss = 6.167
Epoch   7 Batch    0/5   train_loss = 6.034
Epoch   8 Batch    0/5   train_loss = 6.020
Epoch   9 Batch    0/5   train_loss = 6.017
Epoch  10 Batch    0/5   train_loss = 5.985
Epoch  11 Batch    0/5   train_loss = 5.963
Epoch  12 Batch    0/5   train_loss = 5.958
Epoch  13 Batch    0/5   train_loss = 5.952
Epoch  14 Batch    0/5   train_loss = 5.944
Epoch  15 Batch    0/5   train_loss = 5.940
Epoch  16 Batch    0/5   train_loss = 5.936
Epoch  17 Batch    0/5   train_loss = 5.933
Epoch  18 Batch    0/5   train_loss = 5.931
Epoch  19 Batch    0/5   train_loss = 5.929
Epoch  20 Batch    0/5   train_loss = 5.927
Epoch  21 Batch    0/5   train_loss = 5.925
Epoch  22 Batch    0/5   train_l

Epoch 187 Batch    0/5   train_loss = 4.424
Epoch 188 Batch    0/5   train_loss = 4.418
Epoch 189 Batch    0/5   train_loss = 4.411
Epoch 190 Batch    0/5   train_loss = 4.405
Epoch 191 Batch    0/5   train_loss = 4.399
Epoch 192 Batch    0/5   train_loss = 4.392
Epoch 193 Batch    0/5   train_loss = 4.386
Epoch 194 Batch    0/5   train_loss = 4.379
Epoch 195 Batch    0/5   train_loss = 4.373
Epoch 196 Batch    0/5   train_loss = 4.366
Epoch 197 Batch    0/5   train_loss = 4.360
Epoch 198 Batch    0/5   train_loss = 4.354
Epoch 199 Batch    0/5   train_loss = 4.348
Epoch 200 Batch    0/5   train_loss = 4.342
Epoch 201 Batch    0/5   train_loss = 4.336
Epoch 202 Batch    0/5   train_loss = 4.330
Epoch 203 Batch    0/5   train_loss = 4.324
Epoch 204 Batch    0/5   train_loss = 4.317
Epoch 205 Batch    0/5   train_loss = 4.311
Epoch 206 Batch    0/5   train_loss = 4.305
Epoch 207 Batch    0/5   train_loss = 4.299
Epoch 208 Batch    0/5   train_loss = 4.293
Epoch 209 Batch    0/5   train_l

Epoch 374 Batch    0/5   train_loss = 3.391
Epoch 375 Batch    0/5   train_loss = 3.387
Epoch 376 Batch    0/5   train_loss = 3.382
Epoch 377 Batch    0/5   train_loss = 3.378
Epoch 378 Batch    0/5   train_loss = 3.371
Epoch 379 Batch    0/5   train_loss = 3.367
Epoch 380 Batch    0/5   train_loss = 3.359
Epoch 381 Batch    0/5   train_loss = 3.356
Epoch 382 Batch    0/5   train_loss = 3.349
Epoch 383 Batch    0/5   train_loss = 3.344
Epoch 384 Batch    0/5   train_loss = 3.338
Epoch 385 Batch    0/5   train_loss = 3.334
Epoch 386 Batch    0/5   train_loss = 3.328
Epoch 387 Batch    0/5   train_loss = 3.323
Epoch 388 Batch    0/5   train_loss = 3.322
Epoch 389 Batch    0/5   train_loss = 3.316
Epoch 390 Batch    0/5   train_loss = 3.312
Epoch 391 Batch    0/5   train_loss = 3.304
Epoch 392 Batch    0/5   train_loss = 3.299
Epoch 393 Batch    0/5   train_loss = 3.293
Epoch 394 Batch    0/5   train_loss = 3.288
Epoch 395 Batch    0/5   train_loss = 3.283
Epoch 396 Batch    0/5   train_l

Epoch 561 Batch    0/5   train_loss = 2.533
Epoch 562 Batch    0/5   train_loss = 2.530
Epoch 563 Batch    0/5   train_loss = 2.523
Epoch 564 Batch    0/5   train_loss = 2.522
Epoch 565 Batch    0/5   train_loss = 2.515
Epoch 566 Batch    0/5   train_loss = 2.518
Epoch 567 Batch    0/5   train_loss = 2.512
Epoch 568 Batch    0/5   train_loss = 2.512
Epoch 569 Batch    0/5   train_loss = 2.503
Epoch 570 Batch    0/5   train_loss = 2.502
Epoch 571 Batch    0/5   train_loss = 2.494
Epoch 572 Batch    0/5   train_loss = 2.494
Epoch 573 Batch    0/5   train_loss = 2.485
Epoch 574 Batch    0/5   train_loss = 2.483
Epoch 575 Batch    0/5   train_loss = 2.477
Epoch 576 Batch    0/5   train_loss = 2.473
Epoch 577 Batch    0/5   train_loss = 2.474
Epoch 578 Batch    0/5   train_loss = 2.466
Epoch 579 Batch    0/5   train_loss = 2.468
Epoch 580 Batch    0/5   train_loss = 2.456
Epoch 581 Batch    0/5   train_loss = 2.459
Epoch 582 Batch    0/5   train_loss = 2.447
Epoch 583 Batch    0/5   train_l

Epoch 748 Batch    0/5   train_loss = 1.871
Epoch 749 Batch    0/5   train_loss = 1.864
Epoch 750 Batch    0/5   train_loss = 1.861
Epoch 751 Batch    0/5   train_loss = 1.860
Epoch 752 Batch    0/5   train_loss = 1.856
Epoch 753 Batch    0/5   train_loss = 1.857
Epoch 754 Batch    0/5   train_loss = 1.847
Epoch 755 Batch    0/5   train_loss = 1.850
Epoch 756 Batch    0/5   train_loss = 1.842
Epoch 757 Batch    0/5   train_loss = 1.841
Epoch 758 Batch    0/5   train_loss = 1.841
Epoch 759 Batch    0/5   train_loss = 1.833
Epoch 760 Batch    0/5   train_loss = 1.848
Epoch 761 Batch    0/5   train_loss = 1.831
Epoch 762 Batch    0/5   train_loss = 1.836
Epoch 763 Batch    0/5   train_loss = 1.829
Epoch 764 Batch    0/5   train_loss = 1.821
Epoch 765 Batch    0/5   train_loss = 1.817
Epoch 766 Batch    0/5   train_loss = 1.813
Epoch 767 Batch    0/5   train_loss = 1.810
Epoch 768 Batch    0/5   train_loss = 1.805
Epoch 769 Batch    0/5   train_loss = 1.808
Epoch 770 Batch    0/5   train_l

Epoch 935 Batch    0/5   train_loss = 1.420
Epoch 936 Batch    0/5   train_loss = 1.414
Epoch 937 Batch    0/5   train_loss = 1.409
Epoch 938 Batch    0/5   train_loss = 1.403
Epoch 939 Batch    0/5   train_loss = 1.408
Epoch 940 Batch    0/5   train_loss = 1.398
Epoch 941 Batch    0/5   train_loss = 1.403
Epoch 942 Batch    0/5   train_loss = 1.394
Epoch 943 Batch    0/5   train_loss = 1.402
Epoch 944 Batch    0/5   train_loss = 1.392
Epoch 945 Batch    0/5   train_loss = 1.397
Epoch 946 Batch    0/5   train_loss = 1.388
Epoch 947 Batch    0/5   train_loss = 1.392
Epoch 948 Batch    0/5   train_loss = 1.383
Epoch 949 Batch    0/5   train_loss = 1.386
Epoch 950 Batch    0/5   train_loss = 1.379
Epoch 951 Batch    0/5   train_loss = 1.381
Epoch 952 Batch    0/5   train_loss = 1.379
Epoch 953 Batch    0/5   train_loss = 1.374
Epoch 954 Batch    0/5   train_loss = 1.377
Epoch 955 Batch    0/5   train_loss = 1.371
Epoch 956 Batch    0/5   train_loss = 1.380
Epoch 957 Batch    0/5   train_l

Epoch 1119 Batch    0/5   train_loss = 1.104
Epoch 1120 Batch    0/5   train_loss = 1.101
Epoch 1121 Batch    0/5   train_loss = 1.096
Epoch 1122 Batch    0/5   train_loss = 1.094
Epoch 1123 Batch    0/5   train_loss = 1.092
Epoch 1124 Batch    0/5   train_loss = 1.090
Epoch 1125 Batch    0/5   train_loss = 1.091
Epoch 1126 Batch    0/5   train_loss = 1.088
Epoch 1127 Batch    0/5   train_loss = 1.096
Epoch 1128 Batch    0/5   train_loss = 1.107
Epoch 1129 Batch    0/5   train_loss = 1.115
Epoch 1130 Batch    0/5   train_loss = 1.105
Epoch 1131 Batch    0/5   train_loss = 1.102
Epoch 1132 Batch    0/5   train_loss = 1.100
Epoch 1133 Batch    0/5   train_loss = 1.096
Epoch 1134 Batch    0/5   train_loss = 1.087
Epoch 1135 Batch    0/5   train_loss = 1.082
Epoch 1136 Batch    0/5   train_loss = 1.080
Epoch 1137 Batch    0/5   train_loss = 1.074
Epoch 1138 Batch    0/5   train_loss = 1.071
Epoch 1139 Batch    0/5   train_loss = 1.068
Epoch 1140 Batch    0/5   train_loss = 1.067
Epoch 1141

Epoch 1302 Batch    0/5   train_loss = 0.874
Epoch 1303 Batch    0/5   train_loss = 0.873
Epoch 1304 Batch    0/5   train_loss = 0.870
Epoch 1305 Batch    0/5   train_loss = 0.872
Epoch 1306 Batch    0/5   train_loss = 0.870
Epoch 1307 Batch    0/5   train_loss = 0.880
Epoch 1308 Batch    0/5   train_loss = 0.870
Epoch 1309 Batch    0/5   train_loss = 0.888
Epoch 1310 Batch    0/5   train_loss = 0.873
Epoch 1311 Batch    0/5   train_loss = 0.889
Epoch 1312 Batch    0/5   train_loss = 0.900
Epoch 1313 Batch    0/5   train_loss = 0.894
Epoch 1314 Batch    0/5   train_loss = 0.883
Epoch 1315 Batch    0/5   train_loss = 0.880
Epoch 1316 Batch    0/5   train_loss = 0.874
Epoch 1317 Batch    0/5   train_loss = 0.871
Epoch 1318 Batch    0/5   train_loss = 0.867
Epoch 1319 Batch    0/5   train_loss = 0.865
Epoch 1320 Batch    0/5   train_loss = 0.864
Epoch 1321 Batch    0/5   train_loss = 0.869
Epoch 1322 Batch    0/5   train_loss = 0.864
Epoch 1323 Batch    0/5   train_loss = 0.860
Epoch 1324

Epoch 1485 Batch    0/5   train_loss = 0.725
Epoch 1486 Batch    0/5   train_loss = 0.753
Epoch 1487 Batch    0/5   train_loss = 0.737
Epoch 1488 Batch    0/5   train_loss = 0.763
Epoch 1489 Batch    0/5   train_loss = 0.735
Epoch 1490 Batch    0/5   train_loss = 0.729
Epoch 1491 Batch    0/5   train_loss = 0.723
Epoch 1492 Batch    0/5   train_loss = 0.723
Epoch 1493 Batch    0/5   train_loss = 0.716
Epoch 1494 Batch    0/5   train_loss = 0.713
Epoch 1495 Batch    0/5   train_loss = 0.712
Epoch 1496 Batch    0/5   train_loss = 0.710
Epoch 1497 Batch    0/5   train_loss = 0.704
Epoch 1498 Batch    0/5   train_loss = 0.707
Epoch 1499 Batch    0/5   train_loss = 0.703
Epoch 1500 Batch    0/5   train_loss = 0.709
Epoch 1501 Batch    0/5   train_loss = 0.707
Epoch 1502 Batch    0/5   train_loss = 0.716
Epoch 1503 Batch    0/5   train_loss = 0.704
Epoch 1504 Batch    0/5   train_loss = 0.710
Epoch 1505 Batch    0/5   train_loss = 0.707
Epoch 1506 Batch    0/5   train_loss = 0.701
Epoch 1507

Epoch 1668 Batch    0/5   train_loss = 0.607
Epoch 1669 Batch    0/5   train_loss = 0.601
Epoch 1670 Batch    0/5   train_loss = 0.596
Epoch 1671 Batch    0/5   train_loss = 0.591
Epoch 1672 Batch    0/5   train_loss = 0.587
Epoch 1673 Batch    0/5   train_loss = 0.584
Epoch 1674 Batch    0/5   train_loss = 0.587
Epoch 1675 Batch    0/5   train_loss = 0.584
Epoch 1676 Batch    0/5   train_loss = 0.582
Epoch 1677 Batch    0/5   train_loss = 0.581
Epoch 1678 Batch    0/5   train_loss = 0.581
Epoch 1679 Batch    0/5   train_loss = 0.580
Epoch 1680 Batch    0/5   train_loss = 0.578
Epoch 1681 Batch    0/5   train_loss = 0.578
Epoch 1682 Batch    0/5   train_loss = 0.577
Epoch 1683 Batch    0/5   train_loss = 0.573
Epoch 1684 Batch    0/5   train_loss = 0.574
Epoch 1685 Batch    0/5   train_loss = 0.574
Epoch 1686 Batch    0/5   train_loss = 0.574
Epoch 1687 Batch    0/5   train_loss = 0.577
Epoch 1688 Batch    0/5   train_loss = 0.584
Epoch 1689 Batch    0/5   train_loss = 0.585
Epoch 1690

Epoch 1851 Batch    0/5   train_loss = 0.479
Epoch 1852 Batch    0/5   train_loss = 0.482
Epoch 1853 Batch    0/5   train_loss = 0.479
Epoch 1854 Batch    0/5   train_loss = 0.482
Epoch 1855 Batch    0/5   train_loss = 0.481
Epoch 1856 Batch    0/5   train_loss = 0.490
Epoch 1857 Batch    0/5   train_loss = 0.493
Epoch 1858 Batch    0/5   train_loss = 0.494
Epoch 1859 Batch    0/5   train_loss = 0.492
Epoch 1860 Batch    0/5   train_loss = 0.505
Epoch 1861 Batch    0/5   train_loss = 0.536
Epoch 1862 Batch    0/5   train_loss = 0.552
Epoch 1863 Batch    0/5   train_loss = 0.541
Epoch 1864 Batch    0/5   train_loss = 0.542
Epoch 1865 Batch    0/5   train_loss = 0.519
Epoch 1866 Batch    0/5   train_loss = 0.506
Epoch 1867 Batch    0/5   train_loss = 0.496
Epoch 1868 Batch    0/5   train_loss = 0.491
Epoch 1869 Batch    0/5   train_loss = 0.487
Epoch 1870 Batch    0/5   train_loss = 0.482
Epoch 1871 Batch    0/5   train_loss = 0.481
Epoch 1872 Batch    0/5   train_loss = 0.475
Epoch 1873

## Save Parameters
Save `seq_length` and `save_dir` for generating a new TV script.

In [28]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# Save parameters for checkpoint
helper.save_params((seq_length, save_dir))

# Checkpoint

In [29]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import tensorflow as tf
import numpy as np
import helper
import problem_unittests as tests

_, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()
seq_length, load_dir = helper.load_params()

## Implement Generate Functions
### Get Tensors
Get tensors from `loaded_graph` using the function [`get_tensor_by_name()`](https://www.tensorflow.org/api_docs/python/tf/Graph#get_tensor_by_name).  Get the tensors using the following names:
- "input:0"
- "initial_state:0"
- "final_state:0"
- "probs:0"

Return the tensors in the following tuple `(InputTensor, InitialStateTensor, FinalStateTensor, ProbsTensor)` 

In [30]:
def get_tensors(loaded_graph):
    """
    Get input, initial state, final state, and probabilities tensor from <loaded_graph>
    :param loaded_graph: TensorFlow graph loaded from file
    :return: Tuple (InputTensor, InitialStateTensor, FinalStateTensor, ProbsTensor)
    """
    # TODO: Implement Function
    return (
        loaded_graph.get_tensor_by_name("input:0"),
        loaded_graph.get_tensor_by_name("initial_state:0"), 
        loaded_graph.get_tensor_by_name("final_state:0"), 
        loaded_graph.get_tensor_by_name("probs:0")
    )


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_tensors(get_tensors)

Tests Passed


### Choose Word
Implement the `pick_word()` function to select the next word using `probabilities`.

In [31]:
def pick_word(probabilities, int_to_vocab):
    """
    Pick the next word in the generated text
    :param probabilities: Probabilites of the next word
    :param int_to_vocab: Dictionary of word ids as the keys and words as the values
    :return: String of the predicted word
    """
    # TODO: Implement Function
    return int_to_vocab[np.argmax(probabilities)]


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_pick_word(pick_word)

Tests Passed


## Generate TV Script
This will generate the TV script for you.  Set `gen_length` to the length of TV script you want to generate.

In [32]:
gen_length = 200
# homer_simpson, moe_szyslak, or Barney_Gumble
prime_word = 'moe_szyslak'

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(load_dir + '.meta')
    loader.restore(sess, load_dir)

    # Get Tensors from loaded model
    input_text, initial_state, final_state, probs = get_tensors(loaded_graph)

    # Sentences generation setup
    gen_sentences = [prime_word + ':']
    prev_state = sess.run(initial_state, {input_text: np.array([[1]])})

    # Generate sentences
    for n in range(gen_length):
        # Dynamic Input
        dyn_input = [[vocab_to_int[word] for word in gen_sentences[-seq_length:]]]
        dyn_seq_length = len(dyn_input[0])

        # Get Prediction
        probabilities, prev_state = sess.run(
            [probs, final_state],
            {input_text: dyn_input, initial_state: prev_state})
        
        pred_word = pick_word(probabilities[0][dyn_seq_length-1], int_to_vocab)

        gen_sentences.append(pred_word)
    
    # Remove tokens
    tv_script = ' '.join(gen_sentences)
    for key, token in token_dict.items():
        ending = ' ' if key in ['\n', '(', '"'] else ''
        tv_script = tv_script.replace(' ' + token.lower(), key)
    tv_script = tv_script.replace('\n ', '\n')
    tv_script = tv_script.replace('( ', '(')
        
    print(tv_script)

INFO:tensorflow:Restoring parameters from ./save
moe_szyslak: i was just doin' what comes naturally to me-- being mean to animals.
man_with_crazy_beard:(voice like christopher lloyd) excuse me, sir. i was wondering if you would judge an outrageous beard contest i'm in tonight.
moe_szyslak: well, anyone can get a laugh at the expense of an ugly dog. but crazy beards? that's where the big boys play.
moe_szyslak: you know, what's the way, homer. and this isn't.
homer_simpson: what are you doing?
moe_szyslak: oh, i'm afraid lenny's dead.
homer_simpson:(shaking head) no, i won't accept that.
moe_szyslak: no, i want you to meet the guy who's gonna help me?
agnes_skinner: i had ya know how much of you!
homer_simpson:(joking) yeah. oh, thank you for saving my life ain't all the inherent legal liability.
sec_agent_#2: you're under arrest!
moe_szyslak:(realizing) hey, maggie!
homer_simpson:(to self, then, then:) but how do i?
barney_gumble:(tentative annoyed grunt


# The TV Script is Nonsensical
It's ok if the TV script doesn't make any sense.  We trained on less than a megabyte of text.  In order to get good results, you'll have to use a smaller vocabulary or get more data.  Luckily there's more data!  As we mentioned in the beggining of this project, this is a subset of [another dataset](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data).  We didn't have you train on all the data, because that would take too long.  However, you are free to train your neural network on all the data.  After you complete the project, of course.
# Submitting This Project
When submitting this project, make sure to run all the cells before saving the notebook. Save the notebook file as "dlnd_tv_script_generation.ipynb" and save it as a HTML file under "File" -> "Download as". Include the "helper.py" and "problem_unittests.py" files in your submission.

## Other resources

- deeplearning.ai Sequential Models lecture
- https://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html
- https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn