# TV Script Generation
In this project, you'll generate your own [Simpsons](https://en.wikipedia.org/wiki/The_Simpsons) TV scripts using RNNs.  You'll be using part of the [Simpsons dataset](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data) of scripts from 27 seasons.  The Neural Network you'll build will generate a new TV script for a scene at [Moe's Tavern](https://simpsonswiki.com/wiki/Moe's_Tavern).
## Get the Data
The data is already provided for you.  You'll be using a subset of the original dataset.  It consists of only the scenes in Moe's Tavern.  This doesn't include other versions of the tavern, like "Moe's Cavern", "Flaming Moe's", "Uncle Moe's Family Feed-Bag", etc..

In [1]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper

data_dir = './data/simpsons/moes_tavern_lines.txt'
text = helper.load_data(data_dir)
# Ignore notice, since we don't use it for analysing the data
text = text[81:]

## Explore the Data
Play around with `view_sentence_range` to view different parts of the data.

In [2]:
view_sentence_range = (0, 10)

"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in text.split()})))
scenes = text.split('\n\n')
print('Number of scenes: {}'.format(len(scenes)))
sentence_count_scene = [scene.count('\n') for scene in scenes]
print('Average number of sentences in each scene: {}'.format(np.average(sentence_count_scene)))

sentences = [sentence for scene in scenes for sentence in scene.split('\n')]
print('Number of lines: {}'.format(len(sentences)))
word_count_sentence = [len(sentence.split()) for sentence in sentences]
print('Average number of words in each line: {}'.format(np.average(word_count_sentence)))

print()
print('The sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))

Dataset Stats
Roughly the number of unique words: 11492
Number of scenes: 262
Average number of sentences in each scene: 15.248091603053435
Number of lines: 4257
Average number of words in each line: 11.50434578341555

The sentences 0 to 10:
Moe_Szyslak: (INTO PHONE) Moe's Tavern. Where the elite meet to drink.
Bart_Simpson: Eh, yeah, hello, is Mike there? Last name, Rotch.
Moe_Szyslak: (INTO PHONE) Hold on, I'll check. (TO BARFLIES) Mike Rotch. Mike Rotch. Hey, has anybody seen Mike Rotch, lately?
Moe_Szyslak: (INTO PHONE) Listen you little puke. One of these days I'm gonna catch you, and I'm gonna carve my name on your back with an ice pick.
Moe_Szyslak: What's the matter Homer? You're not your normal effervescent self.
Homer_Simpson: I got my problems, Moe. Give me another one.
Moe_Szyslak: Homer, hey, you should not drink to forget your problems.
Barney_Gumble: Yeah, you should only drink to enhance your social skills.




## Implement Preprocessing Functions
The first thing to do to any dataset is preprocessing.  Implement the following preprocessing functions below:
- Lookup Table
- Tokenize Punctuation

### Lookup Table
To create a word embedding, you first need to transform the words to ids.  In this function, create two dictionaries:
- Dictionary to go from the words to an id, we'll call `vocab_to_int`
- Dictionary to go from the id to word, we'll call `int_to_vocab`

Return these dictionaries in the following tuple `(vocab_to_int, int_to_vocab)`

In [3]:
import numpy as np
import problem_unittests as tests

def create_lookup_tables(text):
    """
    Create lookup tables for vocabulary
    :param text: The text of tv scripts split into words
    :return: A tuple of dicts (vocab_to_int, int_to_vocab)
    """
    
    vocab_to_int = {s:i for i, s in enumerate(set(text))}
    int_to_vocab = dict(enumerate(set(text)))
    return (vocab_to_int, int_to_vocab)


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_create_lookup_tables(create_lookup_tables)

Tests Passed


### Tokenize Punctuation
We'll be splitting the script into a word array using spaces as delimiters.  However, punctuations like periods and exclamation marks make it hard for the neural network to distinguish between the word "bye" and "bye!".

Implement the function `token_lookup` to return a dict that will be used to tokenize symbols like "!" into "||Exclamation_Mark||".  Create a dictionary for the following symbols where the symbol is the key and value is the token:
- Period ( . )
- Comma ( , )
- Quotation Mark ( " )
- Semicolon ( ; )
- Exclamation mark ( ! )
- Question mark ( ? )
- Left Parentheses ( ( )
- Right Parentheses ( ) )
- Dash ( -- )
- Return ( \n )

This dictionary will be used to token the symbols and add the delimiter (space) around it.  This separates the symbols as it's own word, making it easier for the neural network to predict on the next word. Make sure you don't use a token that could be confused as a word. Instead of using the token "dash", try using something like "||dash||".

In [4]:
# Find all non-alphanumeric and non-space chars in the text.
import re
specialChars = list(set(re.findall("[^A-Za-z0-9À-ÿ\b \b]", text))) # \b \b is for single space between words.
print(specialChars)

['$', ')', '/', '-', '#', '(', ';', '"', '&', '.', '_', "'", '!', '?', ',', '\n', ':', '%']


In [5]:
def token_lookup():
    """
    Generate a dict to turn punctuation into a token.
    :return: Tokenize dictionary where the key is the punctuation and the value is the token
    """
    # Unit test does not allow some of the symbols found in the text
    # so I simply use the ones from the list above...
    symbol_to_token = { '"': "||Double_Quote||",
                        #'&': "||Ampersand||",
                        #'_': "||Underscore||",
                        ')': "||Right_Parantheses||",
                        ',': "||Comma||",
                        '?': "||Question_Mark||",
                        #'%': "||Percent||",
                        #'$': "||Dollar||",
                        #'/': "||Slash||",
                        '(': "||Left_Parantheses||",
                        '.': "||Period||",
                        '\n': "||Line_Break||",
                        #'#': "||Hash||",
                        '!': "||Exclamation_Mark||",
                        #"'": "||Single_Quote||",
                        ';': "||Semi_Colon||",
                        #':': "||Colon||",
                        #'-': "||Dash||",
                        '--': "||Double_Dash||"
                      }
    return symbol_to_token

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_tokenize(token_lookup)

Tests Passed


## Preprocess all the data and save it
Running the code cell below will preprocess all the data and save it to file.

In [6]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# Preprocess Training, Validation, and Testing Data
helper.preprocess_and_save_data(data_dir, token_lookup, create_lookup_tables)

# Check Point
This is your first checkpoint. If you ever decide to come back to this notebook or have to restart the notebook, you can start from here. The preprocessed data has been saved to disk.

In [1]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper
import numpy as np
import problem_unittests as tests

int_text, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()

## Build the Neural Network
You'll build the components necessary to build a RNN by implementing the following functions below:
- get_inputs
- get_init_cell
- get_embed
- build_rnn
- build_nn
- get_batches

### Check the Version of TensorFlow and Access to GPU

In [2]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
from distutils.version import LooseVersion
import warnings
import tensorflow as tf

# Check TensorFlow Version
assert LooseVersion(tf.__version__) >= LooseVersion('1.0'), 'Please use TensorFlow version 1.0 or newer'
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
if not tf.test.gpu_device_name():
    warnings.warn('No GPU found. Please use a GPU to train your neural network.')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.1.0




### Input
Implement the `get_inputs()` function to create TF Placeholders for the Neural Network.  It should create the following placeholders:
- Input text placeholder named "input" using the [TF Placeholder](https://www.tensorflow.org/api_docs/python/tf/placeholder) `name` parameter.
- Targets placeholder
- Learning Rate placeholder

Return the placeholders in the following tuple `(Input, Targets, LearningRate)`

In [3]:
def get_inputs():
    """
    Create TF Placeholders for input, targets, and learning rate.
    :return: Tuple (input, targets, learning rate)
    """
    # TODO: Implement Function
    i = tf.placeholder(tf.int32, shape=[None, None], name='input')
    t = tf.placeholder(tf.int32, shape=[None, None], name='targets')
    l = tf.placeholder(tf.float32, name='learningRate')
    return i, t, l

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_inputs(get_inputs)

Tests Passed


### Build RNN Cell and Initialize
Stack one or more [`BasicLSTMCells`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicLSTMCell) in a [`MultiRNNCell`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell).
- The Rnn size should be set using `rnn_size`
- Initalize Cell State using the MultiRNNCell's [`zero_state()`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell#zero_state) function
    - Apply the name "initial_state" to the initial state using [`tf.identity()`](https://www.tensorflow.org/api_docs/python/tf/identity)

Return the cell and initial state in the following tuple `(Cell, InitialState)`

In [4]:
def get_init_cell(batch_size, rnn_size):
    """
    Create an RNN Cell and initialize it.
    :param batch_size: Size of batches
    :param rnn_size: Size of RNNs (number of units in hidden layer)
    :return: Tuple (cell, initialize state)
    """
    # Tensorflow 1.1 syntax.
    def build_cell(rnn_size):
        lstm = tf.contrib.rnn.BasicLSTMCell(rnn_size)
        #drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
        return lstm
    n_layers = 1
    cell = tf.contrib.rnn.MultiRNNCell([build_cell(rnn_size) for _ in range(n_layers)])
    
    # Tensorflow 1.0 syntax.
    '''
    # Create basic cell.
    lstm = tf.contrib.rnn.BasicLSTMCell(rnn_size)
    
    # Stack rnn_size basic cells.
    cell = tf.contrib.rnn.MultiRNNCell([lstm]*1)
    '''
    
    # Set initial state.
    initial_state = cell.zero_state(batch_size, tf.float32)
    # Add name to initial state.
    initial_state = tf.identity(initial_state, name='initial_state')
    
    return cell, initial_state


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_init_cell(get_init_cell)

Tests Passed


### Word Embedding
Apply embedding to `input_data` using TensorFlow.  Return the embedded sequence.

In [5]:
def get_embed(input_data, vocab_size, embed_dim):
    """
    Create embedding for <input_data>.
    :param input_data: TF placeholder for text input.
    :param vocab_size: Number of words in vocabulary.
    :param embed_dim: Number of embedding dimensions
    :return: Embedded input.
    """
    # Create embedding lookup table.
    # https://www.tensorflow.org/programmers_guide/embedding
    word_embeddings = tf.get_variable('word_embeddings', [vocab_size, embed_dim])
    embedded_word_ids = tf.nn.embedding_lookup(word_embeddings, input_data)
    # TODO: Implement Function
    return embedded_word_ids


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_embed(get_embed)

Tests Passed


### Build RNN
You created a RNN Cell in the `get_init_cell()` function.  Time to use the cell to create a RNN.
- Build the RNN using the [`tf.nn.dynamic_rnn()`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn)
 - Apply the name "final_state" to the final state using [`tf.identity()`](https://www.tensorflow.org/api_docs/python/tf/identity)

Return the outputs and final_state state in the following tuple `(Outputs, FinalState)` 

In [6]:
def build_rnn(cell, inputs):
    """
    Create a RNN using a RNN Cell
    :param cell: RNN Cell
    :param inputs: Input text data
    :return: Tuple (Outputs, Final State)
    """
    outputs, state = tf.nn.dynamic_rnn(cell, inputs, dtype=tf.float32)
    final_state = tf.identity(state, name='final_state')
    return outputs, final_state


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_build_rnn(build_rnn)

Tests Passed


### Build the Neural Network
Apply the functions you implemented above to:
- Apply embedding to `input_data` using your `get_embed(input_data, vocab_size, embed_dim)` function.
- Build RNN using `cell` and your `build_rnn(cell, inputs)` function.
- Apply a fully connected layer with a linear activation and `vocab_size` as the number of outputs.

Return the logits and final state in the following tuple (Logits, FinalState) 

In [7]:
def build_nn(cell, rnn_size, input_data, vocab_size, embed_dim):
    """
    Build part of the neural network
    :param cell: RNN cell
    :param rnn_size: Size of rnns
    :param input_data: Input data
    :param vocab_size: Vocabulary size
    :param embed_dim: Number of embedding dimensions
    :return: Tuple (Logits, FinalState)
    """
    embed = get_embed(input_data, vocab_size, rnn_size)
    outputs, final_state = build_rnn(cell, embed)
    logits = tf.contrib.layers.fully_connected(outputs, vocab_size, activation_fn=None)
    
    return logits, final_state


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_build_nn(build_nn)

Tests Passed


### Batches
Implement `get_batches` to create batches of input and targets using `int_text`.  The batches should be a Numpy array with the shape `(number of batches, 2, batch size, sequence length)`. Each batch contains two elements:
- The first element is a single batch of **input** with the shape `[batch size, sequence length]`
- The second element is a single batch of **targets** with the shape `[batch size, sequence length]`

If you can't fill the last batch with enough data, drop the last batch.

For exmple, `get_batches([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 3, 2)` would return a Numpy array of the following:
```
[
  # First Batch
  [
    # Batch of Input
    [[ 1  2], [ 7  8], [13 14]]
    # Batch of targets
    [[ 2  3], [ 8  9], [14 15]]
  ]

  # Second Batch
  [
    # Batch of Input
    [[ 3  4], [ 9 10], [15 16]]
    # Batch of targets
    [[ 4  5], [10 11], [16 17]]
  ]

  # Third Batch
  [
    # Batch of Input
    [[ 5  6], [11 12], [17 18]]
    # Batch of targets
    [[ 6  7], [12 13], [18  1]]
  ]
]
```

Notice that the last target value in the last batch is the first input value of the first batch. In this case, `1`. This is a common technique used when creating sequence batches, although it is rather unintuitive.

In [8]:
def get_batches(int_text, batch_size, seq_length):
    """
    Return batches of input and target
    :param int_text: Text with the words replaced by their ids
    :param batch_size: The size of batch
    :param seq_length: The length of sequence
    :return: Batches as a Numpy array
    """
    # Get number of batches.
    chars_per_batch = batch_size * seq_length
    n_batches = int(len(int_text) / chars_per_batch)
    
    # Truncate unused ids.
    ids = int_text[:n_batches * chars_per_batch]
    
    # Reshape into rows of sequence length.
    idsX = np.array(ids).reshape((-1, seq_length)).tolist()
    idsY = np.roll(np.array(ids), -1).reshape((-1, seq_length)).tolist()
    
    # Append sequences into desired array shape.
    idsAll = []
    for b in range(n_batches):
        batch = [[],[]]
        for s in range(batch_size):
            batch[0].append(idsX[s*n_batches+b])
            batch[1].append(idsY[s*n_batches+b])
        idsAll.append(batch)
    idsAll = np.array(idsAll)

    return idsAll


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_batches(get_batches)

Tests Passed


## Neural Network Training
### Hyperparameters
Tune the following parameters:

- Set `num_epochs` to the number of epochs.
- Set `batch_size` to the batch size.
- Set `rnn_size` to the size of the RNNs.
- Set `embed_dim` to the size of the embedding.
- Set `seq_length` to the length of sequence.
- Set `learning_rate` to the learning rate.
- Set `show_every_n_batches` to the number of batches the neural network should print progress.

In [9]:
# Number of Epochs
num_epochs = 10
# Batch Size
batch_size = 32
# RNN Size
rnn_size = 3
# Embedding Dimension Size
embed_dim = 3
# Sequence Length
seq_length = 1
# Learning Rate
learning_rate = 0.01
# Show stats for every n number of batches
show_every_n_batches = 10

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
save_dir = './save'

### Build the Graph
Build the graph using the neural network you implemented.

In [10]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
from tensorflow.contrib import seq2seq

train_graph = tf.Graph()
with train_graph.as_default():
    vocab_size = len(int_to_vocab)
    input_text, targets, lr = get_inputs()
    input_data_shape = tf.shape(input_text)
    cell, initial_state = get_init_cell(input_data_shape[0], rnn_size)
    logits, final_state = build_nn(cell, rnn_size, input_text, vocab_size, embed_dim)

    # Probabilities for generating words
    probs = tf.nn.softmax(logits, name='probs')

    # Loss function
    cost = seq2seq.sequence_loss(
        logits,
        targets,
        tf.ones([input_data_shape[0], input_data_shape[1]]))

    # Optimizer
    optimizer = tf.train.AdamOptimizer(lr)

    # Gradient Clipping
    gradients = optimizer.compute_gradients(cost)
    capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
    train_op = optimizer.apply_gradients(capped_gradients)

## Train
Train the neural network on the preprocessed data.  If you have a hard time getting a good loss, check the [forums](https://discussions.udacity.com/) to see if anyone is having the same problem.

In [11]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
batches = get_batches(int_text, batch_size, seq_length)

with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(num_epochs):
        state = sess.run(initial_state, {input_text: batches[0][0]})

        for batch_i, (x, y) in enumerate(batches):
            feed = {
                input_text: x,
                targets: y,
                initial_state: state,
                lr: learning_rate}
            train_loss, state, _ = sess.run([cost, final_state, train_op], feed)

            # Show every <show_every_n_batches> batches
            if (epoch_i * len(batches) + batch_i) % show_every_n_batches == 0:
                print('Epoch {:>3} Batch {:>4}/{}   train_loss = {:.3f}'.format(
                    epoch_i,
                    batch_i,
                    len(batches),
                    train_loss))

    # Save Model
    saver = tf.train.Saver()
    saver.save(sess, save_dir)
    print('Model Trained and Saved')

Epoch   0 Batch    0/2159   train_loss = 8.822
Epoch   0 Batch   10/2159   train_loss = 8.741
Epoch   0 Batch   20/2159   train_loss = 8.563
Epoch   0 Batch   30/2159   train_loss = 8.355
Epoch   0 Batch   40/2159   train_loss = 8.275
Epoch   0 Batch   50/2159   train_loss = 7.982
Epoch   0 Batch   60/2159   train_loss = 7.724
Epoch   0 Batch   70/2159   train_loss = 6.928
Epoch   0 Batch   80/2159   train_loss = 6.951
Epoch   0 Batch   90/2159   train_loss = 6.266
Epoch   0 Batch  100/2159   train_loss = 5.670
Epoch   0 Batch  110/2159   train_loss = 5.839
Epoch   0 Batch  120/2159   train_loss = 6.323
Epoch   0 Batch  130/2159   train_loss = 6.112
Epoch   0 Batch  140/2159   train_loss = 6.059
Epoch   0 Batch  150/2159   train_loss = 6.216
Epoch   0 Batch  160/2159   train_loss = 5.925
Epoch   0 Batch  170/2159   train_loss = 6.591
Epoch   0 Batch  180/2159   train_loss = 7.185
Epoch   0 Batch  190/2159   train_loss = 4.962
Epoch   0 Batch  200/2159   train_loss = 6.633
Epoch   0 Bat

Epoch   0 Batch 1780/2159   train_loss = 5.421
Epoch   0 Batch 1790/2159   train_loss = 4.949
Epoch   0 Batch 1800/2159   train_loss = 5.042
Epoch   0 Batch 1810/2159   train_loss = 5.209
Epoch   0 Batch 1820/2159   train_loss = 5.562
Epoch   0 Batch 1830/2159   train_loss = 5.606
Epoch   0 Batch 1840/2159   train_loss = 5.582
Epoch   0 Batch 1850/2159   train_loss = 5.490
Epoch   0 Batch 1860/2159   train_loss = 6.288
Epoch   0 Batch 1870/2159   train_loss = 6.132
Epoch   0 Batch 1880/2159   train_loss = 5.755
Epoch   0 Batch 1890/2159   train_loss = 5.887
Epoch   0 Batch 1900/2159   train_loss = 6.965
Epoch   0 Batch 1910/2159   train_loss = 5.228
Epoch   0 Batch 1920/2159   train_loss = 5.444
Epoch   0 Batch 1930/2159   train_loss = 5.841
Epoch   0 Batch 1940/2159   train_loss = 6.300
Epoch   0 Batch 1950/2159   train_loss = 4.900
Epoch   0 Batch 1960/2159   train_loss = 6.303
Epoch   0 Batch 1970/2159   train_loss = 6.514
Epoch   0 Batch 1980/2159   train_loss = 5.099
Epoch   0 Bat

Epoch   1 Batch 1411/2159   train_loss = 4.957
Epoch   1 Batch 1421/2159   train_loss = 5.185
Epoch   1 Batch 1431/2159   train_loss = 5.599
Epoch   1 Batch 1441/2159   train_loss = 4.873
Epoch   1 Batch 1451/2159   train_loss = 6.490
Epoch   1 Batch 1461/2159   train_loss = 4.607
Epoch   1 Batch 1471/2159   train_loss = 5.258
Epoch   1 Batch 1481/2159   train_loss = 4.634
Epoch   1 Batch 1491/2159   train_loss = 6.220
Epoch   1 Batch 1501/2159   train_loss = 4.427
Epoch   1 Batch 1511/2159   train_loss = 4.847
Epoch   1 Batch 1521/2159   train_loss = 5.785
Epoch   1 Batch 1531/2159   train_loss = 6.626
Epoch   1 Batch 1541/2159   train_loss = 5.166
Epoch   1 Batch 1551/2159   train_loss = 5.487
Epoch   1 Batch 1561/2159   train_loss = 4.626
Epoch   1 Batch 1571/2159   train_loss = 5.273
Epoch   1 Batch 1581/2159   train_loss = 5.285
Epoch   1 Batch 1591/2159   train_loss = 5.253
Epoch   1 Batch 1601/2159   train_loss = 5.639
Epoch   1 Batch 1611/2159   train_loss = 4.464
Epoch   1 Bat

Epoch   2 Batch 1042/2159   train_loss = 4.688
Epoch   2 Batch 1052/2159   train_loss = 5.929
Epoch   2 Batch 1062/2159   train_loss = 4.303
Epoch   2 Batch 1072/2159   train_loss = 5.194
Epoch   2 Batch 1082/2159   train_loss = 5.690
Epoch   2 Batch 1092/2159   train_loss = 4.890
Epoch   2 Batch 1102/2159   train_loss = 4.582
Epoch   2 Batch 1112/2159   train_loss = 5.446
Epoch   2 Batch 1122/2159   train_loss = 5.748
Epoch   2 Batch 1132/2159   train_loss = 4.626
Epoch   2 Batch 1142/2159   train_loss = 5.072
Epoch   2 Batch 1152/2159   train_loss = 4.304
Epoch   2 Batch 1162/2159   train_loss = 4.779
Epoch   2 Batch 1172/2159   train_loss = 5.434
Epoch   2 Batch 1182/2159   train_loss = 6.509
Epoch   2 Batch 1192/2159   train_loss = 6.462
Epoch   2 Batch 1202/2159   train_loss = 5.181
Epoch   2 Batch 1212/2159   train_loss = 5.524
Epoch   2 Batch 1222/2159   train_loss = 5.537
Epoch   2 Batch 1232/2159   train_loss = 5.913
Epoch   2 Batch 1242/2159   train_loss = 4.371
Epoch   2 Bat

Epoch   3 Batch  683/2159   train_loss = 4.672
Epoch   3 Batch  693/2159   train_loss = 4.868
Epoch   3 Batch  703/2159   train_loss = 5.445
Epoch   3 Batch  713/2159   train_loss = 4.606
Epoch   3 Batch  723/2159   train_loss = 5.657
Epoch   3 Batch  733/2159   train_loss = 4.982
Epoch   3 Batch  743/2159   train_loss = 4.611
Epoch   3 Batch  753/2159   train_loss = 5.192
Epoch   3 Batch  763/2159   train_loss = 5.355
Epoch   3 Batch  773/2159   train_loss = 4.671
Epoch   3 Batch  783/2159   train_loss = 5.304
Epoch   3 Batch  793/2159   train_loss = 4.771
Epoch   3 Batch  803/2159   train_loss = 5.078
Epoch   3 Batch  813/2159   train_loss = 4.801
Epoch   3 Batch  823/2159   train_loss = 5.344
Epoch   3 Batch  833/2159   train_loss = 5.622
Epoch   3 Batch  843/2159   train_loss = 4.875
Epoch   3 Batch  853/2159   train_loss = 4.460
Epoch   3 Batch  863/2159   train_loss = 4.061
Epoch   3 Batch  873/2159   train_loss = 5.161
Epoch   3 Batch  883/2159   train_loss = 4.539
Epoch   3 Bat

Epoch   4 Batch  274/2159   train_loss = 5.058
Epoch   4 Batch  284/2159   train_loss = 4.539
Epoch   4 Batch  294/2159   train_loss = 5.494
Epoch   4 Batch  304/2159   train_loss = 5.278
Epoch   4 Batch  314/2159   train_loss = 5.732
Epoch   4 Batch  324/2159   train_loss = 4.822
Epoch   4 Batch  334/2159   train_loss = 4.827
Epoch   4 Batch  344/2159   train_loss = 5.804
Epoch   4 Batch  354/2159   train_loss = 5.296
Epoch   4 Batch  364/2159   train_loss = 5.572
Epoch   4 Batch  374/2159   train_loss = 4.905
Epoch   4 Batch  384/2159   train_loss = 5.614
Epoch   4 Batch  394/2159   train_loss = 5.774
Epoch   4 Batch  404/2159   train_loss = 5.018
Epoch   4 Batch  414/2159   train_loss = 4.372
Epoch   4 Batch  424/2159   train_loss = 4.784
Epoch   4 Batch  434/2159   train_loss = 4.654
Epoch   4 Batch  444/2159   train_loss = 5.314
Epoch   4 Batch  454/2159   train_loss = 5.709
Epoch   4 Batch  464/2159   train_loss = 5.787
Epoch   4 Batch  474/2159   train_loss = 5.206
Epoch   4 Bat

Epoch   4 Batch 2034/2159   train_loss = 5.303
Epoch   4 Batch 2044/2159   train_loss = 5.167
Epoch   4 Batch 2054/2159   train_loss = 4.598
Epoch   4 Batch 2064/2159   train_loss = 5.628
Epoch   4 Batch 2074/2159   train_loss = 4.636
Epoch   4 Batch 2084/2159   train_loss = 5.884
Epoch   4 Batch 2094/2159   train_loss = 5.572
Epoch   4 Batch 2104/2159   train_loss = 4.273
Epoch   4 Batch 2114/2159   train_loss = 5.225
Epoch   4 Batch 2124/2159   train_loss = 5.162
Epoch   4 Batch 2134/2159   train_loss = 5.953
Epoch   4 Batch 2144/2159   train_loss = 4.875
Epoch   4 Batch 2154/2159   train_loss = 5.671
Epoch   5 Batch    5/2159   train_loss = 5.463
Epoch   5 Batch   15/2159   train_loss = 4.812
Epoch   5 Batch   25/2159   train_loss = 4.699
Epoch   5 Batch   35/2159   train_loss = 5.816
Epoch   5 Batch   45/2159   train_loss = 5.565
Epoch   5 Batch   55/2159   train_loss = 5.744
Epoch   5 Batch   65/2159   train_loss = 5.230
Epoch   5 Batch   75/2159   train_loss = 4.834
Epoch   5 Bat

Epoch   5 Batch 1665/2159   train_loss = 5.066
Epoch   5 Batch 1675/2159   train_loss = 5.213
Epoch   5 Batch 1685/2159   train_loss = 4.903
Epoch   5 Batch 1695/2159   train_loss = 3.853
Epoch   5 Batch 1705/2159   train_loss = 5.554
Epoch   5 Batch 1715/2159   train_loss = 5.081
Epoch   5 Batch 1725/2159   train_loss = 4.456
Epoch   5 Batch 1735/2159   train_loss = 4.856
Epoch   5 Batch 1745/2159   train_loss = 5.054
Epoch   5 Batch 1755/2159   train_loss = 4.513
Epoch   5 Batch 1765/2159   train_loss = 4.081
Epoch   5 Batch 1775/2159   train_loss = 4.995
Epoch   5 Batch 1785/2159   train_loss = 4.654
Epoch   5 Batch 1795/2159   train_loss = 4.129
Epoch   5 Batch 1805/2159   train_loss = 4.251
Epoch   5 Batch 1815/2159   train_loss = 4.604
Epoch   5 Batch 1825/2159   train_loss = 5.608
Epoch   5 Batch 1835/2159   train_loss = 4.944
Epoch   5 Batch 1845/2159   train_loss = 4.488
Epoch   5 Batch 1855/2159   train_loss = 4.507
Epoch   5 Batch 1865/2159   train_loss = 4.290
Epoch   5 Bat

Epoch   6 Batch 1316/2159   train_loss = 5.061
Epoch   6 Batch 1326/2159   train_loss = 5.379
Epoch   6 Batch 1336/2159   train_loss = 5.021
Epoch   6 Batch 1346/2159   train_loss = 3.643
Epoch   6 Batch 1356/2159   train_loss = 4.915
Epoch   6 Batch 1366/2159   train_loss = 5.599
Epoch   6 Batch 1376/2159   train_loss = 5.784
Epoch   6 Batch 1386/2159   train_loss = 4.371
Epoch   6 Batch 1396/2159   train_loss = 5.172
Epoch   6 Batch 1406/2159   train_loss = 4.632
Epoch   6 Batch 1416/2159   train_loss = 4.846
Epoch   6 Batch 1426/2159   train_loss = 5.294
Epoch   6 Batch 1436/2159   train_loss = 4.526
Epoch   6 Batch 1446/2159   train_loss = 3.971
Epoch   6 Batch 1456/2159   train_loss = 4.758
Epoch   6 Batch 1466/2159   train_loss = 4.413
Epoch   6 Batch 1476/2159   train_loss = 4.743
Epoch   6 Batch 1486/2159   train_loss = 4.640
Epoch   6 Batch 1496/2159   train_loss = 4.699
Epoch   6 Batch 1506/2159   train_loss = 5.380
Epoch   6 Batch 1516/2159   train_loss = 4.801
Epoch   6 Bat

Epoch   7 Batch  907/2159   train_loss = 5.556
Epoch   7 Batch  917/2159   train_loss = 4.247
Epoch   7 Batch  927/2159   train_loss = 5.250
Epoch   7 Batch  937/2159   train_loss = 4.411
Epoch   7 Batch  947/2159   train_loss = 4.615
Epoch   7 Batch  957/2159   train_loss = 4.537
Epoch   7 Batch  967/2159   train_loss = 5.543
Epoch   7 Batch  977/2159   train_loss = 4.586
Epoch   7 Batch  987/2159   train_loss = 4.498
Epoch   7 Batch  997/2159   train_loss = 4.359
Epoch   7 Batch 1007/2159   train_loss = 4.809
Epoch   7 Batch 1017/2159   train_loss = 5.600
Epoch   7 Batch 1027/2159   train_loss = 5.033
Epoch   7 Batch 1037/2159   train_loss = 5.014
Epoch   7 Batch 1047/2159   train_loss = 4.682
Epoch   7 Batch 1057/2159   train_loss = 5.363
Epoch   7 Batch 1067/2159   train_loss = 5.157
Epoch   7 Batch 1077/2159   train_loss = 4.620
Epoch   7 Batch 1087/2159   train_loss = 5.147
Epoch   7 Batch 1097/2159   train_loss = 5.544
Epoch   7 Batch 1107/2159   train_loss = 4.969
Epoch   7 Bat

Epoch   8 Batch  558/2159   train_loss = 4.699
Epoch   8 Batch  568/2159   train_loss = 4.194
Epoch   8 Batch  578/2159   train_loss = 4.791
Epoch   8 Batch  588/2159   train_loss = 4.478
Epoch   8 Batch  598/2159   train_loss = 4.648
Epoch   8 Batch  608/2159   train_loss = 4.757
Epoch   8 Batch  618/2159   train_loss = 4.227
Epoch   8 Batch  628/2159   train_loss = 4.268
Epoch   8 Batch  638/2159   train_loss = 4.666
Epoch   8 Batch  648/2159   train_loss = 5.029
Epoch   8 Batch  658/2159   train_loss = 4.225
Epoch   8 Batch  668/2159   train_loss = 5.814
Epoch   8 Batch  678/2159   train_loss = 5.145
Epoch   8 Batch  688/2159   train_loss = 4.478
Epoch   8 Batch  698/2159   train_loss = 4.487
Epoch   8 Batch  708/2159   train_loss = 4.456
Epoch   8 Batch  718/2159   train_loss = 4.403
Epoch   8 Batch  728/2159   train_loss = 4.506
Epoch   8 Batch  738/2159   train_loss = 4.957
Epoch   8 Batch  748/2159   train_loss = 4.341
Epoch   8 Batch  758/2159   train_loss = 5.119
Epoch   8 Bat

Epoch   9 Batch  149/2159   train_loss = 4.948
Epoch   9 Batch  159/2159   train_loss = 4.157
Epoch   9 Batch  169/2159   train_loss = 4.879
Epoch   9 Batch  179/2159   train_loss = 4.947
Epoch   9 Batch  189/2159   train_loss = 5.222
Epoch   9 Batch  199/2159   train_loss = 4.560
Epoch   9 Batch  209/2159   train_loss = 5.285
Epoch   9 Batch  219/2159   train_loss = 4.841
Epoch   9 Batch  229/2159   train_loss = 4.813
Epoch   9 Batch  239/2159   train_loss = 4.454
Epoch   9 Batch  249/2159   train_loss = 4.491
Epoch   9 Batch  259/2159   train_loss = 5.090
Epoch   9 Batch  269/2159   train_loss = 4.936
Epoch   9 Batch  279/2159   train_loss = 4.099
Epoch   9 Batch  289/2159   train_loss = 5.053
Epoch   9 Batch  299/2159   train_loss = 5.307
Epoch   9 Batch  309/2159   train_loss = 4.831
Epoch   9 Batch  319/2159   train_loss = 4.944
Epoch   9 Batch  329/2159   train_loss = 5.387
Epoch   9 Batch  339/2159   train_loss = 5.338
Epoch   9 Batch  349/2159   train_loss = 5.574
Epoch   9 Bat

Epoch   9 Batch 1899/2159   train_loss = 5.063
Epoch   9 Batch 1909/2159   train_loss = 5.100
Epoch   9 Batch 1919/2159   train_loss = 5.636
Epoch   9 Batch 1929/2159   train_loss = 4.956
Epoch   9 Batch 1939/2159   train_loss = 4.722
Epoch   9 Batch 1949/2159   train_loss = 5.125
Epoch   9 Batch 1959/2159   train_loss = 5.052
Epoch   9 Batch 1969/2159   train_loss = 4.620
Epoch   9 Batch 1979/2159   train_loss = 4.742
Epoch   9 Batch 1989/2159   train_loss = 5.326
Epoch   9 Batch 1999/2159   train_loss = 4.769
Epoch   9 Batch 2009/2159   train_loss = 4.790
Epoch   9 Batch 2019/2159   train_loss = 5.105
Epoch   9 Batch 2029/2159   train_loss = 4.659
Epoch   9 Batch 2039/2159   train_loss = 5.524
Epoch   9 Batch 2049/2159   train_loss = 4.344
Epoch   9 Batch 2059/2159   train_loss = 4.327
Epoch   9 Batch 2069/2159   train_loss = 4.909
Epoch   9 Batch 2079/2159   train_loss = 4.737
Epoch   9 Batch 2089/2159   train_loss = 5.068
Epoch   9 Batch 2099/2159   train_loss = 5.290
Epoch   9 Bat

## Save Parameters
Save `seq_length` and `save_dir` for generating a new TV script.

In [12]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# Save parameters for checkpoint
helper.save_params((seq_length, save_dir))

# Checkpoint

In [13]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import tensorflow as tf
import numpy as np
import helper
import problem_unittests as tests

_, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()
seq_length, load_dir = helper.load_params()

## Implement Generate Functions
### Get Tensors
Get tensors from `loaded_graph` using the function [`get_tensor_by_name()`](https://www.tensorflow.org/api_docs/python/tf/Graph#get_tensor_by_name).  Get the tensors using the following names:
- "input:0"
- "initial_state:0"
- "final_state:0"
- "probs:0"

Return the tensors in the following tuple `(InputTensor, InitialStateTensor, FinalStateTensor, ProbsTensor)` 

In [None]:
def get_tensors(loaded_graph):
    """
    Get input, initial state, final state, and probabilities tensor from <loaded_graph>
    :param loaded_graph: TensorFlow graph loaded from file
    :return: Tuple (InputTensor, InitialStateTensor, FinalStateTensor, ProbsTensor)
    """
    # TODO: Implement Function
    return None, None, None, None


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_tensors(get_tensors)

### Choose Word
Implement the `pick_word()` function to select the next word using `probabilities`.

In [None]:
def pick_word(probabilities, int_to_vocab):
    """
    Pick the next word in the generated text
    :param probabilities: Probabilites of the next word
    :param int_to_vocab: Dictionary of word ids as the keys and words as the values
    :return: String of the predicted word
    """
    # TODO: Implement Function
    return None


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_pick_word(pick_word)

## Generate TV Script
This will generate the TV script for you.  Set `gen_length` to the length of TV script you want to generate.

In [None]:
gen_length = 200
# homer_simpson, moe_szyslak, or Barney_Gumble
prime_word = 'moe_szyslak'

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(load_dir + '.meta')
    loader.restore(sess, load_dir)

    # Get Tensors from loaded model
    input_text, initial_state, final_state, probs = get_tensors(loaded_graph)

    # Sentences generation setup
    gen_sentences = [prime_word + ':']
    prev_state = sess.run(initial_state, {input_text: np.array([[1]])})

    # Generate sentences
    for n in range(gen_length):
        # Dynamic Input
        dyn_input = [[vocab_to_int[word] for word in gen_sentences[-seq_length:]]]
        dyn_seq_length = len(dyn_input[0])

        # Get Prediction
        probabilities, prev_state = sess.run(
            [probs, final_state],
            {input_text: dyn_input, initial_state: prev_state})
        
        pred_word = pick_word(probabilities[dyn_seq_length-1], int_to_vocab)

        gen_sentences.append(pred_word)
    
    # Remove tokens
    tv_script = ' '.join(gen_sentences)
    for key, token in token_dict.items():
        ending = ' ' if key in ['\n', '(', '"'] else ''
        tv_script = tv_script.replace(' ' + token.lower(), key)
    tv_script = tv_script.replace('\n ', '\n')
    tv_script = tv_script.replace('( ', '(')
        
    print(tv_script)

# The TV Script is Nonsensical
It's ok if the TV script doesn't make any sense.  We trained on less than a megabyte of text.  In order to get good results, you'll have to use a smaller vocabulary or get more data.  Luckly there's more data!  As we mentioned in the begging of this project, this is a subset of [another dataset](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data).  We didn't have you train on all the data, because that would take too long.  However, you are free to train your neural network on all the data.  After you complete the project, of course.
# Submitting This Project
When submitting this project, make sure to run all the cells before saving the notebook. Save the notebook file as "dlnd_tv_script_generation.ipynb" and save it as a HTML file under "File" -> "Download as". Include the "helper.py" and "problem_unittests.py" files in your submission.