# TV Script Generation
In this project, you'll generate your own [Simpsons](https://en.wikipedia.org/wiki/The_Simpsons) TV scripts using RNNs.  You'll be using part of the [Simpsons dataset](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data) of scripts from 27 seasons.  The Neural Network you'll build will generate a new TV script for a scene at [Moe's Tavern](https://simpsonswiki.com/wiki/Moe's_Tavern).
## Get the Data
The data is already provided for you.  You'll be using a subset of the original dataset.  It consists of only the scenes in Moe's Tavern.  This doesn't include other versions of the tavern, like "Moe's Cavern", "Flaming Moe's", "Uncle Moe's Family Feed-Bag", etc..

In [214]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper

data_dir = './data/simpsons/moes_tavern_lines.txt'
text = helper.load_data(data_dir)
# Ignore notice, since we don't use it for analysing the data
text = text[81:]

In [215]:
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[9, 8, 7], [6, 5, 4]])
c = np.vstack((a, b))
c


array([[1, 2, 3],
       [4, 5, 6],
       [9, 8, 7],
       [6, 5, 4]])

## Explore the Data
Play around with `view_sentence_range` to view different parts of the data.

In [216]:
view_sentence_range = (0, 10)

"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in text.split()})))
scenes = text.split('\n\n')
print('Number of scenes: {}'.format(len(scenes)))
sentence_count_scene = [scene.count('\n') for scene in scenes]
print('Average number of sentences in each scene: {}'.format(np.average(sentence_count_scene)))

sentences = [sentence for scene in scenes for sentence in scene.split('\n')]
print('Number of lines: {}'.format(len(sentences)))
word_count_sentence = [len(sentence.split()) for sentence in sentences]
print('Average number of words in each line: {}'.format(np.average(word_count_sentence)))

print()
print('The sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))

Dataset Stats
Roughly the number of unique words: 11492
Number of scenes: 262
Average number of sentences in each scene: 15.2519083969
Number of lines: 4258
Average number of words in each line: 11.5016439643
()
The sentences 0 to 10:

Moe_Szyslak: (INTO PHONE) Moe's Tavern. Where the elite meet to drink.
Bart_Simpson: Eh, yeah, hello, is Mike there? Last name, Rotch.
Moe_Szyslak: (INTO PHONE) Hold on, I'll check. (TO BARFLIES) Mike Rotch. Mike Rotch. Hey, has anybody seen Mike Rotch, lately?
Moe_Szyslak: (INTO PHONE) Listen you little puke. One of these days I'm gonna catch you, and I'm gonna carve my name on your back with an ice pick.
Moe_Szyslak: What's the matter Homer? You're not your normal effervescent self.
Homer_Simpson: I got my problems, Moe. Give me another one.
Moe_Szyslak: Homer, hey, you should not drink to forget your problems.
Barney_Gumble: Yeah, you should only drink to enhance your social skills.



## Implement Preprocessing Functions
The first thing to do to any dataset is preprocessing.  Implement the following preprocessing functions below:
- Lookup Table
- Tokenize Punctuation

### Lookup Table
To create a word embedding, you first need to transform the words to ids.  In this function, create two dictionaries:
- Dictionary to go from the words to an id, we'll call `vocab_to_int`
- Dictionary to go from the id to word, we'll call `int_to_vocab`

Return these dictionaries in the following tuple `(vocab_to_int, int_to_vocab)`

In [217]:
import numpy as np
import problem_unittests as tests
import re
from collections import Counter
def create_lookup_tables(text):
    """
    Create lookup tables for vocabulary
    :param text: The text of tv scripts split into words
    :return: A tuple of dicts (vocab_to_int, int_to_vocab)
    """
    word_counts = Counter(text)
    sorted_vocab = sorted(word_counts, key=word_counts.get, reverse=True)
    int_to_vocab = {ii: word for ii, word in enumerate(sorted_vocab)}
    vocab_to_int = {word: ii for ii, word in int_to_vocab.items()}

    return vocab_to_int, int_to_vocab

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_create_lookup_tables(create_lookup_tables)

Tests Passed


### Tokenize Punctuation
We'll be splitting the script into a word array using spaces as delimiters.  However, punctuations like periods and exclamation marks make it hard for the neural network to distinguish between the word "bye" and "bye!".

Implement the function `token_lookup` to return a dict that will be used to tokenize symbols like "!" into "||Exclamation_Mark||".  Create a dictionary for the following symbols where the symbol is the key and value is the token:
- Period ( . )
- Comma ( , )
- Quotation Mark ( " )
- Semicolon ( ; )
- Exclamation mark ( ! )
- Question mark ( ? )
- Left Parentheses ( ( )
- Right Parentheses ( ) )
- Dash ( -- )
- Return ( \n )

This dictionary will be used to token the symbols and add the delimiter (space) around it.  This separates the symbols as it's own word, making it easier for the neural network to predict on the next word. Make sure you don't use a token that could be confused as a word. Instead of using the token "dash", try using something like "||dash||".

In [218]:
def token_lookup():
    """
    Generate a dict to turn punctuation into a token.
    :return: Tokenize dictionary where the key is the punctuation and the value is the token
    """
    # TODO: Implement Function 
    tokens = {}
    tokens['.'] = '<PERIOD>'
    tokens[','] = '<COMMA>'
    tokens['"'] = '<QUOTATION_MARK>'
    tokens[';'] = '<SEMICOLON>'
    tokens['!'] = '<EXCLAMATION_MARK>'
    tokens['?'] = '<QUESTION_MARK>'
    tokens['('] = '<LEFT_PAREN>'
    tokens[')'] = '<RIGHT_PAREN>'
    tokens['--'] = '<HYPHENS>'
    tokens['\n'] = '<NEW_LINE>'
    tokens['?'] = '<QUESTION_MARK>' 
    return tokens

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_tokenize(token_lookup)

Tests Passed


## Preprocess all the data and save it
Running the code cell below will preprocess all the data and save it to file.

In [219]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# Preprocess Training, Validation, and Testing Data
helper.preprocess_and_save_data(data_dir, token_lookup, create_lookup_tables)

# Check Point
This is your first checkpoint. If you ever decide to come back to this notebook or have to restart the notebook, you can start from here. The preprocessed data has been saved to disk.

In [220]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper
import numpy as np
import problem_unittests as tests

int_text, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()

## Build the Neural Network
You'll build the components necessary to build a RNN by implementing the following functions below:
- get_inputs
- get_init_cell
- get_embed
- build_rnn
- build_nn
- get_batches

### Check the Version of TensorFlow and Access to GPU

In [221]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
from distutils.version import LooseVersion
import warnings
import tensorflow as tf

# Check TensorFlow Version
assert LooseVersion(tf.__version__) >= LooseVersion('1.0'), 'Please use TensorFlow version 1.0 or newer'
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
if not tf.test.gpu_device_name():
    warnings.warn('No GPU found. Please use a GPU to train your neural network.')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.1.0


  


### Input
Implement the `get_inputs()` function to create TF Placeholders for the Neural Network.  It should create the following placeholders:
- Input text placeholder named "input" using the [TF Placeholder](https://www.tensorflow.org/api_docs/python/tf/placeholder) `name` parameter.
- Targets placeholder
- Learning Rate placeholder

Return the placeholders in the following tuple `(Input, Targets, LearningRate)`

In [222]:
# Create the graph object
#graph = tf.Graph()
def get_inputs():
    """
    Create TF Placeholders for input, targets, and learning rate.
    :return: Tuple (input, targets, learning rate)
    """
    #with graph.as_default():
        # RDF: Implement Function
    inputs_ = tf.placeholder(tf.int32, [None, None], name='input')
    labels_ = tf.placeholder(tf.int32, [None, None], name='targets')
    learning = tf.placeholder(tf.float32, name='learning_rate')
    return inputs_, labels_, learning


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_inputs(get_inputs)

Tests Passed


### Build RNN Cell and Initialize
Stack one or more [`BasicLSTMCells`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicLSTMCell) in a [`MultiRNNCell`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell).
- The Rnn size should be set using `rnn_size`
- Initalize Cell State using the MultiRNNCell's [`zero_state()`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell#zero_state) function
    - Apply the name "initial_state" to the initial state using [`tf.identity()`](https://www.tensorflow.org/api_docs/python/tf/identity)

Return the cell and initial state in the following tuple `(Cell, InitialState)`

In [223]:
def get_init_cell(batch_size, rnn_size):
    """
    Create an RNN Cell and initialize it.
    :param batch_size: Size of batches
    :param rnn_size: Size of RNNs
    :return: Tuple (cell, initialize state)
    """ 
    # TODO: Implement Function
    # Your basic LSTM cell
    #with graph.as_default():
    lstm = tf.contrib.rnn.BasicLSTMCell(rnn_size)

            # Add dropout to the cell
    drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=0.5)

            # Stack up multiple LSTM layers, for deep learning
    cell = tf.contrib.rnn.MultiRNNCell([drop] * 1)

            # Getting an initial state of all zeros
    initial_state = cell.zero_state(batch_size, tf.float32)
    initial_state = tf.identity(initial_state, name="initial_state")

    return cell, initial_state


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_init_cell(get_init_cell)

Tests Passed


### Word Embedding
Apply embedding to `input_data` using TensorFlow.  Return the embedded sequence.

In [224]:
def get_embed(input_data, vocab_size, embed_dim):
    """
    Create embedding for <input_data>.
    :param input_data: TF placeholder for text input.
    :param vocab_size: Number of words in vocabulary.
    :param embed_dim: Number of embedding dimensions
    :return: Embedded input.
    """
    # RDF: Implement Function  
    #with graph.as_default():
    #sequence = tf.placeholder(tf.float32, [None, embed_dim])
    embedding = tf.Variable(tf.random_uniform( (vocab_size, embed_dim), -1, 1)) 
    return  tf.nn.embedding_lookup(embedding, input_data)

# Size of the embedding vectors (number of units in the embedding layer)
#embed_size = 300 

#with graph.as_default():
#    embedding = tf.Variable(tf.random_uniform((n_words, embed_size), -1, 1))
#    embed = tf.nn.embedding_lookup(embedding, inputs_)


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_embed(get_embed)

Tests Passed


### Build RNN
You created a RNN Cell in the `get_init_cell()` function.  Time to use the cell to create a RNN.
- Build the RNN using the [`tf.nn.dynamic_rnn()`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn)
 - Apply the name "final_state" to the final state using [`tf.identity()`](https://www.tensorflow.org/api_docs/python/tf/identity)

Return the outputs and final_state state in the following tuple `(Outputs, FinalState)` 

In [225]:
def build_rnn(cell, inputs):
    """
    Create a RNN using a RNN Cell
    :param cell: RNN Cell
    :param inputs: Input text data
    :return: Tuple (Outputs, Final State)
    """
    # RDF: Implement Function
    #cell, initial_state = get_init_cell(batch_size, rnn_size)
    outputs, final_state = tf.nn.dynamic_rnn(cell, inputs,time_major=False,
                                             dtype=tf.float32)
    final_state = tf.identity(final_state, name='final_state')
    return outputs, final_state


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_build_rnn(build_rnn)

Tests Passed


### Build the Neural Network
Apply the functions you implemented above to:
- Apply embedding to `input_data` using your `get_embed(input_data, vocab_size, embed_dim)` function.
- Build RNN using `cell` and your `build_rnn(cell, inputs)` function.
- Apply a fully connected layer with a linear activation and `vocab_size` as the number of outputs.

Return the logits and final state in the following tuple (Logits, FinalState) 

In [226]:
import tensorflow as tf
print(' tensor flow version is {}'.format(tf.__version__))
def build_nn(cell, rnn_size, input_data, vocab_size, embed_dim):
    """
    Build part of the neural network
    :param cell: RNN cell
    :param rnn_size: Size of rnns
    :param input_data: Input data
    :param vocab_size: Vocabulary size
    :param embed_dim: Number of embedding dimensions
    :return: Tuple (Logits, FinalState)
    """
    # RDF: Implement Function
    #lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
    
    embedding = get_embed(input_data, vocab_size, rnn_size)
    outputs, final_state = build_rnn(cell, embedding)    
    logits = tf.contrib.layers.fully_connected(outputs, vocab_size, activation_fn=None)
    
       # TODO: Implement Function 
    return (logits, final_state)     


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_build_nn(build_nn)

 tensor flow version is 1.1.0
Tests Passed


### Batches
Implement `get_batches` to create batches of input and targets using `int_text`.  The batches should be a Numpy array with the shape `(number of batches, 2, batch size, sequence length)`. Each batch contains two elements:
- The first element is a single batch of **input** with the shape `[batch size, sequence length]`
- The second element is a single batch of **targets** with the shape `[batch size, sequence length]`

If you can't fill the last batch with enough data, drop the last batch.

For exmple, `get_batches([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 3, 2)` would return a Numpy array of the following:
```
[
  # First Batch
  [
    # Batch of Input
    [[ 1  2], [ 7  8], [13 14]]
    # Batch of targets
    [[ 2  3], [ 8  9], [14 15]]
  ]

  # Second Batch
  [
    # Batch of Input
    [[ 3  4], [ 9 10], [15 16]]
    # Batch of targets
    [[ 4  5], [10 11], [16 17]]
  ]

  # Third Batch
  [
    # Batch of Input
    [[ 5  6], [11 12], [17 18]]
    # Batch of targets
    [[ 6  7], [12 13], [18  1]]
  ]
]
```

Notice that the last target value in the last batch is the first input value of the first batch. In this case, `1`. This is a common technique used when creating sequence batches, although it is rather unintuitive.

## Neural Network Training
### Hyperparameters
Tune the following parameters:

- Set `num_epochs` to the number of epochs.
- Set `batch_size` to the batch size.
- Set `rnn_size` to the size of the RNNs.
- Set `embed_dim` to the size of the embedding.
- Set `seq_length` to the length of sequence.
- Set `learning_rate` to the learning rate.
- Set `show_every_n_batches` to the number of batches the neural network should print progress.

In [227]:
import math
import pprint
def get_batches(int_text, batch_size, seq_length):
    """
    Return batches of input and target
    :param int_text: Text with the words replaced by their ids
    :param batch_size: The size of batch
    :param seq_length: The length of sequence
    :return: Batches as a Numpy array
    """

            
    import numpy as np
    n_batches = len(int_text)//(batch_size*seq_length)
    int_text = int_text[:n_batches*batch_size*seq_length] # We want only whole batches so truncate partial batch
    pp = pprint.PrettyPrinter()
    x = np.asarray((int_text))
    #print(x)
    y = np.zeros_like(x) # See Anna_KaRNNa 
    x2 = x * 2 
    buckets = np.arange( ( n_batches * 2 * batch_size * seq_length) )
    buckets = buckets.reshape([n_batches,2, batch_size, seq_length]) 
    buckets = 0 * buckets
    y[:-1], y[-1] = x[1:], x[0] #offsets targets and swaps start and end
    x,y = x.reshape([n_batches,1, batch_size, seq_length]), y.reshape([n_batches,1, batch_size, seq_length])
    dict_buckets_x = {}
    dict_buckets_y = {}
    bucket_counter = 0
    for i in range(n_batches):
        dict_buckets_x[i] = []
        dict_buckets_y[i] = []
    for idx, val in enumerate(x): 
        for idx__,elx in enumerate(val):
            for idx_,e in enumerate(elx): 
                idxT = idx_
                idxB = n_batches
                b_ = (idxT % idxB)
                #print(idx,idx__,idx_,idxT,idxB,b_,e)
                bucket_x = dict_buckets_x[bucket_counter]
                bucket_x.append(e)
                bucket_y = dict_buckets_y[bucket_counter] 
                bucket_counter = bucket_counter + 1
                bucket_y.append(y[idx][idx__][idx_])
                if bucket_counter == n_batches:
                    bucket_counter=0
    for i in range(n_batches): 
        buckets[i][0] = (dict_buckets_x[i])
        buckets[i][1] = (dict_buckets_y[i])
    return buckets 

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_batches(get_batches)

Tests Passed


In [228]:
# Number of Epochs
num_epochs = 25
# Batch Size
batch_size = 128
# RNN Size
rnn_size = 256
# Embedding Dimension Size
embed_dim = 256
# Sequence Length
seq_length = 5
# Learning Rate
learning_rate = 0.01
# Show stats for every n number of batches
show_every_n_batches = True

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
save_dir = './save'

### Build the Graph
Build the graph using the neural network you implemented.

In [229]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
from tensorflow.contrib import seq2seq

train_graph = tf.Graph()
with train_graph.as_default():
    vocab_size = len(int_to_vocab)
    input_text, targets, lr = get_inputs()
    input_data_shape = tf.shape(input_text)
    cell, initial_state = get_init_cell(input_data_shape[0], rnn_size)
    logits, final_state = build_nn(cell, rnn_size, input_text, vocab_size, embed_dim)

    # Probabilities for generating words
    probs = tf.nn.softmax(logits, name='probs')

    # Loss function
    cost = seq2seq.sequence_loss(
        logits,
        targets,
        tf.ones([input_data_shape[0], input_data_shape[1]]))

    # Optimizer
    optimizer = tf.train.AdamOptimizer(lr)

    # Gradient Clipping
    gradients = optimizer.compute_gradients(cost)
    capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
    train_op = optimizer.apply_gradients(capped_gradients)

## Train
Train the neural network on the preprocessed data.  If you have a hard time getting a good loss, check the [forums](https://discussions.udacity.com/) to see if anyone is having the same problem.

In [230]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
batches = get_batches(int_text, batch_size, seq_length)

with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(num_epochs):
        state = sess.run(initial_state, {input_text: batches[0][0]})

        for batch_i, (x, y) in enumerate(batches):
            feed = {
                input_text: x,
                targets: y,
                initial_state: state,
                lr: learning_rate}
            train_loss, state, _ = sess.run([cost, final_state, train_op], feed)

            # Show every <show_every_n_batches> batches
            if (epoch_i * len(batches) + batch_i) % show_every_n_batches == 0:
                print('Epoch {:>3} Batch {:>4}/{}   train_loss = {:.3f}'.format(
                    epoch_i,
                    batch_i,
                    len(batches),
                    train_loss))

    # Save Model
    saver = tf.train.Saver()
    saver.save(sess, save_dir)
    print('Model Trained and Saved')

Epoch   0 Batch    0/107   train_loss = 8.824
Epoch   0 Batch    1/107   train_loss = 8.695
Epoch   0 Batch    2/107   train_loss = 8.336
Epoch   0 Batch    3/107   train_loss = 7.411
Epoch   0 Batch    4/107   train_loss = 6.920
Epoch   0 Batch    5/107   train_loss = 6.863
Epoch   0 Batch    6/107   train_loss = 6.653
Epoch   0 Batch    7/107   train_loss = 6.600
Epoch   0 Batch    8/107   train_loss = 6.527
Epoch   0 Batch    9/107   train_loss = 6.338
Epoch   0 Batch   10/107   train_loss = 6.121
Epoch   0 Batch   11/107   train_loss = 6.186
Epoch   0 Batch   12/107   train_loss = 6.147
Epoch   0 Batch   13/107   train_loss = 6.009
Epoch   0 Batch   14/107   train_loss = 5.887
Epoch   0 Batch   15/107   train_loss = 6.055
Epoch   0 Batch   16/107   train_loss = 6.189
Epoch   0 Batch   17/107   train_loss = 6.231
Epoch   0 Batch   18/107   train_loss = 6.010
Epoch   0 Batch   19/107   train_loss = 6.023
Epoch   0 Batch   20/107   train_loss = 5.862
Epoch   0 Batch   21/107   train_l

Epoch   1 Batch   72/107   train_loss = 4.884
Epoch   1 Batch   73/107   train_loss = 5.135
Epoch   1 Batch   74/107   train_loss = 4.911
Epoch   1 Batch   75/107   train_loss = 4.812
Epoch   1 Batch   76/107   train_loss = 4.748
Epoch   1 Batch   77/107   train_loss = 4.914
Epoch   1 Batch   78/107   train_loss = 4.941
Epoch   1 Batch   79/107   train_loss = 4.829
Epoch   1 Batch   80/107   train_loss = 4.762
Epoch   1 Batch   81/107   train_loss = 4.929
Epoch   1 Batch   82/107   train_loss = 4.740
Epoch   1 Batch   83/107   train_loss = 4.563
Epoch   1 Batch   84/107   train_loss = 4.933
Epoch   1 Batch   85/107   train_loss = 4.768
Epoch   1 Batch   86/107   train_loss = 4.590
Epoch   1 Batch   87/107   train_loss = 4.753
Epoch   1 Batch   88/107   train_loss = 4.895
Epoch   1 Batch   89/107   train_loss = 4.949
Epoch   1 Batch   90/107   train_loss = 4.990
Epoch   1 Batch   91/107   train_loss = 4.960
Epoch   1 Batch   92/107   train_loss = 4.856
Epoch   1 Batch   93/107   train_l

Epoch   3 Batch   37/107   train_loss = 4.265
Epoch   3 Batch   38/107   train_loss = 4.193
Epoch   3 Batch   39/107   train_loss = 4.212
Epoch   3 Batch   40/107   train_loss = 4.434
Epoch   3 Batch   41/107   train_loss = 4.234
Epoch   3 Batch   42/107   train_loss = 4.211
Epoch   3 Batch   43/107   train_loss = 4.229
Epoch   3 Batch   44/107   train_loss = 4.137
Epoch   3 Batch   45/107   train_loss = 4.319
Epoch   3 Batch   46/107   train_loss = 4.258
Epoch   3 Batch   47/107   train_loss = 4.287
Epoch   3 Batch   48/107   train_loss = 4.128
Epoch   3 Batch   49/107   train_loss = 4.190
Epoch   3 Batch   50/107   train_loss = 4.175
Epoch   3 Batch   51/107   train_loss = 4.259
Epoch   3 Batch   52/107   train_loss = 4.178
Epoch   3 Batch   53/107   train_loss = 4.133
Epoch   3 Batch   54/107   train_loss = 4.191
Epoch   3 Batch   55/107   train_loss = 4.402
Epoch   3 Batch   56/107   train_loss = 4.241
Epoch   3 Batch   57/107   train_loss = 4.034
Epoch   3 Batch   58/107   train_l

Epoch   5 Batch    2/107   train_loss = 3.884
Epoch   5 Batch    3/107   train_loss = 3.817
Epoch   5 Batch    4/107   train_loss = 3.724
Epoch   5 Batch    5/107   train_loss = 3.801
Epoch   5 Batch    6/107   train_loss = 3.848
Epoch   5 Batch    7/107   train_loss = 3.689
Epoch   5 Batch    8/107   train_loss = 3.696
Epoch   5 Batch    9/107   train_loss = 3.695
Epoch   5 Batch   10/107   train_loss = 3.626
Epoch   5 Batch   11/107   train_loss = 3.765
Epoch   5 Batch   12/107   train_loss = 3.842
Epoch   5 Batch   13/107   train_loss = 3.700
Epoch   5 Batch   14/107   train_loss = 3.585
Epoch   5 Batch   15/107   train_loss = 3.750
Epoch   5 Batch   16/107   train_loss = 3.734
Epoch   5 Batch   17/107   train_loss = 3.904
Epoch   5 Batch   18/107   train_loss = 3.733
Epoch   5 Batch   19/107   train_loss = 3.777
Epoch   5 Batch   20/107   train_loss = 3.815
Epoch   5 Batch   21/107   train_loss = 3.726
Epoch   5 Batch   22/107   train_loss = 3.902
Epoch   5 Batch   23/107   train_l

Epoch   6 Batch   74/107   train_loss = 3.461
Epoch   6 Batch   75/107   train_loss = 3.270
Epoch   6 Batch   76/107   train_loss = 3.253
Epoch   6 Batch   77/107   train_loss = 3.574
Epoch   6 Batch   78/107   train_loss = 3.404
Epoch   6 Batch   79/107   train_loss = 3.411
Epoch   6 Batch   80/107   train_loss = 3.404
Epoch   6 Batch   81/107   train_loss = 3.409
Epoch   6 Batch   82/107   train_loss = 3.418
Epoch   6 Batch   83/107   train_loss = 3.336
Epoch   6 Batch   84/107   train_loss = 3.559
Epoch   6 Batch   85/107   train_loss = 3.415
Epoch   6 Batch   86/107   train_loss = 3.296
Epoch   6 Batch   87/107   train_loss = 3.498
Epoch   6 Batch   88/107   train_loss = 3.436
Epoch   6 Batch   89/107   train_loss = 3.505
Epoch   6 Batch   90/107   train_loss = 3.476
Epoch   6 Batch   91/107   train_loss = 3.512
Epoch   6 Batch   92/107   train_loss = 3.453
Epoch   6 Batch   93/107   train_loss = 3.537
Epoch   6 Batch   94/107   train_loss = 3.299
Epoch   6 Batch   95/107   train_l

Epoch   8 Batch   39/107   train_loss = 3.050
Epoch   8 Batch   40/107   train_loss = 3.264
Epoch   8 Batch   41/107   train_loss = 3.148
Epoch   8 Batch   42/107   train_loss = 3.406
Epoch   8 Batch   43/107   train_loss = 3.332
Epoch   8 Batch   44/107   train_loss = 3.137
Epoch   8 Batch   45/107   train_loss = 3.267
Epoch   8 Batch   46/107   train_loss = 3.162
Epoch   8 Batch   47/107   train_loss = 3.206
Epoch   8 Batch   48/107   train_loss = 3.222
Epoch   8 Batch   49/107   train_loss = 3.337
Epoch   8 Batch   50/107   train_loss = 3.217
Epoch   8 Batch   51/107   train_loss = 3.119
Epoch   8 Batch   52/107   train_loss = 3.155
Epoch   8 Batch   53/107   train_loss = 3.151
Epoch   8 Batch   54/107   train_loss = 3.278
Epoch   8 Batch   55/107   train_loss = 3.234
Epoch   8 Batch   56/107   train_loss = 3.315
Epoch   8 Batch   57/107   train_loss = 3.113
Epoch   8 Batch   58/107   train_loss = 3.177
Epoch   8 Batch   59/107   train_loss = 3.341
Epoch   8 Batch   60/107   train_l

Epoch  10 Batch    4/107   train_loss = 2.893
Epoch  10 Batch    5/107   train_loss = 3.058
Epoch  10 Batch    6/107   train_loss = 3.129
Epoch  10 Batch    7/107   train_loss = 3.032
Epoch  10 Batch    8/107   train_loss = 2.948
Epoch  10 Batch    9/107   train_loss = 3.012
Epoch  10 Batch   10/107   train_loss = 2.925
Epoch  10 Batch   11/107   train_loss = 2.995
Epoch  10 Batch   12/107   train_loss = 3.092
Epoch  10 Batch   13/107   train_loss = 2.923
Epoch  10 Batch   14/107   train_loss = 2.936
Epoch  10 Batch   15/107   train_loss = 3.007
Epoch  10 Batch   16/107   train_loss = 3.031
Epoch  10 Batch   17/107   train_loss = 3.170
Epoch  10 Batch   18/107   train_loss = 2.994
Epoch  10 Batch   19/107   train_loss = 3.040
Epoch  10 Batch   20/107   train_loss = 3.025
Epoch  10 Batch   21/107   train_loss = 2.976
Epoch  10 Batch   22/107   train_loss = 3.074
Epoch  10 Batch   23/107   train_loss = 3.153
Epoch  10 Batch   24/107   train_loss = 2.854
Epoch  10 Batch   25/107   train_l

Epoch  11 Batch   76/107   train_loss = 2.881
Epoch  11 Batch   77/107   train_loss = 2.905
Epoch  11 Batch   78/107   train_loss = 2.839
Epoch  11 Batch   79/107   train_loss = 2.869
Epoch  11 Batch   80/107   train_loss = 2.927
Epoch  11 Batch   81/107   train_loss = 2.833
Epoch  11 Batch   82/107   train_loss = 2.899
Epoch  11 Batch   83/107   train_loss = 2.840
Epoch  11 Batch   84/107   train_loss = 3.006
Epoch  11 Batch   85/107   train_loss = 2.826
Epoch  11 Batch   86/107   train_loss = 2.851
Epoch  11 Batch   87/107   train_loss = 2.974
Epoch  11 Batch   88/107   train_loss = 2.918
Epoch  11 Batch   89/107   train_loss = 2.958
Epoch  11 Batch   90/107   train_loss = 2.849
Epoch  11 Batch   91/107   train_loss = 2.892
Epoch  11 Batch   92/107   train_loss = 2.838
Epoch  11 Batch   93/107   train_loss = 2.917
Epoch  11 Batch   94/107   train_loss = 2.725
Epoch  11 Batch   95/107   train_loss = 2.779
Epoch  11 Batch   96/107   train_loss = 2.939
Epoch  11 Batch   97/107   train_l

Epoch  13 Batch   41/107   train_loss = 2.670
Epoch  13 Batch   42/107   train_loss = 2.914
Epoch  13 Batch   43/107   train_loss = 2.864
Epoch  13 Batch   44/107   train_loss = 2.734
Epoch  13 Batch   45/107   train_loss = 2.654
Epoch  13 Batch   46/107   train_loss = 2.657
Epoch  13 Batch   47/107   train_loss = 2.680
Epoch  13 Batch   48/107   train_loss = 2.759
Epoch  13 Batch   49/107   train_loss = 2.819
Epoch  13 Batch   50/107   train_loss = 2.735
Epoch  13 Batch   51/107   train_loss = 2.618
Epoch  13 Batch   52/107   train_loss = 2.781
Epoch  13 Batch   53/107   train_loss = 2.740
Epoch  13 Batch   54/107   train_loss = 2.798
Epoch  13 Batch   55/107   train_loss = 2.701
Epoch  13 Batch   56/107   train_loss = 2.835
Epoch  13 Batch   57/107   train_loss = 2.689
Epoch  13 Batch   58/107   train_loss = 2.781
Epoch  13 Batch   59/107   train_loss = 2.860
Epoch  13 Batch   60/107   train_loss = 2.755
Epoch  13 Batch   61/107   train_loss = 2.725
Epoch  13 Batch   62/107   train_l

Epoch  15 Batch    6/107   train_loss = 2.831
Epoch  15 Batch    7/107   train_loss = 2.676
Epoch  15 Batch    8/107   train_loss = 2.605
Epoch  15 Batch    9/107   train_loss = 2.671
Epoch  15 Batch   10/107   train_loss = 2.562
Epoch  15 Batch   11/107   train_loss = 2.651
Epoch  15 Batch   12/107   train_loss = 2.791
Epoch  15 Batch   13/107   train_loss = 2.642
Epoch  15 Batch   14/107   train_loss = 2.548
Epoch  15 Batch   15/107   train_loss = 2.534
Epoch  15 Batch   16/107   train_loss = 2.543
Epoch  15 Batch   17/107   train_loss = 2.621
Epoch  15 Batch   18/107   train_loss = 2.679
Epoch  15 Batch   19/107   train_loss = 2.581
Epoch  15 Batch   20/107   train_loss = 2.579
Epoch  15 Batch   21/107   train_loss = 2.606
Epoch  15 Batch   22/107   train_loss = 2.676
Epoch  15 Batch   23/107   train_loss = 2.772
Epoch  15 Batch   24/107   train_loss = 2.507
Epoch  15 Batch   25/107   train_loss = 2.623
Epoch  15 Batch   26/107   train_loss = 2.559
Epoch  15 Batch   27/107   train_l

Epoch  16 Batch   78/107   train_loss = 2.526
Epoch  16 Batch   79/107   train_loss = 2.459
Epoch  16 Batch   80/107   train_loss = 2.682
Epoch  16 Batch   81/107   train_loss = 2.513
Epoch  16 Batch   82/107   train_loss = 2.527
Epoch  16 Batch   83/107   train_loss = 2.556
Epoch  16 Batch   84/107   train_loss = 2.703
Epoch  16 Batch   85/107   train_loss = 2.548
Epoch  16 Batch   86/107   train_loss = 2.516
Epoch  16 Batch   87/107   train_loss = 2.708
Epoch  16 Batch   88/107   train_loss = 2.658
Epoch  16 Batch   89/107   train_loss = 2.654
Epoch  16 Batch   90/107   train_loss = 2.537
Epoch  16 Batch   91/107   train_loss = 2.530
Epoch  16 Batch   92/107   train_loss = 2.555
Epoch  16 Batch   93/107   train_loss = 2.592
Epoch  16 Batch   94/107   train_loss = 2.471
Epoch  16 Batch   95/107   train_loss = 2.474
Epoch  16 Batch   96/107   train_loss = 2.671
Epoch  16 Batch   97/107   train_loss = 2.501
Epoch  16 Batch   98/107   train_loss = 2.589
Epoch  16 Batch   99/107   train_l

Epoch  18 Batch   43/107   train_loss = 2.648
Epoch  18 Batch   44/107   train_loss = 2.463
Epoch  18 Batch   45/107   train_loss = 2.434
Epoch  18 Batch   46/107   train_loss = 2.439
Epoch  18 Batch   47/107   train_loss = 2.480
Epoch  18 Batch   48/107   train_loss = 2.561
Epoch  18 Batch   49/107   train_loss = 2.500
Epoch  18 Batch   50/107   train_loss = 2.526
Epoch  18 Batch   51/107   train_loss = 2.375
Epoch  18 Batch   52/107   train_loss = 2.463
Epoch  18 Batch   53/107   train_loss = 2.599
Epoch  18 Batch   54/107   train_loss = 2.569
Epoch  18 Batch   55/107   train_loss = 2.436
Epoch  18 Batch   56/107   train_loss = 2.592
Epoch  18 Batch   57/107   train_loss = 2.423
Epoch  18 Batch   58/107   train_loss = 2.464
Epoch  18 Batch   59/107   train_loss = 2.586
Epoch  18 Batch   60/107   train_loss = 2.478
Epoch  18 Batch   61/107   train_loss = 2.475
Epoch  18 Batch   62/107   train_loss = 2.538
Epoch  18 Batch   63/107   train_loss = 2.523
Epoch  18 Batch   64/107   train_l

Epoch  20 Batch    8/107   train_loss = 2.423
Epoch  20 Batch    9/107   train_loss = 2.407
Epoch  20 Batch   10/107   train_loss = 2.443
Epoch  20 Batch   11/107   train_loss = 2.552
Epoch  20 Batch   12/107   train_loss = 2.580
Epoch  20 Batch   13/107   train_loss = 2.433
Epoch  20 Batch   14/107   train_loss = 2.486
Epoch  20 Batch   15/107   train_loss = 2.431
Epoch  20 Batch   16/107   train_loss = 2.453
Epoch  20 Batch   17/107   train_loss = 2.503
Epoch  20 Batch   18/107   train_loss = 2.420
Epoch  20 Batch   19/107   train_loss = 2.338
Epoch  20 Batch   20/107   train_loss = 2.424
Epoch  20 Batch   21/107   train_loss = 2.368
Epoch  20 Batch   22/107   train_loss = 2.473
Epoch  20 Batch   23/107   train_loss = 2.496
Epoch  20 Batch   24/107   train_loss = 2.386
Epoch  20 Batch   25/107   train_loss = 2.359
Epoch  20 Batch   26/107   train_loss = 2.369
Epoch  20 Batch   27/107   train_loss = 2.490
Epoch  20 Batch   28/107   train_loss = 2.294
Epoch  20 Batch   29/107   train_l

Epoch  21 Batch   80/107   train_loss = 2.554
Epoch  21 Batch   81/107   train_loss = 2.434
Epoch  21 Batch   82/107   train_loss = 2.259
Epoch  21 Batch   83/107   train_loss = 2.402
Epoch  21 Batch   84/107   train_loss = 2.515
Epoch  21 Batch   85/107   train_loss = 2.366
Epoch  21 Batch   86/107   train_loss = 2.401
Epoch  21 Batch   87/107   train_loss = 2.588
Epoch  21 Batch   88/107   train_loss = 2.430
Epoch  21 Batch   89/107   train_loss = 2.525
Epoch  21 Batch   90/107   train_loss = 2.386
Epoch  21 Batch   91/107   train_loss = 2.407
Epoch  21 Batch   92/107   train_loss = 2.289
Epoch  21 Batch   93/107   train_loss = 2.337
Epoch  21 Batch   94/107   train_loss = 2.343
Epoch  21 Batch   95/107   train_loss = 2.351
Epoch  21 Batch   96/107   train_loss = 2.426
Epoch  21 Batch   97/107   train_loss = 2.334
Epoch  21 Batch   98/107   train_loss = 2.427
Epoch  21 Batch   99/107   train_loss = 2.436
Epoch  21 Batch  100/107   train_loss = 2.417
Epoch  21 Batch  101/107   train_l

Epoch  23 Batch   45/107   train_loss = 2.399
Epoch  23 Batch   46/107   train_loss = 2.256
Epoch  23 Batch   47/107   train_loss = 2.273
Epoch  23 Batch   48/107   train_loss = 2.513
Epoch  23 Batch   49/107   train_loss = 2.446
Epoch  23 Batch   50/107   train_loss = 2.395
Epoch  23 Batch   51/107   train_loss = 2.277
Epoch  23 Batch   52/107   train_loss = 2.272
Epoch  23 Batch   53/107   train_loss = 2.462
Epoch  23 Batch   54/107   train_loss = 2.505
Epoch  23 Batch   55/107   train_loss = 2.291
Epoch  23 Batch   56/107   train_loss = 2.377
Epoch  23 Batch   57/107   train_loss = 2.321
Epoch  23 Batch   58/107   train_loss = 2.347
Epoch  23 Batch   59/107   train_loss = 2.479
Epoch  23 Batch   60/107   train_loss = 2.292
Epoch  23 Batch   61/107   train_loss = 2.263
Epoch  23 Batch   62/107   train_loss = 2.369
Epoch  23 Batch   63/107   train_loss = 2.339
Epoch  23 Batch   64/107   train_loss = 2.293
Epoch  23 Batch   65/107   train_loss = 2.259
Epoch  23 Batch   66/107   train_l

## Save Parameters
Save `seq_length` and `save_dir` for generating a new TV script.

In [231]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# Save parameters for checkpoint
helper.save_params((seq_length, save_dir))

# Checkpoint

In [232]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import tensorflow as tf
import numpy as np
import helper
import problem_unittests as tests

_, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()
seq_length, load_dir = helper.load_params()

## Implement Generate Functions
### Get Tensors
Get tensors from `loaded_graph` using the function [`get_tensor_by_name()`](https://www.tensorflow.org/api_docs/python/tf/Graph#get_tensor_by_name).  Get the tensors using the following names:
- "input:0"
- "initial_state:0"
- "final_state:0"
- "probs:0"

Return the tensors in the following tuple `(InputTensor, InitialStateTensor, FinalStateTensor, ProbsTensor)` 

In [237]:
def get_tensors(loaded_graph):
    """
    Get input, initial state, final state, and probabilities tensor from <loaded_graph>
    :param loaded_graph: TensorFlow graph loaded from file
    :return: Tuple (InputTensor, InitialStateTensor, FinalStateTensor, ProbsTensor)
    """
    # TODO: Implement Function
    with loaded_graph.as_default():
        input_state =  loaded_graph.get_tensor_by_name('input:0')
        initial_state = loaded_graph.get_tensor_by_name('initial_state:0')
        final_state = loaded_graph.get_tensor_by_name('final_state:0')
        probs = loaded_graph.get_tensor_by_name('probs:0') 
    return input_state, initial_state, final_state, probs


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_tensors(get_tensors)

Tests Passed


### Choose Word
Implement the `pick_word()` function to select the next word using `probabilities`.

In [243]:
def pick_word(probabilities, int_to_vocab):
    """
    Pick the next word in the generated text
    :param probabilities: Probabilites of the next word
    :param int_to_vocab: Dictionary of word ids as the keys and words as the values
    :return: String of the predicted word
    """ 
    max_prob_index = -1
    for i in range(len(probabilities)):
        if max_prob_index == -1:
            max_prob_index = i
        elif probabilities[max_prob_index] < probabilities[i]:
            max_prob_index = i
            
            
    return int_to_vocab[max_prob_index]


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_pick_word(pick_word)

Tests Passed


## Generate TV Script
This will generate the TV script for you.  Set `gen_length` to the length of TV script you want to generate.

In [244]:
gen_length = 200
# homer_simpson, moe_szyslak, or Barney_Gumble
prime_word = 'moe_szyslak'

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(load_dir + '.meta')
    loader.restore(sess, load_dir)

    # Get Tensors from loaded model
    input_text, initial_state, final_state, probs = get_tensors(loaded_graph)

    # Sentences generation setup
    gen_sentences = [prime_word + ':']
    prev_state = sess.run(initial_state, {input_text: np.array([[1]])})

    # Generate sentences
    for n in range(gen_length):
        # Dynamic Input
        dyn_input = [[vocab_to_int[word] for word in gen_sentences[-seq_length:]]]
        dyn_seq_length = len(dyn_input[0])

        # Get Prediction
        probabilities, prev_state = sess.run(
            [probs, final_state],
            {input_text: dyn_input, initial_state: prev_state})
        
        pred_word = pick_word(probabilities[dyn_seq_length-1], int_to_vocab)

        gen_sentences.append(pred_word)
    
    # Remove tokens
    tv_script = ' '.join(gen_sentences)
    for key, token in token_dict.items():
        ending = ' ' if key in ['\n', '(', '"'] else ''
        tv_script = tv_script.replace(' ' + token.lower(), key)
    tv_script = tv_script.replace('\n ', '\n')
    tv_script = tv_script.replace('( ', '(')
        
    print(tv_script)

INFO:tensorflow:Restoring parameters from ./save
moe_szyslak:(excited) ooo, bart?
homer_simpson:(morose) oh, moe, i...
moe_szyslak:(excited) nice people.
moe_szyslak:(edgy) wreck it.
homer_simpson:(to comic) moe's tavern.
homer_simpson:(tsking) excuse me, sir, homer...
homer_simpson:(to camera) oh, i don't want no more to someone who cannot have anything should be that moron.
homer_simpson:(tsking) oh, homer.
homer_simpson:(singing) yeah, i know, moe. i think you should have that undermine problem in is one texan who don't like goin'?
barney_gumble: i recommend that was the greatest.
moe_szyslak:(into phone) ah, that's why they lost moe them help.
moe_szyslak:(pointed) hey everybody! i have no done! / get him!
teenage_bart: hi homer, i'm a tanked-up loser in our lives.
lenny_leonard:(impressed) nice) it's new y'know we never been filled nervous, or is that correct?
moe_szyslak:(


# The TV Script is Nonsensical
It's ok if the TV script doesn't make any sense.  We trained on less than a megabyte of text.  In order to get good results, you'll have to use a smaller vocabulary or get more data.  Luckly there's more data!  As we mentioned in the begging of this project, this is a subset of [another dataset](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data).  We didn't have you train on all the data, because that would take too long.  However, you are free to train your neural network on all the data.  After you complete the project, of course.
# Submitting This Project
When submitting this project, make sure to run all the cells before saving the notebook. Save the notebook file as "dlnd_tv_script_generation.ipynb" and save it as a HTML file under "File" -> "Download as". Include the "helper.py" and "problem_unittests.py" files in your submission.