# TV Script Generation

Author: Jubin Soni

GitHub: [Link]('https://github.com/jubins/Deep-Learning-Udacity/blob/master/RecurrentNeuralNetworks/TVScriptGeneratorProject/Dlnd_TV_Script_Generation_JubinSoni.ipynb')

In this project, I have generated the TV Scripts for [Simpsons](https://en.wikipedia.org/wiki/The_Simpsons) using RNNs. My RNN takes the scripts from 27 seasons and creates script for a new TV script scene.


## About the Data
I have used the [Simpsons dataset](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data) from Kaggle of scripts from 27 seasons. The Neural Network I created build generates a new TV script for a scene at [Moe's Tavern](https://simpsonswiki.com/wiki/Moe's_Tavern).

This dataset can be found in this same repository and is a subset of the original dataset. It consists of only the scenes in Moe's Tavern.  This doesn't include other versions of the tavern, like "Moe's Cavern", "Flaming Moe's", "Uncle Moe's Family Feed-Bag", etc..

In [36]:
import helper

data_dir = './data/simpsons/moes_tavern_lines.txt'
text = helper.load_data(data_dir)
# Ignore notice, since we don't use it for analysing the data
text = text[81:]

## Exploring the Data
Using `view_sentence_range` to view different parts of the data.

In [37]:
view_sentence_range = (0, 10)

import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in text.split()})))
scenes = text.split('\n\n')
print('Number of scenes: {}'.format(len(scenes)))
sentence_count_scene = [scene.count('\n') for scene in scenes]
print('Average number of sentences in each scene: {}'.format(np.average(sentence_count_scene)))

sentences = [sentence for scene in scenes for sentence in scene.split('\n')]
print('Number of lines: {}'.format(len(sentences)))
word_count_sentence = [len(sentence.split()) for sentence in sentences]
print('Average number of words in each line: {}'.format(np.average(word_count_sentence)))

print()
print('The sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))

Dataset Stats
Roughly the number of unique words: 11492
Number of scenes: 262
Average number of sentences in each scene: 15.248091603053435
Number of lines: 4257
Average number of words in each line: 11.50434578341555

The sentences 0 to 10:
Moe_Szyslak: (INTO PHONE) Moe's Tavern. Where the elite meet to drink.
Bart_Simpson: Eh, yeah, hello, is Mike there? Last name, Rotch.
Moe_Szyslak: (INTO PHONE) Hold on, I'll check. (TO BARFLIES) Mike Rotch. Mike Rotch. Hey, has anybody seen Mike Rotch, lately?
Moe_Szyslak: (INTO PHONE) Listen you little puke. One of these days I'm gonna catch you, and I'm gonna carve my name on your back with an ice pick.
Moe_Szyslak: What's the matter Homer? You're not your normal effervescent self.
Homer_Simpson: I got my problems, Moe. Give me another one.
Moe_Szyslak: Homer, hey, you should not drink to forget your problems.
Barney_Gumble: Yeah, you should only drink to enhance your social skills.




## Implementing Preprocessing Functions
As a general rule of thumb, the first thing to do to any dataset is preprocessing. This section contains the following preprocessing functions:
- Lookup Table
- Tokenize Punctuation

### Lookup Table
To create a word embedding, first we need to transform the words to ids. This function create two dictionaries:
- Dictionary to go from the words to an id, we'll call `vocab_to_int`
- Dictionary to go from the id to word, we'll call `int_to_vocab`
- Returns following tuple `(vocab_to_int, int_to_vocab)`

In [39]:
import numpy as np
import problem_unittests as tests

def create_lookup_tables(text):
    """
    Create lookup tables for vocabulary
    :param text: The text of tv scripts split into words
    :return: A tuple of dicts (vocab_to_int, int_to_vocab)
    """
    words = set(text)
    vocab_to_int = {w:i for i, w in enumerate(words)}
    int_to_vocab = {i:w for i, w in enumerate(words)}
    return vocab_to_int, int_to_vocab


"""
UDACITY TEST:
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_create_lookup_tables(create_lookup_tables)

Tests Passed


### Tokenize Punctuation
We'll be splitting the script into a word array using spaces as delimiters.  However, punctuations like periods and exclamation marks make it hard for the neural network to distinguish between the word "bye" and "bye!".

Implement the function `token_lookup` to return a dict that will be used to tokenize symbols like "!" into "||Exclamation_Mark||".  Create a dictionary for the following symbols where the symbol is the key and value is the token:
- Period ( . )
- Comma ( , )
- Quotation Mark ( " )
- Semicolon ( ; )
- Exclamation mark ( ! )
- Question mark ( ? )
- Left Parentheses ( ( )
- Right Parentheses ( ) )
- Dash ( -- )
- Return ( \n )

This dictionary will be used to token the symbols and add the delimiter (space) around it.  This separates the symbols as it's own word, making it easier for the neural network to predict on the next word. Make sure you don't use a token that could be confused as a word. Instead of using the token "dash", try using something like "||dash||".

In [47]:
def token_lookup():
    """
    Generates a dict to turn punctuation into a token.
    :return: Tokenize dictionary where the key is the punctuation and the value is the token
    """
    token = {'.':'||Period||',
             ',':'||Comma||',
             '"':'||QuotationMark||',
             ';':'||Semicolon||',
             '!':'||ExclamationMark||',
             '?':'||QuestionMark||',
             '(':'||LeftParentheses||',
             ')':'||RightParentheses||',
             '--':'||Dash||',
             '\n':'||Return||'
            }
    
    return token

"""
UDACITY TEST:
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_tokenize(token_lookup)

Tests Passed


## Preprocessing all the data and save it
The code cell below will preprocess all the data and save it to file.

In [41]:
# Preprocess Training, Validation, and Testing Data
helper.preprocess_and_save_data(data_dir, token_lookup, create_lookup_tables)

# Checkpoint: 1
Since we have created the checkpoint above, if we ever decide to come back to this notebook or have to restart the notebook, we can start from here. The preprocessed data has been saved to disk.

In [43]:
#Makes use of helper.py and problem_unittests.py attached in this repository.
import helper
import numpy as np
import problem_unittests as tests

int_text, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()

## Building the Neural Network
Below I have built the components necessary to create a RNN by implementing the following functions:
- get_inputs
- get_init_cell
- get_embed
- build_rnn
- build_nn
- get_batches

### First checking the Version of TensorFlow and Access to GPU
I ran this notebook on AWS AMI which had TensorFlow version 1.3.0 by default, I downgraded my TensorFlow to version 1.1.0 due to the unittest for `build_nn` function giving shape error, however I feel that my code will work on version 1.3.0 as well and I downgraded after googling various stuff and finally came to know its a version problem! If you are facing the same problem on AWS rum the command below:
`sudo pip3 install tensorflow-gpu==1.1`.

In [48]:
from distutils.version import LooseVersion
import warnings
import tensorflow as tf

# Check TensorFlow Version
assert LooseVersion(tf.__version__) >= LooseVersion('1.0'), 'Please use TensorFlow version 1.0 or newer'
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
if not tf.test.gpu_device_name():
    warnings.warn('No GPU found. Please use a GPU to train your neural network.')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.1.0
Default GPU Device: /gpu:0


### 1. Input
Implementing the `get_inputs()` function to create TF Placeholders for the Neural Network.  It will create the following placeholders:
- Input text placeholder named "input" using the [TF Placeholder](https://www.tensorflow.org/api_docs/python/tf/placeholder) `name` parameter.
- Targets placeholder
- Learning Rate placeholder

This function returns the placeholders in the following tuple `(Input, Targets, LearningRate)`

In [49]:
def get_inputs():
    """
    Creates TF Placeholders for input, targets, and learning rate.
    :return: Tuple (input, targets, learning rate)
    """
    Input = tf.placeholder(tf.int32, shape=[None, None], name='input')
    Targets = tf.placeholder(tf.int32, shape=[None, None], name='targets')
    LearningRate = tf.placeholder(tf.float32, name='learning_rate')
    return Input, Targets, LearningRate


"""
UDACITY TEST:
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_inputs(get_inputs)

Tests Passed


### 2. Building RNN Cell and Initialize
Stack one or more [`BasicLSTMCells`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicLSTMCell) in a [`MultiRNNCell`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell).
- The Rnn size should be set using `rnn_size`
- Initalize Cell State using the MultiRNNCell's [`zero_state()`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell#zero_state) function
    - Apply the name "initial_state" to the initial state using [`tf.identity()`](https://www.tensorflow.org/api_docs/python/tf/identity)

Return the cell and initial state in the following tuple `(Cell, InitialState)`

In [50]:
def get_init_cell(batch_size, rnn_size):
    """
    Creates an RNN Cell and initialize it.
    :param batch_size: Size of batches
    :param rnn_size: Size of RNNs
    :return: Tuple (cell, initialize state)
    """
    #keep_prob=0.8
    lstm1 = tf.contrib.rnn.BasicLSTMCell(rnn_size)
    lstm2 = tf.contrib.rnn.BasicLSTMCell(rnn_size)
    #lstm = tf.contrib.rnn.DropoutWrapper(lstm1, output_keep_prob=keep_prob)
    Cell = tf.contrib.rnn.MultiRNNCell([lstm1])
    InitialState = tf.identity(Cell.zero_state(batch_size, tf.float32), name='initial_state')
    return Cell, InitialState
#     lstm_layers = 2
#     keep_prob = 0.8
#     lstm = tf.contrib.rnn.BasicLSTMCell(rnn_size, activation=tf.sigmoid)
#     lstm = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
#     rnn = tf.contrib.rnn.MultiRNNCell([lstm]*lstm_layers)
#     state = tf.identity(rnn.zero_state(batch_size, tf.float32), name='initial_state')
#     return (rnn, state)


"""
UDACITY TEST:
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_init_cell(get_init_cell)

Tests Passed


### 3. Implementing the Word Embeddings
Applying embedding to `input_data` using TensorFlow, the function Returns the embedded sequence.

In [51]:
def get_embed(input_data, vocab_size, embed_dim):
    """
    Creates embedding for <input_data>.
    :param input_data: TF placeholder for text input.
    :param vocab_size: Number of words in vocabulary.
    :param embed_dim: Number of embedding dimensions
    :return: Embedded input.
    """
    embedding_shape = (vocab_size, embed_dim)
    embedding = tf.Variable(tf.random_uniform(embedding_shape,minval=-1))
    embedded_inputs = tf.nn.embedding_lookup(embedding, input_data)
    return embedded_inputs


"""
UDACITY TEST:
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_embed(get_embed)

Tests Passed


### 4. Build RNN
After creating a RNN Cell in the `get_init_cell()` function. Its time to use the cell to create a RNN.
- Building the RNN using the [`tf.nn.dynamic_rnn()`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn)
 - Applying the name "final_state" to the final state using [`tf.identity()`](https://www.tensorflow.org/api_docs/python/tf/identity)

Function returns the outputs and final_state state in the following tuple `(Outputs, FinalState)` 

In [53]:
def build_rnn(cell, inputs):
    """
    Creates a RNN using a RNN Cell
    :param cell: RNN Cell
    :param inputs: Input text data
    :return: Tuple (Outputs, Final State)
    """
    Outputs, FinalState = tf.nn.dynamic_rnn(cell, inputs, dtype=tf.float32)
    FinalState = tf.identity(FinalState, 'final_state')
    return Outputs, FinalState


"""
UDACITY TEST:
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_build_rnn(build_rnn)

Tests Passed


### 5. Building the Neural Network
Applying the functions implemented above to:
- Applying embedding to `input_data` using your `get_embed(input_data, vocab_size, embed_dim)` function.
- Building RNN using `cell` and your `build_rnn(cell, inputs)` function.
- Applying a fully connected layer with a linear activation and `vocab_size` as the number of outputs.

Below function returns the logits and final state in the following tuple (Logits, FinalState) 

In [54]:
def build_nn(cell, rnn_size, input_data, vocab_size, embed_dim):
    """
    Builds part of the neural network
    :param cell: RNN cell
    :param rnn_size: Size of rnns
    :param input_data: Input data
    :param vocab_size: Vocabulary size
    :param embed_dim: Number of embedding dimensions
    :return: Tuple (Logits, FinalState)
    """
    embedded_inputs = get_embed(input_data, vocab_size, embed_dim)
    Outputs, FinalState = build_rnn(cell, embedded_inputs)
    Logits = tf.contrib.layers.fully_connected(
                Outputs,
                vocab_size,
                activation_fn=None,
                weights_initializer=tf.truncated_normal_initializer(0.0,0.1),
                biases_initializer=tf.zeros_initializer())
    return Logits, FinalState


"""
UDACITY TEST:
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_build_nn(build_nn)

Tests Passed


### 6. Batches
Implementing `get_batches` to create batches of input and targets using `int_text`.  The batches should be a Numpy array with the shape `(number of batches, 2, batch size, sequence length)`. Each batch contains two elements:
- The first element is a single batch of **input** with the shape `[batch size, sequence length]`
- The second element is a single batch of **targets** with the shape `[batch size, sequence length]`

It is good idea to drop the last batch, if you can't fill the last batch with enough data.

For exmple, `get_batches([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 3, 2)` would return a Numpy array of the following:
```
[
  # First Batch
  [
    # Batch of Input
    [[ 1  2], [ 7  8], [13 14]]
    # Batch of targets
    [[ 2  3], [ 8  9], [14 15]]
  ]

  # Second Batch
  [
    # Batch of Input
    [[ 3  4], [ 9 10], [15 16]]
    # Batch of targets
    [[ 4  5], [10 11], [16 17]]
  ]

  # Third Batch
  [
    # Batch of Input
    [[ 5  6], [11 12], [17 18]]
    # Batch of targets
    [[ 6  7], [12 13], [18  1]]
  ]
]
```

Note: The last target value in the last batch is the first input value of the first batch. In this case, `1`. This is a common technique used when creating sequence batches, although it is rather unintuitive.

In [56]:
def get_batches(int_text, batch_size, seq_length):
    """
    Returns batches of input and target
    :param int_text: Text with the words replaced by their ids
    :param batch_size: The size of batch
    :param seq_length: The length of sequence
    :return: Batches as a Numpy array
    """
    n_batches = len(int_text) // (seq_length*batch_size)
    batches = np.zeros((n_batches, 2, batch_size, seq_length))
    int_text = int_text[:(n_batches*seq_length*batch_size)]
    
    for b in range(n_batches):
        for j in range(batch_size):
            # input minibatch
            start = (b+j*n_batches)*seq_length
            batches[b][0][j] = int_text[start:start+seq_length]
            # target minibatch
            end = start+seq_length+1
            if end > len(int_text):
                batches[b][1][j] = int_text[start+1:] + int_text[:end%len(int_text)]
            else:
                batches[b][1][j] = int_text[start+1:end]
    
    return batches



"""
UDACITY TEST:
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_batches(get_batches)

Tests Passed


## Neural Network Training
### Hyperparameters Optimization
Tuning the following parameters:

- Set `num_epochs` to the number of epochs.
- Set `batch_size` to the batch size.
- Set `rnn_size` to the size of the RNNs.
- Set `embed_dim` to the size of the embedding.
- Set `seq_length` to the length of sequence.
- Set `learning_rate` to the learning rate.
- Set `show_every_n_batches` to the number of batches the neural network should print progress.

In [57]:
# Number of Epochs
num_epochs = 100
# Batch Size
batch_size = 128
# RNN Size
rnn_size = 300 #320
# Embedding Dimension Size
embed_dim = 350 #320, #300
# Sequence Length
seq_length = 32
# Learning Rate
learning_rate = 0.005 #0.05, #0.09, #0.009
# Show stats for every n number of batches
show_every_n_batches = 64

"""
UDACITY TEST
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
save_dir = './save'

### Building the Graph
Building the graph using the neural network previously implemented.

In [58]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
from tensorflow.contrib import seq2seq

train_graph = tf.Graph()
with train_graph.as_default():
    vocab_size = len(int_to_vocab)
    input_text, targets, lr = get_inputs()
    input_data_shape = tf.shape(input_text)
    cell, initial_state = get_init_cell(input_data_shape[0], rnn_size)
    logits, final_state = build_nn(cell, rnn_size, input_text, vocab_size, embed_dim)

    # Probabilities for generating words
    probs = tf.nn.softmax(logits, name='probs')

    # Loss function
    cost = seq2seq.sequence_loss(
        logits,
        targets,
        tf.ones([input_data_shape[0], input_data_shape[1]]))

    # Optimizer
    optimizer = tf.train.AdamOptimizer(lr)

    # Gradient Clipping
    gradients = optimizer.compute_gradients(cost)
    capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
    train_op = optimizer.apply_gradients(capped_gradients)

## Training
Training the neural network on the preprocessed data.  Goal of this is getting a getting a good loss. References: [forums](https://discussions.udacity.com/) to check if anyone is having the same problem.

In [59]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
batches = get_batches(int_text, batch_size, seq_length)

with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(num_epochs):
        state = sess.run(initial_state, {input_text: batches[0][0]})

        for batch_i, (x, y) in enumerate(batches):
            feed = {
                input_text: x,
                targets: y,
                initial_state: state,
                lr: learning_rate}
            train_loss, state, _ = sess.run([cost, final_state, train_op], feed)

            # Show every <show_every_n_batches> batches
            if (epoch_i * len(batches) + batch_i) % show_every_n_batches == 0:
                print('Epoch {:>3} Batch {:>4}/{}   train_loss = {:.3f}'.format(
                    epoch_i,
                    batch_i,
                    len(batches),
                    train_loss))

    # Save Model
    saver = tf.train.Saver()
    saver.save(sess, save_dir)
    print('Model Trained and Saved')

Epoch   0 Batch    0/16   train_loss = 8.818
Epoch   4 Batch    0/16   train_loss = 4.022
Epoch   8 Batch    0/16   train_loss = 2.948
Epoch  12 Batch    0/16   train_loss = 2.132
Epoch  16 Batch    0/16   train_loss = 1.636
Epoch  20 Batch    0/16   train_loss = 1.207
Epoch  24 Batch    0/16   train_loss = 0.887
Epoch  28 Batch    0/16   train_loss = 0.663
Epoch  32 Batch    0/16   train_loss = 0.477
Epoch  36 Batch    0/16   train_loss = 0.391
Epoch  40 Batch    0/16   train_loss = 0.291
Epoch  44 Batch    0/16   train_loss = 0.233
Epoch  48 Batch    0/16   train_loss = 0.173
Epoch  52 Batch    0/16   train_loss = 0.137
Epoch  56 Batch    0/16   train_loss = 0.119
Epoch  60 Batch    0/16   train_loss = 0.103
Epoch  64 Batch    0/16   train_loss = 0.097
Epoch  68 Batch    0/16   train_loss = 0.094
Epoch  72 Batch    0/16   train_loss = 0.091
Epoch  76 Batch    0/16   train_loss = 0.089
Epoch  80 Batch    0/16   train_loss = 0.088
Epoch  84 Batch    0/16   train_loss = 0.086
Epoch  88 

## Saving Parameters
Saving `seq_length` and `save_dir` for generating a new TV script.

In [28]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# Save parameters for checkpoint
helper.save_params((seq_length, save_dir))

# Checkpoint: 2
At this point, we have got good validation loss as 0.084 so we will save out model.

In [60]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import tensorflow as tf
import numpy as np
import helper
import problem_unittests as tests

_, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()
seq_length, load_dir = helper.load_params()

## Implementing the Generate Functions
### Get Tensors
Getting tensors from `loaded_graph` using the function [`get_tensor_by_name()`](https://www.tensorflow.org/api_docs/python/tf/Graph#get_tensor_by_name).  Loading the tensors using the following names:
- "input:0"
- "initial_state:0"
- "final_state:0"
- "probs:0"

Below function returns the tensors in the following tuple `(InputTensor, InitialStateTensor, FinalStateTensor, ProbsTensor)` 

In [61]:
def get_tensors(loaded_graph):
    """
    Loads input, initial state, final state, and probabilities tensor from <loaded_graph>
    :param loaded_graph: TensorFlow graph loaded from file
    :return: Tuple (InputTensor, InitialStateTensor, FinalStateTensor, ProbsTensor)
    """
    InputTensor = loaded_graph.get_tensor_by_name('input:0')
    InitialStateTensor = loaded_graph.get_tensor_by_name('initial_state:0')
    FinalStateTensor = loaded_graph.get_tensor_by_name('final_state:0')
    ProbsTensor = loaded_graph.get_tensor_by_name('probs:0')
    return InputTensor, InitialStateTensor, FinalStateTensor, ProbsTensor


"""
UDACITY TEST:
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_tensors(get_tensors)

Tests Passed


### Choosing Word
Implementing the `pick_word()` function to select the next word using `probabilities`.

In [62]:
def pick_word(probabilities, int_to_vocab):
    """
    Picks the next word in the generated text
    :param probabilities: Probabilites of the next word
    :param int_to_vocab: Dictionary of word ids as the keys and words as the values
    :return: String of the predicted word
    """
    predicted_word = int_to_vocab[int(np.searchsorted(np.cumsum(probabilities), np.random.rand()))]
    return predicted_word


"""
UDACITY TEST
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_pick_word(pick_word)

Tests Passed


## Generating the TV Script
Now we will generate the TV script.  Setting `gen_length` to the length of TV script we want to generate. For now we will just set a shorter length like 250.

In [63]:
gen_length = 250
# homer_simpson, moe_szyslak, or Barney_Gumble
prime_word = 'moe_szyslak'

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(load_dir + '.meta')
    loader.restore(sess, load_dir)

    # Get Tensors from loaded model
    input_text, initial_state, final_state, probs = get_tensors(loaded_graph)

    # Sentences generation setup
    gen_sentences = [prime_word + ':']
    prev_state = sess.run(initial_state, {input_text: np.array([[1]])})

    # Generate sentences
    for n in range(gen_length):
        # Dynamic Input
        dyn_input = [[vocab_to_int[word] for word in gen_sentences[-seq_length:]]]
        dyn_seq_length = len(dyn_input[0])

        # Get Prediction
        probabilities, prev_state = sess.run(
            [probs, final_state],
            {input_text: dyn_input, initial_state: prev_state})
        
        pred_word = pick_word(probabilities[dyn_seq_length-1], int_to_vocab)

        gen_sentences.append(pred_word)
    
    # Remove tokens
    tv_script = ' '.join(gen_sentences)
    for key, token in token_dict.items():
        ending = ' ' if key in ['\n', '(', '"'] else ''
        tv_script = tv_script.replace(' ' + token.lower(), key)
    tv_script = tv_script.replace('\n ', '\n')
    tv_script = tv_script.replace('( ', '(')
        
    print(tv_script)

INFO:tensorflow:Restoring parameters from ./save


INFO:tensorflow:Restoring parameters from ./save


moe_szyslak: hey! hey! aw, that's great.
barney_gumble: if you were such big game last, homer.
carl_carlson: say, bob, how come you were never able to kill bart?
moe_szyslak: yeah, a kid should be real at the expense of fistiana.
moe_szyslak: uh, james watt invented the steam engine.
homer_simpson: the best thing?
bart_simpson: wow, it was, it's like there's a sandwich. or my love...
homer_simpson:(singing) hello...
chief_wiggum:(singing) hello...
apu_nahasapeemapetilon:(singing) hello...
all:(murmur understanding assent) oh, i didn't see any that went my head.
are(shocked, nobody could bart to do!
moe_szyslak: calm down, little fella.
moe_szyslak: uh, no. it's probably due to your ugliness. but that doesn't mean-- that's well, i put this up recently, and it's a good team, moe. i got a date with my wife.
homer_simpson: don't worry, moe. i need cash to my wife.(getting idea) y'know, i gotta tell ya everybody, barn.
barney_gumble: no, my god, where i couldn't your flashbacks?
moe_szyslak

# Results

### The TV Script is Nonsensical
We can see that the the TV script doesn't make a lot of sense. This is because we trained on less than a megabyte of text. In order to get good results, we'll have to use a smaller vocabulary or get more data. Luckily there's more data! In the beginning of this project, this is a subset of [another dataset](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data) on Kaggle. I have not trained on all the data, because that would take too long. However, those a future improvement steps for this project on all the data after we complete the project of this nanodegree course.

# Submitting This Project
When submitting this project, make sure to run all the cells before saving the notebook. Save the notebook file as "dlnd_tv_script_generation.ipynb" and save it as a HTML file under "File" -> "Download as". Include the "helper.py" and "problem_unittests.py" files in your submission.