# TV Script Generation
In this project, you'll generate your own [Simpsons](https://en.wikipedia.org/wiki/The_Simpsons) TV scripts using RNNs.  You'll be using part of the [Simpsons dataset](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data) of scripts from 27 seasons.  The Neural Network you'll build will generate a new TV script for a scene at [Moe's Tavern](https://simpsonswiki.com/wiki/Moe's_Tavern).
## Get the Data
The data is already provided for you.  You'll be using a subset of the original dataset.  It consists of only the scenes in Moe's Tavern.  This doesn't include other versions of the tavern, like "Moe's Cavern", "Flaming Moe's", "Uncle Moe's Family Feed-Bag", etc..

In [1]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper

data_dir = './data/simpsons/moes_tavern_lines.txt'
text = helper.load_data(data_dir)
# Ignore notice, since we don't use it for analysing the data
text = text[81:]

## Explore the Data
Play around with `view_sentence_range` to view different parts of the data.

In [2]:
view_sentence_range = (0, 10)

"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in text.split()})))
scenes = text.split('\n\n')
print('Number of scenes: {}'.format(len(scenes)))
sentence_count_scene = [scene.count('\n') for scene in scenes]
print('Average number of sentences in each scene: {}'.format(np.average(sentence_count_scene)))

sentences = [sentence for scene in scenes for sentence in scene.split('\n')]
print('Number of lines: {}'.format(len(sentences)))
word_count_sentence = [len(sentence.split()) for sentence in sentences]
print('Average number of words in each line: {}'.format(np.average(word_count_sentence)))

print()
print('The sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))

Dataset Stats
Roughly the number of unique words: 11492
Number of scenes: 262
Average number of sentences in each scene: 15.251908396946565
Number of lines: 4258
Average number of words in each line: 11.50164396430249

The sentences 0 to 10:

Moe_Szyslak: (INTO PHONE) Moe's Tavern. Where the elite meet to drink.
Bart_Simpson: Eh, yeah, hello, is Mike there? Last name, Rotch.
Moe_Szyslak: (INTO PHONE) Hold on, I'll check. (TO BARFLIES) Mike Rotch. Mike Rotch. Hey, has anybody seen Mike Rotch, lately?
Moe_Szyslak: (INTO PHONE) Listen you little puke. One of these days I'm gonna catch you, and I'm gonna carve my name on your back with an ice pick.
Moe_Szyslak: What's the matter Homer? You're not your normal effervescent self.
Homer_Simpson: I got my problems, Moe. Give me another one.
Moe_Szyslak: Homer, hey, you should not drink to forget your problems.
Barney_Gumble: Yeah, you should only drink to enhance your social skills.



## Implement Preprocessing Functions
The first thing to do to any dataset is preprocessing.  Implement the following preprocessing functions below:
- Lookup Table
- Tokenize Punctuation

### Lookup Table
To create a word embedding, you first need to transform the words to ids.  In this function, create two dictionaries:
- Dictionary to go from the words to an id, we'll call `vocab_to_int`
- Dictionary to go from the id to word, we'll call `int_to_vocab`

Return these dictionaries in the following tuple `(vocab_to_int, int_to_vocab)`

In [3]:
import numpy as np
import problem_unittests as tests

def create_lookup_tables(text):
    """
    Create lookup tables for vocabulary
    :param text: The text of tv scripts split into words
    :return: A tuple of dicts (vocab_to_int, int_to_vocab)
    """
    # TODO: Implement Function
    words = set(text)
    words=list(words)
    vocab_to_int = dict()
    int_to_vocab=dict()
    for i in range(len(words)):
        vocab_to_int[words[i]]=i
        int_to_vocab[i]=words[i]
    return vocab_to_int, int_to_vocab


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_create_lookup_tables(create_lookup_tables)

Tests Passed


### Tokenize Punctuation
We'll be splitting the script into a word array using spaces as delimiters.  However, punctuations like periods and exclamation marks make it hard for the neural network to distinguish between the word "bye" and "bye!".

Implement the function `token_lookup` to return a dict that will be used to tokenize symbols like "!" into "||Exclamation_Mark||".  Create a dictionary for the following symbols where the symbol is the key and value is the token:
- Period ( . )
- Comma ( , )
- Quotation Mark ( " )
- Semicolon ( ; )
- Exclamation mark ( ! )
- Question mark ( ? )
- Left Parentheses ( ( )
- Right Parentheses ( ) )
- Dash ( -- )
- Return ( \n )

This dictionary will be used to token the symbols and add the delimiter (space) around it.  This separates the symbols as it's own word, making it easier for the neural network to predict on the next word. Make sure you don't use a token that could be confused as a word. Instead of using the token "dash", try using something like "||dash||".

In [4]:
def token_lookup():
    """
    Generate a dict to turn punctuation into a token.
    :return: Tokenize dictionary where the key is the punctuation and the value is the token
    """
    # TODO: Implement Function
    punctuation_tokenizer=dict()
    punctuation_tokenizer["."]="||Period||"
    punctuation_tokenizer[","]="||Comma||"
    punctuation_tokenizer["\""]="||QuotationMark||"
    punctuation_tokenizer[";"]="||Semicolon||"
    punctuation_tokenizer["!"]="||ExclamationMark||"
    punctuation_tokenizer["?"]="||QuestionMark||"
    punctuation_tokenizer["("]="||LeftParenthesis||"
    punctuation_tokenizer[")"]="||RightParenthesis||"
    punctuation_tokenizer["--"]="||Dash||"
    punctuation_tokenizer["\n"]="||Return||"
    return punctuation_tokenizer

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_tokenize(token_lookup)

Tests Passed


## Preprocess all the data and save it
Running the code cell below will preprocess all the data and save it to file.

In [5]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# Preprocess Training, Validation, and Testing Data
helper.preprocess_and_save_data(data_dir, token_lookup, create_lookup_tables)

# Check Point
This is your first checkpoint. If you ever decide to come back to this notebook or have to restart the notebook, you can start from here. The preprocessed data has been saved to disk.

In [6]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper
import numpy as np
import problem_unittests as tests

int_text, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()

## Build the Neural Network
You'll build the components necessary to build a RNN by implementing the following functions below:
- get_inputs
- get_init_cell
- get_embed
- build_rnn
- build_nn
- get_batches

### Check the Version of TensorFlow and Access to GPU

In [7]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
from distutils.version import LooseVersion
import warnings
import tensorflow as tf

# Check TensorFlow Version
assert LooseVersion(tf.__version__) >= LooseVersion('1.0'), 'Please use TensorFlow version 1.0 or newer'
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
if not tf.test.gpu_device_name():
    warnings.warn('No GPU found. Please use a GPU to train your neural network.')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.0.0
Default GPU Device: /gpu:0


### Input
Implement the `get_inputs()` function to create TF Placeholders for the Neural Network.  It should create the following placeholders:
- Input text placeholder named "input" using the [TF Placeholder](https://www.tensorflow.org/api_docs/python/tf/placeholder) `name` parameter.
- Targets placeholder
- Learning Rate placeholder

Return the placeholders in the following tuple `(Input, Targets, LearningRate)`

In [8]:
def get_inputs():
    """
    Create TF Placeholders for input, targets, and learning rate.
    :return: Tuple (input, targets, learning rate)
    """
    # TODO: Implement Function
    inputs = (tf.placeholder(tf.float32, shape=[None,None], name="input"), \
              tf.placeholder(tf.float32, shape=[None,None], name="targets"), \
             tf.placeholder(tf.float32, name="LearningRate")\
             )
    return inputs


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_inputs(get_inputs)

Tests Passed


### Build RNN Cell and Initialize
Stack one or more [`BasicLSTMCells`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicLSTMCell) in a [`MultiRNNCell`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell).
- The Rnn size should be set using `rnn_size`
- Initalize Cell State using the MultiRNNCell's [`zero_state()`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell#zero_state) function
    - Apply the name "initial_state" to the initial state using [`tf.identity()`](https://www.tensorflow.org/api_docs/python/tf/identity)

Return the cell and initial state in the following tuple `(Cell, InitialState)`

In [9]:
def get_init_cell(batch_size, rnn_size):
    """
    Create an RNN Cell and initialize it.
    :param batch_size: Size of batches
    :param rnn_size: Size of RNNs
    :return: Tuple (cell, initialize state)
    """
    # TODO: Implement Function
    rnn_layer_size = 2
    cell = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.BasicLSTMCell(rnn_size) for _ in range(rnn_layer_size)])
    init_state = tf.identity(cell.zero_state(batch_size, tf.float32), name="initial_state")
    return cell, init_state


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_init_cell(get_init_cell)

Tests Passed


### Word Embedding
Apply embedding to `input_data` using TensorFlow.  Return the embedded sequence.

In [10]:
def get_embed(input_data, vocab_size, embed_dim):
    """
    Create embedding for <input_data>.
    :param input_data: TF placeholder for text input.
    :param vocab_size: Number of words in vocabulary.
    :param embed_dim: Number of embedding dimensions
    :return: Embedded input.
    """
    # TODO: Implement Function
    word_embeddings = tf.get_variable("word_embeddings", [vocab_size, embed_dim])
    embedded_input = tf.nn.embedding_lookup(word_embeddings, input_data)
    return embedded_input


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_embed(get_embed)

Tests Passed


### Build RNN
You created a RNN Cell in the `get_init_cell()` function.  Time to use the cell to create a RNN.
- Build the RNN using the [`tf.nn.dynamic_rnn()`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn)
 - Apply the name "final_state" to the final state using [`tf.identity()`](https://www.tensorflow.org/api_docs/python/tf/identity)

Return the outputs and final_state state in the following tuple `(Outputs, FinalState)` 

In [11]:
def build_rnn(cell, inputs):
    """
    Create a RNN using a RNN Cell
    :param cell: RNN Cell
    :param inputs: Input text data
    :return: Tuple (Outputs, Final State)
    """
    # TODO: Implement Function
    outputs, final_state = tf.nn.dynamic_rnn(cell, inputs, dtype =tf.float32)
    final_state = tf.identity(final_state, name = "final_state")
    return outputs, final_state


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_build_rnn(build_rnn)

Tests Passed


### Build the Neural Network
Apply the functions you implemented above to:
- Apply embedding to `input_data` using your `get_embed(input_data, vocab_size, embed_dim)` function.
- Build RNN using `cell` and your `build_rnn(cell, inputs)` function.
- Apply a fully connected layer with a linear activation and `vocab_size` as the number of outputs.

Return the logits and final state in the following tuple (Logits, FinalState) 

In [12]:
def build_nn(cell, rnn_size, input_data, vocab_size, embed_dim):
    """
    Build part of the neural network
    :param cell: RNN cell
    :param rnn_size: Size of rnns
    :param input_data: Input data
    :param vocab_size: Vocabulary size
    :param embed_dim: Number of embedding dimensions
    :return: Tuple (Logits, FinalState)
    """
    # TODO: Implement Function
    print("rnn size", rnn_size, "input data", input_data, "vocab_size", vocab_size, "embed dim", embed_dim)
    embedded_inputs = get_embed(input_data, vocab_size, embed_dim)
    logits, final_state = build_rnn(cell, embedded_inputs)
    logits = tf.layers.dense(logits, vocab_size, activation=None)
    return logits, final_state


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_build_nn(build_nn)

rnn size 256 input data Tensor("Placeholder:0", shape=(128, 5), dtype=int32) vocab_size 27 embed dim 300
Tests Passed


###### Batches
Implement `get_batches` to create batches of input and targets using `int_text`.  The batches should be a Numpy array with the shape `(number of batches, 2, batch size, sequence length)`. Each batch contains two elements:
- The first element is a single batch of **input** with the shape `[batch size, sequence length]`
- The second element is a single batch of **targets** with the shape `[batch size, sequence length]`

If you can't fill the last batch with enough data, drop the last batch.

For exmple, `get_batches([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 3, 2)` would return a Numpy array of the following:
```
[
  # First Batch
  [
    # Batch of Input
    [[ 1  2], [ 7  8], [13 14]]
    # Batch of targets
    [[ 2  3], [ 8  9], [14 15]]
  ]

  # Second Batch
  [
    # Batch of Input
    [[ 3  4], [ 9 10], [15 16]]
    # Batch of targets
    [[ 4  5], [10 11], [16 17]]
  ]

  # Third Batch
  [
    # Batch of Input
    [[ 5  6], [11 12], [17 18]]
    # Batch of targets
    [[ 6  7], [12 13], [18  1]]
  ]
]
```

Notice that the last target value in the last batch is the first input value of the first batch. In this case, `1`. This is a common technique used when creating sequence batches, although it is rather unintuitive.

In [16]:
def get_batches(int_text, batch_size, seq_length):
    """
    Return batches of input and target
    :param int_text: Text with the words replaced by their ids
    :param batch_size: The size of batch
    :param seq_length: The length of sequence
    :return: Batches as a Numpy array
    """
    # TODO: Implement Function
    num_batches = int(len(int_text)/(batch_size * seq_length))
    print("text int length", len(int_text))
    print("batch size", batch_size)
    print("seq_length", seq_length)
    print("num batches ", num_batches)
    batches = np.zeros((num_batches, 2, batch_size, seq_length), dtype="int")
    print("batches shape ", batches.shape)
    for batch_no in range(0,num_batches):
        for i in range(0,batch_size):
            for j in range (0,seq_length):
                position_formula = j + i * num_batches * seq_length + batch_no * seq_length
                print("batch no ", batch_no, "i ", i, "j ", j, "position formula ", position_formula)
                batches[batch_no,0,i,j] = int_text[position_formula]
                if batch_no == (num_batches - 1) and i == (batch_size - 1) and j == (seq_length - 1):
                    batches[batch_no,1,i,j] = int_text[0]
                else:
                    batches[batch_no,1,i,j] = int_text[position_formula+1]
    print (batches)
    return batches


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_batches(get_batches)

text int length 5000
batch size 128
seq_length 5
num batches  7
batches shape  (7, 2, 128, 5)
batch no  0 i  0 j  0 position formula  0
batch no  0 i  0 j  1 position formula  1
batch no  0 i  0 j  2 position formula  2
batch no  0 i  0 j  3 position formula  3
batch no  0 i  0 j  4 position formula  4
batch no  0 i  1 j  0 position formula  35
batch no  0 i  1 j  1 position formula  36
batch no  0 i  1 j  2 position formula  37
batch no  0 i  1 j  3 position formula  38
batch no  0 i  1 j  4 position formula  39
batch no  0 i  2 j  0 position formula  70
batch no  0 i  2 j  1 position formula  71
batch no  0 i  2 j  2 position formula  72
batch no  0 i  2 j  3 position formula  73
batch no  0 i  2 j  4 position formula  74
batch no  0 i  3 j  0 position formula  105
batch no  0 i  3 j  1 position formula  106
batch no  0 i  3 j  2 position formula  107
batch no  0 i  3 j  3 position formula  108
batch no  0 i  3 j  4 position formula  109
batch no  0 i  4 j  0 position formula  140
ba

batch no  0 i  57 j  1 position formula  1996
batch no  0 i  57 j  2 position formula  1997
batch no  0 i  57 j  3 position formula  1998
batch no  0 i  57 j  4 position formula  1999
batch no  0 i  58 j  0 position formula  2030
batch no  0 i  58 j  1 position formula  2031
batch no  0 i  58 j  2 position formula  2032
batch no  0 i  58 j  3 position formula  2033
batch no  0 i  58 j  4 position formula  2034
batch no  0 i  59 j  0 position formula  2065
batch no  0 i  59 j  1 position formula  2066
batch no  0 i  59 j  2 position formula  2067
batch no  0 i  59 j  3 position formula  2068
batch no  0 i  59 j  4 position formula  2069
batch no  0 i  60 j  0 position formula  2100
batch no  0 i  60 j  1 position formula  2101
batch no  0 i  60 j  2 position formula  2102
batch no  0 i  60 j  3 position formula  2103
batch no  0 i  60 j  4 position formula  2104
batch no  0 i  61 j  0 position formula  2135
batch no  0 i  61 j  1 position formula  2136
batch no  0 i  61 j  2 position fo

batch no  0 i  106 j  2 position formula  3712
batch no  0 i  106 j  3 position formula  3713
batch no  0 i  106 j  4 position formula  3714
batch no  0 i  107 j  0 position formula  3745
batch no  0 i  107 j  1 position formula  3746
batch no  0 i  107 j  2 position formula  3747
batch no  0 i  107 j  3 position formula  3748
batch no  0 i  107 j  4 position formula  3749
batch no  0 i  108 j  0 position formula  3780
batch no  0 i  108 j  1 position formula  3781
batch no  0 i  108 j  2 position formula  3782
batch no  0 i  108 j  3 position formula  3783
batch no  0 i  108 j  4 position formula  3784
batch no  0 i  109 j  0 position formula  3815
batch no  0 i  109 j  1 position formula  3816
batch no  0 i  109 j  2 position formula  3817
batch no  0 i  109 j  3 position formula  3818
batch no  0 i  109 j  4 position formula  3819
batch no  0 i  110 j  0 position formula  3850
batch no  0 i  110 j  1 position formula  3851
batch no  0 i  110 j  2 position formula  3852
batch no  0 i

batch no  1 i  28 j  2 position formula  987
batch no  1 i  28 j  3 position formula  988
batch no  1 i  28 j  4 position formula  989
batch no  1 i  29 j  0 position formula  1020
batch no  1 i  29 j  1 position formula  1021
batch no  1 i  29 j  2 position formula  1022
batch no  1 i  29 j  3 position formula  1023
batch no  1 i  29 j  4 position formula  1024
batch no  1 i  30 j  0 position formula  1055
batch no  1 i  30 j  1 position formula  1056
batch no  1 i  30 j  2 position formula  1057
batch no  1 i  30 j  3 position formula  1058
batch no  1 i  30 j  4 position formula  1059
batch no  1 i  31 j  0 position formula  1090
batch no  1 i  31 j  1 position formula  1091
batch no  1 i  31 j  2 position formula  1092
batch no  1 i  31 j  3 position formula  1093
batch no  1 i  31 j  4 position formula  1094
batch no  1 i  32 j  0 position formula  1125
batch no  1 i  32 j  1 position formula  1126
batch no  1 i  32 j  2 position formula  1127
batch no  1 i  32 j  3 position formu

batch no  1 i  78 j  2 position formula  2737
batch no  1 i  78 j  3 position formula  2738
batch no  1 i  78 j  4 position formula  2739
batch no  1 i  79 j  0 position formula  2770
batch no  1 i  79 j  1 position formula  2771
batch no  1 i  79 j  2 position formula  2772
batch no  1 i  79 j  3 position formula  2773
batch no  1 i  79 j  4 position formula  2774
batch no  1 i  80 j  0 position formula  2805
batch no  1 i  80 j  1 position formula  2806
batch no  1 i  80 j  2 position formula  2807
batch no  1 i  80 j  3 position formula  2808
batch no  1 i  80 j  4 position formula  2809
batch no  1 i  81 j  0 position formula  2840
batch no  1 i  81 j  1 position formula  2841
batch no  1 i  81 j  2 position formula  2842
batch no  1 i  81 j  3 position formula  2843
batch no  1 i  81 j  4 position formula  2844
batch no  1 i  82 j  0 position formula  2875
batch no  1 i  82 j  1 position formula  2876
batch no  1 i  82 j  2 position formula  2877
batch no  1 i  82 j  3 position fo

batch no  2 i  0 j  2 position formula  12
batch no  2 i  0 j  3 position formula  13
batch no  2 i  0 j  4 position formula  14
batch no  2 i  1 j  0 position formula  45
batch no  2 i  1 j  1 position formula  46
batch no  2 i  1 j  2 position formula  47
batch no  2 i  1 j  3 position formula  48
batch no  2 i  1 j  4 position formula  49
batch no  2 i  2 j  0 position formula  80
batch no  2 i  2 j  1 position formula  81
batch no  2 i  2 j  2 position formula  82
batch no  2 i  2 j  3 position formula  83
batch no  2 i  2 j  4 position formula  84
batch no  2 i  3 j  0 position formula  115
batch no  2 i  3 j  1 position formula  116
batch no  2 i  3 j  2 position formula  117
batch no  2 i  3 j  3 position formula  118
batch no  2 i  3 j  4 position formula  119
batch no  2 i  4 j  0 position formula  150
batch no  2 i  4 j  1 position formula  151
batch no  2 i  4 j  2 position formula  152
batch no  2 i  4 j  3 position formula  153
batch no  2 i  4 j  4 position formula  154
b

batch no  2 i  50 j  2 position formula  1762
batch no  2 i  50 j  3 position formula  1763
batch no  2 i  50 j  4 position formula  1764
batch no  2 i  51 j  0 position formula  1795
batch no  2 i  51 j  1 position formula  1796
batch no  2 i  51 j  2 position formula  1797
batch no  2 i  51 j  3 position formula  1798
batch no  2 i  51 j  4 position formula  1799
batch no  2 i  52 j  0 position formula  1830
batch no  2 i  52 j  1 position formula  1831
batch no  2 i  52 j  2 position formula  1832
batch no  2 i  52 j  3 position formula  1833
batch no  2 i  52 j  4 position formula  1834
batch no  2 i  53 j  0 position formula  1865
batch no  2 i  53 j  1 position formula  1866
batch no  2 i  53 j  2 position formula  1867
batch no  2 i  53 j  3 position formula  1868
batch no  2 i  53 j  4 position formula  1869
batch no  2 i  54 j  0 position formula  1900
batch no  2 i  54 j  1 position formula  1901
batch no  2 i  54 j  2 position formula  1902
batch no  2 i  54 j  3 position fo

batch no  2 i  100 j  2 position formula  3512
batch no  2 i  100 j  3 position formula  3513
batch no  2 i  100 j  4 position formula  3514
batch no  2 i  101 j  0 position formula  3545
batch no  2 i  101 j  1 position formula  3546
batch no  2 i  101 j  2 position formula  3547
batch no  2 i  101 j  3 position formula  3548
batch no  2 i  101 j  4 position formula  3549
batch no  2 i  102 j  0 position formula  3580
batch no  2 i  102 j  1 position formula  3581
batch no  2 i  102 j  2 position formula  3582
batch no  2 i  102 j  3 position formula  3583
batch no  2 i  102 j  4 position formula  3584
batch no  2 i  103 j  0 position formula  3615
batch no  2 i  103 j  1 position formula  3616
batch no  2 i  103 j  2 position formula  3617
batch no  2 i  103 j  3 position formula  3618
batch no  2 i  103 j  4 position formula  3619
batch no  2 i  104 j  0 position formula  3650
batch no  2 i  104 j  1 position formula  3651
batch no  2 i  104 j  2 position formula  3652
batch no  2 i

batch no  3 i  22 j  2 position formula  787
batch no  3 i  22 j  3 position formula  788
batch no  3 i  22 j  4 position formula  789
batch no  3 i  23 j  0 position formula  820
batch no  3 i  23 j  1 position formula  821
batch no  3 i  23 j  2 position formula  822
batch no  3 i  23 j  3 position formula  823
batch no  3 i  23 j  4 position formula  824
batch no  3 i  24 j  0 position formula  855
batch no  3 i  24 j  1 position formula  856
batch no  3 i  24 j  2 position formula  857
batch no  3 i  24 j  3 position formula  858
batch no  3 i  24 j  4 position formula  859
batch no  3 i  25 j  0 position formula  890
batch no  3 i  25 j  1 position formula  891
batch no  3 i  25 j  2 position formula  892
batch no  3 i  25 j  3 position formula  893
batch no  3 i  25 j  4 position formula  894
batch no  3 i  26 j  0 position formula  925
batch no  3 i  26 j  1 position formula  926
batch no  3 i  26 j  2 position formula  927
batch no  3 i  26 j  3 position formula  928
batch no  

batch no  3 i  72 j  2 position formula  2537
batch no  3 i  72 j  3 position formula  2538
batch no  3 i  72 j  4 position formula  2539
batch no  3 i  73 j  0 position formula  2570
batch no  3 i  73 j  1 position formula  2571
batch no  3 i  73 j  2 position formula  2572
batch no  3 i  73 j  3 position formula  2573
batch no  3 i  73 j  4 position formula  2574
batch no  3 i  74 j  0 position formula  2605
batch no  3 i  74 j  1 position formula  2606
batch no  3 i  74 j  2 position formula  2607
batch no  3 i  74 j  3 position formula  2608
batch no  3 i  74 j  4 position formula  2609
batch no  3 i  75 j  0 position formula  2640
batch no  3 i  75 j  1 position formula  2641
batch no  3 i  75 j  2 position formula  2642
batch no  3 i  75 j  3 position formula  2643
batch no  3 i  75 j  4 position formula  2644
batch no  3 i  76 j  0 position formula  2675
batch no  3 i  76 j  1 position formula  2676
batch no  3 i  76 j  2 position formula  2677
batch no  3 i  76 j  3 position fo

batch no  3 i  122 j  2 position formula  4287
batch no  3 i  122 j  3 position formula  4288
batch no  3 i  122 j  4 position formula  4289
batch no  3 i  123 j  0 position formula  4320
batch no  3 i  123 j  1 position formula  4321
batch no  3 i  123 j  2 position formula  4322
batch no  3 i  123 j  3 position formula  4323
batch no  3 i  123 j  4 position formula  4324
batch no  3 i  124 j  0 position formula  4355
batch no  3 i  124 j  1 position formula  4356
batch no  3 i  124 j  2 position formula  4357
batch no  3 i  124 j  3 position formula  4358
batch no  3 i  124 j  4 position formula  4359
batch no  3 i  125 j  0 position formula  4390
batch no  3 i  125 j  1 position formula  4391
batch no  3 i  125 j  2 position formula  4392
batch no  3 i  125 j  3 position formula  4393
batch no  3 i  125 j  4 position formula  4394
batch no  3 i  126 j  0 position formula  4425
batch no  3 i  126 j  1 position formula  4426
batch no  3 i  126 j  2 position formula  4427
batch no  3 i

batch no  4 i  44 j  2 position formula  1562
batch no  4 i  44 j  3 position formula  1563
batch no  4 i  44 j  4 position formula  1564
batch no  4 i  45 j  0 position formula  1595
batch no  4 i  45 j  1 position formula  1596
batch no  4 i  45 j  2 position formula  1597
batch no  4 i  45 j  3 position formula  1598
batch no  4 i  45 j  4 position formula  1599
batch no  4 i  46 j  0 position formula  1630
batch no  4 i  46 j  1 position formula  1631
batch no  4 i  46 j  2 position formula  1632
batch no  4 i  46 j  3 position formula  1633
batch no  4 i  46 j  4 position formula  1634
batch no  4 i  47 j  0 position formula  1665
batch no  4 i  47 j  1 position formula  1666
batch no  4 i  47 j  2 position formula  1667
batch no  4 i  47 j  3 position formula  1668
batch no  4 i  47 j  4 position formula  1669
batch no  4 i  48 j  0 position formula  1700
batch no  4 i  48 j  1 position formula  1701
batch no  4 i  48 j  2 position formula  1702
batch no  4 i  48 j  3 position fo

batch no  4 i  94 j  1 position formula  3311
batch no  4 i  94 j  2 position formula  3312
batch no  4 i  94 j  3 position formula  3313
batch no  4 i  94 j  4 position formula  3314
batch no  4 i  95 j  0 position formula  3345
batch no  4 i  95 j  1 position formula  3346
batch no  4 i  95 j  2 position formula  3347
batch no  4 i  95 j  3 position formula  3348
batch no  4 i  95 j  4 position formula  3349
batch no  4 i  96 j  0 position formula  3380
batch no  4 i  96 j  1 position formula  3381
batch no  4 i  96 j  2 position formula  3382
batch no  4 i  96 j  3 position formula  3383
batch no  4 i  96 j  4 position formula  3384
batch no  4 i  97 j  0 position formula  3415
batch no  4 i  97 j  1 position formula  3416
batch no  4 i  97 j  2 position formula  3417
batch no  4 i  97 j  3 position formula  3418
batch no  4 i  97 j  4 position formula  3419
batch no  4 i  98 j  0 position formula  3450
batch no  4 i  98 j  1 position formula  3451
batch no  4 i  98 j  2 position fo

batch no  5 i  16 j  1 position formula  586
batch no  5 i  16 j  2 position formula  587
batch no  5 i  16 j  3 position formula  588
batch no  5 i  16 j  4 position formula  589
batch no  5 i  17 j  0 position formula  620
batch no  5 i  17 j  1 position formula  621
batch no  5 i  17 j  2 position formula  622
batch no  5 i  17 j  3 position formula  623
batch no  5 i  17 j  4 position formula  624
batch no  5 i  18 j  0 position formula  655
batch no  5 i  18 j  1 position formula  656
batch no  5 i  18 j  2 position formula  657
batch no  5 i  18 j  3 position formula  658
batch no  5 i  18 j  4 position formula  659
batch no  5 i  19 j  0 position formula  690
batch no  5 i  19 j  1 position formula  691
batch no  5 i  19 j  2 position formula  692
batch no  5 i  19 j  3 position formula  693
batch no  5 i  19 j  4 position formula  694
batch no  5 i  20 j  0 position formula  725
batch no  5 i  20 j  1 position formula  726
batch no  5 i  20 j  2 position formula  727
batch no  

batch no  5 i  66 j  1 position formula  2336
batch no  5 i  66 j  2 position formula  2337
batch no  5 i  66 j  3 position formula  2338
batch no  5 i  66 j  4 position formula  2339
batch no  5 i  67 j  0 position formula  2370
batch no  5 i  67 j  1 position formula  2371
batch no  5 i  67 j  2 position formula  2372
batch no  5 i  67 j  3 position formula  2373
batch no  5 i  67 j  4 position formula  2374
batch no  5 i  68 j  0 position formula  2405
batch no  5 i  68 j  1 position formula  2406
batch no  5 i  68 j  2 position formula  2407
batch no  5 i  68 j  3 position formula  2408
batch no  5 i  68 j  4 position formula  2409
batch no  5 i  69 j  0 position formula  2440
batch no  5 i  69 j  1 position formula  2441
batch no  5 i  69 j  2 position formula  2442
batch no  5 i  69 j  3 position formula  2443
batch no  5 i  69 j  4 position formula  2444
batch no  5 i  70 j  0 position formula  2475
batch no  5 i  70 j  1 position formula  2476
batch no  5 i  70 j  2 position fo

batch no  5 i  116 j  1 position formula  4086
batch no  5 i  116 j  2 position formula  4087
batch no  5 i  116 j  3 position formula  4088
batch no  5 i  116 j  4 position formula  4089
batch no  5 i  117 j  0 position formula  4120
batch no  5 i  117 j  1 position formula  4121
batch no  5 i  117 j  2 position formula  4122
batch no  5 i  117 j  3 position formula  4123
batch no  5 i  117 j  4 position formula  4124
batch no  5 i  118 j  0 position formula  4155
batch no  5 i  118 j  1 position formula  4156
batch no  5 i  118 j  2 position formula  4157
batch no  5 i  118 j  3 position formula  4158
batch no  5 i  118 j  4 position formula  4159
batch no  5 i  119 j  0 position formula  4190
batch no  5 i  119 j  1 position formula  4191
batch no  5 i  119 j  2 position formula  4192
batch no  5 i  119 j  3 position formula  4193
batch no  5 i  119 j  4 position formula  4194
batch no  5 i  120 j  0 position formula  4225
batch no  5 i  120 j  1 position formula  4226
batch no  5 i

batch no  6 i  38 j  1 position formula  1361
batch no  6 i  38 j  2 position formula  1362
batch no  6 i  38 j  3 position formula  1363
batch no  6 i  38 j  4 position formula  1364
batch no  6 i  39 j  0 position formula  1395
batch no  6 i  39 j  1 position formula  1396
batch no  6 i  39 j  2 position formula  1397
batch no  6 i  39 j  3 position formula  1398
batch no  6 i  39 j  4 position formula  1399
batch no  6 i  40 j  0 position formula  1430
batch no  6 i  40 j  1 position formula  1431
batch no  6 i  40 j  2 position formula  1432
batch no  6 i  40 j  3 position formula  1433
batch no  6 i  40 j  4 position formula  1434
batch no  6 i  41 j  0 position formula  1465
batch no  6 i  41 j  1 position formula  1466
batch no  6 i  41 j  2 position formula  1467
batch no  6 i  41 j  3 position formula  1468
batch no  6 i  41 j  4 position formula  1469
batch no  6 i  42 j  0 position formula  1500
batch no  6 i  42 j  1 position formula  1501
batch no  6 i  42 j  2 position fo

batch no  6 i  88 j  1 position formula  3111
batch no  6 i  88 j  2 position formula  3112
batch no  6 i  88 j  3 position formula  3113
batch no  6 i  88 j  4 position formula  3114
batch no  6 i  89 j  0 position formula  3145
batch no  6 i  89 j  1 position formula  3146
batch no  6 i  89 j  2 position formula  3147
batch no  6 i  89 j  3 position formula  3148
batch no  6 i  89 j  4 position formula  3149
batch no  6 i  90 j  0 position formula  3180
batch no  6 i  90 j  1 position formula  3181
batch no  6 i  90 j  2 position formula  3182
batch no  6 i  90 j  3 position formula  3183
batch no  6 i  90 j  4 position formula  3184
batch no  6 i  91 j  0 position formula  3215
batch no  6 i  91 j  1 position formula  3216
batch no  6 i  91 j  2 position formula  3217
batch no  6 i  91 j  3 position formula  3218
batch no  6 i  91 j  4 position formula  3219
batch no  6 i  92 j  0 position formula  3250
batch no  6 i  92 j  1 position formula  3251
batch no  6 i  92 j  2 position fo

## Neural Network Training
### Hyperparameters
Tune the following parameters:

- Set `num_epochs` to the number of epochs.
- Set `batch_size` to the batch size.
- Set `rnn_size` to the size of the RNNs.
- Set `embed_dim` to the size of the embedding.
- Set `seq_length` to the length of sequence.
- Set `learning_rate` to the learning rate.
- Set `show_every_n_batches` to the number of batches the neural network should print progress.

In [None]:
# Number of Epochs
num_epochs = 100
# Batch Size
batch_size = 128
# RNN Size
rnn_size = 256
# Embedding Dimension Size
embed_dim = 500
# Sequence Length
seq_length = 10
# Learning Rate
learning_rate = 0.01
# Show stats for every n number of batches
show_every_n_batches = 1000

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
save_dir = './save'

### Build the Graph
Build the graph using the neural network you implemented.

In [None]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
from tensorflow.contrib import seq2seq

train_graph = tf.Graph()
with train_graph.as_default():
    vocab_size = len(int_to_vocab)
    input_text, targets, lr = get_inputs()
    input_data_shape = tf.shape(input_text)
    cell, initial_state = get_init_cell(input_data_shape[0], rnn_size)
    logits, final_state = build_nn(cell, rnn_size, input_text, vocab_size, embed_dim)

    # Probabilities for generating words
    probs = tf.nn.softmax(logits, name='probs')

    # Loss function
    cost = seq2seq.sequence_loss(
        logits,
        targets,
        tf.ones([input_data_shape[0], input_data_shape[1]]))

    # Optimizer
    optimizer = tf.train.AdamOptimizer(lr)

    # Gradient Clipping
    gradients = optimizer.compute_gradients(cost)
    capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
    train_op = optimizer.apply_gradients(capped_gradients)

## Train
Train the neural network on the preprocessed data.  If you have a hard time getting a good loss, check the [forums](https://discussions.udacity.com/) to see if anyone is having the same problem.

In [None]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
batches = get_batches(int_text, batch_size, seq_length)

with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(num_epochs):
        state = sess.run(initial_state, {input_text: batches[0][0]})

        for batch_i, (x, y) in enumerate(batches):
            feed = {
                input_text: x,
                targets: y,
                initial_state: state,
                lr: learning_rate}
            train_loss, state, _ = sess.run([cost, final_state, train_op], feed)

            # Show every <show_every_n_batches> batches
            if (epoch_i * len(batches) + batch_i) % show_every_n_batches == 0:
                print('Epoch {:>3} Batch {:>4}/{}   train_loss = {:.3f}'.format(
                    epoch_i,
                    batch_i,
                    len(batches),
                    train_loss))

    # Save Model
    saver = tf.train.Saver()
    saver.save(sess, save_dir)
    print('Model Trained and Saved')

## Save Parameters
Save `seq_length` and `save_dir` for generating a new TV script.

In [None]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# Save parameters for checkpoint
helper.save_params((seq_length, save_dir))

# Checkpoint

In [None]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import tensorflow as tf
import numpy as np
import helper
import problem_unittests as tests

_, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()
seq_length, load_dir = helper.load_params()

## Implement Generate Functions
### Get Tensors
Get tensors from `loaded_graph` using the function [`get_tensor_by_name()`](https://www.tensorflow.org/api_docs/python/tf/Graph#get_tensor_by_name).  Get the tensors using the following names:
- "input:0"
- "initial_state:0"
- "final_state:0"
- "probs:0"

Return the tensors in the following tuple `(InputTensor, InitialStateTensor, FinalStateTensor, ProbsTensor)` 

In [None]:
def get_tensors(loaded_graph):
    """
    Get input, initial state, final state, and probabilities tensor from <loaded_graph>
    :param loaded_graph: TensorFlow graph loaded from file
    :return: Tuple (InputTensor, InitialStateTensor, FinalStateTensor, ProbsTensor)
    """
    # TODO: Implement Function
    return None, None, None, None


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_tensors(get_tensors)

### Choose Word
Implement the `pick_word()` function to select the next word using `probabilities`.

In [None]:
def pick_word(probabilities, int_to_vocab):
    """
    Pick the next word in the generated text
    :param probabilities: Probabilites of the next word
    :param int_to_vocab: Dictionary of word ids as the keys and words as the values
    :return: String of the predicted word
    """
    # TODO: Implement Function
    return None


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_pick_word(pick_word)

## Generate TV Script
This will generate the TV script for you.  Set `gen_length` to the length of TV script you want to generate.

In [None]:
gen_length = 200
# homer_simpson, moe_szyslak, or Barney_Gumble
prime_word = 'moe_szyslak'

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(load_dir + '.meta')
    loader.restore(sess, load_dir)

    # Get Tensors from loaded model
    input_text, initial_state, final_state, probs = get_tensors(loaded_graph)

    # Sentences generation setup
    gen_sentences = [prime_word + ':']
    prev_state = sess.run(initial_state, {input_text: np.array([[1]])})

    # Generate sentences
    for n in range(gen_length):
        # Dynamic Input
        dyn_input = [[vocab_to_int[word] for word in gen_sentences[-seq_length:]]]
        dyn_seq_length = len(dyn_input[0])

        # Get Prediction
        probabilities, prev_state = sess.run(
            [probs, final_state],
            {input_text: dyn_input, initial_state: prev_state})
        
        pred_word = pick_word(probabilities[dyn_seq_length-1], int_to_vocab)

        gen_sentences.append(pred_word)
    
    # Remove tokens
    tv_script = ' '.join(gen_sentences)
    for key, token in token_dict.items():
        ending = ' ' if key in ['\n', '(', '"'] else ''
        tv_script = tv_script.replace(' ' + token.lower(), key)
    tv_script = tv_script.replace('\n ', '\n')
    tv_script = tv_script.replace('( ', '(')
        
    print(tv_script)

# The TV Script is Nonsensical
It's ok if the TV script doesn't make any sense.  We trained on less than a megabyte of text.  In order to get good results, you'll have to use a smaller vocabulary or get more data.  Luckly there's more data!  As we mentioned in the begging of this project, this is a subset of [another dataset](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data).  We didn't have you train on all the data, because that would take too long.  However, you are free to train your neural network on all the data.  After you complete the project, of course.
# Submitting This Project
When submitting this project, make sure to run all the cells before saving the notebook. Save the notebook file as "dlnd_tv_script_generation.ipynb" and save it as a HTML file under "File" -> "Download as". Include the "helper.py" and "problem_unittests.py" files in your submission.