# TV Script Generation

In this project, you'll generate your own [Seinfeld](https://en.wikipedia.org/wiki/Seinfeld) TV scripts using RNNs.  You'll be using part of the [Seinfeld dataset](https://www.kaggle.com/thec03u5/seinfeld-chronicles#scripts.csv) of scripts from 9 seasons.  The Neural Network you'll build will generate a new ,"fake" TV script, based on patterns it recognizes in this training data.

## Get the Data

The data is already provided for you in `./data/Seinfeld_Scripts.txt` and you're encouraged to open that file and look at the text. 
>* As a first step, we'll load in this data and look at some samples. 
* Then, you'll be tasked with defining and training an RNN to generate a new script!

In [1]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# load in data
import helper
data_dir = './data/Seinfeld_Scripts.txt'
text = helper.load_data(data_dir)
# print(text[:2000])

## Explore the Data
Play around with `view_line_range` to view different parts of the data. This will give you a sense of the data you'll be working with. You can see, for example, that it is all lowercase text, and each new line of dialogue is separated by a newline character `\n`.

In [2]:
view_line_range = (0, 10)


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
import numpy as np
from scipy import stats

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in text.split()})))

lines = text.split('\n')
keep_lines = [ line for line in lines if line != '']
lines = keep_lines
print('Number of lines: {}'.format(len(lines)))
word_count_line = [len(line.split()) for line in lines]
# print("word_count_line: ", word_count_line)
average_word_count_line = np.average(word_count_line)
print('Average number of words in each line: {}'.format(np.average(word_count_line)))
print('Median number of words in each line: {}'.format(np.median(word_count_line)))
print('Mode number of words in each line: {}'.format(stats.mode(word_count_line)))

print()
print('The lines {} to {}:'.format(*view_line_range))
print('\n'.join(text.split('\n')[view_line_range[0]:view_line_range[1]]))


Dataset Stats
Roughly the number of unique words: 46367
Number of lines: 54616
Average number of words in each line: 11.088582100483375
Median number of words in each line: 8.0
Mode number of words in each line: ModeResult(mode=array([2]), count=array([4517]))

The lines 0 to 10:
jerry: do you know what this is all about? do you know, why were here? to be out, this is out...and out is one of the single most enjoyable experiences of life. people...did you ever hear people talking about we should go out? this is what theyre talking about...this whole thing, were all out now, no one is home. not one person here is home, were all out! there are people trying to find us, they dont know where we are. (on an imaginary phone) did you ring?, i cant find him. where did he go? he didnt tell me where he was going. he must have gone out. you wanna go out you get ready, you pick out the clothes, right? you take the shower, you get all ready, get the cash, get your friends, the car, the spot, the res

---
## Implement Pre-processing Functions
The first thing to do to any dataset is pre-processing.  Implement the following pre-processing functions below:
- Lookup Table
- Tokenize Punctuation

### Lookup Table
To create a word embedding, you first need to transform the words to ids.  In this function, create two dictionaries:
- Dictionary to go from the words to an id, we'll call `vocab_to_int`
- Dictionary to go from the id to word, we'll call `int_to_vocab`

Return these dictionaries in the following **tuple** `(vocab_to_int, int_to_vocab)`

In [3]:
# imports for Pre-processing functions and testing those functions
import problem_unittests as tests
from collections import Counter
# import re

In [4]:
def create_lookup_tables(words):
    """
    Create lookup tables for vocabulary
    :param text: The text of tv scripts split into words
    :return: A tuple of dicts (vocab_to_int, int_to_vocab)
    """
    ## Build a dictionary that maps words to integers

    word_counts = Counter(words)
    # sorting the words from most to least frequent in text occurrence
    sorted_vocab = sorted(word_counts, key=word_counts.get, reverse=True)
    # create int_to_vocab dictionary
    int_to_vocab = {ii: word for ii, word in enumerate(sorted_vocab)}
    # create vocab_to_int dictionary
    vocab_to_int = {word: ii for ii, word in int_to_vocab.items()}

    return (vocab_to_int, int_to_vocab)


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_create_lookup_tables(create_lookup_tables)

Tests Passed


### Tokenize Punctuation
We'll be splitting the script into a word array using spaces as delimiters.  However, punctuations like periods and exclamation marks can create multiple ids for the same word. For example, "bye" and "bye!" would generate two different word ids.

Implement the function `token_lookup` to return a dict that will be used to tokenize symbols like "!" into "||Exclamation_Mark||".  Create a dictionary for the following symbols where the symbol is the key and value is the token:
- Period ( **.** )
- Comma ( **,** )
- Quotation Mark ( **"** )
- Semicolon ( **;** )
- Exclamation mark ( **!** )
- Question mark ( **?** )
- Left Parentheses ( **(** )
- Right Parentheses ( **)** )
- Dash ( **-** )
- Return ( **\n** )

This dictionary will be used to tokenize the symbols and add the delimiter (space) around it.  This separates each symbols as its own word, making it easier for the neural network to predict the next word. Make sure you don't use a value that could be confused as a word; for example, instead of using the value "dash", try using something like "||dash||".

In [5]:
def token_lookup():
    """
    Generate a dict to turn punctuation into a token.
    :return: Tokenized dictionary where the key is the punctuation and the value is the token
    """
    
    token_lookup_dict = {
        '.': '<PERIOD>',
        ',': '<COMMA>',
        '"': '<QUOTATION_MARK>',
        ';': '<SEMICOLON>',
        '!': '<EXCLAMATION_MARK>',
        '?': '<QUESTION_MARK>',
        '(': '<LEFT_PAREN>',
        ')': '<RIGHT_PAREN>',
        '-': '<DASH>',
#         '--': ' <HYPHENS> ',
        '\n': '<NEW_LINE>',
#         ':': ' <COLON> ',
    }

    return token_lookup_dict

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_tokenize(token_lookup)

Tests Passed


## Pre-process all the data and save it

Running the code cell below will pre-process all the data and save it to file. You're encouraged to lok at the code for `preprocess_and_save_data` in the `helpers.py` file to see what it's doing in detail, but you do not need to change this code.

In [6]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# pre-process training data
helper.preprocess_and_save_data(data_dir, token_lookup, create_lookup_tables)

# Check Point
This is your first checkpoint. If you ever decide to come back to this notebook or have to restart the notebook, you can start from here. The preprocessed data has been saved to disk.

In [7]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper
import problem_unittests as tests

int_text, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()

## Build the Neural Network
In this section, you'll build the components necessary to build an RNN by implementing the RNN Module and forward and backpropagation functions.

### Check Access to GPU

In [8]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import torch

# Check for a GPU
train_on_gpu = torch.cuda.is_available()
if not train_on_gpu:
    print('No GPU found. Please use a GPU to train your neural network.')

No GPU found. Please use a GPU to train your neural network.


## Input
Let's start with the preprocessed input data. We'll use [TensorDataset](http://pytorch.org/docs/master/data.html#torch.utils.data.TensorDataset) to provide a known format to our dataset; in combination with [DataLoader](http://pytorch.org/docs/master/data.html#torch.utils.data.DataLoader), it will handle batching, shuffling, and other dataset iteration functions.

You can create data with TensorDataset by passing in feature and target tensors. Then create a DataLoader as usual.
```
data = TensorDataset(feature_tensors, target_tensors)
data_loader = torch.utils.data.DataLoader(data, 
                                          batch_size=batch_size)
```

### Batching
Implement the `batch_data` function to batch `words` data into chunks of size `batch_size` using the `TensorDataset` and `DataLoader` classes.

>You can batch words using the DataLoader, but it will be up to you to create `feature_tensors` and `target_tensors` of the correct size and content for a given `sequence_length`.

For example, say we have these as input:
```
words = [1, 2, 3, 4, 5, 6, 7]
sequence_length = 4
```

Your first `feature_tensor` should contain the values:
```
[1, 2, 3, 4]
```
And the corresponding `target_tensor` should just be the next "word":
```
5
```
This should continue with the second `feature_tensor`, `target_tensor` being:
```
[2, 3, 4, 5]  # features
6             # target

```


In [9]:
def create_tensors_from_words(words, sequence_length):
    
    feature_lists = []
    target_list = []

    # number of targets possible with the given list and 
    num_targets = len(words) - sequence_length
    
    if (num_targets >= 1):
        for i in range(num_targets):
            
            # creates feature list with corresponding target and appends the to proper python list
            feature_lists.append(words[i:sequence_length + i])
            target_list.append(words[sequence_length + i])
            
        # convert python list into numpy array, and then into pytorch tensor        
        feature_tensor = torch.from_numpy(np.asarray(feature_lists))
        target_tensor = torch.from_numpy(np.asarray(target_list))
        
        return feature_tensor, target_tensor

words = [1, 2, 3, 4, 5, 6, 7]
sequence_length = 4
feature_tensors, target_tensors = create_tensors_from_words(words, sequence_length)
print(feature_tensors)
print(target_tensors)


tensor([[1, 2, 3, 4],
        [2, 3, 4, 5],
        [3, 4, 5, 6]])
tensor([5, 6, 7])


In [10]:
from torch.utils.data import TensorDataset, DataLoader



def batch_data(words, sequence_length, batch_size):
    """
    Batch the neural network data using DataLoader
    :param words: The word ids of the TV scripts
    :param sequence_length: The sequence length of each batch
    :param batch_size: The size of each batch; the number of sequences in a batch
    :return: DataLoader with batched data
    """
    # TODO: Implement Function
    
    #create feature and target tensors from the words with a given sequence_length 
    feature_tensors, target_tensors = create_tensors_from_words(words, sequence_length)

    # create Tensor datasets
    data = TensorDataset(feature_tensors, target_tensors)
    
    # create Dataloader with tenser dataset and batch size
    dataloader = DataLoader(data, batch_size=batch_size, shuffle=True)
  
    # return a dataloader
    return dataloader

# there is no test for this function, but you are encouraged to create
# print statements and tests of your own

# test loader - does not change int_text 
# Use the following line to test batch_data 
#
# text as int
test_words = [1, 2, 3, 4, 5, 6, 7]
# number of words in a sequence
test_sequence_length = 4
# batch size
test_batch_size = 4
test_data_loader = batch_data(test_words, test_sequence_length, test_batch_size)
print("\ntest_data_loader.dataset.tensors:\n{}".format(test_data_loader.dataset.tensors))



test_data_loader.dataset.tensors:
(tensor([[1, 2, 3, 4],
        [2, 3, 4, 5],
        [3, 4, 5, 6]]), tensor([5, 6, 7]))


---
## Build the Neural Network
Implement an RNN using PyTorch's [Module class](http://pytorch.org/docs/master/nn.html#torch.nn.Module). You may choose to use a GRU or an LSTM. To complete the RNN, you'll have to implement the following functions for the class:
 - `__init__` - The initialize function. 
 - `init_hidden` - The initialization function for an LSTM/GRU hidden state
 - `forward` - Forward propagation function.
 
The initialize function should create the layers of the neural network and save them to the class. The forward propagation function will use these layers to run forward propagation and generate an output and a hidden state.

**The output of this model should be the *last* batch of words** after a complete sequence has been processed. That is, for each input sequence of words, we only want to output one, next word.

In [12]:
import torch.nn as nn

class RNN(nn.Module):
    
    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5):
        """
        Initialize the PyTorch RNN Module
        :param vocab_size: The number of input dimensions of the neural network (the size of the vocabulary)
        :param output_size: The number of output dimensions of the neural network
        :param embedding_dim: The size of embeddings, should you choose to use them        
        :param hidden_dim: The size of the hidden layer outputs
        :param dropout: dropout to add in between LSTM/GRU layers
        """
        super(RNN, self).__init__()
        # TODO: Implement function
        
        # set class variables
        self.vocab_size = vocab_size
        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        self.clip = 5
        
        # define all model layers
        
        # embed layer
        self.embed_layer = nn.Embedding(vocab_size, embedding_dim)

        # LSTM layer
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, dropout=dropout, batch_first=True)
  
     
        # fully-connected output layer
        self.fc = nn.Linear(hidden_dim, output_size)

    
    
    def forward(self, nn_input, hidden):
        """
        Forward propagation of the neural network
        :param nn_input: The input to the neural network
        :param hidden: The hidden state        
        :return: Two Tensors, the output of the neural network and the latest hidden state
        """
        # TODO: Implement function   
        
        batch_size = nn_input.size(0)
        # embeddings and lstm_out
        embeds = self.embed_layer(nn_input)

        lstm_output, hidden = self.lstm(embeds, hidden)
        
        # stack up lstm outputs
        lstm_output = lstm_output.contiguous().view(-1, self.hidden_dim)


        # fully-connected layer
        fc_output = self.fc(lstm_output)

        # reshape to be batch_size first
        fc_output = fc_output.view(batch_size, -1, self.output_size)
        fc_output = fc_output[:, -1] # get last batch of labels

        # return one batch of output word scores and the hidden state
        return fc_output, hidden
    
    
    def init_hidden(self, batch_size):
        '''
        Initialize the hidden state of an LSTM/GRU with zero weights, and move to GPU if available
        :param batch_size: The batch_size of the hidden state
        :param train_on_gpu: bool True if cuda GPU is avalible
        :return: hidden state of dims (n_layers, batch_size, hidden_dim)
        '''
        # Implement function
        
        # Create two new tensors with sizes n_layers x batch_size x hidden_dim,
        # initialized to zero, for hidden state and cell state of LSTM
        weight = next(self.parameters()).data

        if train_on_gpu:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())

        return hidden

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_rnn(RNN, train_on_gpu)

Tests Passed


### Define forward and backpropagation

Use the RNN class you implemented to apply forward and back propagation. This function will be called, iteratively, in the training loop as follows:
```
loss = forward_back_prop(decoder, decoder_optimizer, criterion, inp, target)
```

And it should return the average loss over a batch and the hidden state returned by a call to `RNN(inp, hidden)`. Recall that you can get this loss by computing it, as usual, and calling `loss.item()`.

**If a GPU is available, you should move your data to that GPU device, here.**

In [13]:
def forward_back_prop(rnn, optimizer, criterion, inp, target, hidden):
    """
    Forward and backward propagation on the neural network
    :param decoder: The PyTorch Module that holds the neural network
    :param decoder_optimizer: The PyTorch optimizer for the neural network
    :param criterion: The PyTorch loss function
    :param inp: A batch of input to the neural network
    :param target: The target output for the batch of input
    :return: The loss and the latest hidden state Tensor
    """
    
    # TODO: Implement Function
    
    # move data to GPU, if available
    if(train_on_gpu):
        inp, target = inp.cuda(), target.cuda()

    # Creating new variables for the hidden state, otherwise
    # we'd backprop through the entire training history
    hidden = tuple([each.data for each in hidden])

    # zero accumulated gradients
    rnn.zero_grad()

    # get the output from the model
    output, hidden = rnn(inp, hidden)

    
    # perform backpropagation and optimization
    # calculate the loss and perform backprop
    loss = criterion(output, target)    
    loss.backward()
    
    # prevent exploding gradients by clipping to clip
    nn.utils.clip_grad_norm_(rnn.parameters(), 5)
    optimizer.step()
    loss_float = loss.item()
    # return the loss over a batch and the hidden state produced by our model
    return loss_float, hidden

# Note that these tests aren't completely extensive.
# they are here to act as general checks on the expected outputs of your functions
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_forward_back_prop(RNN, forward_back_prop, train_on_gpu)

Tests Passed



## Neural Network Training

With the structure of the network complete and data ready to be fed in the neural network, it's time to train it.

### Train Loop

The training loop is implemented for you in the `train_decoder` function. This function will train the network over all the batches for the number of epochs given. The model progress will be shown every number of batches. This number is set with the `show_every_n_batches` parameter. You'll set this parameter along with other parameters in the next section.

In [26]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""

# def train_rnn(rnn, batch_size, optimizer, criterion, n_epochs, show_every_n_batches=100):
#     batch_losses = []
    
#     rnn.train()

#     print("Training for %d epoch(s)..." % n_epochs)
#     for epoch_i in range(1, n_epochs + 1):
        
#         # initialize hidden state
#         hidden = rnn.init_hidden(batch_size)
        
#         for batch_i, (inputs, labels) in enumerate(train_loader, 1):
            
#             # make sure you iterate over completely full batches, only
#             n_batches = len(train_loader.dataset)//batch_size
#             if(batch_i > n_batches):
#                 break
            
#             # forward, back prop
#             loss, hidden = forward_back_prop(rnn, optimizer, criterion, inputs, labels, hidden)          
#             # record loss
#             batch_losses.append(loss)

#             # printing loss stats
#             if batch_i % show_every_n_batches == 0:
#                 print('Epoch: {:>4}/{:<4}  Loss: {}\n'.format(
#                     epoch_i, n_epochs, np.average(batch_losses)))
#                 batch_losses = []

#     # returns a trained rnn
#     return rnn




def train_rnn(rnn, batch_size, optimizer, criterion, n_epochs, show_every_n_batches=100, lr=0.001):
    batch_losses = []
    
    # earlier stopping
    loss_min = np.Inf 
    stop_counter = 0
    
    rnn.train()

    print("Training for %d epoch(s)..." % n_epochs)
    for epoch_i in range(1, n_epochs + 1):
        
        # initialize the counter for earlier stopping
        stop_counter = 0
        
        # initialize hidden state
        hidden = rnn.init_hidden(batch_size)

        # adaptive learning rate
        # decrease lr every 5 epochs
        if epoch_i % 5 == 0:
            print("learning rate in epoch", epoch_i, "before change:", optimizer.param_groups[0]["lr"])
            lr = lr * (0.1 ** (epoch_i // 5))
            print("lr: ", lr)
            print("before change:", optimizer.param_groups[0]["lr"])
            for param_group in optimizer.param_groups:
                param_group['lr'] = lr 
            print("after change:", optimizer.param_groups[0]["lr"])
        
        for batch_i, (inputs, labels) in enumerate(train_loader, 1):
            
            # make sure you iterate over completely full batches, only
            n_batches = len(train_loader.dataset)//batch_size
            if(batch_i > n_batches):
                break
            
            # forward, back prop
            loss, hidden = forward_back_prop(rnn, optimizer, criterion, inputs, labels, hidden)          
            # record loss
            batch_losses.append(loss)

            # printing loss stats
            if batch_i % show_every_n_batches == 0:
                print('Epoch: {:>4}/{:<4}  Loss: {}\n'.format(
                    epoch_i, n_epochs, np.average(batch_losses)))
                
                # earlier stopping
                avg_batch_losses = np.average(batch_losses)                
                if avg_batch_losses <= loss_min:
                    print('avg batch loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(loss_min, avg_batch_losses))
                    # 'model_rnn.pt' = save_path
                    torch.save(rnn.state_dict(), 'model_rnn.pt')
                    loss_min = avg_batch_losses
                    stop_counter = 0

                else:
                    print('avg batch loss increased ({:.6f} --> {:.6f}).  not saving model ...'.format(loss_min, avg_batch_losses))
                    stop_counter += 1
                    if(stop_counter > 15):
                        helper.save_model('./save/trained_rnn', rnn)
                        return rnn                
                
                batch_losses = []

    # returns a trained rnn
    return rnn

### Hyperparameters

Set and train the neural network with the following parameters:
- Set `sequence_length` to the length of a sequence.
- Set `batch_size` to the batch size.
- Set `num_epochs` to the number of epochs to train for.
- Set `learning_rate` to the learning rate for an Adam optimizer.
- Set `vocab_size` to the number of uniqe tokens in our vocabulary.
- Set `output_size` to the desired size of the output.
- Set `embedding_dim` to the embedding dimension; smaller than the vocab_size.
- Set `hidden_dim` to the hidden dimension of your RNN.
- Set `n_layers` to the number of layers/cells in your RNN.
- Set `show_every_n_batches` to the number of batches at which the neural network should print progress.

If the network isn't getting the desired results, tweak these parameters and/or the layers in the `RNN` class.

In [27]:
import math

# Data params
# Sequence Length
sequence_length = (math.ceil(average_word_count_line))*2  # of words in a sequence
# Batch Size
batch_size = 64

# data loader - do not change
train_loader = batch_data(int_text, sequence_length, batch_size)

In [28]:
# Training parameters
# Number of Epochs
num_epochs = 10
# Learning Rate
learning_rate = 0.001

# Model parameters
# Vocab size
vocab_size = len(vocab_to_int)

# Output size
output_size = len(vocab_to_int)

# Embedding Dimension
embedding_dim = 200

# Hidden Dimension
hidden_dim = 512

# Number of RNN Hidden Layers
n_layers = 3

# Show stats for every n number of batches
show_every_n_batches = 2000 



### Train
In the next cell, you'll train the neural network on the pre-processed data.  If you have a hard time getting a good loss, you may consider changing your hyperparameters. In general, you may get better results with larger hidden and n_layer dimensions, but larger models take a longer time to train. You should also experiment with different sequence lengths, which determine the size of the long range dependencies that a model can learn.

In [29]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""

# create model and move to gpu if available
rnn = RNN(vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5)
if train_on_gpu:
    rnn.cuda()

# defining loss and optimization functions for training
optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

# training the model
trained_rnn = train_rnn(rnn, batch_size, optimizer, criterion, num_epochs, show_every_n_batches)

# saving the trained model
helper.save_model('./save/trained_rnn', trained_rnn)
print('Model Trained and Saved')

Training for 10 epoch(s)...
Epoch:    1/10    Loss: 5.281672384381294

avg batch loss decreased (inf --> 5.281672).  Saving model ...
Epoch:    1/10    Loss: 4.643662386417389

avg batch loss decreased (5.281672 --> 4.643662).  Saving model ...
Epoch:    1/10    Loss: 4.478282729387283

avg batch loss decreased (4.643662 --> 4.478283).  Saving model ...
Epoch:    1/10    Loss: 4.367510246515274

avg batch loss decreased (4.478283 --> 4.367510).  Saving model ...
Epoch:    1/10    Loss: 4.310431725382805

avg batch loss decreased (4.367510 --> 4.310432).  Saving model ...
Epoch:    1/10    Loss: 4.27254907476902

avg batch loss decreased (4.310432 --> 4.272549).  Saving model ...
Epoch:    2/10    Loss: 4.151652924417541

avg batch loss decreased (4.272549 --> 4.151653).  Saving model ...
Epoch:    2/10    Loss: 4.043157801866531

avg batch loss decreased (4.151653 --> 4.043158).  Saving model ...
Epoch:    2/10    Loss: 4.021179232954979

avg batch loss decreased (4.043158 --> 4.021179

  "type " + obj.__name__ + ". It won't be checked "


## Question: How did you decide on your model hyperparameters? 
For example, did you try different sequence_lengths and find that one size made the model converge faster? What about your hidden_dim and n_layers; how did you decide on those?

## Answer: (Write answer, here)

There are many hyperparameters and they all have there own uses, but the main reasons to change a hyperparameter is to optimize the following: training speed, training loss divergence speed, validation loss divergence speed, and CPU/GPU memory resources.


### Number of Epochs:

The number of epochs is the number of iterations the model will train for. I set the number of epochs to `num_epochs = 5` to test and set how low my loss could get as I play with my other hyperparameters. Once I had my hyperparmeters close, I increased the number of epochs to `num_epochs = 10` because I was able to reach a 3.5% validation loss befor 10 epochs. These 10 epochs took 2 days to train my model. 

I also implimented a tequnique that stops the training session if the validation loss has stopped decreaseing after a while. This is called earlier stopping. If after 15 batches we generate another validation loss greater than the minimum validation loss, we stop the current epoch and save the model. If the model generate another validation loss less than the minimum validation loss, we also save the model.


### Batch Size

Here are the standard batch sizes used:

$$ 1, 2, 4, 8, 16, 32, 64, \cdots, 512 $$ 

A batch is a dataset sent throught your model for a the forward pass, calculating training error, and the backpropagate to calculate new weights for each node in each layer. The smaller the batch size the slower ther training process will be, because there is less data to affect the new node weights. The larger the batch size the slower the calculation takes because there is to much data to calculate the new node weights. This means our memory is to small for tthe calculation the preform with speed. There is a middle ground, the sweet spot, where the CPU or GPU can handle the calculation size while the batch size is still large and can make a significant change in the nodes weight towards a local or hopefully a global minima.

I started my batch size at `batch_size = 32` but increased to `batch_size = 64`. This change was to increase my model's training speed while not over increase and creating a calculation that is to large. I didn't go test larger even though this might of inceased speed and decreased validation loss. I thought 64 was working well enough to get to 3.5% validation loss.


### Sequence Lengths:

My sequence length was picked with a simple idea in mind. I pick `sequence_length = (math.ceil(average_word_count_line)00)*2` to try and create a sequence with 2 lines going in model to train at a time. 


### Learning Rate:

Learning rate is one of the most important hyperparameters. The standard way to start ajusting the learning rate is to use `0.1`, `0.01`, `0.001`, `0.0001`, `0.00001`, `0.000001` and so on. The smaller the learning rate, the longer the training will take. Plus, you can get stuck in a local minima rather than reaching the global minima. The larger the learning rate is the fast you can train, but may reach a point of that is never a local minima nor global minima.

I did some interesting things to optimize my learning rate. I started by testing with `learning_rate = 0.001` and then `learning_rate = 0.0001` and found that 0.0001 was to slow, so started with 0.001. With 0.001 as my initial learning rate, I went on to use learning rate decay, which will decreate the rate of a number of epochs. I decreased the learning rate by 0.1 after 5 epochs. That means for 10 epochs the first 5 I use 0.001,  and then the second 5 epochs the learning rate decreased to 0.0001. The moment right before, during, and right after this happen during training the result are printed as the following:


    Epoch:    4/10    Loss: 3.783744016647339

    avg batch loss increased (3.712393 --> 3.783744).  not saving model ...
    Epoch:    4/10    Loss: 3.789862581372261

    avg batch loss increased (3.712393 --> 3.789863).  not saving model ...
    learning rate in epoch 5 before change: 0.001
    lr:  0.0001
    before change: 0.001
    after change: 0.0001
    Epoch:    5/10    Loss: 3.6898119439928587

    avg batch loss decreased (3.712393 --> 3.689812).  Saving model ...
    Epoch:    5/10    Loss: 3.568787024974823

    avg batch loss decreased (3.689812 --> 3.568787).  Saving model ...


### Dropout Rate

My model uses dropout layers to prevent overfitting the model to the dataset. A good indication of a model overfit is when the training loss is smaller than the validation loss. The dropout layer locks a hidden layer node in order to strengthen the models training and validation loss ratio. We apple a probabilty that a layer's node will be locked for a given batch. Locking the node will help other layers and nodes find their global minima by removing a possible cruch that another node might be. 

I used a high dropout rate of `dropout=0.5`. This is from a range from 0 to 1. This rate may have slowed my model down on over training speed because it would need more epoch iteations to make changes across the models hidden layers but it made sure that the training would not overfit the model. 


### Vocabulary Size & Output Size

Continuing with other hyperparameters, I set vocabulary size as `vocab_size = len(vocab_to_int)` and output size as `output_size = len(vocab_to_int)`.


### Embedding Dimensions:

The embedding dimension should be set to reflect the size of the distinctive vocabulary which hold the most power and meaning for the model. The size is around 1-5% of the original vocabulary dataset. This is why I choose the embedding dimension to be `embedding_dim = 200`.


### Hidden Dimensions:

The hidden dimemention is number of nodes that will be in each hidden layer. This value normally follows the convention:

$$ 32, 64, 128, 256, 512, \cdots, 2048 $$ 
 
I experimented with 256 and 512 and found 512 gave me a lowed starting loss so I stuck with it. 


### Hidden Layers:

Lastly, the hidden layer is dephth of how many hidden layers a found in the RNN. The model will begin to overfit if you increase he hidden layer. Again, A good indication of a model overfitting is when the training loss is smaller than the validation loss. We add dropout to our model to prevent the overfitting. 

I started my layers at 2 and increased them to 5 realizing that the model with 3 hidden layers, `n_layers = 3`,  worked the best without overfitting the model to the training dataset.


---
# Checkpoint

After running the above training cell, your model will be saved by name, `trained_rnn`, and if you save your notebook progress, **you can pause here and come back to this code at another time**. You can resume your progress by running the next cell, which will load in our word:id dictionaries _and_ load in your saved model by name!

In [30]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import torch
import helper
import problem_unittests as tests

_, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()
trained_rnn = helper.load_model('./save/trained_rnn')

## Generate TV Script
With the network trained and saved, you'll use it to generate a new, "fake" Seinfeld TV script in this section.

### Generate Text
To generate the text, the network needs to start with a single word and repeat it's predictions until it reaches a set length. You'll be using the `generate` function to do this. It takes a word id to start with, `prime_id`, and generates a set length of text, `predict_len`.

In [31]:
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
def generate(rnn, prime_id, int_to_vocab, token_dict, pad_value, predict_len=100):
    """
    Generate text using the neural network
    :param decoder: The PyTorch Module that holds the trained neural network
    :param prime_id: The word id to start the first prediction
    :param int_to_vocab: Dict of word id keys to word values
    :param token_dict: Dict of puncuation tokens keys to puncuation values
    :param pad_value: The value used to pad a sequence
    :param predict_len: The length of text to generate
    :return: The generated text
    """
    
    rnn.eval() # eval mode
    
    # create a sequence (batch_size=1) with the prime_id
    current_seq = np.full((1, sequence_length), pad_value)
    current_seq[-1][-1] = prime_id
    predicted = [int_to_vocab[prime_id]]
    
    for _ in range(predict_len):
        if train_on_gpu:
            current_seq = torch.LongTensor(current_seq).cuda()
        else:
            current_seq = torch.LongTensor(current_seq)
        
        # initialize the hidden state
        hidden = rnn.init_hidden(current_seq.size(0))
        
        # get the output of the rnn
        output, _ = rnn(current_seq, hidden)
        
        # get the index of the most likely next word
        top_i = torch.multinomial(output.exp().data, 1).item()
        # retrieve that word from the dictionary
        word = int_to_vocab[top_i]
        predicted.append(word)
        
        # the generated word becomes the next "current sequence" and the cycle can continue
        current_seq = np.roll(current_seq, -1, 1)
        current_seq[-1][-1] = top_i
    
    gen_sentences = ' '.join(predicted)
    
    # Replace punctuation tokens
    for key, token in token_dict.items():
        ending = ' ' if key in ['\n', '(', '"'] else ''
        gen_sentences = gen_sentences.replace(' ' + token.lower(), key)
    gen_sentences = gen_sentences.replace('\n ', '\n')
    gen_sentences = gen_sentences.replace('( ', '(')
    
    # return all the sentences
    return gen_sentences

### Generate a New Script
It's time to generate the text. Set `gen_length` to the length of TV script you want to generate and set `prime_word` to one of the following to start the prediction:
- "jerry"
- "elaine"
- "george"
- "kramer"

You can set the prime word to _any word_ in our dictionary, but it's best to start with a name for generating a TV script. (You can also start with any other names you find in the original text file!)

In [32]:
# run the cell multiple times to get different results!
gen_length = 600 # modify the length to your preference
prime_word = 'kramer' # name for starting the script

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
pad_word = helper.SPECIAL_WORDS['PADDING']
generated_script = generate(trained_rnn, vocab_to_int[prime_word + ':'], int_to_vocab, token_dict, vocab_to_int[pad_word], gen_length)
print(generated_script)

kramer: insecure doo waa.

jerry: who got this?

helen: what did they do?

george: yeah. he loves cashmere. he thought that i'm not going home. yes you are.

elaine: sure. what's he going? i'll tell you the truth for coming this year in ten minutes.

jerry: alright, he says that?

jerry: he's very unusual.

jerry: cmon.

jerry: hi.

george: ah, jerry,(smacks arms box on his pocket and sits down last to back and ed 3: food) sure... you know, i cannot have a break for you.

elaine: ahhhhh... those are great.

kramer: so, i was saying that misunderstanding is out loud of all the guy could run.

elaine: well, i sold her up so i go to the car.

susan: someone out do he like her.

jerry: no, today's not even joking.

elaine: yeah, well, i'll tell you what it's going to tell nina. i'm going to do something about me, that's where i make! what do you think? somebody else men here at me?! god!

elaine: but we made well outside shorts

elaine: elaine? if you had to do me a favor.

jerry: keith ey

#### Save your favorite scripts

Once you have a script that you like (or find interesting), save it to a text file!

In [33]:
# save script to a text file
f =  open("generated_script_1.txt","w")
f.write(generated_script)
f.close()

# The TV Script is Nonsensical
It's ok if the TV script doesn't make any sense.  It takes quite a while to get good results, and often, you'll have to use a smaller vocabulary (and discard uncommon words), or get more data.  The Seinfeld dataset is about 3.4 MB, which is big enough for our purposes; for script generation you'll want more than 1 MB of text, generally. 

# Submitting This Project
When submitting this project, make sure to run all the cells before saving the notebook. Save the notebook file as "dlnd_tv_script_generation.ipynb" and save another copy as an HTML file by clicking "File" -> "Download as.."->"html". Include the "helper.py" and "problem_unittests.py" files in your submission. Once you download these files, compress them into one zip file for submission.