# TV Script Generation

In this project, I will generate my own [Seinfeld](https://en.wikipedia.org/wiki/Seinfeld) TV scripts using [Recurrent Neural Networks](https://en.wikipedia.org/wiki/Recurrent_neural_network). I will be using a part of the [Seinfeld dataset](https://www.kaggle.com/thec03u5/seinfeld-chronicles#scripts.csv) of scripts from 9 seasons. The Neural Network will generate a new "fake" TV script based on patterns it recognizes in this training data.

The general structure of LSTM (a type of RNN with a more complex structure) looks like this: 

![LSTM](lstm_structure.png)

One of the applications of RNN is text generation which can be reframed as a problem of predicting the next most likely word based on the current and the previous words in the sequence: 

![Words](words.png)

Let's proceed with getting / exploring the data and building a network for predictions. 

## Get the Data

The data is located in `./data/Seinfeld_Scripts.txt`. 

In [1]:
# Load the data
import helper
data_dir = './data/Seinfeld_Scripts.txt'
text = helper.load_data(data_dir)

## Explore the Data

The dataset is all lowercase text, and each new line of dialogue is separated by a newline character `\n`.

In [5]:
import numpy as np

view_line_range = (0, 10)

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in text.split()})))

lines = text.split('\n')
print('Number of lines: {}'.format(len(lines)))
word_count_line = [len(line.split()) for line in lines]
print('Average number of words in each line: {}'.format(np.average(word_count_line)))

print()
print('The lines {} to {}:'.format(*view_line_range))
print('\n'.join(text.split('\n')[view_line_range[0]:view_line_range[1]]))

Dataset Stats
Roughly the number of unique words: 46367
Number of lines: 109233
Average number of words in each line: 5.544240293684143

The lines 0 to 10:
jerry: do you know what this is all about? do you know, why were here? to be out, this is out...and out is one of the single most enjoyable experiences of life. people...did you ever hear people talking about we should go out? this is what theyre talking about...this whole thing, were all out now, no one is home. not one person here is home, were all out! there are people trying to find us, they dont know where we are. (on an imaginary phone) did you ring?, i cant find him. where did he go? he didnt tell me where he was going. he must have gone out. you wanna go out you get ready, you pick out the clothes, right? you take the shower, you get all ready, get the cash, get your friends, the car, the spot, the reservation...then youre standing around, what do you do? you go we gotta be getting back. once youre out, you wanna get back! y

---
## Implement Pre-processing Functions
The first thing to do to any dataset is pre-processing. The following pre-processing functions are implemented below:
- Lookup Table
- Tokenize Punctuation

### Lookup Table
To create a word embedding, I first need to transform the words to unique ids. In this function we will, create two dictionaries:
- Dictionary to go from the words to an id: `vocab_to_int`
- Dictionary to go from the id to word: `int_to_vocab`

These dictionaries will be returned in the following **tuple** `(vocab_to_int, int_to_vocab)`

In [6]:
import problem_unittests as tests
from collections import Counter # for processing 

def text_split(text): 
    return text.split() 
    

def create_lookup_tables(text):
    """
    :param text: The text of tv scripts split into words
    :return: A tuple of dicts (vocab_to_int, int_to_vocab)
    """
    # Counter 
    word_counts = Counter(text)
    
    # Sorting the words from most to least frequent in text occurrence
    sorted_vocab = sorted(word_counts, key=word_counts.get, reverse=True)
    
    # Create dictionaries
    int_to_vocab = {ii: word for ii, word in enumerate(sorted_vocab)}
    vocab_to_int = {word: ii for ii, word in int_to_vocab.items()}
    
    # return tuple
    return (vocab_to_int, int_to_vocab)

# Unit tests 
tests.test_create_lookup_tables(create_lookup_tables)

Tests Passed


### Tokenize Punctuation
We will be splitting the script into a word array using spaces as delimiters.  However, punctuations like periods and exclamation marks can create multiple ids for the same word. For example, "bye" and "bye!" would generate two different word ids.

I have implemented the function `token_lookup` to return a dict that will be used to tokenize symbols like "!" into "||Exclamation_Mark||". The following dictionary will be created for the symbols where the symbol is the key and value is the token:
- Period ( **.** )
- Comma ( **,** )
- Quotation Mark ( **"** )
- Semicolon ( **;** )
- Exclamation mark ( **!** )
- Question mark ( **?** )
- Left Parentheses ( **(** )
- Right Parentheses ( **)** )
- Dash ( **-** )
- Return ( **\n** )

This dictionary will be used to tokenize the symbols and to add the delimiter (space) around it. This separates each symbols as its own word, making it easier for the neural network to predict the next word. 

In [7]:
def token_lookup():
    """
    Generating a dict to turn punctuation into a token.
    :return: Tokenized dictionary where the key is the punctuation and the value is the token
    """
    # Text special symbols replacement 
    dict_punct = {
        '.': '<PERIOD>',  
        ',': '<COMMA>',
        '"': '<QUOTATION_MARK>',
        ';': '<SEMICOLON>',
        '!': '<EXCLAMATION_MARK>',
        '?': '<QUESTION_MARK>',
        '(': '<LEFT_PAREN>',
        ')': '<RIGHT_PAREN>',
        '-': '<DASH>',  
        '\n': '<NEW_LINE>'
    }
    
    return dict_punct
        
# Unit tests
tests.test_tokenize(token_lookup)

Tests Passed


## Pre-process all the data and save it

Running the code cell below pre-processes all the data and saves it to a file. 

In [8]:
# Pre-process training data
helper.preprocess_and_save_data(data_dir, token_lookup, create_lookup_tables)

## Check Point
This is a checkpoint to work with the preprocessed data which has been saved to disk.

In [9]:
import helper
import problem_unittests as tests

int_text, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()

## Build the Neural Network
In this section, I am building the components necessary to build an RNN by implementing the RNN Module and forward / backpropagation functions.

### Check Access to GPU

In [10]:
import torch

# Check for a GPU
train_on_gpu = torch.cuda.is_available()
if not train_on_gpu:
    print('No GPU found. Please use a GPU to train your neural network.')

## Input
Let's start with the preprocessed input data. We'll use [TensorDataset](http://pytorch.org/docs/master/data.html#torch.utils.data.TensorDataset) to provide a known format to our dataset; in combination with [DataLoader](http://pytorch.org/docs/master/data.html#torch.utils.data.DataLoader), it will handle batching, shuffling, and other dataset iteration functions.

Data with TensorDataset can be created by passing in feature and target tensors. Then I will create a DataLoader.
```
data = TensorDataset(feature_tensors, target_tensors)
data_loader = torch.utils.data.DataLoader(data, 
                                          batch_size=batch_size)
```

### Batching
The `batch_data` function is implemented to batch `words` data into chunks of size `batch_size` using the `TensorDataset` and `DataLoader` classes.

> We will create `feature_tensors` and `target_tensors` of the correct size and content for a given `sequence_length`.

For example, say we have these as input:
```
words = [1, 2, 3, 4, 5, 6, 7]
sequence_length = 4
```

Your first `feature_tensor` should contain the values:
```
[1, 2, 3, 4]
```
And the corresponding `target_tensor` should just be the next "word"/tokenized word value:
```
5
```
This should continue with the second `feature_tensor`, `target_tensor` being:
```
[2, 3, 4, 5]  # features
6             # target
```

In [11]:
from torch.utils.data import TensorDataset, DataLoader

def batch_data(words, sequence_length, batch_size):
    """
    Batching the neural network data using DataLoader
    :param words: The word ids of the TV scripts
    :param sequence_length: The sequence length of each batch
    :param batch_size: The size of each batch; the number of sequences in a batch
    :return: DataLoader with batched data
    """
    # Cropping text to fully generate batches, need to round to min; length + 1 target
    n_batches = len(words)//(sequence_length + 1)

    x_arr, y_arr = [], []
    
    # Generating simple lists first
    for i in range(0, n_batches): 
        i_from = (sequence_length + 1)*i
        i_to = (sequence_length + 1)*(i+1) - 1
        i_label = (sequence_length + 1)*(i+1) - 1 

        x_arr.append(int_text[i_from:i_to])
        y_arr.append(int_text[i_label])
    
    # Converting to torch tensors
    x_arr = torch.from_numpy(np.array(x_arr))
    y_arr = torch.from_numpy(np.array(y_arr))

    training_set = TensorDataset(x_arr, y_arr)

    dataloader = DataLoader(training_set, batch_size=batch_size, shuffle=True)
    
    # returning a dataloader
    return dataloader

# Test the loader 
test_loader = batch_data(int_text, sequence_length=4, batch_size=1)

# Get data from the loader 
data_iter = iter(test_loader)

In [12]:
# Check the generated set
test_tensor = data_iter.next()
print('Tensor:', test_tensor)

sequence = test_tensor[0].numpy()
next_word = test_tensor[1].numpy()

print('\nFeatures:')

for elem in sequence[0]: 
    print(int_to_vocab[[elem][0]])

print('\nNext word:', int_to_vocab[next_word[0]])
    
#print (int_to_vocab[next_word])

Tensor: [tensor([[    0,    13,     6,  2778]]), tensor([ 2])]

Features:
<new_line>
george:
the
tag

Next word: <comma>


### Testing the dataloader 

Below, we are generating some test text data and defining a dataloader using the function I defined above. Then, we are getting some sample batch of inputs `sample_x` and targets `sample_y` from our dataloader.

Your code should return something like the following:

```
torch.Size([10, 5])
tensor([[ 28,  29,  30,  31,  32],
        [ 21,  22,  23,  24,  25],
        [ 17,  18,  19,  20,  21],
        [ 34,  35,  36,  37,  38],
        [ 11,  12,  13,  14,  15],
        [ 23,  24,  25,  26,  27],
        [  6,   7,   8,   9,  10],
        [ 38,  39,  40,  41,  42],
        [ 25,  26,  27,  28,  29],
        [  7,   8,   9,  10,  11]])

torch.Size([10])
tensor([ 33,  26,  22,  39,  16,  28,  11,  43,  30,  12])
```

### Sizes
Our sample_x should be of size `(batch_size, sequence_length)` or (10, 5) in this case and sample_y should just have one dimension: batch_size (10). 

### Values

The targets, sample_y, are the *next* value in the ordered test_text data. So, for an input sequence `[ 28,  29,  30,  31,  32]` that ends with the value `32`, the corresponding output should be `33`.

In [13]:
# Test dataloader

test_text = range(50)
t_loader = batch_data(test_text, sequence_length=5, batch_size=10)

data_iter = iter(t_loader)
sample_x, sample_y = data_iter.next()

print(sample_x.shape)
print(sample_x)
print()
print(sample_y.shape)
print(sample_y)

torch.Size([8, 5])
tensor([[ 1252,   545,  8782,  7189,    20],
        [   22,    18,   677,   208,    58],
        [   17,    47,    22,    82,    20],
        [   55,   135,    64,    47,     3],
        [   24,    22,    47,     1,     1],
        [    1,     1,    24,   220,   126],
        [    1,   149,     1,     1,     1],
        [    4,   200,   238,   149,   208]])

torch.Size([8])
tensor([ 241,    1,    6,   24,    1,    2,   84,   58])


---
## Building the Neural Network
I will implement an RNN using PyTorch's [Module class](http://pytorch.org/docs/master/nn.html#torch.nn.Module). To complete the RNN, I have to implement the following functions for the class:
 - `__init__` - The initialize function. 
 - `init_hidden` - The initialization function for an LSTM/GRU hidden state
 - `forward` - Forward propagation function.
 
The initialize function is creating the layers of the neural network and is saving them to the class. The forward propagation function uses these layers to run forward propagation and to generate an output and a hidden state.

**The output of this model should be the *last* batch of word scores** after a complete sequence has been processed. That is, for each input sequence of words, we only want to output the word scores for a single, most likely, next word.

### Notes

1. I am making sure to stack the outputs of the lstm to pass to the fully-connected layer, which is done with `lstm_output = lstm_output.contiguous().view(-1, self.hidden_dim)`
2. The last batch of word scores is checked by shaping the output of the final, fully-connected layer like this:

```
# reshape into (batch_size, seq_length, output_size)
output = output.view(batch_size, -1, self.output_size)
# get last batch
out = output[:, -1]
```

In [14]:
import torch.nn as nn

class RNN(nn.Module):
    
    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5):
        """
        Initializing the PyTorch RNN Module
        :param vocab_size: The number of input dimensions of the neural network (the size of the vocabulary)
        :param output_size: The number of output dimensions of the neural network
        :param embedding_dim: The size of embeddings, should you choose to use them        
        :param hidden_dim: The size of the hidden layer outputs
        :param dropout: dropout to add in between LSTM/GRU layers
        """
        super(RNN, self).__init__()
        
        ## Class variables
        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        
        ## Model layers
        # Embedding / LSTM
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, dropout=dropout, batch_first=True)
                
        # linear layer
        self.fc = nn.Linear(hidden_dim, output_size)
    
    
    def forward(self, nn_input, hidden):
        """
        Forward propagation of the neural network
        :param nn_input: The input to the neural network
        :param hidden: The hidden state        
        :return: Two Tensors, the output of the neural network and the latest hidden state
        """
        
        batch_size = nn_input.size(0) # get the batch size 

        # Embeddings and LSTM output 
        embed = self.embedding(nn_input)
        lstm_out, hidden = self.lstm(embed, hidden)
        
        # Stack LSTM output
        lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)
        
        # FC layer 
        output = self.fc(lstm_out)
        
        # Reshape according to the batch size and output size
        output = output.view(batch_size, -1, self.output_size)
        out = output[:, -1] # get the last batch of labels as an output of FF       
        
        # Return one batch of output word scores and the hidden state
        return out, hidden
    
    
    def init_hidden(self, batch_size):
        '''
        Initializing the hidden state of an LSTM/GRU
        :param batch_size: The batch_size of the hidden state
        :return: hidden state of dims (n_layers, batch_size, hidden_dim)
        '''
        # Initializing hidden state with zero weights and moving to GPU if available

        weight = next(self.parameters()).data
        
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        
        return hidden

# Unit tests
tests.test_rnn(RNN, train_on_gpu)

Tests Passed


### Defining forward and backpropagation

I will use the implemented RNN class to apply forward and back propagation. This function will be called, iteratively, in the training loop as follows:
```
loss = forward_back_prop(decoder, decoder_optimizer, criterion, inp, target)
```

It returns the average loss over a batch and the hidden state returned by a call to `RNN(inp, hidden)`. I can get this loss by computing it and calling `loss.item()`.

In [15]:
def forward_back_prop(rnn, optimizer, criterion, inp, target, hidden):
    """
    Forward and backward propagation on the neural network
    :param decoder: The PyTorch Module that holds the neural network
    :param decoder_optimizer: The PyTorch optimizer for the neural network
    :param criterion: The PyTorch loss function
    :param inp: A batch of input to the neural network
    :param target: The target output for the batch of input
    :return: The loss and the latest hidden state Tensor
    """
    
    # Move data to GPU, if available
    if(train_on_gpu):
        rnn.cuda()

    # perform backpropagation and optimization
    h = tuple([each.data for each in hidden])   # hidden state values

    rnn.zero_grad()  # zero out accumulated gradients
    
    if(train_on_gpu):
        inp, target = inp.cuda(), target.cuda()
        
    # model output
    output, h = rnn(inp, h)  
    
    # calculate the loss and perform backpropagation
    loss = criterion(output, target)
    loss.backward()
    
    # clip_grad_norm prevents the exploding gradient problem in RNN / LSTM
    nn.utils.clip_grad_norm_(rnn.parameters(), 5)
    optimizer.step()

    # return the loss over a batch and the hidden state produced by our model
    return loss.item(), h

# Unit tests
tests.test_forward_back_prop(RNN, forward_back_prop, train_on_gpu)

Tests Passed


## Neural Network Training

With the structure of the network complete and data ready to be fed in the neural network, it is now time to train it.

### Train Loop

The training loop is implemented in the `train_decoder` function. This function will train the network over all the batches for the number of epochs given. The model progress will be shown every number of batches. This number is set with the `show_every_n_batches` parameter. 

In [28]:
# Save a model when loss is decreasing, use baseline here
loss_val = float('inf')

def train_rnn(rnn, batch_size, optimizer, criterion, n_epochs, show_every_n_batches=100):
    batch_losses = []
    
    rnn.train()

    print("Training for %d epoch(s)..." % n_epochs)
    for epoch_i in range(1, n_epochs + 1):
        
        # initialize hidden state
        hidden = rnn.init_hidden(batch_size)
        
        for batch_i, (inputs, labels) in enumerate(train_loader, 1):
            
            # make sure you iterate over completely full batches, only
            n_batches = len(train_loader.dataset)//batch_size
            if(batch_i > n_batches):
                break
            
            # forward, back prop
            loss, hidden = forward_back_prop(rnn, optimizer, criterion, inputs, labels, hidden)          
            # record loss
            batch_losses.append(loss)

            # printing loss stats
            if batch_i % show_every_n_batches == 0:
                print('Epoch: {:>4}/{:<4}  Loss: {}\n'.format(
                    epoch_i, n_epochs, np.average(batch_losses)))
                
                # save if a loss is decreasing 
                if np.average(batch_losses) < loss_val: 
                    print('Loss decreased. Saving the model...')
                    helper.save_model('./save/trained_rnn', rnn)        
                
                batch_losses = []

    # returns a trained rnn
    return rnn

### Hyperparameters

I will set and train the neural network with the following parameters:
- `sequence_length` to the length of a sequence.
- `batch_size` to the batch size.
- `num_epochs` to the number of epochs to train for.
- `learning_rate` to the learning rate for an Adam optimizer.
- `vocab_size` to the number of uniqe tokens in our vocabulary.
- `output_size` to the desired size of the output.
- `embedding_dim` to the embedding dimension; smaller than the vocab_size.
- `hidden_dim` to the hidden dimension of your RNN.
- `n_layers` to the number of layers/cells in your RNN.
- `show_every_n_batches` to the number of batches at which the neural network should print progress.

In [29]:
# Data params
# Sequence Length
sequence_length = 10   # of words in a sequence

# Batch Size
batch_size = 64 

# data loader  
train_loader = batch_data(int_text, sequence_length, batch_size)

In [35]:
# Training parameters
# Number of Epochs
num_epochs = 30

# Learning Rate
learning_rate = 0.001 

# Model parameters
# Vocab size and output size
output_size = vocab_size = len(vocab_to_int) 

# Embedding Dimension
embedding_dim = 300

# Hidden Dimension
hidden_dim = 512

# Number of RNN Layers
n_layers = 2

# Show stats for every n number of batches
show_every_n_batches = 500

### Train
In the next cell, I will train the neural network on the pre-processed data. In general, the results are better with larger hidden and n_layer dimensions, but larger models take a longer time to train. 

In [36]:
# create model and move to gpu if available
rnn = RNN(vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5)
if train_on_gpu:
    rnn.cuda()

# defining loss and optimization functions for training
optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

# training the model
trained_rnn = train_rnn(rnn, batch_size, optimizer, criterion, num_epochs, show_every_n_batches)

Training for 30 epoch(s)...
Epoch:    1/30    Loss: 5.395510953426361

Loss decreased. Saving the model...


  "type " + obj.__name__ + ". It won't be checked "


Epoch:    1/30    Loss: 4.85286203622818

Loss decreased. Saving the model...
Epoch:    2/30    Loss: 4.448871087219755

Loss decreased. Saving the model...
Epoch:    2/30    Loss: 4.272874928951263

Loss decreased. Saving the model...
Epoch:    3/30    Loss: 4.074918312386086

Loss decreased. Saving the model...
Epoch:    3/30    Loss: 3.9594917125701903

Loss decreased. Saving the model...
Epoch:    4/30    Loss: 3.7544392849351467

Loss decreased. Saving the model...
Epoch:    4/30    Loss: 3.715929986000061

Loss decreased. Saving the model...
Epoch:    5/30    Loss: 3.505082797817643

Loss decreased. Saving the model...
Epoch:    5/30    Loss: 3.435906894683838

Loss decreased. Saving the model...
Epoch:    6/30    Loss: 3.2340487768628297

Loss decreased. Saving the model...
Epoch:    6/30    Loss: 3.182234263420105

Loss decreased. Saving the model...
Epoch:    7/30    Loss: 2.9605577573316193

Loss decreased. Saving the model...
Epoch:    7/30    Loss: 2.9356576843261717

Loss 

### Note: choosing the model hyperparameters

I have started with Googling 'text generation LSTM architecture' to check papers describing state-of-art architectures and best practices for text generation. As a result, I ended up reading the following papers: 

1. [GTR-LSTM: A Triple Encoder for Sentence Generation from RDF Data (Trisedya, Qi, Zhang, Wang)](http://aclweb.org/anthology/P18-1151)
2. [Deep Poetry: Word-Level and Character-Level Language Models for Shakespearean Sonnet Generation (Xie, Rastogi)](https://web.stanford.edu/class/cs224n/reports/2762063.pdf)

I also skimmed through some of the references mentioned in the above two papers. 

My conclusion was that the following parameters should be used for the task: 
- 300-dimension word embeddings 
- 512 neurons in the hidden layers 
- batch size of 64 

The paper #2 has some insights on how perplexity changes depending on the number of layers, where we could see that the performance pretty much flattens out after the 3rd layer. Since out goal in this exercise is to achieve loss lower than 3.5 (not too stringent), I thought that a solution with 2 hidden layers could be enough.  

Number of epochs and learning rate was chosen arbitrarily. I usually start with LR of 0.001 and 10 or 20 epochs and increase both if the performance is unsatisfactory. After a few iterations, I increased the number of epochs to 30. 

In [37]:
# Checkpoint
import torch
import helper
import problem_unittests as tests

_, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()
trained_rnn = helper.load_model('./save/trained_rnn')

## Generate TV Script
With the network trained and saved, we can use it to generate a new, "fake" Seinfeld TV script. 

### Generate Text
To generate the text, the network needs to start with a single word and repeat its predictions until it reaches a set length. I will be using the `generate` function to do this. It takes a word id to start with, `prime_id`, and generates a set length of text, `predict_len`. It uses topk sampling to introduce some randomness in choosing the most likely next word, given an output set of word scores.

In [38]:
import torch.nn.functional as F

def generate(rnn, prime_id, int_to_vocab, token_dict, pad_value, predict_len=100):
    """
    Generating text using the neural network
    :param decoder: The PyTorch Module that holds the trained neural network
    :param prime_id: The word id to start the first prediction
    :param int_to_vocab: Dict of word id keys to word values
    :param token_dict: Dict of puncuation tokens keys to puncuation values
    :param pad_value: The value used to pad a sequence
    :param predict_len: The length of text to generate
    :return: The generated text
    """
    rnn.eval()
    
    # create a sequence (batch_size=1) with the prime_id
    current_seq = np.full((1, sequence_length), pad_value)
    current_seq[-1][-1] = prime_id
    predicted = [int_to_vocab[prime_id]]
    
    for _ in range(predict_len):
        if train_on_gpu:
            current_seq = torch.LongTensor(current_seq).cuda()
        else:
            current_seq = torch.LongTensor(current_seq)
        
        # initialize the hidden state
        hidden = rnn.init_hidden(current_seq.size(0))
        
        # get the output of the rnn
        output, _ = rnn(current_seq, hidden)
        
        # get the next word probabilities
        p = F.softmax(output, dim=1).data
        if(train_on_gpu):
            p = p.cpu() # move to cpu
         
        # use top_k sampling to get the index of the next word
        top_k = 5
        p, top_i = p.topk(top_k)
        top_i = top_i.numpy().squeeze()
        
        # select the likely next word index with some element of randomness
        p = p.numpy().squeeze()
        word_i = np.random.choice(top_i, p=p/p.sum())
        
        # retrieve that word from the dictionary
        word = int_to_vocab[word_i]
        predicted.append(word)     
        
        # the generated word becomes the next "current sequence" and the cycle can continue
        current_seq = np.roll(current_seq, -1, 1)
        current_seq[-1][-1] = word_i
    
    gen_sentences = ' '.join(predicted)
    
    # Replace punctuation tokens
    for key, token in token_dict.items():
        ending = ' ' if key in ['\n', '(', '"'] else ''
        gen_sentences = gen_sentences.replace(' ' + token.lower(), key)
    gen_sentences = gen_sentences.replace('\n ', '\n')
    gen_sentences = gen_sentences.replace('( ', '(')
    
    # return all the sentences
    return gen_sentences

### Generating a New Script
It's time to generate the text. We will set `gen_length` to the length of TV script we want to generate and set `prime_word` to any one of the following to start the prediction:
- "jerry"
- "elaine"
- "george"
- "kramer"

You can set the prime word to _any word_ in our dictionary, but it's best to start with a name for generating a TV script. (You can also start with any other names you find in the original text file!)

In [168]:
gen_length = 250 # modify the length to your preference
prime_word = 'george' # name for starting the script

# Generate the script
pad_word = helper.SPECIAL_WORDS['PADDING']
generated_script = generate(trained_rnn, vocab_to_int[prime_word + ':'], 
                            int_to_vocab, token_dict, vocab_to_int[pad_word], gen_length)

### Favourite generated scripts

I have generated a number of scripts and I have found several of them especially fascinating. Here they are: 

**This one is about travelling apparently:**

In [169]:
print(v1)

elaine: for two months.

george: alright look, if he was being enough to gamma sometime?

elaine: well. well, i didn't do anything. it was something. i only find my friend.

jerry: so, winona a woman?

george:(to the phone) all right.

jerry:(pointing) well, what d'you got to do with?

george: i told everything i was doing when i saw my car. this shirt can be a meal for me.

jerry: oh, yeah.

george: hey.

jerry: hey, lainie.

kramer:(to the phone) all right, all right. all right. ill tell you something, all day.

rental oh, anyway anyway

elaine: yeah.

jodi: hi, why don't you just double this thing.

jerry: what, are you crazy?

jerry: i'm sorry it's not just just review off the clubs.

jerry: but you don't see if you wouldn't do it on me.(hurriedly laughs)

jerry: oh, hi! no no no no no no. i know, look, but im not going to do that.

jerry: well, i didn't think it was bawdy.


**This one is very intense:**

In [170]:
print(v2)

kramer: two countries. two day jerry. no one's not talking about.

jerry: well, i guess she's gonna go down to my fianc.

jerry:(joking) oh man, good. elaine doesn't know. i really don't think we lost the place i'm a gift.

jerry: oh, i don't know what to do. you know, i don't think so much promise about for this guy. wow. ooh, tell me. and i don't know why you're going to pay me! ha ha ha ha ha ha ha ha ha ha ha!!!!!!!!!!!!!!!!!!!!!!.......... velvet spite, wyck, i'd come off.

george: i have no idea how i feel about about that.

jerry: well, rabbi, we- uh, i've always wanted to maintain at their face as well, it was fun. i was watching my feet, and some yesterday) oh, wait.

kramer:(slapping) hey everybody(to jerry) well, i guess i was wrong. i always don't know, i feel!

jerry: well, i don't know

george: no no no no, that'll


**This one is also about travel but in Monty Python kind of style**. I think I am going to borrow a few phrases from here!

In [172]:
print(v3)

jerry: two countries. when they ask him that, i was babbling with you.

george: i can't tell you that, i said hey, you think she's selling gonna be the bookie.

dana: i have money.

helen: oh. hello, ok.. i am furious i'm very interested.

jerry: what? now, how do you figure me?

jerry: well, alec are you goin' there? what, you wanna get her up? cause shed? what the hell is this?

jack: oh, i read it in 1971.

bystander station, paper.

elaine:(calling her) now, i'm back to you for a position, then you can see the difference, they're empty as--

george:(suspicious) yes. ill choose(back to the booth up) oh, jerry, can i bring a book down together at them.(shows george is kramer) what are you doing in there? they were tired, and i am dating late.

elaine:(picking the phone) all right, all right, fine. meet the subject?

kramer: yes. yes. oh, yeah..(finally takes a breath, she can explain it back back.




**And the last one, a drama thriller**

In [173]:
print(v4)

george: double there.

kramer: oh, you know, lloyd.. you're concerned. concerned...

fred:(interrupting) oh, jerry look at the tickets, you know what that's uh... didn't you believe this?

jerry: i don't want to save a position.

elaine: really? what's wrong?(starts to come to the table and pulls his feet of it.

stewardess: h&h, will not rather have a little problem regarding. yeah.

kramer: ahhh get off.

susan: well, i'm sorry. i'm just getting rid of them for this.

kramer: mmm.

jerry: oh, hi, hi, listen, uh...

elaine:(to jerry) so then we were gonna be an eye!

salesman: sounds really bad?

jerry: no.

elaine: what are you doing?

jerry: i don't know, but they don't get the job. the only time you could have overcharged. this probably be fun.(claps hands to leave) well, here, uh, you want to trade you what time is that, you were going to do with me.

jerry: well, i should ask him. i have to say, i'm


Let's save an example: 

In [174]:
generated_script = v3

In [175]:
# save script to a text file
f =  open("generated_script_1.txt","w")
f.write(generated_script)
f.close()

# The TV Script is Not Perfect
It's ok if the TV script doesn't make perfect sense. It should look like alternating lines of dialogue, here is one such example of a few generated lines.

### Example generated script

>jerry: what about me?
>
>jerry: i don't have to wait.
>
>kramer:(to the sales table)
>
>elaine:(to jerry) hey, look at this, i'm a good doctor.
>
>newman:(to elaine) you think i have no idea of this...
>
>elaine: oh, you better take the phone, and he was a little nervous.
>
>kramer:(to the phone) hey, hey, jerry, i don't want to be a little bit.(to kramer and jerry) you can't.
>
>jerry: oh, yeah. i don't even know, i know.
>
>jerry:(to the phone) oh, i know.
>
>kramer:(laughing) you know...(to jerry) you don't know.

You can see that there are multiple characters that say (somewhat) complete sentences, but it doesn't have to be perfect! It takes quite a while to get good results, and often, you'll have to use a smaller vocabulary (and discard uncommon words), or get more data.  The Seinfeld dataset is about 3.4 MB, which is big enough for our purposes; for script generation you'll want more than 1 MB of text, generally. 