# TV Script Generation

In this project, you'll generate your own [Seinfeld](https://en.wikipedia.org/wiki/Seinfeld) TV scripts using RNNs.  You'll be using part of the [Seinfeld dataset](https://www.kaggle.com/thec03u5/seinfeld-chronicles#scripts.csv) of scripts from 9 seasons.  The Neural Network you'll build will generate a new ,"fake" TV script, based on patterns it recognizes in this training data.

## Get the Data

The data is already provided for you in `./data/Seinfeld_Scripts.txt` and you're encouraged to open that file and look at the text. 
>* As a first step, we'll load in this data and look at some samples. 
* Then, you'll be tasked with defining and training an RNN to generate a new script!

In [1]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# load in data
import helper
data_dir = './data/Seinfeld_Scripts.txt'
text = helper.load_data(data_dir)

## Explore the Data
Play around with `view_line_range` to view different parts of the data. This will give you a sense of the data you'll be working with. You can see, for example, that it is all lowercase text, and each new line of dialogue is separated by a newline character `\n`.

In [2]:
view_line_range = (0, 10)

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in text.split()})))

lines = text.split('\n')
print('Number of lines: {}'.format(len(lines)))
word_count_line = [len(line.split()) for line in lines]
print('Average number of words in each line: {}'.format(np.average(word_count_line)))

print('The lines {} to {}:'.format(*view_line_range))
print('\n'.join(text.split('\n')[view_line_range[0]:view_line_range[1]]))

Dataset Stats
Roughly the number of unique words: 46367
Number of lines: 109233
Average number of words in each line: 5.544240293684143
The lines 0 to 10:
jerry: do you know what this is all about? do you know, why were here? to be out, this is out...and out is one of the single most enjoyable experiences of life. people...did you ever hear people talking about we should go out? this is what theyre talking about...this whole thing, were all out now, no one is home. not one person here is home, were all out! there are people trying to find us, they dont know where we are. (on an imaginary phone) did you ring?, i cant find him. where did he go? he didnt tell me where he was going. he must have gone out. you wanna go out you get ready, you pick out the clothes, right? you take the shower, you get all ready, get the cash, get your friends, the car, the spot, the reservation...then youre standing around, what do you do? you go we gotta be getting back. once youre out, you wanna get back! yo

---
## Implement Pre-processing Functions
The first thing to do to any dataset is pre-processing.  Implement the following pre-processing functions below:
- Lookup Table
- Tokenize Punctuation

### Lookup Table
To create a word embedding, you first need to transform the words to ids.  In this function, create two dictionaries:
- Dictionary to go from the words to an id, we'll call `vocab_to_int`
- Dictionary to go from the id to word, we'll call `int_to_vocab`

Return these dictionaries in the following **tuple** `(vocab_to_int, int_to_vocab)`

In [3]:
import problem_unittests as tests
from string import punctuation
from collections import Counter

def create_lookup_tables(text):
#    print(text[:2000])
    """
    Create lookup tables for vocabulary
    :param text: The text of tv scripts split into words
    :return: A tuple of dicts (vocab_to_int, int_to_vocab)
    """
    # TODO: Implement Function
    all_text = ' '.join([word for word in text])                      # consolidate to one string
#    all_text = ''.join([c for c in all_text if c not in punctuation]) # remove punctuation
#    all_text = all_text.lower()                                       # change upper to lower
    print(all_text[:2000]) # test

    text_split = all_text.split('\n')
    all_text = ''.join(text_split)

    words = all_text.split()

    word_counts = Counter(words)
    sorted_vocab = sorted(word_counts, key=word_counts.get, reverse=True)
    int_to_vocab = {ii: word for ii, word in enumerate(sorted_vocab, 0)}
#    int_to_vocab = {ii: word for ii, word in enumerate(sorted_vocab, 1)}

#    print(type(sorted_vocab)) # <class 'list'>
#    print(len(sorted_vocab))  # 71
#    print(type(int_to_vocab)) # <class 'dict'>
#    print(len(int_to_vocab))  # 71
    vocab_to_int = {word: ii for ii, word in int_to_vocab.items()}

#    for i in range(len(int_to_vocab)):
#        print(i)
#        print(int_to_vocab[i])
#    print(vocab_to_int['||Comma||'])
    
# return tuple
    return (vocab_to_int, int_to_vocab)


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_create_lookup_tables(create_lookup_tables)

moe_szyslak moe's tavern where the elite meet to drink bart_simpson eh yeah hello is mike there last name rotch moe_szyslak hold on i'll check mike rotch mike rotch hey has anybody seen mike rotch lately moe_szyslak listen you little puke one of these days i'm gonna catch you and i'm gonna carve my name on your back with an ice pick moe_szyslak whats the matter homer you're not your normal effervescent self homer_simpson i got my problems moe give me another one moe_szyslak homer hey you should not drink to forget your problems barney_gumble yeah you should only drink to enhance your social skills
Tests Passed


### Tokenize Punctuation
We'll be splitting the script into a word array using spaces as delimiters.  However, punctuations like periods and exclamation marks can create multiple ids for the same word. For example, "bye" and "bye!" would generate two different word ids.

Implement the function `token_lookup` to return a dict that will be used to tokenize symbols like "!" into "||Exclamation_Mark||".  Create a dictionary for the following symbols where the symbol is the key and value is the token:
- Period ( **.** )
- Comma ( **,** )
- Quotation Mark ( **"** )
- Semicolon ( **;** )
- Exclamation mark ( **!** )
- Question mark ( **?** )
- Left Parentheses ( **(** )
- Right Parentheses ( **)** )
- Dash ( **-** )
- Return ( **\n** )

This dictionary will be used to tokenize the symbols and add the delimiter (space) around it.  This separates each symbols as its own word, making it easier for the neural network to predict the next word. Make sure you don't use a value that could be confused as a word; for example, instead of using the value "dash", try using something like "||dash||".

In [4]:
def token_lookup():
    """
    Generate a dict to turn punctuation into a token.
    :return: Tokenized dictionary where the key is the punctuation and the value is the token
    """
    # TODO: Implement Function
    punc_to_token = {
        '.': "||Period||", 
        ',': "||Comma||", 
        '"': "||Quotation_Mark||", 
        ';': "||Semicolon||", 
        '!': "||Exclamation_mark||", 
        '?': "||Question_mark||", 
        '(': "||Left_Parentheses||", 
        ')': "||Right_Parentheses||", 
        '-': "||Dash||", 
        '\n': "||Return||"
    }
    
    return punc_to_token

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_tokenize(token_lookup)

Tests Passed


## Pre-process all the data and save it

Running the code cell below will pre-process all the data and save it to file. You're encouraged to look at the code for `preprocess_and_save_data` in the `helpers.py` file to see what it's doing in detail, but you do not need to change this code.

In [5]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# pre-process training data
helper.preprocess_and_save_data(data_dir, token_lookup, create_lookup_tables)

this is out ||period|| ||period|| ||period|| and out is one of the single most enjoyable experiences of life ||period|| people ||period|| ||period|| ||period|| did you ever hear people talking about we should go out ||question_mark|| this is what theyre talking about ||period|| ||period|| ||period|| this whole thing ||comma|| were all out now ||comma|| no one is home ||period|| not one person here is home ||comma|| were all out ||exclamation_mark|| there are people trying to find us ||comma|| they dont know where we are ||period|| ||left_parentheses|| on an imaginary phone ||right_parentheses|| did you ring ||question_mark|| ||comma|| i cant find him ||period|| where did he go ||question_mark|| he didnt tell me where he was going ||period|| he must have gone out ||period|| you wanna go out you get ready ||comma|| you pick out the clothes ||comma|| right ||question_mark|| you take the shower ||comma|| you get all ready ||comma|| get the cash ||comma|| get your friends ||comma|| the car 

# Check Point
This is your first checkpoint. If you ever decide to come back to this notebook or have to restart the notebook, you can start from here. The preprocessed data has been saved to disk.

In [6]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper
import problem_unittests as tests

int_text, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()

## Build the Neural Network
In this section, you'll build the components necessary to build an RNN by implementing the RNN Module and forward and backpropagation functions.

### Check Access to GPU

In [7]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import torch

# Check for a GPU
train_on_gpu = torch.cuda.is_available()
if not train_on_gpu:
    print('No GPU found. Please use a GPU to train your neural network.')

## Input
Let's start with the preprocessed input data. We'll use [TensorDataset](http://pytorch.org/docs/master/data.html#torch.utils.data.TensorDataset) to provide a known format to our dataset; in combination with [DataLoader](http://pytorch.org/docs/master/data.html#torch.utils.data.DataLoader), it will handle batching, shuffling, and other dataset iteration functions.

You can create data with TensorDataset by passing in feature and target tensors. Then create a DataLoader as usual.
```
data = TensorDataset(feature_tensors, target_tensors)
data_loader = torch.utils.data.DataLoader(data, 
                                          batch_size=batch_size)
```

### Batching
Implement the `batch_data` function to batch `words` data into chunks of size `batch_size` using the `TensorDataset` and `DataLoader` classes.

>You can batch words using the DataLoader, but it will be up to you to create `feature_tensors` and `target_tensors` of the correct size and content for a given `sequence_length`.

For example, say we have these as input:
```
words = [1, 2, 3, 4, 5, 6, 7]
sequence_length = 4
```

Your first `feature_tensor` should contain the values:
```
[1, 2, 3, 4]
```
And the corresponding `target_tensor` should just be the next "word"/tokenized word value:
```
5
```
This should continue with the second `feature_tensor`, `target_tensor` being:
```
[2, 3, 4, 5]  # features
6             # target
```

In [8]:
from torch.utils.data import TensorDataset, DataLoader


def batch_data(words, sequence_length, batch_size):
    """
    Batch the neural network data using DataLoader
    :param words: The word ids of the TV scripts
    :param sequence_length: The sequence length of each batch
    :param batch_size: The size of each batch; the number of sequences in a batch
    :return: DataLoader with batched data
    """
    # TODO: Implement function
    features = np.zeros(((len(words) - sequence_length), sequence_length), dtype=int)
    targets = np.zeros((len(words) - sequence_length), dtype=int)
    for i in range(len(words) - sequence_length):
        features[i] = words[i : i + sequence_length]
        targets[i] = words[i + sequence_length]

## test
#    print(type(features)) # <class 'numpy.ndarray'>
#    print(features[0])    # [0 1 2 3 4]
#    print(targets[0])     # 5
#    print(features[1])    # [1 2 3 4 5]
#    print(targets[1])     # 6
#    print(features[len(words) - sequence_length - 1]) # [44 45 46 47 48]
#    print(targets[len(words) - sequence_length - 1])  # 49

    data = TensorDataset(torch.from_numpy(features), torch.from_numpy(targets))
    data_loader = torch.utils.data.DataLoader(data, shuffle=True, batch_size=batch_size)

    # return a dataloader
    return data_loader

# there is no test for this function, but you are encouraged to create
# print statements and tests of your own

## test
#test_text = range(50)
#batch_data(test_text, sequence_length=5, batch_size=10)

### Test your dataloader 

You'll have to modify this code to test a batching function, but it should look fairly similar.

Below, we're generating some test text data and defining a dataloader using the function you defined, above. Then, we are getting some sample batch of inputs `sample_x` and targets `sample_y` from our dataloader.

Your code should return something like the following (likely in a different order, if you shuffled your data):

```
torch.Size([10, 5])
tensor([[ 28,  29,  30,  31,  32],
        [ 21,  22,  23,  24,  25],
        [ 17,  18,  19,  20,  21],
        [ 34,  35,  36,  37,  38],
        [ 11,  12,  13,  14,  15],
        [ 23,  24,  25,  26,  27],
        [  6,   7,   8,   9,  10],
        [ 38,  39,  40,  41,  42],
        [ 25,  26,  27,  28,  29],
        [  7,   8,   9,  10,  11]])

torch.Size([10])
tensor([ 33,  26,  22,  39,  16,  28,  11,  43,  30,  12])
```

### Sizes
Your sample_x should be of size `(batch_size, sequence_length)` or (10, 5) in this case and sample_y should just have one dimension: batch_size (10). 

### Values

You should also notice that the targets, sample_y, are the *next* value in the ordered test_text data. So, for an input sequence `[ 28,  29,  30,  31,  32]` that ends with the value `32`, the corresponding output should be `33`.

In [9]:
# test dataloader

test_text = range(50)
t_loader = batch_data(test_text, sequence_length=5, batch_size=10)

data_iter = iter(t_loader)
sample_x, sample_y = data_iter.next()

print(sample_x.shape)
print(sample_x)
print()
print(sample_y.shape)
print(sample_y)

torch.Size([10, 5])
tensor([[35, 36, 37, 38, 39],
        [ 5,  6,  7,  8,  9],
        [ 1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10],
        [10, 11, 12, 13, 14],
        [27, 28, 29, 30, 31],
        [30, 31, 32, 33, 34],
        [33, 34, 35, 36, 37],
        [40, 41, 42, 43, 44],
        [ 7,  8,  9, 10, 11]])

torch.Size([10])
tensor([40, 10,  6, 11, 15, 32, 35, 38, 45, 12])


---
## Build the Neural Network
Implement an RNN using PyTorch's [Module class](http://pytorch.org/docs/master/nn.html#torch.nn.Module). You may choose to use a GRU or an LSTM. To complete the RNN, you'll have to implement the following functions for the class:
 - `__init__` - The initialize function. 
 - `init_hidden` - The initialization function for an LSTM/GRU hidden state
 - `forward` - Forward propagation function.
 
The initialize function should create the layers of the neural network and save them to the class. The forward propagation function will use these layers to run forward propagation and generate an output and a hidden state.

**The output of this model should be the *last* batch of word scores** after a complete sequence has been processed. That is, for each input sequence of words, we only want to output the word scores for a single, most likely, next word.

### Hints

1. Make sure to stack the outputs of the lstm to pass to your fully-connected layer, you can do this with `lstm_output = lstm_output.contiguous().view(-1, self.hidden_dim)`
2. You can get the last batch of word scores by shaping the output of the final, fully-connected layer like so:

```
# reshape into (batch_size, seq_length, output_size)
output = output.view(batch_size, -1, self.output_size)
# get last batch
out = output[:, -1]
```

In [None]:
import torch.nn as nn

class RNN(nn.Module):
    
    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5):
        """
        Initialize the PyTorch RNN Module
        :param vocab_size: The number of input dimensions of the neural network (the size of the vocabulary)
        :param output_size: The number of output dimensions of the neural network
        :param embedding_dim: The size of embeddings, should you choose to use them        
        :param hidden_dim: The size of the hidden layer outputs
        :param dropout: dropout to add in between LSTM/GRU layers
        """
        super(RNN, self).__init__()
        
        # TODO: Implement function
        
        # set class variables
        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim

        # define model layers
        self.embd = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, dropout = dropout, batch_first = True)
#        self.dropout = nn.Dropout(0.3)
        self.fc = nn.Linear(hidden_dim, output_size)

    def forward(self, nn_input, hidden):
        """
        Forward propagation of the neural network
        :param nn_input: The input to the neural network
        :param hidden: The hidden state        
        :return: Two Tensors, the output of the neural network and the latest hidden state
        """
        # TODO: Implement function   
#        print(type(nn_input))
#        print(type(nn_input.size))
        batch_size = nn_input.size(0)
#        print(batch_size) # 50

        # embeddings and lstm_out
        embeds = self.embd(nn_input)
        lstm_out, hidden = self.lstm(embeds, hidden)
#        print(type(hidden))
        
        # stack up lstm outputs
        lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)

        # dropout and fully-connected layer
#        out = self.dropout(lstm_out)
#        out = self.fc(out)
        out = self.fc(lstm_out)

        # reshape to be batch_size first
        out = out.view(batch_size, -1, self.output_size)
        out = out[:, -1] # get last batch of outputs

        # return one batch of output word scores and the hidden state
        return out, hidden
    
    def init_hidden(self, batch_size):
        '''
        Initialize the hidden state of an LSTM/GRU
        :param batch_size: The batch_size of the hidden state
        :return: hidden state of dims (n_layers, batch_size, hidden_dim)
        '''
        # Implement function
        
        # initialize hidden state with zero weights, and move to GPU if available
        weight = next(self.parameters()).data

        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())

        return hidden

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_rnn(RNN, train_on_gpu)

### Memo
**元のcodeでは、`nn.Dropout(0.3)`によって性能が悪くなっていた。**  
**dropoutを取り除くことで、性能が劇的に向上。**  
**Originally, `nn.Dropout(0.3)` caused poor performance.**  
**After removing dropout, performance was significantly improved.**  


### Define forward and backpropagation

Use the RNN class you implemented to apply forward and back propagation. This function will be called, iteratively, in the training loop as follows:
```
loss = forward_back_prop(decoder, decoder_optimizer, criterion, inp, target)
```

And it should return the average loss over a batch and the hidden state returned by a call to `RNN(inp, hidden)`. Recall that you can get this loss by computing it, as usual, and calling `loss.item()`.

**If a GPU is available, you should move your data to that GPU device, here.**

In [11]:
def forward_back_prop(rnn, optimizer, criterion, inp, target, hidden):
    """
    Forward and backward propagation on the neural network
    :param rnn: The PyTorch Module that holds the neural network
    :param optimizer: The PyTorch optimizer for the neural network
    :param criterion: The PyTorch loss function
    :param inp: A batch of input to the neural network
    :param target: The target output for the batch of input
    :return: The loss and the latest hidden state Tensor
    """
    
    # TODO: Implement Function
    clip=5 # gradient clipping
    
    # move data to GPU, if available
    if(train_on_gpu):
        inp, target = inp.cuda(), target.cuda()
    
    # perform backpropagation and optimization
    hidden = tuple([each.data for each in hidden])

    # zero accumulated gradients
    rnn.zero_grad()

    # get the output from the model
    output, hidden = rnn(inp, hidden)

    # calculate the loss and perform backprop
    loss = criterion(output.squeeze(), target)
    loss.backward()
    
    # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
    nn.utils.clip_grad_norm_(rnn.parameters(), clip)
    optimizer.step()

    # return the loss over a batch and the hidden state produced by our model
    return loss.item(), hidden

# Note that these tests aren't completely extensive.
# they are here to act as general checks on the expected outputs of your functions
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_forward_back_prop(RNN, forward_back_prop, train_on_gpu)

Tests Passed


## Neural Network Training

With the structure of the network complete and data ready to be fed in the neural network, it's time to train it.

### Train Loop

The training loop is implemented for you in the `train_decoder` function. This function will train the network over all the batches for the number of epochs given. The model progress will be shown every number of batches. This number is set with the `show_every_n_batches` parameter. You'll set this parameter along with other parameters in the next section.

In [12]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""

def train_rnn(rnn, batch_size, optimizer, criterion, n_epochs, show_every_n_batches=100):
    batch_losses = []
    
    rnn.train()

    print("Training for %d epoch(s)..." % n_epochs)
    for epoch_i in range(1, n_epochs + 1):
        
        # initialize hidden state
        hidden = rnn.init_hidden(batch_size)
        
        for batch_i, (inputs, labels) in enumerate(train_loader, 1):
            
            # make sure you iterate over completely full batches, only
            n_batches = len(train_loader.dataset)//batch_size
            if(batch_i > n_batches):
                break
            
            # forward, back prop
            loss, hidden = forward_back_prop(rnn, optimizer, criterion, inputs, labels, hidden)          
            # record loss
            batch_losses.append(loss)

            # printing loss stats
            if batch_i % show_every_n_batches == 0:
#                print('Epoch: {:>4}/{:<4}  Loss: {}\n'.format(
#                    epoch_i, n_epochs, np.average(batch_losses)))
                print('Epoch: {:>4}/{:<4}  Loss: {}'.format(
                    epoch_i, n_epochs, np.average(batch_losses)))
                batch_losses = []

    # returns a trained rnn
    return rnn

### Hyperparameters

Set and train the neural network with the following parameters:
- Set `sequence_length` to the length of a sequence.
- Set `batch_size` to the batch size.
- Set `num_epochs` to the number of epochs to train for.
- Set `learning_rate` to the learning rate for an Adam optimizer.
- Set `vocab_size` to the number of unique tokens in our vocabulary.
- Set `output_size` to the desired size of the output.
- Set `embedding_dim` to the embedding dimension; smaller than the vocab_size.
- Set `hidden_dim` to the hidden dimension of your RNN.
- Set `n_layers` to the number of layers/cells in your RNN.
- Set `show_every_n_batches` to the number of batches at which the neural network should print progress.

If the network isn't getting the desired results, tweak these parameters and/or the layers in the `RNN` class.

In [13]:
# Data params
# Sequence Length
sequence_length = 10  # of words in a sequence
# Batch Size
batch_size = 64

# data loader - do not change
train_loader = batch_data(int_text, sequence_length, batch_size)

In [14]:
# Training parameters
# Number of Epochs
num_epochs = 20
# Learning Rate
learning_rate = 0.001

# Model parameters
# Vocab size
vocab_size = len(vocab_to_int)
# Output size
output_size = vocab_size
# Embedding Dimension
embedding_dim = 256
# Hidden Dimension
hidden_dim = 800
# Number of RNN Layers
n_layers = 2

# Show stats for every n number of batches
show_every_n_batches = 640

### Train
In the next cell, you'll train the neural network on the pre-processed data.  If you have a hard time getting a good loss, you may consider changing your hyperparameters. In general, you may get better results with larger hidden and n_layer dimensions, but larger models take a longer time to train. 
> **You should aim for a loss less than 3.5.** 

You should also experiment with different sequence lengths, which determine the size of the long range dependencies that a model can learn.

In [15]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""

# create model and move to gpu if available
rnn = RNN(vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5)
if train_on_gpu:
    rnn.cuda()

# defining loss and optimization functions for training
optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

# training the model
trained_rnn = train_rnn(rnn, batch_size, optimizer, criterion, num_epochs, show_every_n_batches)

# saving the trained model
helper.save_model('./save/trained_rnn', trained_rnn)
print('Model Trained and Saved')

Training for 20 epoch(s)...
Epoch:    1/20    Loss: 5.229137389361858
Epoch:    1/20    Loss: 4.6782575029879805
Epoch:    1/20    Loss: 4.523252369463444
Epoch:    1/20    Loss: 4.39951732121408
Epoch:    1/20    Loss: 4.321641484647989
Epoch:    1/20    Loss: 4.285001425072551
Epoch:    1/20    Loss: 4.232086840644479
Epoch:    1/20    Loss: 4.198610675707459
Epoch:    1/20    Loss: 4.211443359032273
Epoch:    1/20    Loss: 4.143431318178773
Epoch:    1/20    Loss: 4.108359711617231
Epoch:    1/20    Loss: 4.136287257820368
Epoch:    1/20    Loss: 4.091873025521636
Epoch:    1/20    Loss: 4.082285134494304
Epoch:    1/20    Loss: 4.10346759557724
Epoch:    1/20    Loss: 4.078864604234695
Epoch:    1/20    Loss: 4.051289072632789
Epoch:    1/20    Loss: 4.06037515103817
Epoch:    1/20    Loss: 4.0634753502905365
Epoch:    1/20    Loss: 4.016846016794443
Epoch:    1/20    Loss: 4.022511497139931
Epoch:    2/20    Loss: 3.884987689822886
Epoch:    2/20    Loss: 3.7904474064707756
Epoch:

Epoch:   10/20    Loss: 3.0818775221705437
Epoch:   10/20    Loss: 3.049671177379787
Epoch:   10/20    Loss: 3.0897314865142107
Epoch:   10/20    Loss: 3.0922563571482895
Epoch:   10/20    Loss: 3.126779767498374
Epoch:   10/20    Loss: 3.140875707566738
Epoch:   10/20    Loss: 3.1425112027674915
Epoch:   10/20    Loss: 3.1609161268919705
Epoch:   10/20    Loss: 3.1613209303468466
Epoch:   10/20    Loss: 3.1826103407889605
Epoch:   10/20    Loss: 3.1734988264739514
Epoch:   10/20    Loss: 3.217325211688876
Epoch:   10/20    Loss: 3.2241872802376745
Epoch:   10/20    Loss: 3.221763363480568
Epoch:   10/20    Loss: 3.2503265094012024
Epoch:   10/20    Loss: 3.2601236172020434
Epoch:   11/20    Loss: 3.090364319698344
Epoch:   11/20    Loss: 2.9603553995490075
Epoch:   11/20    Loss: 2.972752561420202
Epoch:   11/20    Loss: 2.970705915428698
Epoch:   11/20    Loss: 3.0108174800872805
Epoch:   11/20    Loss: 3.031998048350215
Epoch:   11/20    Loss: 3.03409095518291
Epoch:   11/20    Loss

Epoch:   19/20    Loss: 2.862694627046585
Epoch:   19/20    Loss: 2.876845060288906
Epoch:   19/20    Loss: 2.915335972607136
Epoch:   19/20    Loss: 2.9086193937808273
Epoch:   19/20    Loss: 2.9207887187600137
Epoch:   19/20    Loss: 2.9207478921860455
Epoch:   19/20    Loss: 2.9499135542660953
Epoch:   19/20    Loss: 2.957243686541915
Epoch:   19/20    Loss: 2.9783644203096626
Epoch:   19/20    Loss: 2.9611406806856393
Epoch:   20/20    Loss: 2.8080363532134167
Epoch:   20/20    Loss: 2.710379865951836
Epoch:   20/20    Loss: 2.7313403541222216
Epoch:   20/20    Loss: 2.7475198272615673
Epoch:   20/20    Loss: 2.7642371913418176
Epoch:   20/20    Loss: 2.8081039214506744
Epoch:   20/20    Loss: 2.7939363082870843
Epoch:   20/20    Loss: 2.7994265543296932
Epoch:   20/20    Loss: 2.81603539660573
Epoch:   20/20    Loss: 2.824087541550398
Epoch:   20/20    Loss: 2.8570318717509506
Epoch:   20/20    Loss: 2.8544814608991147
Epoch:   20/20    Loss: 2.863310379907489
Epoch:   20/20    Lo

### Question: How did you decide on your model hyperparameters? 
For example, did you try different sequence_lengths and find that one size made the model converge faster? What about your hidden_dim and n_layers; how did you decide on those?

**Answer:** (Write answer, here)

For sequence_length, I tried 10, 16, 20 and 10 was better than others.  
And I guessed smaller numbers than 10 might be too small to estimate the next word appropreately from the sequence.

For batch_size,I tried only 64.  
I confirmed that the training was going well.  
In addition, according to the nvidia-smi command, the memory usage was 3333MiB / 7973MiB. I thought it was appropreate.

For num_epochs, I tried 10, 20, 30 and 50.  
30 might be appropreate. But I couldn't get the loss lower than 3.0 at the end of the training.  
So I chose 50 instead of 30.

For learning_rate, I tried only 0.001.  
I found some article that mentioned 0.001 or around numbers were appropreate.

For embedding_dim, I tried 128, 256, 400, and 512.  
At the final, I chose 256 considering that the number of unique words were 46367.  
Because in the Sentiment_RNN_Exercise exsample, we used 400 for 74072 unique words.

For hidden_dim, I tried 128, 256, 400, and 800.  
At the final, I chose 800 because I couldn't get the loss less than 3.5 using smaller numbers as hidden_dim.

For n_layers, I chose 2 because it must be 1-3 from the criteria.

---
# Checkpoint

After running the above training cell, your model will be saved by name, `trained_rnn`, and if you save your notebook progress, **you can pause here and come back to this code at another time**. You can resume your progress by running the next cell, which will load in our word:id dictionaries _and_ load in your saved model by name!

In [16]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import torch
import helper
import problem_unittests as tests

_, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()
trained_rnn = helper.load_model('./save/trained_rnn')

## Generate TV Script
With the network trained and saved, you'll use it to generate a new, "fake" Seinfeld TV script in this section.

### Generate Text
To generate the text, the network needs to start with a single word and repeat its predictions until it reaches a set length. You'll be using the `generate` function to do this. It takes a word id to start with, `prime_id`, and generates a set length of text, `predict_len`. Also note that it uses topk sampling to introduce some randomness in choosing the most likely next word, given an output set of word scores!

In [17]:
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
import torch.nn.functional as F

def generate(rnn, prime_id, int_to_vocab, token_dict, pad_value, predict_len=100):
    """
    Generate text using the neural network
    :param decoder: The PyTorch Module that holds the trained neural network
    :param prime_id: The word id to start the first prediction
    :param int_to_vocab: Dict of word id keys to word values
    :param token_dict: Dict of puncuation tokens keys to puncuation values
    :param pad_value: The value used to pad a sequence
    :param predict_len: The length of text to generate
    :return: The generated text
    """
    rnn.eval()
    
    # create a sequence (batch_size=1) with the prime_id
    current_seq = np.full((1, sequence_length), pad_value)
    current_seq[-1][-1] = prime_id
    predicted = [int_to_vocab[prime_id]]
    
    for _ in range(predict_len):
        if train_on_gpu:
            current_seq = torch.LongTensor(current_seq).cuda()
        else:
            current_seq = torch.LongTensor(current_seq)
        
        # initialize the hidden state
        hidden = rnn.init_hidden(current_seq.size(0))
        
        # get the output of the rnn
        output, _ = rnn(current_seq, hidden)
        
        # get the next word probabilities
        p = F.softmax(output, dim=1).data
        if(train_on_gpu):
            p = p.cpu() # move to cpu
         
        # use top_k sampling to get the index of the next word
        top_k = 5
        p, top_i = p.topk(top_k)
        top_i = top_i.numpy().squeeze()
        
        # select the likely next word index with some element of randomness
        p = p.numpy().squeeze()
        word_i = np.random.choice(top_i, p=p/p.sum())
        
        # retrieve that word from the dictionary
        word = int_to_vocab[word_i]
        predicted.append(word)     
        
        if(train_on_gpu):
            current_seq = current_seq.cpu() # move to cpu
        # the generated word becomes the next "current sequence" and the cycle can continue
        if train_on_gpu:
            current_seq = current_seq.cpu()
        current_seq = np.roll(current_seq, -1, 1)
        current_seq[-1][-1] = word_i
    
    gen_sentences = ' '.join(predicted)
    
    # Replace punctuation tokens
    for key, token in token_dict.items():
        ending = ' ' if key in ['\n', '(', '"'] else ''
        gen_sentences = gen_sentences.replace(' ' + token.lower(), key)
    gen_sentences = gen_sentences.replace('\n ', '\n')
    gen_sentences = gen_sentences.replace('( ', '(')
    
    # return all the sentences
    return gen_sentences

### Generate a New Script
It's time to generate the text. Set `gen_length` to the length of TV script you want to generate and set `prime_word` to one of the following to start the prediction:
- "jerry"
- "elaine"
- "george"
- "kramer"

You can set the prime word to _any word_ in our dictionary, but it's best to start with a name for generating a TV script. (You can also start with any other names you find in the original text file!)

In [36]:
# run the cell multiple times to get different results!
gen_length = 400 # modify the length to your preference
prime_word = 'jerry' # name for starting the script

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
pad_word = helper.SPECIAL_WORDS['PADDING']
generated_script = generate(trained_rnn, vocab_to_int[prime_word + ':'], int_to_vocab, token_dict, vocab_to_int[pad_word], gen_length)
print(generated_script)

jerry: rooms climbing everywhere, but apparently, i'm gonna get you a note for some new tv city?

george: yeah, i suppose.

jerry:(cont'd) i can't believe this, kramer..

kramer: yeah, yeah. i got to get out of my mind, moving into a space.

jerry: what are you saying?

george: i'm gay. i'm shaking.

jerry: what? what are you doing, you should've seen a relationship before i find myself.

elaine:(laughs, frustrated) stella.

george:(shaking hands triumphantly) hey, hey, hey! hey.(to kramer) hey, what about your arm?!

kramer: well, i guess i'll be lying.

jerry: well, i think i'm gonna have to get my money, i dont want to hear anything to do with this guy from nbc. he's a real nut, but i just want to be a very eligible lesbian. i mean it...

jerry: oh, no, not really.

george: what are you doing here?

jerry: i don't know, but you don't say anything.

jerry: i thought it might be something in a situation. i just wanted to see you.

kramer: oh!

jerry: oh. i knew it! i saw you!

morty: 

#### Save your favorite scripts

Once you have a script that you like (or find interesting), save it to a text file!

In [98]:
# save script to a text file
f =  open("generated_script_1.txt","w")
f.write(generated_script)
f.close()

# The TV Script is Not Perfect
It's ok if the TV script doesn't make perfect sense. It should look like alternating lines of dialogue, here is one such example of a few generated lines.

### Example generated script

>jerry: what about me?
>
>jerry: i don't have to wait.
>
>kramer:(to the sales table)
>
>elaine:(to jerry) hey, look at this, i'm a good doctor.
>
>newman:(to elaine) you think i have no idea of this...
>
>elaine: oh, you better take the phone, and he was a little nervous.
>
>kramer:(to the phone) hey, hey, jerry, i don't want to be a little bit.(to kramer and jerry) you can't.
>
>jerry: oh, yeah. i don't even know, i know.
>
>jerry:(to the phone) oh, i know.
>
>kramer:(laughing) you know...(to jerry) you don't know.

You can see that there are multiple characters that say (somewhat) complete sentences, but it doesn't have to be perfect! It takes quite a while to get good results, and often, you'll have to use a smaller vocabulary (and discard uncommon words), or get more data.  The Seinfeld dataset is about 3.4 MB, which is big enough for our purposes; for script generation you'll want more than 1 MB of text, generally. 

# Submitting This Project
When submitting this project, make sure to run all the cells before saving the notebook. Save the notebook file as "dlnd_tv_script_generation.ipynb" and save another copy as an HTML file by clicking "File" -> "Download as.."->"html". Include the "helper.py" and "problem_unittests.py" files in your submission. Once you download these files, compress them into one zip file for submission.

# Grid Search Version

In [43]:
def train_rnn_get_loss(rnn, batch_size, optimizer, criterion, n_epochs, show_every_n_batches=100):

    rnn.train()

    print("Training for %d epoch(s)..." % n_epochs)
    for epoch_i in range(1, n_epochs + 1):
        
        # initialize hidden state
        hidden = rnn.init_hidden(batch_size)
        
        batch_losses = []
        
        for batch_i, (inputs, labels) in enumerate(train_loader, 1):
            
            # make sure you iterate over completely full batches, only
            n_batches = len(train_loader.dataset)//batch_size
            if(batch_i > n_batches):
                break
            
            # forward, back prop
            loss, hidden = forward_back_prop(rnn, optimizer, criterion, inputs, labels, hidden)          
            # record loss
            batch_losses.append(loss)
            # printing loss stats
            if batch_i % show_every_n_batches == 0:
                print('Epoch: {:>4}/{:<4}  Loss: {}'.format(
                    epoch_i, n_epochs, np.average(batch_losses)))

        print('Epoch: {:>4}/{:<4}  Loss: {}'.format(epoch_i, n_epochs, np.average(batch_losses)))

    # returns a trained rnn
    return rnn, np.average(batch_losses)

# Sequence Length
for sequence_length in [10, 16]:  # of words in a sequence
    min_loss = 10000
    # Batch Size
    batch_size = 64

    # data loader - do not change
    train_loader = batch_data(int_text, sequence_length, batch_size)

    # Training parameters
    # Number of Epochs
    num_epochs = 10
    # Learning Rate
    learning_rate = 0.001

    # Model parameters
    # Vocab size
    vocab_size = len(vocab_to_int)
    # Output size
    output_size = vocab_size
    # Embedding Dimension
    for embedding_dim in [128, 256]:
        # Hidden Dimension
        for hidden_dim in [256, 800]:
            # Number of RNN Layers
            for n_layers in [2, 3]:

                # Show stats for every n number of batches
                show_every_n_batches = 1280

                # create model and move to gpu if available
                rnn = RNN(vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5)
                if train_on_gpu:
                    rnn.cuda()

                # defining loss and optimization functions for training
                optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
                criterion = nn.CrossEntropyLoss()

                print('batch_size:{}, learning_rate:{}, embedding_dim:{}, hidden_dim:{}, n_layers:{}'.format(batch_size, learning_rate, embedding_dim, hidden_dim, n_layers))
                # training the model
                trained_rnn, average_loss = train_rnn_get_loss(rnn, batch_size, optimizer, criterion, num_epochs, show_every_n_batches)
                
                if (average_loss < min_loss):
                    min_loss = average_loss
                    # saving the trained model
                    helper.save_model('./save/trained_rnn_grid_search', trained_rnn)
                    print('Model Trained and Saved')


batch_size:64, learning_rate:0.001, embedding_dim:128, hidden_dim:256, n_layers:2
Training for 10 epoch(s)...
Epoch:    1/10    Loss: 5.264968131482601
Epoch:    1/10    Loss: 4.999911734741181
Epoch:    1/10    Loss: 4.848859952017665
Epoch:    1/10    Loss: 4.7442983041517435
Epoch:    1/10    Loss: 4.667877151221037
Epoch:    1/10    Loss: 4.6096951281651855
Epoch:    1/10    Loss: 4.561065202206374
Epoch:    1/10    Loss: 4.519967463822104
Epoch:    1/10    Loss: 4.484614451664189
Epoch:    1/10    Loss: 4.455811984539032
Epoch:    1/10    Loss: 4.4340362672835205
Epoch:    2/10    Loss: 3.993470881320536
Epoch:    2/10    Loss: 3.997169641871005
Epoch:    2/10    Loss: 3.9992007171114285
Epoch:    2/10    Loss: 3.995821291860193
Epoch:    2/10    Loss: 3.9943844141066074
Epoch:    2/10    Loss: 3.9936816899105905
Epoch:    2/10    Loss: 3.992791992958103
Epoch:    2/10    Loss: 3.9916433964390308
Epoch:    2/10    Loss: 3.9916051532038384
Epoch:    2/10    Loss: 3.9916085172630846

Epoch:    8/10    Loss: 3.64770968798548
Epoch:    8/10    Loss: 3.654914181306958
Epoch:    8/10    Loss: 3.661445679375902
Epoch:    8/10    Loss: 3.6720642073079945
Epoch:    8/10    Loss: 3.6795959917207557
Epoch:    8/10    Loss: 3.6886286893859506
Epoch:    8/10    Loss: 3.6949191309278833
Epoch:    8/10    Loss: 3.700994110479951
Epoch:    8/10    Loss: 3.7089974397234617
Epoch:    8/10    Loss: 3.7151971180971457
Epoch:    9/10    Loss: 3.6261410348117353
Epoch:    9/10    Loss: 3.6305971120484175
Epoch:    9/10    Loss: 3.6395607352877657
Epoch:    9/10    Loss: 3.64665651503019
Epoch:    9/10    Loss: 3.653685080856085
Epoch:    9/10    Loss: 3.6610509720941384
Epoch:    9/10    Loss: 3.6672159915257776
Epoch:    9/10    Loss: 3.6728388684103264
Epoch:    9/10    Loss: 3.6775356138745945
Epoch:    9/10    Loss: 3.685173584166914
Epoch:    9/10    Loss: 3.6909140724133143
Epoch:   10/10    Loss: 3.6118910860270264
Epoch:   10/10    Loss: 3.6026403850875797
Epoch:   10/10    Lo

Epoch:    5/10    Loss: 3.6624406481782597
Epoch:    5/10    Loss: 3.67044678600505
Epoch:    5/10    Loss: 3.6739292223379016
Epoch:    5/10    Loss: 3.677812303043902
Epoch:    5/10    Loss: 3.6816658697756273
Epoch:    5/10    Loss: 3.687349981069565
Epoch:    5/10    Loss: 3.691051408379442
Epoch:    5/10    Loss: 3.6975777609460057
Epoch:    5/10    Loss: 3.7016785000957695
Epoch:    6/10    Loss: 3.54476048797369
Epoch:    6/10    Loss: 3.5541616732254626
Epoch:    6/10    Loss: 3.55945693453153
Epoch:    6/10    Loss: 3.5681024524383247
Epoch:    6/10    Loss: 3.575426831804216
Epoch:    6/10    Loss: 3.584946066917231
Epoch:    6/10    Loss: 3.5919636079509343
Epoch:    6/10    Loss: 3.5994318182813005
Epoch:    6/10    Loss: 3.603767972025606
Epoch:    6/10    Loss: 3.611846964973956
Epoch:    6/10    Loss: 3.6168483886462868
Epoch:    7/10    Loss: 3.4850825188681482
Epoch:    7/10    Loss: 3.477480092551559
Epoch:    7/10    Loss: 3.483951543706159
Epoch:    7/10    Loss: 3.

Epoch:    2/10    Loss: 4.087346386648715
Epoch:    2/10    Loss: 4.091070807849367
Epoch:    2/10    Loss: 4.090618892173682
Epoch:    2/10    Loss: 4.0917139947647225
Epoch:    2/10    Loss: 4.089782899535365
Epoch:    2/10    Loss: 4.090260819587857
Epoch:    2/10    Loss: 4.090196763471104
Epoch:    3/10    Loss: 3.910403460636735
Epoch:    3/10    Loss: 3.925250784680247
Epoch:    3/10    Loss: 3.924754748555521
Epoch:    3/10    Loss: 3.9309794328641146
Epoch:    3/10    Loss: 3.9341189857199788
Epoch:    3/10    Loss: 3.9350391063218315
Epoch:    3/10    Loss: 3.9396994755470325
Epoch:    3/10    Loss: 3.9432947943918406
Epoch:    3/10    Loss: 3.945910024518768
Epoch:    3/10    Loss: 3.9516319378092883
Epoch:    3/10    Loss: 3.954579643936699
Epoch:    4/10    Loss: 3.8407948300242425
Epoch:    4/10    Loss: 3.835350781865418
Epoch:    4/10    Loss: 3.83278302544107
Epoch:    4/10    Loss: 3.8333448130637406
Epoch:    4/10    Loss: 3.836094100549817
Epoch:    4/10    Loss: 3.

Epoch:    9/10    Loss: 3.2204678751735223
Epoch:    9/10    Loss: 3.232292750943452
Epoch:    9/10    Loss: 3.2434076326924286
Epoch:   10/10    Loss: 3.073245872091502
Epoch:   10/10    Loss: 3.081191825354472
Epoch:   10/10    Loss: 3.096925279032439
Epoch:   10/10    Loss: 3.1122453997144475
Epoch:   10/10    Loss: 3.1255539071373644
Epoch:   10/10    Loss: 3.135210199886933
Epoch:   10/10    Loss: 3.1481598422330404
Epoch:   10/10    Loss: 3.1587151309126056
Epoch:   10/10    Loss: 3.170471491592212
Epoch:   10/10    Loss: 3.1822154447901996
Epoch:   10/10    Loss: 3.1921824236683785
batch_size:64, learning_rate:0.001, embedding_dim:256, hidden_dim:800, n_layers:3
Training for 10 epoch(s)...
Epoch:    1/10    Loss: 5.215050030127168
Epoch:    1/10    Loss: 4.953537492454052
Epoch:    1/10    Loss: 4.811015879859527
Epoch:    1/10    Loss: 4.719901277823373
Epoch:    1/10    Loss: 4.650515756309033
Epoch:    1/10    Loss: 4.599604161828756
Epoch:    1/10    Loss: 4.554820132867566


Epoch:    6/10    Loss: 3.620685564074665
Epoch:    6/10    Loss: 3.6269950405872953
Epoch:    7/10    Loss: 3.5021766813471915
Epoch:    7/10    Loss: 3.5158820994198323
Epoch:    7/10    Loss: 3.527199724378685
Epoch:    7/10    Loss: 3.536724126059562
Epoch:    7/10    Loss: 3.541896550171077
Epoch:    7/10    Loss: 3.5498275894050795
Epoch:    7/10    Loss: 3.5590915934049656
Epoch:    7/10    Loss: 3.565455923555419
Epoch:    7/10    Loss: 3.5736893938026495
Epoch:    7/10    Loss: 3.579940702319145
Epoch:    7/10    Loss: 3.587010841130834
Epoch:    8/10    Loss: 3.4792083259671926
Epoch:    8/10    Loss: 3.478154028672725
Epoch:    8/10    Loss: 3.4934846660743157
Epoch:    8/10    Loss: 3.5000160037074237
Epoch:    8/10    Loss: 3.5047769782692195
Epoch:    8/10    Loss: 3.5138338550614816
Epoch:    8/10    Loss: 3.523058999888599
Epoch:    8/10    Loss: 3.5299277785234153
Epoch:    8/10    Loss: 3.538496013875637
Epoch:    8/10    Loss: 3.547138053011149
Epoch:    8/10    Loss

Epoch:    3/10    Loss: 3.6506281670341707
Epoch:    4/10    Loss: 3.408326674439013
Epoch:    4/10    Loss: 3.4159938456490635
Epoch:    4/10    Loss: 3.425401338810722
Epoch:    4/10    Loss: 3.4422694918699563
Epoch:    4/10    Loss: 3.4545797811821104
Epoch:    4/10    Loss: 3.4666121816262603
Epoch:    4/10    Loss: 3.474711576423475
Epoch:    4/10    Loss: 3.486421631788835
Epoch:    4/10    Loss: 3.495206938725379
Epoch:    4/10    Loss: 3.503063950277865
Epoch:    4/10    Loss: 3.512598273025077
Epoch:    5/10    Loss: 3.275439489632845
Epoch:    5/10    Loss: 3.2811921460554005
Epoch:    5/10    Loss: 3.2928880234559377
Epoch:    5/10    Loss: 3.3079421288799495
Epoch:    5/10    Loss: 3.3209659817814825
Epoch:    5/10    Loss: 3.335747663769871
Epoch:    5/10    Loss: 3.3484539619247826
Epoch:    5/10    Loss: 3.35956149129197
Epoch:    5/10    Loss: 3.3690309659267466
Epoch:    5/10    Loss: 3.380601658578962
Epoch:    5/10    Loss: 3.3900244008487257
Epoch:    6/10    Loss:

Epoch:    1/10    Loss: 5.228715258464217
Epoch:    1/10    Loss: 4.942526105605066
Epoch:    1/10    Loss: 4.7785799597079555
Epoch:    1/10    Loss: 4.679375772783533
Epoch:    1/10    Loss: 4.604295153543353
Epoch:    1/10    Loss: 4.548613437544555
Epoch:    1/10    Loss: 4.5042989672028595
Epoch:    1/10    Loss: 4.466915598721243
Epoch:    1/10    Loss: 4.43254047534946
Epoch:    1/10    Loss: 4.405462696831673
Epoch:    1/10    Loss: 4.382922561323648
Epoch:    2/10    Loss: 3.9461035514250398
Epoch:    2/10    Loss: 3.9467760029248895
Epoch:    2/10    Loss: 3.9492753335585196
Epoch:    2/10    Loss: 3.9486444569192827
Epoch:    2/10    Loss: 3.9439453960210087
Epoch:    2/10    Loss: 3.9445254194550214
Epoch:    2/10    Loss: 3.946451954969338
Epoch:    2/10    Loss: 3.9496539297048003
Epoch:    2/10    Loss: 3.9522061460961897
Epoch:    2/10    Loss: 3.9537505325488747
Epoch:    2/10    Loss: 3.9546624644275785
Epoch:    3/10    Loss: 3.763377636484802
Epoch:    3/10    Loss:

Epoch:    8/10    Loss: 3.738755423538387
Epoch:    8/10    Loss: 3.7464617064843577
Epoch:    8/10    Loss: 3.7548239655792712
Epoch:    8/10    Loss: 3.7603790393797683
Epoch:    8/10    Loss: 3.765430630888376
Epoch:    8/10    Loss: 3.7710920579731466
Epoch:    8/10    Loss: 3.777512085200006
Epoch:    9/10    Loss: 3.690780128352344
Epoch:    9/10    Loss: 3.6887072461657224
Epoch:    9/10    Loss: 3.705173442761103
Epoch:    9/10    Loss: 3.708821218693629
Epoch:    9/10    Loss: 3.713466298133135
Epoch:    9/10    Loss: 3.717600200790912
Epoch:    9/10    Loss: 3.7270566176889197
Epoch:    9/10    Loss: 3.7318390691885726
Epoch:    9/10    Loss: 3.739780963398516
Epoch:    9/10    Loss: 3.7475055928342043
Epoch:    9/10    Loss: 3.754077760487996
Epoch:   10/10    Loss: 3.6438882002606987
Epoch:   10/10    Loss: 3.665692257415503
Epoch:   10/10    Loss: 3.6736817818755907
Epoch:   10/10    Loss: 3.685979281552136
Epoch:   10/10    Loss: 3.695853739604354
Epoch:   10/10    Loss: 

Epoch:    5/10    Loss: 3.574837555674215
Epoch:    5/10    Loss: 3.5818678147558654
Epoch:    5/10    Loss: 3.58983765959274
Epoch:    5/10    Loss: 3.5965291536723574
Epoch:    5/10    Loss: 3.603615446314216
Epoch:    5/10    Loss: 3.610638679416677
Epoch:    6/10    Loss: 3.4570452101528644
Epoch:    6/10    Loss: 3.4645339022390544
Epoch:    6/10    Loss: 3.4736134202529985
Epoch:    6/10    Loss: 3.4789932266343384
Epoch:    6/10    Loss: 3.486119572073221
Epoch:    6/10    Loss: 3.4912211169799168
Epoch:    6/10    Loss: 3.4991075305268167
Epoch:    6/10    Loss: 3.507583526801318
Epoch:    6/10    Loss: 3.5156731380977564
Epoch:    6/10    Loss: 3.5232170472852884
Epoch:    6/10    Loss: 3.528836087681741
Epoch:    7/10    Loss: 3.3786056648939846
Epoch:    7/10    Loss: 3.383696526195854
Epoch:    7/10    Loss: 3.387665758219858
Epoch:    7/10    Loss: 3.3976167250890286
Epoch:    7/10    Loss: 3.408801605850458
Epoch:    7/10    Loss: 3.416221312681834
Epoch:    7/10    Loss:

|sequence_length|embedding_dim|hidden_dim|n_layers|average_loss|
|:---:|:---:|:---:|:---:|:---:|
|10|128|256|2|3.529771206599413|
|10|128|256|3|3.6684367423745496|
|10|128|800|2|3.0826866824381134|
|10|128|800|3|3.3583731963632952|
|10|256|256|2|3.5334890530350407|
|10|256|256|3|3.6434468732296508|
|10|256|800|2|3.1921824236683785|
|10|256|800|3|3.2873149972297564|
|16|128|256|2|3.50244138775879|
|16|128|256|3|3.6361112283281622|
|16|128|800|2|**2.9521624652294913**|
|16|128|800|3|3.5764145333005053|
|16|256|256|2|3.5163858515543343|
|16|256|256|3|3.735428644160592|
|16|256|800|2|3.1695165765954325|
|16|256|800|3|3.2587432532728546|

In [44]:
min_loss = 10000

# Sequence Length
sequence_length = 16 # of words in a sequence

# Batch Size
batch_size = 64

# data loader - do not change
train_loader = batch_data(int_text, sequence_length, batch_size)

# Training parameters
# Number of Epochs
num_epochs = 20
# Learning Rate
learning_rate = 0.001

# Model parameters
# Vocab size
vocab_size = len(vocab_to_int)
# Output size
output_size = vocab_size
# Embedding Dimension
embedding_dim = 128
# Hidden Dimension
hidden_dim = 800
# Number of RNN Layers
n_layers = 2

# Show stats for every n number of batches
show_every_n_batches = 1280

In [45]:
# create model and move to gpu if available
rnn = RNN(vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5)
if train_on_gpu:
    rnn.cuda()

# defining loss and optimization functions for training
optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

print('batch_size:{}, learning_rate:{}, embedding_dim:{}, hidden_dim:{}, n_layers:{}'.format(batch_size, learning_rate, embedding_dim, hidden_dim, n_layers))
# training the model
trained_rnn, average_loss = train_rnn_get_loss(rnn, batch_size, optimizer, criterion, num_epochs, show_every_n_batches)

if (average_loss < min_loss):
    min_loss = average_loss
    # saving the trained model
    helper.save_model('./save/trained_rnn_2', trained_rnn)
    print('Model Trained and Saved')

batch_size:64, learning_rate:0.001, embedding_dim:128, hidden_dim:800, n_layers:2
Training for 20 epoch(s)...
Epoch:    1/20    Loss: 5.049312077462673
Epoch:    1/20    Loss: 4.785189735144376
Epoch:    1/20    Loss: 4.643437670730054
Epoch:    1/20    Loss: 4.544743520300836
Epoch:    1/20    Loss: 4.470967053063214
Epoch:    1/20    Loss: 4.412431323900819
Epoch:    1/20    Loss: 4.370049592825983
Epoch:    1/20    Loss: 4.333909240504727
Epoch:    1/20    Loss: 4.301998683561881
Epoch:    1/20    Loss: 4.276040617171675
Epoch:    1/20    Loss: 4.256299495235169
Epoch:    2/20    Loss: 3.8173890888690947
Epoch:    2/20    Loss: 3.8178693807683883
Epoch:    2/20    Loss: 3.819128568905095
Epoch:    2/20    Loss: 3.821230503730476
Epoch:    2/20    Loss: 3.8224444828554986
Epoch:    2/20    Loss: 3.8213826744817196
Epoch:    2/20    Loss: 3.8218089948009166
Epoch:    2/20    Loss: 3.823505443846807
Epoch:    2/20    Loss: 3.8268416297725505
Epoch:    2/20    Loss: 3.827874788157642
Ep

Epoch:   19/20    Loss: 2.459620485454798
Epoch:   19/20    Loss: 2.468232113867998
Epoch:   19/20    Loss: 2.4856557890151936
Epoch:   19/20    Loss: 2.50496483319439
Epoch:   19/20    Loss: 2.5210230265371503
Epoch:   19/20    Loss: 2.5384905927969763
Epoch:   19/20    Loss: 2.5520316815535935
Epoch:   19/20    Loss: 2.565461745543871
Epoch:   19/20    Loss: 2.5797230710999832
Epoch:   19/20    Loss: 2.592569358451292
Epoch:   19/20    Loss: 2.6035747996599983
Epoch:   20/20    Loss: 2.4445276531390845
Epoch:   20/20    Loss: 2.454839697200805
Epoch:   20/20    Loss: 2.46934798459212
Epoch:   20/20    Loss: 2.4862052518874407
Epoch:   20/20    Loss: 2.504067032150924
Epoch:   20/20    Loss: 2.5162599871711184
Epoch:   20/20    Loss: 2.530266154025282
Epoch:   20/20    Loss: 2.5426904736319558
Epoch:   20/20    Loss: 2.553757861805045
Epoch:   20/20    Loss: 2.5655411624349655
Epoch:   20/20    Loss: 2.5770081997533527
Model Trained and Saved


In [47]:
import torch
import helper
import problem_unittests as tests

_, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()
trained_rnn = helper.load_model('./save/trained_rnn_2')

In [56]:
# run the cell multiple times to get different results!
gen_length = 400 # modify the length to your preference
prime_word = 'jerry' # name for starting the script

pad_word = helper.SPECIAL_WORDS['PADDING']
generated_script = generate(trained_rnn, vocab_to_int[prime_word + ':'], int_to_vocab, token_dict, vocab_to_int[pad_word], gen_length)
print(generated_script)

jerry: weddings vandelay

george: what are you doing? i thought you were changing him. that's why he came home and he was on the stage.

jerry: what?

elaine: you know, i'm not married. i mean, i have to tell you, i'm not gonna tell you.

george:(shouting) i mean, thats exactly what it would be.. i just realized kruger might be dating someone like this.

elaine: oh..

jerry: i thought we were going to get the skin. we had a deal with you to write on your mind.

elaine: i don't know, opera.

jerry:(cont'd) yeah, i guess.

elaine:(to anna) oh, my god.

george: what?

jerry: the network movement. the whole stock is based on the non- marathon.

jerry:(to george) so?

elaine: i don't know.

jerry: well if you can't get it out of my cab?

kramer: oh, no, no. i got a big nose. i'm a little worried.

jerry:(continuing) yeah.

elaine: well, you know i have spoken to the movies. this is close.

kramer: hey, hey. what's that?

elaine: i thought we were looking at hop sips.

george:(still quiet) w

In [57]:
# save script to a text file
f =  open("generated_script_2.txt","w")
f.write(generated_script)
f.close()

## Further Kaizen

In [4]:
# load in data
import helper
data_dir = './data/Seinfeld_Scripts.txt'
text = helper.load_data(data_dir)

### add assertion

In [108]:
import problem_unittests as tests
from string import punctuation
# from collections import Counter

def create_lookup_tables(text):
    assert type(text) is list
    
    all_text = ' '.join([word for word in text]) # consolidate to one string
    text_split = all_text.split('\n')
    all_text = ''.join(text_split)
    words = all_text.split() # get word list

    word_sets = set(words)
    sorted_vocab = sorted(word_sets, reverse=True)
# 2 lines of code above using Set instead of Counter replaced the following code
#    word_counts = Counter(words)
#    sorted_vocab = sorted(word_counts, key=word_counts.get, reverse=True)
    int_to_vocab = {ii: word for ii, word in enumerate(sorted_vocab, 0)}
    vocab_to_int = {word: ii for ii, word in int_to_vocab.items()}


    return (vocab_to_int, int_to_vocab)

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_create_lookup_tables(create_lookup_tables)

Tests Passed


In [24]:
def token_lookup():
    """
    Generate a dict to turn punctuation into a token.
    :return: Tokenized dictionary where the key is the punctuation and the value is the token
    """
    # TODO: Implement Function
    punc_to_token = {
        '.': "||Period||", 
        ',': "||Comma||", 
        '"': "||Quotation_Mark||", 
        ';': "||Semicolon||", 
        '!': "||Exclamation_mark||", 
        '?': "||Question_mark||", 
        '(': "||Left_Parentheses||", 
        ')': "||Right_Parentheses||", 
        '-': "||Dash||", 
        '\n': "||Return||"
    }
    
    return punc_to_token

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_tokenize(token_lookup)

Tests Passed


In [25]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# pre-process training data
helper.preprocess_and_save_data(data_dir, token_lookup, create_lookup_tables)

In [26]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper
import problem_unittests as tests

int_text, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()

In [27]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import torch

# Check for a GPU
train_on_gpu = torch.cuda.is_available()
if not train_on_gpu:
    print('No GPU found. Please use a GPU to train your neural network.')

### add assertion, use a singular list comprehension, remove shuffling, use num_workers
An reviewer said we could use num_workers=-1 but the minus was not allowed.  
Alternatively, I could use os.cpu_count() or psutil.cpu_count().
Also, I could use pin_memory as True.
I referred https://qiita.com/sugulu_Ogawa_ISID/items/62f5f7adee083d96a587 (in Japanese) to accelarate training and inference process.

In [101]:
import psutil
psutil.cpu_count()

16

In [102]:
import os
os.cpu_count()

16

In [103]:
import numpy as np
from torch.utils.data import TensorDataset, DataLoader

def batch_data(words, sequence_length, batch_size):
    """
    Batch the neural network data using DataLoader
    :param words: The word ids of the TV scripts
    :param sequence_length: The sequence length of each batch
    :param batch_size: The size of each batch; the number of sequences in a batch
    :return: DataLoader with batched data
    """

    n = len(words) - sequence_length
    features = np.array([words[i : i + sequence_length] for i in range(n)])
    targets = np.array([words[i + sequence_length] for i in range(n)])
# 3 lines code using a singular list comprehension above replaced the following code.
#    features = np.zeros(((len(words) - sequence_length), sequence_length), dtype=int)
#    targets = np.zeros((len(words) - sequence_length), dtype=int)
#    for i in range(len(words) - sequence_length):
#        features[i] = words[i : i + sequence_length]
#        targets[i] = words[i + sequence_length]
    
    assert type(features) is np.ndarray
    assert type(targets) is np.ndarray

    data = TensorDataset(torch.from_numpy(features), torch.from_numpy(targets))
    data_loader = torch.utils.data.DataLoader(data, batch_size=batch_size, num_workers=os.cpu_count(), pin_memory=True)
# A code using num_workers and pin_memory above replaced the following code.
#    data_loader = torch.utils.data.DataLoader(data, batch_size=batch_size)

    return data_loader

In [98]:
# test dataloader

test_text = range(50)
t_loader = batch_data(test_text, sequence_length=5, batch_size=10)

data_iter = iter(t_loader)
sample_x, sample_y = data_iter.next()

print(sample_x.shape)
print(sample_x)
print()
print(sample_y.shape)
print(sample_y)

torch.Size([10, 5])
tensor([[ 0,  1,  2,  3,  4],
        [ 1,  2,  3,  4,  5],
        [ 2,  3,  4,  5,  6],
        [ 3,  4,  5,  6,  7],
        [ 4,  5,  6,  7,  8],
        [ 5,  6,  7,  8,  9],
        [ 6,  7,  8,  9, 10],
        [ 7,  8,  9, 10, 11],
        [ 8,  9, 10, 11, 12],
        [ 9, 10, 11, 12, 13]])

torch.Size([10])
tensor([ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14])


In [71]:
import torch.nn as nn

class RNN(nn.Module):
    
    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5):
        """
        Initialize the PyTorch RNN Module
        :param vocab_size: The number of input dimensions of the neural network (the size of the vocabulary)
        :param output_size: The number of output dimensions of the neural network
        :param embedding_dim: The size of embeddings, should you choose to use them        
        :param hidden_dim: The size of the hidden layer outputs
        :param dropout: dropout to add in between LSTM/GRU layers
        """
        super(RNN, self).__init__()
        
        # TODO: Implement function
        
        # set class variables
        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim

        # define model layers
        self.embd = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, dropout = dropout, batch_first = True)
#        self.dropout = nn.Dropout(0.3)
        self.fc = nn.Linear(hidden_dim, output_size)

    def forward(self, nn_input, hidden):
        """
        Forward propagation of the neural network
        :param nn_input: The input to the neural network
        :param hidden: The hidden state        
        :return: Two Tensors, the output of the neural network and the latest hidden state
        """
        # TODO: Implement function   
#        print(type(nn_input))
#        print(type(nn_input.size))
        batch_size = nn_input.size(0)
#        print(batch_size) # 50

        # embeddings and lstm_out
        embeds = self.embd(nn_input)
        lstm_out, hidden = self.lstm(embeds, hidden)
#        print(type(hidden))
        
        # stack up lstm outputs
        lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)

        # dropout and fully-connected layer
#        out = self.dropout(lstm_out)
#        out = self.fc(out)
        out = self.fc(lstm_out)

        # reshape to be batch_size first
        out = out.view(batch_size, -1, self.output_size)
        out = out[:, -1] # get last batch of outputs

        # return one batch of output word scores and the hidden state
        return out, hidden
    
    def init_hidden(self, batch_size):
        '''
        Initialize the hidden state of an LSTM/GRU
        :param batch_size: The batch_size of the hidden state
        :return: hidden state of dims (n_layers, batch_size, hidden_dim)
        '''
        # Implement function
        
        # initialize hidden state with zero weights, and move to GPU if available
        weight = next(self.parameters()).data

        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())

        return hidden

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_rnn(RNN, train_on_gpu)

Tests Passed


In [72]:
def forward_back_prop(rnn, optimizer, criterion, inp, target, hidden):
    """
    Forward and backward propagation on the neural network
    :param rnn: The PyTorch Module that holds the neural network
    :param optimizer: The PyTorch optimizer for the neural network
    :param criterion: The PyTorch loss function
    :param inp: A batch of input to the neural network
    :param target: The target output for the batch of input
    :return: The loss and the latest hidden state Tensor
    """
    
    # TODO: Implement Function
    clip=5 # gradient clipping
    
    # move data to GPU, if available
    if(train_on_gpu):
        inp, target = inp.cuda(), target.cuda()
    
    # perform backpropagation and optimization
    hidden = tuple([each.data for each in hidden])

    # zero accumulated gradients
    rnn.zero_grad()

    # get the output from the model
    output, hidden = rnn(inp, hidden)

    # calculate the loss and perform backprop
    loss = criterion(output.squeeze(), target)
    loss.backward()
    
    # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
    nn.utils.clip_grad_norm_(rnn.parameters(), clip)
    optimizer.step()

    # return the loss over a batch and the hidden state produced by our model
    return loss.item(), hidden

# Note that these tests aren't completely extensive.
# they are here to act as general checks on the expected outputs of your functions
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_forward_back_prop(RNN, forward_back_prop, train_on_gpu)

Tests Passed


In [76]:
def train_rnn_get_loss(rnn, batch_size, optimizer, criterion, n_epochs, show_every_n_batches=100):

    rnn.train()

    print("Training for %d epoch(s)..." % n_epochs)
    for epoch_i in range(1, n_epochs + 1):
        
        # initialize hidden state
        hidden = rnn.init_hidden(batch_size)
        
        batch_losses = []
        
        for batch_i, (inputs, labels) in enumerate(train_loader, 1):
            
            # make sure you iterate over completely full batches, only
            n_batches = len(train_loader.dataset)//batch_size
            if(batch_i > n_batches):
                break
            
            # forward, back prop
            loss, hidden = forward_back_prop(rnn, optimizer, criterion, inputs, labels, hidden)          
            # record loss
            batch_losses.append(loss)
            # printing loss stats
            if batch_i % show_every_n_batches == 0:
                print('Epoch: {:>4}/{:<4}  Loss: {}'.format(
                    epoch_i, n_epochs, np.average(batch_losses)))

        print('Epoch: {:>4}/{:<4}  Loss: {}'.format(epoch_i, n_epochs, np.average(batch_losses)))

    # returns a trained rnn
    return rnn, np.average(batch_losses)

In [77]:
min_loss = 10000

# Sequence Length
sequence_length = 16 # of words in a sequence

# Batch Size
batch_size = 64

# data loader - do not change
train_loader = batch_data(int_text, sequence_length, batch_size)

# Training parameters
# Number of Epochs
num_epochs = 20
# Learning Rate
learning_rate = 0.001

# Model parameters
# Vocab size
vocab_size = len(vocab_to_int)
# Output size
output_size = vocab_size
# Embedding Dimension
embedding_dim = 128
# Hidden Dimension
hidden_dim = 800
# Number of RNN Layers
n_layers = 2

# Show stats for every n number of batches
show_every_n_batches = 1280

In [78]:
# create model and move to gpu if available
rnn = RNN(vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5)
if train_on_gpu:
    rnn.cuda()

# defining loss and optimization functions for training
optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

print('batch_size:{}, learning_rate:{}, embedding_dim:{}, hidden_dim:{}, n_layers:{}'.format(batch_size, learning_rate, embedding_dim, hidden_dim, n_layers))
# training the model
trained_rnn, average_loss = train_rnn_get_loss(rnn, batch_size, optimizer, criterion, num_epochs, show_every_n_batches)

if (average_loss < min_loss):
    min_loss = average_loss
    # saving the trained model
    helper.save_model('./save/trained_rnn_3', trained_rnn)
    print('Model Trained and Saved')

batch_size:64, learning_rate:0.001, embedding_dim:128, hidden_dim:800, n_layers:2
Training for 20 epoch(s)...
Epoch:    1/20    Loss: 5.100201232917607
Epoch:    1/20    Loss: 4.848501576017588
Epoch:    1/20    Loss: 4.686472039110958
Epoch:    1/20    Loss: 4.607305602775886
Epoch:    1/20    Loss: 4.565404795743525
Epoch:    1/20    Loss: 4.52090911657239
Epoch:    1/20    Loss: 4.467185576193567
Epoch:    1/20    Loss: 4.430700896296185
Epoch:    1/20    Loss: 4.415552356125166
Epoch:    1/20    Loss: 4.403643614938483
Epoch:    1/20    Loss: 4.3952877602852265
Epoch:    2/20    Loss: 3.958196663670242
Epoch:    2/20    Loss: 3.930439414130524
Epoch:    2/20    Loss: 3.889227632402132
Epoch:    2/20    Loss: 3.887143304827623
Epoch:    2/20    Loss: 3.9008681781403722
Epoch:    2/20    Loss: 3.895195374684408
Epoch:    2/20    Loss: 3.8755473463263894
Epoch:    2/20    Loss: 3.8655031170346774
Epoch:    2/20    Loss: 3.871496046386245
Epoch:    2/20    Loss: 3.8774154006596655
Epoc

Epoch:   18/20    Loss: 2.52813641126785
Epoch:   18/20    Loss: 2.5288748277164994
Epoch:   18/20    Loss: 2.5300788630421764
Epoch:   19/20    Loss: 2.5745859612710773
Epoch:   19/20    Loss: 2.5537395623046906
Epoch:   19/20    Loss: 2.5322199574671687
Epoch:   19/20    Loss: 2.5116593109909444
Epoch:   19/20    Loss: 2.5185046639107167
Epoch:   19/20    Loss: 2.512190491706133
Epoch:   19/20    Loss: 2.5009215847722124
Epoch:   19/20    Loss: 2.4968484578537753
Epoch:   19/20    Loss: 2.4972836788329813
Epoch:   19/20    Loss: 2.4973093916941433
Epoch:   19/20    Loss: 2.4992402447554625
Epoch:   20/20    Loss: 2.540177704859525
Epoch:   20/20    Loss: 2.5248253119643778
Epoch:   20/20    Loss: 2.5059250642545523
Epoch:   20/20    Loss: 2.484659798955545
Epoch:   20/20    Loss: 2.4919041772186756
Epoch:   20/20    Loss: 2.4847005383111536
Epoch:   20/20    Loss: 2.4748326434886883
Epoch:   20/20    Loss: 2.4699899445287885
Epoch:   20/20    Loss: 2.4702270295470954
Epoch:   20/20  

In [79]:
import torch
import helper
import problem_unittests as tests

_, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()
trained_rnn = helper.load_model('./save/trained_rnn_3')

In [81]:
import torch.nn.functional as F

def generate(rnn, prime_id, int_to_vocab, token_dict, pad_value, predict_len=100):
    """
    Generate text using the neural network
    :param decoder: The PyTorch Module that holds the trained neural network
    :param prime_id: The word id to start the first prediction
    :param int_to_vocab: Dict of word id keys to word values
    :param token_dict: Dict of puncuation tokens keys to puncuation values
    :param pad_value: The value used to pad a sequence
    :param predict_len: The length of text to generate
    :return: The generated text
    """
    rnn.eval()
    
    # create a sequence (batch_size=1) with the prime_id
    current_seq = np.full((1, sequence_length), pad_value)
    current_seq[-1][-1] = prime_id
    predicted = [int_to_vocab[prime_id]]
    
    for _ in range(predict_len):
        if train_on_gpu:
            current_seq = torch.LongTensor(current_seq).cuda()
        else:
            current_seq = torch.LongTensor(current_seq)
        
        # initialize the hidden state
        hidden = rnn.init_hidden(current_seq.size(0))
        
        # get the output of the rnn
        output, _ = rnn(current_seq, hidden)
        
        # get the next word probabilities
        p = F.softmax(output, dim=1).data
        if(train_on_gpu):
            p = p.cpu() # move to cpu
         
        # use top_k sampling to get the index of the next word
        top_k = 5
        p, top_i = p.topk(top_k)
        top_i = top_i.numpy().squeeze()
        
        # select the likely next word index with some element of randomness
        p = p.numpy().squeeze()
        word_i = np.random.choice(top_i, p=p/p.sum())
        
        # retrieve that word from the dictionary
        word = int_to_vocab[word_i]
        predicted.append(word)     
        
        if(train_on_gpu):
            current_seq = current_seq.cpu() # move to cpu
        # the generated word becomes the next "current sequence" and the cycle can continue
        if train_on_gpu:
            current_seq = current_seq.cpu()
        current_seq = np.roll(current_seq, -1, 1)
        current_seq[-1][-1] = word_i
    
    gen_sentences = ' '.join(predicted)
    
    # Replace punctuation tokens
    for key, token in token_dict.items():
        ending = ' ' if key in ['\n', '(', '"'] else ''
        gen_sentences = gen_sentences.replace(' ' + token.lower(), key)
    gen_sentences = gen_sentences.replace('\n ', '\n')
    gen_sentences = gen_sentences.replace('( ', '(')
    
    # return all the sentences
    return gen_sentences

In [86]:
# run the cell multiple times to get different results!
gen_length = 400 # modify the length to your preference
prime_word = 'jerry' # name for starting the script

pad_word = helper.SPECIAL_WORDS['PADDING']
generated_script = generate(trained_rnn, vocab_to_int[prime_word + ':'], int_to_vocab, token_dict, vocab_to_int[pad_word], gen_length)
print(generated_script)

jerry: trial.

george: medium turkey chili for you. speaking of money, i don't know if i could bring the video machines.

helen: you know, whoever else has changed here.

morty: i remember when they gave them to you.

george: you know, sometimes you weren't ready for coming.

boy: i told you you were supposed to be publishing to find me anywhere.

morty: i bet it was a good snack- you h behavior. i was going to be held accountable for the lipo. you know, this trial could never be different.

elaine: oh, yeah? well, thanks. okay, bye. we'll take care beef personally ready for you, cheese, massachusetts.

jerry: what happened to you?

kramer: oh, no, that's not good enough to be.

jerry: alright, we'll see ya.

jerry: yeah?

george: yeah.

jerry: boy, i am getting out of town, sir.

jerry: what?

kramer: well, we have our money for you.

elaine: what happened to pop?

jerry: you don't respect coming from that. how's anyone going to do?

doctor: i already bought that one.

george: aah! aa

In [87]:
# save script to a text file
f =  open("generated_script_3.txt","w")
f.write(generated_script)
f.close()

# ToDo in the future
* Try Bi-LSTM
* Try Optuna
* Try to see BLEU score
* Try to use 4th root of vocab size for embedded size
* Try to check the traning time
* Try to use TODOS comments
* Try to use appropreate parameter names
