# Building a Chatbot

In this project, we will build a chatbot using conversations from Cornell University's [Movie Dialogue Corpus](https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html). The main features of our model are LSTM cells, a bidirectional dynamic RNN, and decoders with attention. 

The conversations will be cleaned rather extensively to help the model to produce better responses. As part of the cleaning process, punctuation will be removed, rare words will be replaced with "UNK" (our "unknown" token), longer sentences will not be used, and all letters will be in the lowercase. 

With a larger amount of data, it would be more practical to keep features, such as punctuation. However, I am using FloydHub's GPU services and I don't want to get carried away with too training for too long.

In [1]:
import numpy as np
import tensorflow as tf
import time

from corpus import Corpus

tf.__version__

np.random.seed(1)
tf.set_random_seed(1)

Most of the code to load the data is courtesy of https://github.com/suriyadeepan/practical_seq2seq/blob/master/datasets/cornell_corpus/data.py.

## Load and Preprocess Data

In [2]:
cornell_corpus = Corpus("movie_lines.txt", "movie_conversations.txt", max_vocab=8100, max_line_length=30)

questions_text = cornell_corpus.prompts
answers_text = cornell_corpus.answers
questions_int = cornell_corpus.prompts_int
answers_int = cornell_corpus.answers_int

UNK = cornell_corpus.unk
vocab2int = cornell_corpus.vocab2int
int2vocab = cornell_corpus.int2vocab

METATOKEN_INDEX = len(vocab2int)
META = "<META>"
EOS = "<EOS>"
PAD = "<PAD>"
GO = "<GO>"
codes = [META, EOS, PAD, GO]
    

source_vocab_size = len(vocab2int)
dest_vocab_size = len(vocab2int)

vocab_dicts = (vocab2int, int2vocab)
(questions_vocab_to_int, questions_int_to_vocab) = vocab_dicts
(answers_vocab_to_int, answers_int_to_vocab) = vocab_dicts

In [5]:
for i in range(50):
    print(questions_text[i])
    print(answers_text[i])
    print()

['can', 'we', 'make', 'this', 'quick', '<UNK>', '<UNK>', 'and', 'andrew', 'barrett', 'are', 'having', 'an', 'incredibly', '<UNK>', 'public', 'break', 'up', 'on', 'the', '<UNK>', 'again']
['well', 'i', 'thought', 'we', 'would', 'start', 'with', '<UNK>', 'if', 'that', 'okay', 'with', 'you']

['well', 'i', 'thought', 'we', 'would', 'start', 'with', '<UNK>', 'if', 'that', 'okay', 'with', 'you']
['not', 'the', 'hacking', 'and', '<UNK>', 'and', '<UNK>', 'part', 'please']

['not', 'the', 'hacking', 'and', '<UNK>', 'and', '<UNK>', 'part', 'please']
['okay', 'then', 'how', 'about', 'we', 'try', 'out', 'some', 'french', '<UNK>', 'saturday', 'night']

['you', 'are', 'asking', 'me', 'out', 'that', 'so', 'cute', 'what', 'your', 'name', 'again']
['forget', 'it']

['no', 'no', 'it', 'my', 'fault', 'we', 'did', 'not', 'have', 'a', 'proper', 'introduction']
['cameron']

['cameron']
['the', 'thing', 'is', 'cameron', 'i', 'at', 'the', 'mercy', 'of', 'a', 'particularly', '<UNK>', 'breed', 'of', 'loser', '

In [6]:
#DEBUGGING AND DISPLAY PURPOSES ONLY
def int_to_text(sequence, int2vocab):
    return [int2vocab[index] for index in sequence if index != METATOKEN_INDEX]

def text_to_int(sequence, vocab2int):
    return [vocab2int.get(token, vocab2int[UNK]) for token in sequence if token not in codes]

## Word2Vec Embeddings

In [7]:
combined_corpus=[]
combined_corpus.extend(questions_text)
combined_corpus.extend(answers_text)

In [8]:
len(combined_corpus)

394916

In [9]:
combined_corpus

[['can',
  'we',
  'make',
  'this',
  'quick',
  '<UNK>',
  '<UNK>',
  'and',
  'andrew',
  'barrett',
  'are',
  'having',
  'an',
  'incredibly',
  '<UNK>',
  'public',
  'break',
  'up',
  'on',
  'the',
  '<UNK>',
  'again'],
 ['well',
  'i',
  'thought',
  'we',
  'would',
  'start',
  'with',
  '<UNK>',
  'if',
  'that',
  'okay',
  'with',
  'you'],
 ['not', 'the', 'hacking', 'and', '<UNK>', 'and', '<UNK>', 'part', 'please'],
 ['you',
  'are',
  'asking',
  'me',
  'out',
  'that',
  'so',
  'cute',
  'what',
  'your',
  'name',
  'again'],
 ['no',
  'no',
  'it',
  'my',
  'fault',
  'we',
  'did',
  'not',
  'have',
  'a',
  'proper',
  'introduction'],
 ['cameron'],
 ['the',
  'thing',
  'is',
  'cameron',
  'i',
  'at',
  'the',
  'mercy',
  'of',
  'a',
  'particularly',
  '<UNK>',
  'breed',
  'of',
  'loser',
  'my',
  'sister',
  'i',
  'ca',
  'not',
  'date',
  'until',
  'she',
  'does'],
 ['why'],
 ['<UNK>',
  'mystery',
  'she',
  'used',
  'to',
  'be',
  'really'

In [10]:
from gensim.models import Word2Vec
embedding_size = 1024
model = Word2Vec(sentences=combined_corpus, size=embedding_size, window=5, min_count=1, workers=4, sg=0)

In [11]:
model.wv['well']

array([ 0.5658267 ,  0.18310311, -0.2876634 , ...,  0.14245577,
        0.45896396, -0.28743207], dtype=float32)

In [12]:
wordVecs = model.wv

In [13]:
word_vecs = np.zeros((len(model.wv.vocab),1024))
for i,word in enumerate(model.wv.index2word):
        word_vecs[i] = model[word]
      

  This is separate from the ipykernel package so we can avoid doing imports until


In [14]:
print("Vocabulary lengths")
print(len(word_vecs))
print(len(questions_vocab_to_int))
print(len(answers_vocab_to_int))
print(len(questions_int_to_vocab))
print(len(answers_int_to_vocab))

Vocabulary lengths
8101
8101
8101
8101
8101


## Additional Preprocessing

In [15]:
#Add EOS tokens to target data now that the embeddings have been trained
for i in range(len(answers_int)):
    answers_text[i] += " " + EOS
    answers_int[i].append(METATOKEN_INDEX)
    
    #answers_int[i].append(answers_vocab_to_int[EOS])

In [16]:
# Sort questions and answers by the length of questions.
# This will reduce the amount of padding during training
# Which should speed up training and help to reduce the loss

max_source_line_length = max( [len(sentence) for sentence in questions_int])
max_targ_line_length = max([len(sentence) for sentence in answers_int])
max_line_length = max(max_source_line_length, max_targ_line_length)

sorted_questions = []
sorted_answers = []

for length in range(1, max_line_length+1):
    for index, sequence in enumerate(questions_int):
        if len(sequence) == length:
            sorted_questions.append(questions_int[index])
            sorted_answers.append(answers_int[index])

print(len(sorted_questions))
print(len(sorted_answers))
print()
indices = [0, 1, 2, len(sorted_questions) - 1]
for i in indices:
    print(int_to_text(sorted_questions[i], questions_int_to_vocab))
    print(int_to_text(sorted_answers[i], answers_int_to_vocab))
    print()

197194
197194

['cameron']
['the', 'thing', 'is', 'cameron', 'i', 'at', 'the', 'mercy', 'of', 'a', 'particularly', '<UNK>', 'breed', 'of', 'loser', 'my', 'sister', 'i', 'ca', 'not', 'date', 'until', 'she', 'does']

['why']
['<UNK>', 'mystery', 'she', 'used', 'to', 'be', 'really', 'popular', 'when', 'she', 'started', 'high', 'school', 'then', 'it', 'was', 'just', 'like', 'she', 'got', 'sick', 'of', 'it', 'or', 'something']

['there']
['where']

['yes', 'i', 'see', 'you', 'have', 'issued', 'each', 'of', 'them', 'with', 'a', 'martini', 'henry', '<UNK>', 'our', '<UNK>', 'for', 'native', '<UNK>', 'one', 'rifle', 'to', 'ten', 'men', 'and', 'only', 'five', 'rounds', 'per', 'rifle']
['but', 'will', 'they', 'make', 'good', 'use', 'of', 'them']



In [17]:
import numpy
np.save('word_Vecs.npy',word_vecs)

In [18]:
#FIXME: This really should be something like "preprocess_targets"
def process_decoding_input(target_data, batch_size):
    '''Remove the last word id from each batch and concat the <GO> to the begining of each batch'''
    ending = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1])
    dec_input = tf.concat( [tf.fill([batch_size, 1], METATOKEN_INDEX), ending], 1)
    return dec_input


In [19]:
def dropout_cell(rnn_size, keep_prob):
    lstm = tf.contrib.rnn.BasicLSTMCell(rnn_size)
    return tf.contrib.rnn.DropoutWrapper(lstm, input_keep_prob=keep_prob)

def multi_dropout_cell(rnn_size, keep_prob, num_layers):    
    return tf.contrib.rnn.MultiRNNCell( [dropout_cell(rnn_size, keep_prob) for _ in range(num_layers)] )

In [20]:
def encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob, sequence_lengths):
    """
    Create the encoding layer
    
    Returns a tuple `(outputs, output_states)` where
      outputs is a 2-tuple of vectors of dimensions [sequence_length, rnn_size] for the forward and backward passes
      output_states is a 2-tupe of the final hidden states of the forward and backward passes
    
    """
    forward_cell = multi_dropout_cell(rnn_size, keep_prob, num_layers)
    backward_cell = multi_dropout_cell(rnn_size, keep_prob, num_layers)
    outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_fw = forward_cell,
                                                   cell_bw = backward_cell,
                                                   sequence_length = sequence_lengths,
                                                   inputs = rnn_inputs,
                                                    dtype=tf.float32)
    return outputs, states

## Decoding

In [21]:
def decoding_layer(enc_state, enc_outputs, dec_embed_input, dec_embeddings, #Inputs
                        rnn_size, num_layers, output_layer, #Architecture
                        keep_prob, beam_width, #Hypeparameters
                        target_lengths, batch_size): 
   
    with tf.variable_scope("decoding", reuse=tf.AUTO_REUSE) as decoding_scope:
        dec_cell = multi_dropout_cell(rnn_size, keep_prob, num_layers)
        init_dec_state_size = batch_size
        #TRAINING
        train_attn = tf.contrib.seq2seq.BahdanauAttention(num_units=dec_cell.output_size, memory=enc_outputs)
        
        train_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell, train_attn,
                                                    attention_layer_size=dec_cell.output_size)
        
        
        helper = tf.contrib.seq2seq.TrainingHelper(dec_embed_input, target_lengths, time_major=False)
        train_decoder = tf.contrib.seq2seq.BasicDecoder(train_cell, helper,
                            train_cell.zero_state(init_dec_state_size, tf.float32)
                                                        .clone(cell_state=enc_state),
                            output_layer = output_layer)
        outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(train_decoder, impute_finished=True, scope=decoding_scope)
        logits = outputs.rnn_output

        #INFERENCE
        #Tile inputs
        enc_state = tf.contrib.seq2seq.tile_batch(enc_state, beam_width)
        enc_outputs = tf.contrib.seq2seq.tile_batch(enc_outputs, beam_width)
        init_dec_state_size *= beam_width
        
        infer_attn = tf.contrib.seq2seq.BahdanauAttention(num_units=dec_cell.output_size,
                                                          memory=enc_outputs)#,
                                                          #dtype=tf.float64)
        infer_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell, infer_attn,
                                                    attention_layer_size=dec_cell.output_size)
        
        
        start_tokens = tf.tile( [METATOKEN_INDEX], [batch_size]) #Not by batch_size*beam_width, strangely
        end_token = METATOKEN_INDEX
        
        decoder = tf.contrib.seq2seq.BeamSearchDecoder(cell = infer_cell,
            embedding = dec_embeddings,
            start_tokens = start_tokens, 
            end_token = end_token,
            beam_width = beam_width,
            initial_state = infer_cell.zero_state(init_dec_state_size, tf.float32).clone(cell_state=enc_state),
            output_layer = output_layer
        )  
        final_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder, scope=decoding_scope)
        
        ids = final_decoder_output.predicted_ids
        beams = ids
                
    return logits, beams

In [22]:
def seq2seq_model(wordVecs,input_data, target_data, keep_prob, batch_size,
                  source_lengths, target_sequence_lengths,
                  vocab_size, enc_embedding_size, dec_embedding_size,
                  rnn_size, num_layers, beam_width):
    

    W = tf.Variable(wordVecs,trainable=False,name="W")
    enc_embed_input = tf.nn.embedding_lookup(W, input_data)
    enc_outputs, enc_states = encoding_layer(enc_embed_input, rnn_size, num_layers, keep_prob, source_lengths)    
    concatenated_enc_output = tf.concat(enc_outputs, -1)
    init_dec_state = enc_states[0]    
    
    
    dec_input = process_decoding_input(target_data, batch_size)
    dec_embeddings = W 
    dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)
    
    output_layer = tf.layers.Dense(vocab_size,bias_initializer=tf.zeros_initializer(),activation=tf.nn.relu)
    logits, beams = decoding_layer(init_dec_state,
                            concatenated_enc_output,
                            dec_embed_input,
                            dec_embeddings,
                            rnn_size, 
                            num_layers,
                            output_layer,
                            keep_prob,
                            beam_width,
                            target_sequence_lengths, 
                            batch_size
                            )
    
    
    return logits, beams

In [23]:
#Network Architecture
rnn_size = 128
num_layers = 2
encoding_embedding_size = embedding_size
decoding_embedding_size = embedding_size

#Training
epochs = 100
train_batch_size = 128
learning_rate = 0.001

learning_rate_decay = 0.9
min_learning_rate = 0.00001

keep_probability = 0.75
vocab_size = len(answers_vocab_to_int)
#Decoding
beam_width = 10

#Validation
valid_batch_size = 128

wordVecs = np.load('word_Vecs.npy').astype(np.float32)
metatoken_embedding = np.zeros((1, embedding_size), dtype=wordVecs.dtype)
wordVecsWithMeta = np.concatenate( (wordVecs, metatoken_embedding), axis=0 )
vocab_size_with_meta = wordVecsWithMeta.shape[0]

print("vocab_size_with_meta =", vocab_size_with_meta)
print("METATOKEN_INDEX =", METATOKEN_INDEX)
print("wordVecsWithMeta.shape =", wordVecsWithMeta.shape)
print("wordVecsWithMeta[METATOKEN_INDEX] =", wordVecsWithMeta[METATOKEN_INDEX])

#print(wordVecsWithMeta.dtype)

vocab_size_with_meta = 8102
METATOKEN_INDEX = 8101
wordVecsWithMeta.shape = (8102, 1024)
wordVecsWithMeta[METATOKEN_INDEX] = [0. 0. 0. ... 0. 0. 0.]


In [24]:
# Reset the graph to ensure that it is ready for training
tf.reset_default_graph()


#                                      batch_size, time
input_data = tf.placeholder(tf.int32, [None,       None], name='input')
targets = tf.placeholder(tf.int32,    [None,       None], name='targets')
lr = tf.placeholder(tf.float32, name='learning_rate')
keep_prob = tf.placeholder(tf.float32, name='keep_prob')

#                                          batch_size
source_lengths = tf.placeholder(tf.int32, [None], name="source_lengths")
target_lengths = tf.placeholder(tf.int32, [None], name="target_lengths")
batch_size = tf.shape(input_data)[0]


# Create the training and inference logits
train_logits, beams = \
seq2seq_model(wordVecsWithMeta,input_data, targets, keep_prob, batch_size,
              source_lengths, target_lengths, 
    vocab_size_with_meta, encoding_embedding_size, decoding_embedding_size, rnn_size, num_layers, beam_width)

# Find the shape of the input data for sequence_loss
with tf.name_scope("optimization"):

#     huber_loss = tf.losses.huber_loss(
#                    train_logits,
#                    tf.one_hot(targets,vocab_size_with_meta,axis=-1),
#                    delta=1.0,
#                    scope=None,
#                    loss_collection=tf.GraphKeys.LOSSES,
#                    reduction=tf.losses.Reduction.SUM_BY_NONZERO_WEIGHTS)

   # cost=tf.reduce_mean(tf.nn.l2_loss(train_logits - tf.one_hot(targets,vocab_size,axis=-1)))
    #cost = tf.reduce_mean(tf.square(tf.subtract(train_logits,targets)))
    #cost = losses * tf.ones([batch_size, max_sequence_length_batch])
    
    eval_mask = tf.sequence_mask(target_lengths, dtype=tf.float32)
    xent_loss = tf.contrib.seq2seq.sequence_loss(train_logits, targets, eval_mask)
    
    cost = xent_loss

    optimizer = tf.train.AdamOptimizer(lr)
    gradients = optimizer.compute_gradients(cost)
    capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
    train_op = optimizer.apply_gradients(capped_gradients)


Instructions for updating:
Use the retry module or similar alternatives.


In [25]:
def pad_sentence_batch(sentence_batch, vocab_to_int):
    """Pad sentences with <PAD> so that each sentence of a batch has the same length"""
    pad_int = METATOKEN_INDEX
    max_sentence_length = max([len(sentence) for sentence in sentence_batch])
    return [sentence + [pad_int] * (max_sentence_length - len(sentence)) for sentence in sentence_batch]

In [26]:
def batch_data(questions, answers, batch_size):
    """Batch questions and answers together"""
    for batch_i in range(0, len(questions)//batch_size):
        start_i = batch_i * batch_size
        questions_batch = questions[start_i:start_i + batch_size]
        answers_batch = answers[start_i:start_i + batch_size]
        
        source_lengths = np.array( [len(sentence) for sentence in questions_batch] )
        target_lengths = np.array( [len(sentence) for sentence in answers_batch])
        
        pad_questions_batch = np.array(pad_sentence_batch(questions_batch, questions_vocab_to_int))
        pad_answers_batch = np.array(pad_sentence_batch(answers_batch, answers_vocab_to_int))
        yield source_lengths, target_lengths, pad_questions_batch, pad_answers_batch

In [27]:
def parallel_shuffle(source_sequences, target_sequences):
    if len(source_sequences) != len(target_sequences):
        raise ValueError("Cannot shuffle parallel sets with different numbers of sequences")
    indices = np.random.permutation(len(source_sequences))
    shuffled_source = [source_sequences[indices[i]] for i in range(len(indices))]
    shuffled_target = [target_sequences[indices[i]] for i in range(len(indices))]
    
    return (shuffled_source, shuffled_target)

In [28]:
def check_response(session, question_int, answer_int=None, best_only=True):
    """
    session - the TensorFlow session
    question_int - a list of integers
    answer - the actual, correct response (if available)
    """
    
    two_d_question_int = [question_int]
    q_lengths = [len(question_int)]
    
    pad_q = METATOKEN_INDEX
    print("Sample Question")
    print("  Word Ids:    {}".format([i for i in question_int if i != pad_q]))
    print("  Input Words: {}".format(int_to_text(question_int, questions_int_to_vocab)))
        
    pad_a = METATOKEN_INDEX
    if answer_int:
        print("Actual Answer")
        print("  Word Ids:    {}".format([i for i in answer_int if i != pad_a]))
        print("  Input Words: {}".format(int_to_text(answer_int, answers_int_to_vocab)))
    
    [beam_output] = session.run([beams], feed_dict = {input_data: np.array(two_d_question_int, dtype=np.float32),
                                                      source_lengths: q_lengths,
                                                      keep_prob: 1})
    
    limit = 1 if best_only else beam_width
    beam_output = beam_output[0] #We only have one sample
    for i in range(limit):
        beam = beam_output[:, i]
        print("\nBeam Answer", i)
        print('  Word Ids:       {}'.format([i for i in beam if i != pad_a]))
        print('  Response Words: {}'.format(int_to_text(beam, answers_int_to_vocab)))

In [29]:
# Validate the training with 10% of the data
train_valid_split = int(len(sorted_questions)*0.15)

# Split the questions and answers into training and validating data
train_questions = sorted_questions[train_valid_split:]
train_answers = sorted_answers[train_valid_split:]

valid_questions = sorted_questions[:train_valid_split]
valid_answers = sorted_answers[:train_valid_split]

print(len(train_questions))
print(len(valid_questions))

167615
29579


In [None]:
#TRAINING
display_step = 100 # Check training loss after every 100 batches
total_train_loss = 0 # Record the training loss for each display step

#VALIDATION
stop_early = 0 
stop = 5 # If the validation loss does decrease in 5 consecutive checks, stop training
validation_check = ((len(train_questions))//train_batch_size//2)-1 #Check validation loss every half-epoch
summary_valid_loss = [] # Record the validation loss for saving improvements in the model

#Minimum number of epochs before we start checking sample output with beam search
min_epochs_before_sampling = 2 


checkpoint = "./checkpoints/best_model.ckpt" 

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(1, epochs+1):
        if epoch_i > min_epochs_before_sampling:
                sample_exchange = np.random.choice(len(valid_questions))
                sample_question = valid_questions[sample_exchange]
                sample_answer = valid_answers[sample_exchange]
                check_response(sess, sample_question, sample_answer)
        
        print("Shuffling training data . . .")
        (train_questions, train_answers) = parallel_shuffle(train_questions, train_answers)
        
        
        for batch_i, (q_lengths, a_lengths, questions_batch, answers_batch) in enumerate(
                batch_data(train_questions, train_answers, train_batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: questions_batch,
                 targets: answers_batch,
                 source_lengths: q_lengths,
                 target_lengths: a_lengths,
                 lr: learning_rate,
                 keep_prob: keep_probability})

            total_train_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time
            
            total_train_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>9.6f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(train_questions) // train_batch_size, 
                              total_train_loss / display_step, 
                              batch_time*display_step),
                         flush=True)
                total_train_loss = 0

            if batch_i % validation_check == 0 and batch_i > 0:
                print("Shuffling validation data . . .")
                (valid_questions, valid_answers) = parallel_shuffle(valid_questions, valid_answers)
                total_valid_loss = 0
                start_time = time.time()
                for batch_ii, (q_lengths, a_lengths, questions_batch, answers_batch) in \
                        enumerate(batch_data(valid_questions, valid_answers, valid_batch_size)):
                    valid_loss = sess.run(
                    cost, {input_data: questions_batch,
                           targets: answers_batch,
                           source_lengths: q_lengths,
                           target_lengths: a_lengths,
                           keep_prob: 1})
                    total_valid_loss += valid_loss
                end_time = time.time()
                batch_time = end_time - start_time
                avg_valid_loss = total_valid_loss / (len(valid_questions) / valid_batch_size)
                print('Valid Loss: {:>9.6f}, Seconds: {:>5.2f}'.format(avg_valid_loss, batch_time), flush=True)
                


                # Reduce learning rate, but not below its minimum value
                learning_rate *= learning_rate_decay
                if learning_rate < min_learning_rate:
                    learning_rate = min_learning_rate

                summary_valid_loss.append(avg_valid_loss)
                if avg_valid_loss <= min(summary_valid_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)
                else:
                    print("No Improvement.")
                    stop_early += 1
                    if stop_early == stop:
                        break
                        
    
        if stop_early == stop:
            print("Stopping Training.")
            break


Shuffling training data . . .
Epoch   1/100 Batch    0/1309 - Loss:  0.179993, Seconds: 112.63
Epoch   1/100 Batch  100/1309 - Loss: 12.785852, Seconds: 18.42
Epoch   1/100 Batch  200/1309 - Loss: 12.063487, Seconds: 18.44
Epoch   1/100 Batch  300/1309 - Loss: 11.892944, Seconds: 19.99
Epoch   1/100 Batch  400/1309 - Loss: 11.633031, Seconds: 18.78
Epoch   1/100 Batch  500/1309 - Loss: 11.487310, Seconds: 17.87
Epoch   1/100 Batch  600/1309 - Loss: 11.412990, Seconds: 20.29
Shuffling validation data . . .
Valid Loss:  5.745995, Seconds: 14.47
New Record!
Epoch   1/100 Batch  700/1309 - Loss: 11.309823, Seconds: 19.97
Epoch   1/100 Batch  800/1309 - Loss: 11.275793, Seconds: 16.41
Epoch   1/100 Batch  900/1309 - Loss: 11.229212, Seconds: 17.15
Epoch   1/100 Batch 1000/1309 - Loss: 11.186867, Seconds: 19.35
Epoch   1/100 Batch 1100/1309 - Loss: 11.116487, Seconds: 22.04
Epoch   1/100 Batch 1200/1309 - Loss: 11.108699, Seconds: 20.04
Epoch   1/100 Batch 1300/1309 - Loss: 11.081388, Second

In [30]:
def question_to_seq(question, vocab_to_int, int_to_vocab):
    '''Prepare the question for the model'''
    cleaned_question = Corpus.clean_sequence(question)
    return [vocab_to_int.get(word, vocab_to_int[UNK]) for word in cleaned_question]


In [56]:
# Use a question from the data as your input
random = np.random.choice(len(sorted_questions))
question_int = sorted_questions[random]
answer_int = sorted_answers[random]

saver = tf.train.Saver()
with tf.Session() as sess:
    # Run the model with the input question
    saver.restore(sess, checkpoint)
    check_response(sess, question_int, answer_int, best_only=False)


INFO:tensorflow:Restoring parameters from ./checkpoints/best_model.ckpt
Sample Question
  Word Ids:    [58, 2032, 1, 50, 0, 10, 113, 249, 0, 13, 41, 162, 177, 193, 8100, 480, 2, 8100, 110, 1102, 74, 1178, 660]
  Input Words: ['yes', 'vada', 'i', 'think', 'you', 'are', 'very', 'pretty', 'you', 'have', 'got', 'these', 'great', 'big', '<UNK>', 'eyes', 'the', '<UNK>', 'little', 'nose', 'an', 'amazing', 'mouth']
Actual Answer
  Word Ids:    [2, 548, 56, 310, 7, 3, 50, 1, 123]
  Input Words: ['the', 'boys', 'at', 'school', 'do', 'not', 'think', 'i', 'am']

Beam Answer 0
  Word Ids:       [9]
  Response Words: ['what']

Beam Answer 1
  Word Ids:       [21]
  Response Words: ['no']

Beam Answer 2
  Word Ids:       [1, 7, 3, 20]
  Response Words: ['i', 'do', 'not', 'know']

Beam Answer 3
  Word Ids:       [9, 10, 0]
  Response Words: ['what', 'are', 'you']

Beam Answer 4
  Word Ids:       [1, 7, 3, 20, 9]
  Response Words: ['i', 'do', 'not', 'know', 'what']

Beam Answer 5
  Word Ids:       [1, 