# 0. Introduction

This is a simple abstractive text summarization model that is based on the sequence-to-sequence with attention model. The original code is based on Pan and Liu's model from ["Sequence-to-Sequence with Attention Model for Text Summarization"](https://github.com/weichengzhang/Summarization), and simplified by [Currie](https://github.com/Currie32/Text-Summarization-with-Amazon-Reviews/blob/master/summarize_reviews.ipynb). The model was trained on [amazon's product reviews](https://www.kaggle.com/snap/amazon-fine-food-reviews). The original dataset is too large and will take a few days to train the model on a local environment only using CPU, so the training set only used a part of the dataset. 

For comparison, we will use the same text and summarize using a library called [sumy](https://github.com/miso-belica/sumy), which has multiple automatic summarization methods. We will use a TextRank method to automatically summarize text by extracting sentences that are deemed important.

# 1. Importing and cleaning data

In [1]:
# libraries
import pandas as pd
import numpy as np
import tensorflow as tf
import re
from nltk.corpus import stopwords
import time
from tensorflow.python.layers.core import Dense
from tensorflow.python.ops.rnn_cell_impl import _zero_state_tensors

In [2]:
df = pd.read_csv("Reviews.csv")

In [3]:
print(df.shape)
df.head(2)

(568454, 10)


Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
0,1,B001E4KFG0,A3SGXH7AUHU8GW,delmartian,1,1,5,1303862400,Good Quality Dog Food,I have bought several of the Vitality canned d...
1,2,B00813GRG4,A1D87F6ZCVE5NK,dll pa,0,0,1,1346976000,Not as Advertised,Product arrived labeled as Jumbo Salted Peanut...


In [4]:
# null values
df.isnull().sum()

Id                         0
ProductId                  0
UserId                     0
ProfileName               16
HelpfulnessNumerator       0
HelpfulnessDenominator     0
Score                      0
Time                       0
Summary                   26
Text                       0
dtype: int64

In [5]:
# remove null values and unneeded featured
df = df.dropna()
df = df[['Summary','Text']]

In [6]:
# view some reviews
for i in range(2):
    print('Review {}:'.format(i+1))
    print(df.loc[i,'Summary'])
    print(df.loc[i,'Text'])
    print()

Review 1:
Good Quality Dog Food
I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most.

Review 2:
Not as Advertised
Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as "Jumbo".



# 2. Preparing data for training

In [10]:
# A list of contractions from http://stackoverflow.com/questions/19790188/expanding-english-language-contractions-in-python
contractions = { 
"ain't": "am not",
"aren't": "are not",
"can't": "cannot",
"can't've": "cannot have",
"'cause": "because",
"could've": "could have",
"couldn't": "could not",
"couldn't've": "could not have",
"didn't": "did not",
"doesn't": "does not",
"don't": "do not",
"hadn't": "had not",
"hadn't've": "had not have",
"hasn't": "has not",
"haven't": "have not",
"he'd": "he would",
"he'd've": "he would have",
"he'll": "he will",
"he's": "he is",
"how'd": "how did",
"how'll": "how will",
"how's": "how is",
"i'd": "i would",
"i'll": "i will",
"i'm": "i am",
"i've": "i have",
"isn't": "is not",
"it'd": "it would",
"it'll": "it will",
"it's": "it is",
"let's": "let us",
"ma'am": "madam",
"mayn't": "may not",
"might've": "might have",
"mightn't": "might not",
"must've": "must have",
"mustn't": "must not",
"needn't": "need not",
"oughtn't": "ought not",
"shan't": "shall not",
"sha'n't": "shall not",
"she'd": "she would",
"she'll": "she will",
"she's": "she is",
"should've": "should have",
"shouldn't": "should not",
"that'd": "that would",
"that's": "that is",
"there'd": "there had",
"there's": "there is",
"they'd": "they would",
"they'll": "they will",
"they're": "they are",
"they've": "they have",
"wasn't": "was not",
"we'd": "we would",
"we'll": "we will",
"we're": "we are",
"we've": "we have",
"weren't": "were not",
"what'll": "what will",
"what're": "what are",
"what's": "what is",
"what've": "what have",
"where'd": "where did",
"where's": "where is",
"who'll": "who will",
"who's": "who is",
"won't": "will not",
"wouldn't": "would not",
"you'd": "you would",
"you'll": "you will",
"you're": "you are"
}

In [11]:
# create function that cleans text
def clean_text(text, remove_stopwords = True):
    '''Remove unwanted characters, stopwords, and format the text to create fewer nulls word embeddings'''
    
    # Convert words to lower case
    text = text.lower()
    
    # Replace contractions with their longer forms 
    if True:
        text = text.split()
        new_text = []
        for word in text:
            if word in contractions:
                new_text.append(contractions[word])
            else:
                new_text.append(word)
        text = " ".join(new_text)
    
    # Format words and remove unwanted characters
    text = re.sub(r'https?:\/\/.*[\r\n]*', '', text, flags=re.MULTILINE)
    text = re.sub(r'\<a href', ' ', text)
    text = re.sub(r'&amp;', '', text) 
    text = re.sub(r'[_"\-;%()|+&=*%.,!?:#$@\[\]/]', ' ', text)
    text = re.sub(r'<br />', ' ', text)
    text = re.sub(r'\'', ' ', text)
    
    # Optionally, remove stop words
    if remove_stopwords:
        text = text.split()
        stops = set(stopwords.words("english"))
        text = [w for w in text if not w in stops]
        text = " ".join(text)

    return text

In [12]:
# remove stopwords from texts for training purposes,
# but keep them in summary to result in more coherent summary

# Clean the summaries and texts
clean_summaries = []
for summary in df.Summary:
    clean_summaries.append(clean_text(summary, remove_stopwords=False))
print("Summaries are complete.")

clean_texts = []
for text in df.Text:
    clean_texts.append(clean_text(text))
print("Texts are complete.")

Summaries are complete.
Texts are complete.


In [13]:
# Find the number of times each word was used and the size of the vocabulary
def count_words(count_dict, text):
    '''Count the number of occurrences of each word in a set of text'''
    for sentence in text:
        for word in sentence.split():
            if word not in count_dict:
                count_dict[word] = 1
            else:
                count_dict[word] += 1

word_counts = {}

count_words(word_counts, clean_summaries)
count_words(word_counts, clean_texts)
            
print("Size of Vocabulary:", len(word_counts))

Size of Vocabulary: 132884


In [14]:
# Load Conceptnet Numberbatch's (CN) embeddings, similar to GloVe, but probably better 
# (https://github.com/commonsense/conceptnet-numberbatch)
embeddings_index = {}
with open('numberbatch-en-17.06.txt', encoding='utf-8') as f:
    for line in f:
        values = line.split(' ')
        word = values[0]
        embedding = np.asarray(values[1:], dtype='float32')
        embeddings_index[word] = embedding

print('Word embeddings:', len(embeddings_index))

FileNotFoundError: [Errno 2] No such file or directory: 'numberbatch-en-17.06.txt'

In [21]:
# Find the number of words that are missing from CN, and are used more than our threshold.
missing_words = 0
threshold = 20

for word, count in word_counts.items():
    if count > threshold:
        if word not in embeddings_index:
            missing_words += 1
            
missing_ratio = round(missing_words/len(word_counts),4)*100
            
print("Number of words missing from CN:", missing_words)
print("Percent of words that are missing from vocabulary: {}%".format(missing_ratio))

Number of words missing from CN: 3870
Percent of words that are missing from vocabulary: 2.91%


In [24]:
# Limit the vocab that we will use to words that appear ≥ threshold or are in GloVe

#dictionary to convert words to integers
vocab_to_int = {} 

value = 0
for word, count in word_counts.items():
    if count >= threshold or word in embeddings_index:
        vocab_to_int[word] = value
        value += 1

# Special tokens that will be added to our vocab
codes = ["<UNK>","<PAD>","<EOS>","<GO>"]   

# Add codes to vocab
for code in codes:
    vocab_to_int[code] = len(vocab_to_int)

# Dictionary to convert integers to words
int_to_vocab = {}
for word, value in vocab_to_int.items():
    int_to_vocab[value] = word

usage_ratio = round(len(vocab_to_int) / len(word_counts),4)*100

print("Total number of unique words:", len(word_counts))
print("Number of words we will use:", len(vocab_to_int))
print("Percent of words we will use: {}%".format(usage_ratio))

Total number of unique words: 132884
Number of words we will use: 59595
Percent of words we will use: 44.85%


In [25]:
# Need to use 300 for embedding dimensions to match CN's vectors.
embedding_dim = 300
nb_words = len(vocab_to_int)

# Create matrix with default values of zero
word_embedding_matrix = np.zeros((nb_words, embedding_dim), dtype=np.float32)
for word, i in vocab_to_int.items():
    if word in embeddings_index:
        word_embedding_matrix[i] = embeddings_index[word]
    else:
        # If word not in CN, create a random embedding for it
        new_embedding = np.array(np.random.uniform(-1.0, 1.0, embedding_dim))
        embeddings_index[word] = new_embedding
        word_embedding_matrix[i] = new_embedding

# Check if value matches len(vocab_to_int)
print(len(word_embedding_matrix))

59595


In [26]:
def convert_to_ints(text, word_count, unk_count, eos=False):
    '''Convert words in text to an integer.
       If word is not in vocab_to_int, use UNK's integer.
       Total the number of words and UNKs.
       Add EOS token to the end of texts'''
    ints = []
    for sentence in text:
        sentence_ints = []
        for word in sentence.split():
            word_count += 1
            if word in vocab_to_int:
                sentence_ints.append(vocab_to_int[word])
            else:
                sentence_ints.append(vocab_to_int["<UNK>"])
                unk_count += 1
        if eos:
            sentence_ints.append(vocab_to_int["<EOS>"])
        ints.append(sentence_ints)
    return ints, word_count, unk_count

In [27]:
# Apply convert_to_ints to clean_summaries and clean_texts
word_count = 0
unk_count = 0

int_summaries, word_count, unk_count = convert_to_ints(clean_summaries, word_count, unk_count)
int_texts, word_count, unk_count = convert_to_ints(clean_texts, word_count, unk_count, eos=True)

unk_percent = round(unk_count/word_count,4)*100

print("Total number of words in headlines:", word_count)
print("Total number of UNKs in headlines:", unk_count)
print("Percent of words that are UNK: {}%".format(unk_percent))

Total number of words in headlines: 25679933
Total number of UNKs in headlines: 192245
Percent of words that are UNK: 0.75%


In [28]:
def create_lengths(text):
    '''Create a data frame of the sentence lengths from a text'''
    lengths = []
    for sentence in text:
        lengths.append(len(sentence))
    return pd.DataFrame(lengths, columns=['counts'])

lengths_summaries = create_lengths(int_summaries)
lengths_texts = create_lengths(int_texts)

print("Summaries:")
print(lengths_summaries.describe())
print()
print("Texts:")
print(lengths_texts.describe())

Summaries:
              counts
count  568411.000000
mean        4.181624
std         2.657872
min         0.000000
25%         2.000000
50%         4.000000
75%         5.000000
max        48.000000

Texts:
              counts
count  568411.000000
mean       41.996835
std        42.520873
min         1.000000
25%        18.000000
50%        29.000000
75%        50.000000
max      2085.000000


In [29]:
def unk_counter(sentence):
    '''Counts the number of time UNK appears in a sentence.'''
    unk_count = 0
    for word in sentence:
        if word == vocab_to_int["<UNK>"]:
            unk_count += 1
    return unk_count

# Sort the summaries and texts by the length of the texts, shortest to longest
# Limit the length of summaries and texts based on the min and max ranges.
# Remove reviews that include too many UNKs

sorted_summaries = []
sorted_texts = []
max_text_length = 84
max_summary_length = 13
min_length = 2
unk_text_limit = 1
unk_summary_limit = 0

for length in range(min(lengths_texts.counts), max_text_length): 
    for count, words in enumerate(int_summaries):
        if (len(int_summaries[count]) >= min_length and
            len(int_summaries[count]) <= max_summary_length and
            len(int_texts[count]) >= min_length and
            unk_counter(int_summaries[count]) <= unk_summary_limit and
            unk_counter(int_texts[count]) <= unk_text_limit and
            length == len(int_texts[count])
           ):
            sorted_summaries.append(int_summaries[count])
            sorted_texts.append(int_texts[count])
        
# Compare lengths to ensure they match
print(len(sorted_summaries))
print(len(sorted_texts))

425615
425615


# 3. Building seq2seq with attention model

In [30]:
def model_inputs():
    '''Create palceholders for inputs to the model'''
    
    input_data = tf.placeholder(tf.int32, [None, None], name='input')
    targets = tf.placeholder(tf.int32, [None, None], name='targets')
    lr = tf.placeholder(tf.float32, name='learning_rate')
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    summary_length = tf.placeholder(tf.int32, (None,), name='summary_length')
    max_summary_length = tf.reduce_max(summary_length, name='max_dec_len')
    text_length = tf.placeholder(tf.int32, (None,), name='text_length')

    return input_data, targets, lr, keep_prob, summary_length, max_summary_length, text_length

In [31]:
def process_encoding_input(target_data, vocab_to_int, batch_size):
    '''Remove the last word id from each batch and concat the <GO> to the begining of each batch'''
    
    ending = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1])
    dec_input = tf.concat([tf.fill([batch_size, 1], vocab_to_int['<GO>']), ending], 1)

    return dec_input

In [32]:
def encoding_layer(rnn_size, sequence_length, num_layers, rnn_inputs, keep_prob):
    '''Create the encoding layer'''
    
    for layer in range(num_layers):
        with tf.variable_scope('encoder_{}'.format(layer)):
            cell_fw = tf.contrib.rnn.LSTMCell(rnn_size,
                                              initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
            cell_fw = tf.contrib.rnn.DropoutWrapper(cell_fw, 
                                                    input_keep_prob = keep_prob)

            cell_bw = tf.contrib.rnn.LSTMCell(rnn_size,
                                              initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
            cell_bw = tf.contrib.rnn.DropoutWrapper(cell_bw, 
                                                    input_keep_prob = keep_prob)

            enc_output, enc_state = tf.nn.bidirectional_dynamic_rnn(cell_fw, 
                                                                    cell_bw, 
                                                                    rnn_inputs,
                                                                    sequence_length,
                                                                    dtype=tf.float32)
    # Join outputs since we are using a bidirectional RNN
    enc_output = tf.concat(enc_output,2)
    
    return enc_output, enc_state

In [33]:
def training_decoding_layer(dec_embed_input, summary_length, dec_cell, initial_state, output_layer, 
                            vocab_size, max_summary_length):
    '''Create the training logits'''
    
    training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=dec_embed_input,
                                                        sequence_length=summary_length,
                                                        time_major=False)

    training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,
                                                       training_helper,
                                                       initial_state,
                                                       output_layer) 

    training_logits, _ = tf.contrib.seq2seq.dynamic_decode(training_decoder,
                                                           output_time_major=False,
                                                           impute_finished=True,
                                                           maximum_iterations=max_summary_length)
    return training_logits


In [34]:
def inference_decoding_layer(embeddings, start_token, end_token, dec_cell, initial_state, output_layer,
                             max_summary_length, batch_size):
    '''Create the inference logits'''
    
    start_tokens = tf.tile(tf.constant([start_token], dtype=tf.int32), [batch_size], name='start_tokens')
    
    inference_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embeddings,
                                                                start_tokens,
                                                                end_token)
                
    inference_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,
                                                        inference_helper,
                                                        initial_state,
                                                        output_layer)
                
    inference_logits, _ = tf.contrib.seq2seq.dynamic_decode(inference_decoder,
                                                            output_time_major=False,
                                                            impute_finished=True,
                                                            maximum_iterations=max_summary_length)
    
    return inference_logits

In [35]:
def decoding_layer(dec_embed_input, embeddings, enc_output, enc_state, vocab_size, text_length, summary_length, 
                   max_summary_length, rnn_size, vocab_to_int, keep_prob, batch_size, num_layers):
    '''Create the decoding cell and attention for the training and inference decoding layers'''
    
    for layer in range(num_layers):
        with tf.variable_scope('decoder_{}'.format(layer)):
            lstm = tf.contrib.rnn.LSTMCell(rnn_size,
                                           initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
            dec_cell = tf.contrib.rnn.DropoutWrapper(lstm, 
                                                     input_keep_prob = keep_prob)
    
    output_layer = Dense(vocab_size,
                         kernel_initializer = tf.truncated_normal_initializer(mean = 0.0, stddev=0.1))
    
    attn_mech = tf.contrib.seq2seq.BahdanauAttention(rnn_size,
                                                  enc_output,
                                                  text_length,
                                                  normalize=False,
                                                  name='BahdanauAttention')

    dec_cell = tf.contrib.seq2seq.DynamicAttentionWrapper(dec_cell,
                                                          attn_mech,
                                                          rnn_size)
            
    initial_state = tf.contrib.seq2seq.DynamicAttentionWrapperState(enc_state[0],
                                                                    _zero_state_tensors(rnn_size, 
                                                                                        batch_size, 
                                                                                        tf.float32)) 
    with tf.variable_scope("decode"):
        training_logits = training_decoding_layer(dec_embed_input, 
                                                  summary_length, 
                                                  dec_cell, 
                                                  initial_state,
                                                  output_layer,
                                                  vocab_size, 
                                                  max_summary_length)
    with tf.variable_scope("decode", reuse=True):
        inference_logits = inference_decoding_layer(embeddings,  
                                                    vocab_to_int['<GO>'], 
                                                    vocab_to_int['<EOS>'],
                                                    dec_cell, 
                                                    initial_state, 
                                                    output_layer,
                                                    max_summary_length,
                                                    batch_size)

    return training_logits, inference_logits

In [36]:
def seq2seq_model(input_data, target_data, keep_prob, text_length, summary_length, max_summary_length, 
                  vocab_size, rnn_size, num_layers, vocab_to_int, batch_size):
    '''Use the previous functions to create the training and inference logits'''
    
    # Use Numberbatch's embeddings and the newly created ones as our embeddings
    embeddings = word_embedding_matrix
    
    enc_embed_input = tf.nn.embedding_lookup(embeddings, input_data)
    enc_output, enc_state = encoding_layer(rnn_size, text_length, num_layers, enc_embed_input, keep_prob)
    
    dec_input = process_encoding_input(target_data, vocab_to_int, batch_size)
    dec_embed_input = tf.nn.embedding_lookup(embeddings, dec_input)
    
    training_logits, inference_logits  = decoding_layer(dec_embed_input, 
                                                        embeddings,
                                                        enc_output,
                                                        enc_state, 
                                                        vocab_size, 
                                                        text_length, 
                                                        summary_length, 
                                                        max_summary_length,
                                                        rnn_size, 
                                                        vocab_to_int, 
                                                        keep_prob, 
                                                        batch_size,
                                                        num_layers)
    
    return training_logits, inference_logits

In [37]:
def pad_sentence_batch(sentence_batch):
    """Pad sentences with <PAD> so that each sentence of a batch has the same length"""
    max_sentence = max([len(sentence) for sentence in sentence_batch])
    return [sentence + [vocab_to_int['<PAD>']] * (max_sentence - len(sentence)) for sentence in sentence_batch]

In [38]:
def get_batches(summaries, texts, batch_size):
    """Batch summaries, texts, and the lengths of their sentences together"""
    for batch_i in range(0, len(texts)//batch_size):
        start_i = batch_i * batch_size
        summaries_batch = summaries[start_i:start_i + batch_size]
        texts_batch = texts[start_i:start_i + batch_size]
        pad_summaries_batch = np.array(pad_sentence_batch(summaries_batch))
        pad_texts_batch = np.array(pad_sentence_batch(texts_batch))
        
        # Need the lengths for the _lengths parameters
        pad_summaries_lengths = []
        for summary in pad_summaries_batch:
            pad_summaries_lengths.append(len(summary))
        
        pad_texts_lengths = []
        for text in pad_texts_batch:
            pad_texts_lengths.append(len(text))
        
        yield pad_summaries_batch, pad_texts_batch, pad_summaries_lengths, pad_texts_lengths

In [39]:
# Set the Hyperparameters
epochs = 100
batch_size = 64
rnn_size = 256
num_layers = 2
learning_rate = 0.005
keep_probability = 0.75

In [40]:
# Build the graph
train_graph = tf.Graph()
# Set the graph to default to ensure that it is ready for training
with train_graph.as_default():
    
    # Load the model inputs    
    input_data, targets, lr, keep_prob, summary_length, max_summary_length, text_length = model_inputs()

    # Create the training and inference logits
    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                      targets, 
                                                      keep_prob,   
                                                      text_length,
                                                      summary_length,
                                                      max_summary_length,
                                                      len(vocab_to_int)+1,
                                                      rnn_size, 
                                                      num_layers, 
                                                      vocab_to_int,
                                                      batch_size)
    
    # Create tensors for the training logits and inference logits
    training_logits = tf.identity(training_logits.rnn_output, 'logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')
    
    # Create the weights for sequence_loss
    masks = tf.sequence_mask(summary_length, max_summary_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Loss function
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizer
        optimizer = tf.train.AdamOptimizer(learning_rate)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
print("Graph is built.")

  from ._conv import register_converters as _register_converters


Graph is built.


# 4. Training the model using training dataset

In [52]:
# only use subset
# Subset the data for training
start = 200000
end = start + 30000
sorted_summaries_short = sorted_summaries[start:end]
sorted_texts_short = sorted_texts[start:end]
print("The shortest text length:", len(sorted_texts_short[0]))
print("The longest text length:",len(sorted_texts_short[-1]))

The shortest text length: 25
The longest text length: 29


In [53]:
# Train the Model
learning_rate_decay = 0.95
min_learning_rate = 0.0005
display_step = 20 # Check training loss after every 20 batches
stop_early = 0 
stop = 3 # If the update loss does not decrease in 3 consecutive update checks, stop training
per_epoch = 3 # Make 3 update checks per epoch
update_check = (len(sorted_texts_short)//batch_size//per_epoch)-1

update_loss = 0 
batch_loss = 0
summary_update_loss = [] # Record the update losses for saving improvements in the model

checkpoint = "best_model.ckpt" 
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    
    # If we want to continue training a previous session
    #loader = tf.train.import_meta_graph("./" + checkpoint + '.meta')
    #loader.restore(sess, checkpoint)
    
    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (summaries_batch, texts_batch, summaries_lengths, texts_lengths) in enumerate(
                get_batches(sorted_summaries_short, sorted_texts_short, batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: texts_batch,
                 targets: summaries_batch,
                 lr: learning_rate,
                 summary_length: summaries_lengths,
                 text_length: texts_lengths,
                 keep_prob: keep_probability})

            batch_loss += loss
            update_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0 and batch_i > 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(sorted_texts_short) // batch_size, 
                              batch_loss / display_step, 
                              batch_time*display_step))
                batch_loss = 0

            if batch_i % update_check == 0 and batch_i > 0:
                print("Average loss for this update:", round(update_loss/update_check,3))
                summary_update_loss.append(update_loss)
                
                # If the update loss is at a new minimum, save the model
                if update_loss <= min(summary_update_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)

                else:
                    print("No Improvement.")
                    stop_early += 1
                    if stop_early == stop:
                        break
                update_loss = 0
            
                    
        # Reduce learning rate, but not below its minimum value
        learning_rate *= learning_rate_decay
        if learning_rate < min_learning_rate:
            learning_rate = min_learning_rate
        
        if stop_early == stop:
            print("Stopping Training.")
            break

Epoch   1/100 Batch   20/468 - Loss:  4.662, Seconds: 67.04
Epoch   1/100 Batch   40/468 - Loss:  2.869, Seconds: 65.10
Epoch   1/100 Batch   60/468 - Loss:  2.841, Seconds: 55.84
Epoch   1/100 Batch   80/468 - Loss:  2.721, Seconds: 70.62
Epoch   1/100 Batch  100/468 - Loss:  2.785, Seconds: 60.29
Epoch   1/100 Batch  120/468 - Loss:  2.679, Seconds: 60.47
Epoch   1/100 Batch  140/468 - Loss:  2.560, Seconds: 64.61
Average loss for this update: 2.978
New Record!
Epoch   1/100 Batch  160/468 - Loss:  2.558, Seconds: 68.57
Epoch   1/100 Batch  180/468 - Loss:  2.500, Seconds: 62.68
Epoch   1/100 Batch  200/468 - Loss:  2.588, Seconds: 53.37
Epoch   1/100 Batch  220/468 - Loss:  2.570, Seconds: 60.53
Epoch   1/100 Batch  240/468 - Loss:  2.510, Seconds: 44.45
Epoch   1/100 Batch  260/468 - Loss:  2.345, Seconds: 58.32
Epoch   1/100 Batch  280/468 - Loss:  2.205, Seconds: 66.59
Epoch   1/100 Batch  300/468 - Loss:  2.298, Seconds: 58.46
Average loss for this update: 2.423
New Record!
Epoc

Epoch   6/100 Batch  200/468 - Loss:  1.259, Seconds: 57.52
Epoch   6/100 Batch  220/468 - Loss:  1.255, Seconds: 70.99
Epoch   6/100 Batch  240/468 - Loss:  1.178, Seconds: 59.74
Epoch   6/100 Batch  260/468 - Loss:  1.106, Seconds: 68.59
Epoch   6/100 Batch  280/468 - Loss:  0.985, Seconds: 71.71
Epoch   6/100 Batch  300/468 - Loss:  1.122, Seconds: 59.45
Average loss for this update: 1.158
New Record!
Epoch   6/100 Batch  320/468 - Loss:  1.200, Seconds: 58.79
Epoch   6/100 Batch  340/468 - Loss:  1.376, Seconds: 49.48
Epoch   6/100 Batch  360/468 - Loss:  1.258, Seconds: 66.36
Epoch   6/100 Batch  380/468 - Loss:  1.198, Seconds: 67.66
Epoch   6/100 Batch  400/468 - Loss:  1.114, Seconds: 72.88
Epoch   6/100 Batch  420/468 - Loss:  1.038, Seconds: 53.84
Epoch   6/100 Batch  440/468 - Loss:  1.090, Seconds: 59.10
Epoch   6/100 Batch  460/468 - Loss:  1.273, Seconds: 58.25
Average loss for this update: 1.203
No Improvement.
Epoch   7/100 Batch   20/468 - Loss:  1.455, Seconds: 62.79


Epoch  11/100 Batch  380/468 - Loss:  0.836, Seconds: 67.66
Epoch  11/100 Batch  400/468 - Loss:  0.804, Seconds: 66.46
Epoch  11/100 Batch  420/468 - Loss:  0.755, Seconds: 53.86
Epoch  11/100 Batch  440/468 - Loss:  0.774, Seconds: 57.80
Epoch  11/100 Batch  460/468 - Loss:  0.910, Seconds: 59.18
Average loss for this update: 0.851
No Improvement.
Epoch  12/100 Batch   20/468 - Loss:  1.067, Seconds: 63.50
Epoch  12/100 Batch   40/468 - Loss:  0.943, Seconds: 62.57
Epoch  12/100 Batch   60/468 - Loss:  0.887, Seconds: 53.40
Epoch  12/100 Batch   80/468 - Loss:  0.853, Seconds: 66.72
Epoch  12/100 Batch  100/468 - Loss:  0.860, Seconds: 57.90
Epoch  12/100 Batch  120/468 - Loss:  0.776, Seconds: 58.20
Epoch  12/100 Batch  140/468 - Loss:  0.767, Seconds: 59.37
Average loss for this update: 0.875
No Improvement.
Epoch  12/100 Batch  160/468 - Loss:  0.810, Seconds: 66.42
Epoch  12/100 Batch  180/468 - Loss:  0.828, Seconds: 62.70
Epoch  12/100 Batch  200/468 - Loss:  0.839, Seconds: 53

Epoch  17/100 Batch  100/468 - Loss:  0.681, Seconds: 58.23
Epoch  17/100 Batch  120/468 - Loss:  0.615, Seconds: 57.58
Epoch  17/100 Batch  140/468 - Loss:  0.612, Seconds: 59.12
Average loss for this update: 0.694
No Improvement.
Epoch  17/100 Batch  160/468 - Loss:  0.646, Seconds: 67.22
Epoch  17/100 Batch  180/468 - Loss:  0.667, Seconds: 62.65
Epoch  17/100 Batch  200/468 - Loss:  0.685, Seconds: 54.01
Epoch  17/100 Batch  220/468 - Loss:  0.662, Seconds: 58.16
Epoch  17/100 Batch  240/468 - Loss:  0.634, Seconds: 44.82
Epoch  17/100 Batch  260/468 - Loss:  0.596, Seconds: 57.63
Epoch  17/100 Batch  280/468 - Loss:  0.547, Seconds: 67.39
Epoch  17/100 Batch  300/468 - Loss:  0.634, Seconds: 58.99
Average loss for this update: 0.627
New Record!
Epoch  17/100 Batch  320/468 - Loss:  0.604, Seconds: 59.37
Epoch  17/100 Batch  340/468 - Loss:  0.699, Seconds: 49.90
Epoch  17/100 Batch  360/468 - Loss:  0.664, Seconds: 67.07
Epoch  17/100 Batch  380/468 - Loss:  0.630, Seconds: 68.32


Epoch  22/100 Batch  280/468 - Loss:  0.464, Seconds: 67.47
Epoch  22/100 Batch  300/468 - Loss:  0.568, Seconds: 57.77
Average loss for this update: 0.542
New Record!
Epoch  22/100 Batch  320/468 - Loss:  0.526, Seconds: 58.93
Epoch  22/100 Batch  340/468 - Loss:  0.581, Seconds: 50.46
Epoch  22/100 Batch  360/468 - Loss:  0.567, Seconds: 67.25
Epoch  22/100 Batch  380/468 - Loss:  0.539, Seconds: 67.47
Epoch  22/100 Batch  400/468 - Loss:  0.554, Seconds: 68.26
Epoch  22/100 Batch  420/468 - Loss:  0.499, Seconds: 55.19
Epoch  22/100 Batch  440/468 - Loss:  0.521, Seconds: 58.28
Epoch  22/100 Batch  460/468 - Loss:  0.584, Seconds: 58.40
Average loss for this update: 0.555
No Improvement.
Epoch  23/100 Batch   20/468 - Loss:  0.702, Seconds: 61.98
Epoch  23/100 Batch   40/468 - Loss:  0.628, Seconds: 63.62
Epoch  23/100 Batch   60/468 - Loss:  0.577, Seconds: 53.43
Epoch  23/100 Batch   80/468 - Loss:  0.561, Seconds: 67.56
Epoch  23/100 Batch  100/468 - Loss:  0.563, Seconds: 58.42


Epoch  27/100 Batch  460/468 - Loss:  0.538, Seconds: 58.71
Average loss for this update: 0.497
No Improvement.
Epoch  28/100 Batch   20/468 - Loss:  0.611, Seconds: 61.57
Epoch  28/100 Batch   40/468 - Loss:  0.570, Seconds: 62.55
Epoch  28/100 Batch   60/468 - Loss:  0.529, Seconds: 54.21
Epoch  28/100 Batch   80/468 - Loss:  0.494, Seconds: 68.01
Epoch  28/100 Batch  100/468 - Loss:  0.513, Seconds: 57.96
Epoch  28/100 Batch  120/468 - Loss:  0.455, Seconds: 57.63
Epoch  28/100 Batch  140/468 - Loss:  0.452, Seconds: 57.82
Average loss for this update: 0.515
No Improvement.
Epoch  28/100 Batch  160/468 - Loss:  0.482, Seconds: 67.21
Epoch  28/100 Batch  180/468 - Loss:  0.504, Seconds: 62.52
Epoch  28/100 Batch  200/468 - Loss:  0.536, Seconds: 54.27
Epoch  28/100 Batch  220/468 - Loss:  0.487, Seconds: 58.14
Epoch  28/100 Batch  240/468 - Loss:  0.480, Seconds: 45.32
Epoch  28/100 Batch  260/468 - Loss:  0.441, Seconds: 59.19
Epoch  28/100 Batch  280/468 - Loss:  0.406, Seconds: 67

Epoch  33/100 Batch  160/468 - Loss:  0.439, Seconds: 66.76
Epoch  33/100 Batch  180/468 - Loss:  0.454, Seconds: 62.37
Epoch  33/100 Batch  200/468 - Loss:  0.463, Seconds: 54.38
Epoch  33/100 Batch  220/468 - Loss:  0.439, Seconds: 57.81
Epoch  33/100 Batch  240/468 - Loss:  0.449, Seconds: 44.98
Epoch  33/100 Batch  260/468 - Loss:  0.405, Seconds: 58.13
Epoch  33/100 Batch  280/468 - Loss:  0.372, Seconds: 67.76
Epoch  33/100 Batch  300/468 - Loss:  0.442, Seconds: 58.72
Average loss for this update: 0.428
No Improvement.
Stopping Training.


# 5. Generating and comparing summaries

In [160]:
def text_to_seq(text):
    '''Prepare the text for the model'''
    
    text = clean_text(text)
    return [vocab_to_int.get(word, vocab_to_int['<UNK>']) for word in text.split()]

def summarize_text(input_text):
    # measure time
    t0 = time.time()
    text = text_to_seq(input_text)

    checkpoint = "./best_model.ckpt"

    loaded_graph = tf.Graph()
    with tf.Session(graph=loaded_graph) as sess:
        # Load saved model
        loader = tf.train.import_meta_graph(checkpoint + '.meta')
        loader.restore(sess, checkpoint)

        input_data = loaded_graph.get_tensor_by_name('input:0')
        logits = loaded_graph.get_tensor_by_name('predictions:0')
        text_length = loaded_graph.get_tensor_by_name('text_length:0')
        summary_length = loaded_graph.get_tensor_by_name('summary_length:0')
        keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')

        #Multiply by batch_size to match the model's input parameters
        answer_logits = sess.run(logits, {input_data: [text]*batch_size, 
                                          summary_length: [np.random.randint(5,8)], 
                                          text_length: [len(text)]*batch_size,
                                          keep_prob: 1.0})[0] 

    # Remove the padding from the text
    pad = vocab_to_int["<PAD>"] 
    print(" ".join([int_to_vocab[i] for i in answer_logits if i != pad]))
    
    t1 = time.time()
    print("Time taken to summarize:",t1-t0)

In [161]:
# import sumy's text_rank module as an example of extractive summary
from sumy.summarizers.text_rank import TextRankSummarizer
from nltk.corpus import stopwords
from nltk.tokenize import punkt
from sumy.nlp.stemmers import Stemmer
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer

def text_rank_summary(text, language = 'english',length = 1):
    # measure time
    t0 = time.time()
    
    parser = PlaintextParser(text, Tokenizer(language))
    stemmer = Stemmer(language)
    summarizer = TextRankSummarizer()
    summarizer.stop_words = stopwords.words('english')
    
    for i in summarizer(parser.document,length):
        print(i)
    
    t1 = time.time()
    print("Time taken to summarize:", t1 - t0)

In [162]:
# comparing text
def compare_text(text, length = 1):
    # original text
    print("----------ORIGINAL TEXT----------")
    print(text)

    # Abstractive summarization
    print("\n----------ABSTRACTIVE SUMMARIZATION----------")
    summarize_text(text)

    # Extractive summarization
    print("\n----------EXTRACTIVE SUMMARIZATION----------")
    print(text_rank_summary(text),length)

In [163]:
example = """For those of us without a two week vacation to devote to this tome, I would like to see a Clif's Notes version. Maybe 10,000 random numbers? \
SPOILER ALERT: They just pretty much stay random the whole time, no plot twists or anything. I mean if you've seen one random number, you've seen them all. In a slap in the face of randomness, the very randomness of it got repetitive after a few pages. Save yourself the time, and if you need a random number, just sort of think of a random number in your head and write it down. Odds are its in the book already, and you saved yourself $80.\
On the plus side, great comments. Please read my upcoming meta-pop-economics book, "Absurdity, Humor, and Metacommentary in Current Anonymous Internet Communication, A Case Study: Literary Criticism of the Amazon.com Comments Section for the Book 'A Million Random Digits with 100,000 Normal Deviates' by the Rand Corporation." Coming soon to Amazon.com."""

compare_text(example)

----------ORIGINAL TEXT----------
For those of us without a two week vacation to devote to this tome, I would like to see a Clif's Notes version. Maybe 10,000 random numbers? SPOILER ALERT: They just pretty much stay random the whole time, no plot twists or anything. I mean if you've seen one random number, you've seen them all. In a slap in the face of randomness, the very randomness of it got repetitive after a few pages. Save yourself the time, and if you need a random number, just sort of think of a random number in your head and write it down. Odds are its in the book already, and you saved yourself $80.On the plus side, great comments. Please read my upcoming meta-pop-economics book, "Absurdity, Humor, and Metacommentary in Current Anonymous Internet Communication, A Case Study: Literary Criticism of the Amazon.com Comments Section for the Book 'A Million Random Digits with 100,000 Normal Deviates' by the Rand Corporation." Coming soon to Amazon.com.

----------ABSTRACTIVE SUMMAR

In [164]:
example2 = """Ferrero Rocher is one of my favorite candies so I decided to order some as a Valentine’s gift for myself. \
The package arrived on time and the appropriate protection was used as this product did not arrive damaged or cracked. \
The product was just as described – 2 tiers of 24 candies on each level, for a total of 48 pieces. \
Unfortunately, when I opened the candy wrapper I noticed the outer chocolate shell looked off-color and had a white dried-out tint to it. I proceeded to eat the candy and found that it tasted a bit off and seemed stale. \
I thought maybe it was just my taste buds or that it was possibly just that one piece. So I went ahead and ate a couple more pieces. Every piece tasted just like the first one – stale! \
I decided to try and find the LOT codes to check out the expiration date. Sure enough, these chocolates were beyond expired! \
I am extremely disappointed and unhappy with my purchase. I know mistakes can happen, so if this company is able to fix this problem, I will post an update. However, at this time I do not recommend this product from this seller."""

compare_text(example2)

----------ORIGINAL TEXT----------
Ferrero Rocher is one of my favorite candies so I decided to order some as a Valentine’s gift for myself. The package arrived on time and the appropriate protection was used as this product did not arrive damaged or cracked. The product was just as described – 2 tiers of 24 candies on each level, for a total of 48 pieces. Unfortunately, when I opened the candy wrapper I noticed the outer chocolate shell looked off-color and had a white dried-out tint to it. I proceeded to eat the candy and found that it tasted a bit off and seemed stale. I thought maybe it was just my taste buds or that it was possibly just that one piece. So I went ahead and ate a couple more pieces. Every piece tasted just like the first one – stale! I decided to try and find the LOT codes to check out the expiration date. Sure enough, these chocolates were beyond expired! I am extremely disappointed and unhappy with my purchase. I know mistakes can happen, so if this company is able

In [165]:
example3 = """Out of all the furniture I ever bought online, this was by far the best choice... Both in quality and price! I seriously have one of these rugs in every room of my house now. I ordered the 6-7 x 9' rug in Turquoise on 01/17/16 for $129 w/FREE Shipping. It was delivered in pristine condition less than a week later. Although it was a risk to buy a rug without seeing first in person, I was very happy with the outcome and highly recommend it to anyone looking for a good quality yet affordable shag carpet. In fact, I've purchased several other rugs like this in various colors/sizes and have found them all to be of the same high quality.\
APPEARANCE | This rug is absolutely gorgeous. The pile is just the right length and the shag is has a comfortable cushy texture that also looks amazing. It's all over soft and cozy and is the perfect area for performing my daily yoga routine. The material is thick, heavy (in a good way) and feels luxurious. My family members and guests love to sit on it. I've received many compliments (from
both males and females of varying ages) and it seems like everyone who enters the house is drawn to sitting on it!\
SHEDDING | It hardly sheds (only 4 or 5 threads fell out over the 1st month). I rarely find any loose threads. It's held up well to vacuuming, even with a Shop Vac.\
STAINING/CLEANING | The rug is resistant to staining and spot cleans easily. Unlike virtually everything else in my house, it doesn't attract fur from my large German Shepard (who also loves to lay on it)! It's also "kid-friendly," meaning that it's easy to keep clean and durable.\
COMPETITION | I spent a lot of time researching shag rugs on Amazon, Wayfair, and Overstock.com prior to going with this one. Sweet Home brand was by far the best priced... It had a lot of positive reviews and I feel they were accurate. I'll definitely continue to buy from Sweet Home brand.\
COLOR | The Turquoise is true to color, just as pictured. It's a light turquoise... Definitely NOT teal or baby blue. It's been easy to match with other turquoise-colored decor. I also have the Trellis Design in Charcoal-Cream. It's an attractive dark grey, definitely not black or blue, and the cream is... well, cream. I also have one in the lighter grey (also trellis design). It's best described as a neutral, maybe slightly cool toned (as opposed to warm) grey and the trellis pattern is also a light-medium cream.\
PLACEMENT | This huge rug rolled out nicely and stays in place. It's didn't buckle or roll up on itself, even on the 1st day. Note that I have one over carpet, one on wood flooring, and the others over tile. They retain their grip well on each type of surface and I do NOT use any rug pads under them.\
COORDINATION | This rug adds visual interest to the room in terms of color and texture. Since I decorate with lots of white, silver, grey and clear lucite pieces, the turquoise prevents the room from looking sterile or drab. In order to tie in the vibrant color, I ordered some faux fur pillow covers in turquoise from the Euphoria CaliTime brand. The color isn't exact but it's close enough and looks awesome. (Sweet Home also makes a faux fur pillow in turquoise which appears to be an exact match but I passed on it as the cover is not detachable and I like to have the ability to wash them.) The overall look has major swag! It's chick, yet inviting and comfortable.\
INSPECT | When I bought a 2nd rug (the Trellis design in Charcoal/Cream), I ordered it in 6'7" x 9'3" but received one that was 4'11" 6'8." I advise other buyers to inspect the size and confirm it's accurate upon arrival. The seller was SUPER cool about correcting the issue and even allowed me to keep the smaller-sized rug. I found this to be great customer service as returning a rug is NOT an easy task. Therefore, in the event that you do not receive the correct size or color, I urge you to NOT flip out and leave a nasty review.... Give them an opportunity to correct the issue first. Hopefully you'll have a positive experience like I did. """

compare_text(example3)

----------ORIGINAL TEXT----------
Out of all the furniture I ever bought online, this was by far the best choice... Both in quality and price! I seriously have one of these rugs in every room of my house now. I ordered the 6-7 x 9' rug in Turquoise on 01/17/16 for $129 w/FREE Shipping. It was delivered in pristine condition less than a week later. Although it was a risk to buy a rug without seeing first in person, I was very happy with the outcome and highly recommend it to anyone looking for a good quality yet affordable shag carpet. In fact, I've purchased several other rugs like this in various colors/sizes and have found them all to be of the same high quality.APPEARANCE | This rug is absolutely gorgeous. The pile is just the right length and the shag is has a comfortable cushy texture that also looks amazing. It's all over soft and cozy and is the perfect area for performing my daily yoga routine. The material is thick, heavy (in a good way) and feels luxurious. My family members 

# 6. Conclusion

As can be seen from the example, seq2seq attention model produces a much more relevant and coherent result. However, it is also much slower than the extractive textrank method. In particular, if we consider the time taken to train the model, abstractive methods can be excruciatingly slow, even with the limited training set used in the context. Depending on the usage (especially in the commercial production settings), abstractive methods may still prove to be limited in practicality.