## AIDI 2004 - AI in Enterprise Systems

### FINAL PROJECT

by Michael Molnar and Vasundara Chandre Gowda

Condensing long passages of text into short and representative snippets is a important task in natural language processing.  People can often be left frustrated at the end of reading an article, having found they had been fooled by a sensation, clickbait headline. 

In this project we will train and build our own deep learning model that will generate a novel summary that captures the main points of a news article, product review, or other text sample.    

We will build a web application and deploy our model for use. 

### NOTEBOOK 4:

In this notebook we will build the module that will be the basis of our Flask application.  This will take a user's input text, appropriately clean and process it, and then use our models to generate a summary of it.

In [6]:
import numpy as np
import re
from nltk.corpus import stopwords
from tensorflow.keras.preprocessing.sequence import pad_sequences 
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.models import load_model
import pickle

# Define the global variables
MAX_LEN_TEXT = 50
MAX_LEN_SUMMARY = 10

# Load the tokenizers
with open('text_tokenizer.pkl', 'rb') as handle:
    text_tokenizer = pickle.load(handle)

with open('summary_tokenizer.pkl', 'rb') as handle:
    summary_tokenizer = pickle.load(handle)
    
# Create the mappings to go between integer indexes and words
reverse_target_word_index = summary_tokenizer.index_word 
reverse_source_word_index = text_tokenizer.index_word 
target_word_index = summary_tokenizer.word_index

# Import the models
encoder_model = load_model('encoder_model.h5')
decoder_model = load_model('decoder_model.h5')

# Create the list of contraction mappings
contraction_mappings = {"ain't": "is not", "aren't": "are not", "can't": "cannot", "'cause": "because", 
                       "could've": "could have", "couldn't": "could not", "didn't": "did not", 
                       "doesn't": "does not", "don't": "do not", "hadn't": "had not", "hasn't": "has not", 
                       "haven't": "have not", "he'd": "he would","he'll": "he will", "he's": "he is", 
                       "how'd": "how did", "how'd'y": "how do you", "how'll": "how will", "how's": "how is",
                       "I'd": "I would", "I'd've": "I would have", "I'll": "I will", "I'll've": "I will have",
                       "I'm": "I am", "I've": "I have", "i'd": "i would", "i'd've": "i would have", 
                       "i'll": "i will",  "i'll've": "i will have","i'm": "i am", "i've": "i have", 
                       "isn't": "is not", "it'd": "it would", "it'd've": "it would have", "it'll": "it will", 
                       "it'll've": "it will have","it's": "it is", "let's": "let us", "ma'am": "madam", 
                       "mayn't": "may not", "might've": "might have","mightn't": "might not",
                       "mightn't've": "might not have", "must've": "must have", "mustn't": "must not", 
                       "mustn't've": "must not have", "needn't": "need not", "needn't've": "need not have",
                       "o'clock": "of the clock", "oughtn't": "ought not", "oughtn't've": "ought not have", 
                       "shan't": "shall not", "sha'n't": "shall not", "shan't've": "shall not have", 
                       "she'd": "she would", "she'd've": "she would have", "she'll": "she will", 
                       "she'll've": "she will have", "she's": "she is", "should've": "should have", 
                       "shouldn't": "should not", "shouldn't've": "should not have", "so've": "so have",
                       "so's": "so as", "this's": "this is","that'd": "that would", "that'd've": "that would have", 
                       "that's": "that is", "there'd": "there would", "there'd've": "there would have", 
                       "there's": "there is", "here's": "here is","they'd": "they would", "they'd've": "they would have", 
                       "they'll": "they will", "they'll've": "they will have", "they're": "they are", 
                       "they've": "they have", "to've": "to have", "wasn't": "was not", "we'd": "we would", 
                       "we'd've": "we would have", "we'll": "we will", "we'll've": "we will have", "we're": "we are",
                       "we've": "we have", "weren't": "were not", "what'll": "what will", "what'll've": "what will have", 
                       "what're": "what are", "what's": "what is", "what've": "what have", "when's": "when is", 
                       "when've": "when have", "where'd": "where did", "where's": "where is", "where've": "where have", 
                       "who'll": "who will", "who'll've": "who will have", "who's": "who is", "who've": "who have", 
                       "why's": "why is", "why've": "why have", "will've": "will have", "won't": "will not", 
                       "won't've": "will not have", "would've": "would have", "wouldn't": "would not", 
                       "wouldn't've": "would not have", "y'all": "you all", "y'all'd": "you all would",
                       "y'all'd've": "you all would have","y'all're": "you all are","y'all've": "you all have",
                       "you'd": "you would", "you'd've": "you would have", "you'll": "you will", 
                       "you'll've": "you will have", "you're": "you are", "you've": "you have"}

# Import the list of stopwords
stopwords = set(stopwords.words('english'))

def clean_text(text):
    """
    Performs all necessary cleaning operations on text input.
    """
    # Lowercase the text
    new_text = text.lower()
    # Remove special characters
    new_text = re.sub(r'\([^)]*\)', '', new_text)
    new_text = re.sub('"', '', new_text)
    # Expand contractions
    new_text = ' '.join([contraction_mappings[x] if x in contraction_mappings else x for x in new_text.split(' ')])
    # Remove 's 
    new_text = re.sub(r"'s\b", '', new_text)
    # Replace non-alphabetic characters with a space
    new_text = re.sub('[^a-zA-Z]', ' ', new_text)
    # Split the text into tokens and remove stopwords
    tokens = [word for word in new_text.split() if word not in stopwords]
    # Keep only tokens that are longer than one letter long 
    words = []
    for t in tokens:
        if len(t) > 1:
            words.append(t)
    # Return a rejoined string 
    return (' '.join(words).strip())

def decode_sequence(input_seq):
    """
    Take the input sequence and use the models to generate the summary.
    """
    # Encode the input as state vectors.
    e_out, e_h, e_c = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1,1))

    # Chose the 'start' word as the first word of the target sequence
    target_seq[0, 0] = target_word_index['start']

    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + [e_out, e_h, e_c])

        # Sample a token
        
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        
        sampled_token = reverse_target_word_index[sampled_token_index]

        if(sampled_token!='end'):
            decoded_sentence += ' '+sampled_token

        # Exit condition: either hit max length or find stop word.
        if (sampled_token == 'end' or len(decoded_sentence.split()) >= (MAX_LEN_SUMMARY-1)):
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1,1))
        target_seq[0, 0] = sampled_token_index

        # Update internal states
        e_h, e_c = h, c

    return decoded_sentence

def summarize():
    # Take input text
    text = input('Enter your text: ')
    # Clean the text
    review = clean_text(text)
    # Condense to the maximum lenght
    condensed_review = review.split()[:MAX_LEN_TEXT]
    # Join into a string 
    condensed_review = ' '.join(condensed_review)
    # Tokenized
    tokenized_review = text_tokenizer.texts_to_sequences([condensed_review])
    # Pad the seuqence 
    pad_tokenized_review = pad_sequences(tokenized_review, maxlen = MAX_LEN_TEXT, padding = 'post')
    # Generate the summary 
    summary = decode_sequence(pad_tokenized_review.reshape(1, MAX_LEN_TEXT))
    # Print results
    print('Summary: ', summary)



### Generating Summaries

In [14]:
summarize()

Enter your text: give rating order make comment really like eden organic beans used many years usually purchased locally available however beans received online source date black beans expiration dates pinto beans expiration dates check dates open cooked one package beans thinking check date thought would even issue since ordering direct going mine old dates know use rest 
Summary:   great product poor packaging


In [10]:
summarize()

Enter your text: grounds cup started packing much coffee could cups could barely get top made much better cup coffee however still leaked still grounds coffee make keurig run much slower mention cleanup business uses keurig brew coffee choice also couple tea hot chocolate drinkers everytime use disposable cups wipe machine drip tray cup also clean interior cup next person put leftovers nice idea amount work require makes paying little convenience regular cups attractive probably use back regular cups find landfill friendly solution 
Summary:   excellent coffee


In [11]:
summarize()

Enter your text: reviews decided purchase natural balance several different sites included natural balance among best dry food happy cat switched foods effortlessly even gradually introduce like one day eating purina one next natural balance one major thing noticed weeks food coat much softer shinier silky soft holy cow know good things fur good sign bowel movements always nice solid look even better food also feed less day makes bag last longer lbs purina one would convert keep buying food thanks natural balance 
Summary:   my cat loves this


In [12]:
summarize()

Enter your text: problem though one store sells box seldom shelf store carries regularly sells box living rural area long ways nearest large town two groceries closest small town depend two stores getting frustrated either finding tea paying exorbitant price found delivered regular intervals shipping charges savings store prices ecstatic running store store hoping getting strange looks stand coffee tea aisle shelf empty wondering going drink feel good bigelow box empty favorite tea comes right mailbox reasonably priced automatically like magic life good 
Summary:   great price and taste


In [16]:
summarize()

Enter your text: this is my favorite pizza
Summary:   yummy


In [17]:
summarize()

Enter your text: this is decent, though I have had better.
Summary:   good
