In [1]:
import spacy
import pandas
from spacy.language import Language
from spacy.symbols import nsubj, acomp
from collections import Counter
from heapq import nlargest
nlp = spacy.load('en_core_web_sm')

In [2]:
rd = pandas.read_csv('preprossecedData.csv')
products = rd['name'].unique()

In [3]:
@Language.component("set_custom_boundaries")
def set_custom_boundaries(doc):
    for token in doc[:-1]:
        if token.text in ["," ,".", "!", "?", ":"]:
            doc[token.i + 1].is_sent_start = True
            continue
        if "," in token.text or "." in token.text or "!" in token.text or "?" in token.text or ":" in token.text:
            doc[token.i].is_sent_start = True
    return doc
nlp.add_pipe("set_custom_boundaries", before="parser")

<function __main__.set_custom_boundaries(doc)>

The aim of this notebook is to improve the summarization functions from summarization.ipynb by changing the sentence weighting to favor descriptive sentences consisting of noun-adjective pairs. The following steps will be made based on the explanations at https://achyutjoshi.github.io/aspect_extraction/aspectextraction

In [4]:
def hasPair(sent):
    pairExists = False
    for possible_subject in sent:
        if possible_subject.dep == nsubj and possible_subject.head.dep == acomp:
            pairExists = True
            break
    return pairExists

In [5]:
nlp.vocab["it"].is_stop = False
nlp.vocab["this"].is_stop = False
nlp.vocab["they"].is_stop = False
nlp.vocab["these"].is_stop = False

In [6]:
#Positive summary
def summarizeP(product):
    df = rd[(rd['name'] == product) & (rd['doRecommend'] == True)]
    reviewText = df['text'].str.cat(sep='. ')
    reviewDoc = nlp(reviewText)
    pos_tag = ['NOUN', 'ADJ']
    words = [ token.lemma_ for token in reviewDoc if token.is_stop != True and token.is_punct != True and token.pos_ in pos_tag]
    freq_word = Counter(words)
    max_freq = freq_word.most_common(1)[0][1]
    for word in freq_word.keys():
        freq_word[word] = (freq_word[word]/max_freq)
    sent_strength = {}
    for sent in reviewDoc.sents:
        for word in sent:
            if word.lemma_ in freq_word.keys():
                if sent in sent_strength.keys():
                    sent_strength[sent] += freq_word[word.lemma_]
                else:
                    sent_strength[sent] = freq_word[word.lemma_]
        if hasPair(sent):
            sent_strength[sent] += 1
    important_sents = nlargest(3, sent_strength, key=sent_strength.get)
    final_sentences = [ w.text for w in important_sents ]
    summary = ' STOP '.join(final_sentences)#"STOP" included to highlight beginnings of sentences (debug)
    return summary

In [7]:
#Negative summary
def summarizeN(product):
    df = rd[(rd['name'] == product) & (rd['doRecommend'] == False)]
    reviewText = df['text'].str.cat(sep='. ')
    reviewDoc = nlp(reviewText)
    pos_tag = ['NOUN', 'ADJ']
    words = [ token.lemma_ for token in reviewDoc if token.is_stop != True and token.is_punct != True and token.pos_ in pos_tag]
    freq_word = Counter(words)
    max_freq = freq_word.most_common(1)[0][1]
    for word in freq_word.keys():
        freq_word[word] = (freq_word[word]/max_freq)
    sent_strength = {}
    for sent in reviewDoc.sents:
        for word in sent:
            if word.lemma_ in freq_word.keys():
                if sent in sent_strength.keys():
                    sent_strength[sent] += freq_word[word.lemma_]
                else:
                    sent_strength[sent] = freq_word[word.lemma_]
        if hasPair(sent):
            sent_strength[sent] += 1
    important_sents = nlargest(3, sent_strength, key=sent_strength.get)
    final_sentences = [ w.text for w in important_sents ]
    summary = ' STOP '.join(final_sentences)#"STOP" included to highlight beginnings of sentences (debug)
    return summary

In [8]:
print("PSummarizing", products[0])
print()
print(summarizeP(products[0]))

PSummarizing AmazonBasics AAA Performance Alkaline Batteries (36 Count)

great price on good batteries that seem to be just as good as most alkaline batteries I've used. STOP We've been going through a lot of AAA sized batteries and have been able to verify that these Amazon Basics batteries last just as long as other name brand alkaline batteries from Duracell, STOP These batteries seem to hold up as well as any other batteries I've used and they are cheaper than the name brand batteries I had been using


In [9]:
print("NSummarizing", products[0])
print()
print(summarizeN(products[0]))

NSummarizing AmazonBasics AAA Performance Alkaline Batteries (36 Count)

If these batteries continue to perform well and have normal lives we will switch all of our battery purchases from name brand batteries to this brand STOP They're such a good deal that after I got these I replaced the batteries in every device I own that uses batteries (to avoid old batteries corroding and destroying anything). STOP I can't tell you how they are tested or at what point a battery is rejected but I can tell you that there are only certain types of batteries that I will buy because I don't believe in throwing away money and I Amazon batteries are right up there with my other 2 brands.


In [10]:
print("PSummarizing", products[1])
print()
print(summarizeP(products[1]))

PSummarizing AmazonBasics AA Performance Alkaline Batteries (48 Count) - Packaging May Vary

materials.con's noneoverall great batteries great quality great bargainwould buy again yes they seam to be great batteries use in all devices like flashlights, STOP We've been going through a lot of AAA sized batteries and have been able to verify that these Amazon Basics batteries last just as long as other name brand alkaline batteries from Duracell, STOP Great batteries and great the price is great to I will almost definitely buy these again


In [11]:
print("NSummarizing", products[1])
print()
print(summarizeN(products[1]))

NSummarizing AmazonBasics AA Performance Alkaline Batteries (48 Count) - Packaging May Vary

Why pay extra for Duracell batteries when AmazonBasics' batteries will last almost as long I bought a 48-pack of AA batteries for just 12 that's just 25 cents per battery! STOP They're such a good deal that after I got these I replaced the batteries in every device I own that uses batteries (to avoid old batteries corroding and destroying anything). STOP Do you have battery-operated items Do you need AA batteries to power your items These AA batteries will power your items


In [12]:
print("PSummarizing", products[2])
print()
print(summarizeP(products[2]))

PSummarizing AmazonBasics Backpack for Laptops up to 17-inches

This is a nice big bag with lots of pockets and zippers. STOP although for extended use in rain you would need a separate rain cover made of coated waterproof nylon. STOP If I had a choice I'd rather the laptop slot be replaced with a waterproof cooler space as I purchased this for hiking.


In [13]:
print("NSummarizing", products[2])
print()
print(summarizeN(products[2]))

NSummarizing AmazonBasics Backpack for Laptops up to 17-inches

There is mid-sized compartment that is in front of the main compartment and a smaller front compartment with the obligatory organizer/pockets to hold pens, STOP My normal backpack is a smaller one from Timberline which has less compartments but seems to hold a great deal of stuff and not feel nearly as bulky as this one STOP This AmazonBasics laptop is well-made and sturdy with a variety of pocket sizes and places to put stuff.


In [14]:
print("PSummarizing", products[3])
print()
print(summarizeP(products[3]))

PSummarizing AmazonBasics 15.6-Inch Laptop and Tablet Bag

it's a great value for a reasonably low price STOP For the price very good value and its lasting well. STOP This bag is really great value for money,


In [15]:
print("NSummarizing", products[3])
print()
print(summarizeN(products[3]))

NSummarizing AmazonBasics 15.6-Inch Laptop and Tablet Bag

fab case for any laptop would buy another again i got it for my laptop and it is fine and strong plenty of pockets. STOP On arrival I put my dell Inspiron laptop inside and the charger went in the front pocket. STOP arrived on time well packed disappointed with quality does the job it was only for keeping dust off second laptop,
