In [1]:
import spacy
import pandas
from collections import Counter
from heapq import nlargest
nlp = spacy.load('en_core_web_sm')

In [2]:
rd = pandas.read_csv('preprossecedData.csv')
products = rd['name'].unique()

The function below is mostly based on the spaCy summarization algorithm in https://medium.com/analytics-vidhya/text-summarization-using-spacy-ca4867c6b744.

It takes a product name as an argument, then combines all reviews about that product into one text, and the sentences in this text are used to produce the summary.

The weight of each sentence is determined by the sum of the normalized frequencies of the words in that sentence.

The normalized frequency of each word is its frequency in the text, divided by the frequency of the most common word.

In [3]:
def summarize(product):
    df = rd[rd['name'] == product]
    reviewText = df['text'].str.cat(sep=' ')
    reviewDoc = nlp(reviewText)
    pos_tag = ['PROPN', 'ADJ', 'NOUN', 'VERB']
    words = [ token.text for token in reviewDoc if token.is_stop != True and token.is_punct != True and token.pos_ in pos_tag]
    freq_word = Counter(words)
    max_freq = freq_word.most_common(1)[0][1]
    for word in freq_word.keys():
        freq_word[word] = (freq_word[word]/max_freq)
    sent_strength = {}
    for sent in reviewDoc.sents:
        for word in sent:
            if word.text in freq_word.keys():
                if sent in sent_strength.keys():
                    sent_strength[sent] += freq_word[word.text]
                else:
                    sent_strength[sent] = freq_word[word.text]
    important_sents = nlargest(3, sent_strength, key=sent_strength.get)
    final_sentences = [ w.text for w in important_sents ]
    summary = ' '.join(final_sentences)
    return summary

In [4]:
summarize(products[0])

"My roommates stole them all and never complained! great price and long lasting same as above batteries, great price and long lasting and appreciate quality and great price these are the only AA batteries i buy now, great price and outlasts any of the batteries from the big box stores I haven't tested them to compare to other brands, but the Amazon batteries are a great price and great packaging. I never use amazon batteries before,but I love it, It's good for the price,it last just as long as duracell batteries, and the design of the batteries is just cute Pros: You can get a LOT of batteries for a very cheap price!Cons: Batteries do not last very long at ALL!I don't mind changing out batteries on something I use constantly since I got the hundred back, but it can get a little annoying doing it once a week. I will never buy batteries from anywhere else Good batteries, just as great as name brand but for a cheaper price Good batteries, the humble packaging makes them cheaper than other

With the current definition of the summarize function, some of the weighed sentences are too long. This is a weakness of using the spaCy dependency parser for setting sentence boundaries within the text containing all reviews of the product.

A possible solution to this problem is to use rule based sentence segmentation instead of spaCy's default segmentation technique.