# AI6122 Text Management Review Summarizer
Task List:
- [ ] Define and justify what a summarizer is. E.g.
  + a list of keywords
  + a list of key phrases
  + a list of noun-adjective pairs 
  + a list of nounPhrase - adjectivePhrase pairs 
  + a list of representative sentences 
- [ ] Technical challenges to achieve ideal summarization and your solution.
- [ ] Justify approach is best option for each component in your solution.
- [ ] Justify limitations to your approach.
- [ ] Evaluate solution with possible alternative solutions (baselines).
- [ ] Randomly choose 3 products to create product review summary.

Possible algorithms to try:
- [x] (Extractive) Word frequency sentence extraction: score sentences by word freq
- (Extractive) Text rank algorithm: uses cosine similarity
- (Extractive/ Abstractive) Machine learning 



## Colab Configuration

In [0]:
from google.colab import drive
drive.mount('/content/drive')
import os 
os.chdir('/content/drive/My Drive/Colab Notebooks/txtmgmt')
!pwd

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/My Drive/Colab Notebooks/txtmgmt


## Import Modules & Configurations

In [0]:
from collections import OrderedDict
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
import pandas as pd
import multiprocessing as mp
import datetime
from helpers.duallogger import loggersetup
from helpers.filehelper import is_not_empty_file_exists, write_to_file, load_from_file
import logging

from nltk.corpus import stopwords
import nltk
import re
import heapq

In [0]:
nltk.download('punkt')
nltk.download('stopwords')
stopwords_nltk = stopwords.words('english')
stopwords_spacy = list(STOP_WORDS)
stopwords_spacy.append('\n')
stopwords = stopwords_nltk + list(set(stopwords_spacy) - set(stopwords_nltk))

print("sw nltk: ", len(stopwords_nltk))
print("sw spacy: ", len(stopwords_spacy))
print("combined: ", len(stopwords))

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
sw nltk:  179
sw spacy:  327
combined:  383


In [0]:
cores = mp.cpu_count()
print("Cores:", cores)
gpu = spacy.prefer_gpu()
print("GPU:", gpu)

log_dir = './logs/'
log = loggersetup(log_dir, stdout_level=logging.DEBUG, file_level=logging.DEBUG)

# log.debug('Debug message, should only appear in the file.')
# log.info('Info message, should appear in file and stdout.')
# log.warning('Warning message, should appear in file and stdout.')
# log.error('Error message, should appear in file and stdout.')

Cores: 2
GPU: False


In [0]:
parameters = OrderedDict()
parameters['json_file'] = 'CellPhoneReview.json'
parameters['reload_prod_reviews'] = True
parameters['prod_reviews_path'] = './data/prod_reviews.data'
parameters['clean_reviews'] = True
parameters['reload_clean_reviews'] = True
parameters['cleaned_reviews_path'] = './data/prod_reviews_cleaned.data'

## Data Preprocessing

In [101]:
if not parameters['reload_prod_reviews'] or not is_not_empty_file_exists(parameters['prod_reviews_path']):
    data = pd.read_json(parameters['json_file'], lines = True)
    prod_reviews = data.groupby(['asin'])['reviewText'].apply(' '.join).reset_index()
    log.info("Writing prod_reviews to %s" % parameters['prod_reviews_path'])
    write_to_file(parameters['prod_reviews_path'], prod_reviews)
else:
    log.info("Reloading prod_reviews from %s" % parameters['prod_reviews_path'])
    prod_reviews = load_from_file(parameters['prod_reviews_path'])

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)
prod_reviews.head(3) 

[INFO] Reloading prod_reviews from ./data/prod_reviews.data


Unnamed: 0,asin,reviewText
0,120401325X,"These stickers work like the review says they do. They stick on great and they stay on the phone. They are super stylish and I can share them with my sister. :) These are awesome and make my phone look so stylish! I have only used one so far and have had it on for almost a year! CAN YOU BELIEVE THAT! ONE YEAR!! Great quality! Item arrived in great time and was in perfect condition. However, I ordered these buttons because they were a great deal and included a FREE screen protector. I never received one. Though its not a big deal, it would've been nice to get it since they claim it comes with one. awesome! stays on, and looks great. can be used on multiple apple products. especially having nails, it helps to have an elevated key. These make using the home button easy. My daughter and I both like them. I would purchase them again. Well worth the price. Came just as described.. It doesn't come unstuck and its cute! People ask where I got them from & it's great when driving."
1,3998899561,"it worked for the first week then it only charge my phone to 20%. it is a waste of money. Good case, solid build. Protects phone all around with good access to buttons. Battery charges with full battery lasts me a full day. I usually leave my house around 7am and return at 10pm. I'm glad that it lasts from start to end. 5/5 This is a fantastic case. Very stylish and protects my phone. Easy access to all buttons and features, without any loss of phone reception. But most importantly, it double power, just as promised. Great buy this case fits perfectly on the s4 and keeps me powerd all day I can't complain! a+ recommend it to all This is the first battery case I have had for my Galaxy S4. The S4 fits very well, is slim and doesn't add much weight to the Galaxy S4. It doubles the battery life. You can charge either the battery, the phone or both. There is a handy on-off switch with leds to indicate the level of charge.The battery case came on time and was packaged well. Well worth the price. Performs exactly as advertised . It's very sturdily built,and provides lots of boost . It does exactly what it's supposed too .Easy to insert phone in and out . Definitely a 5 star experience. Don't know what I would do without this case LOVE LOVE LOVE it. Unlike Most of the Rechargeable Battery cases, PowerBear Lasts up to 2 whole days. It doesn't heat up like most of the other ones, and I was completely fascinated by the ultra light and sleek design for the case. Before I was using the Mophie case but I couldn't wear it often because it was like having a hot brick in your pocket, Hence I had to always leave it at home.On the contrary, with PowerBear, I never take it off because I can't even tell the difference. Also it is build in a super STRONG manner and even though I dropped my phone a few times, its shock resistant technology won't let a single thing happen to the case or the phone. The PowerBear case became an extension to my phone that I never have to take off because when I charge it at night, it charges both my phone and the case, and I have battery life for more than two days. I was also shocked to hear all the positive compliments I was hearing from people at my job, fitness center, and throughout the community. Everybody was loving the case, and what they don't know is that it cost me 60 % CHEAPER then all the other brand. This is the best purchase I've made on the internet, and I am going to buy more for my family in time for the Holidays. THANK YOU POWERBEAR!!!!!!!!!!!!!! Just what I needed. I needed a phone case for myself and my two sons, but I also needed new replacement batteries. Now this isn't the case, since I got both in one. Awesome thanks A+ When there is no outlets, or chargers near by its Powerbear to the rescue! Ordered one for my husband, and myself. Great purchase!! It works great. Doesn't heat up like crazy like the other ones I got, and cheaper too! Its definetly the best power case for the S4 you can get, thats why I got one for me and my wife. I wonder why its called power bear.."
2,6073894996,"Surprisingly, this inexpensive version works just as well and just as reliably as the expensive variety. It has been working for me for months now. No problem. Excellent value. I have tested this against the griffin dual output unit.I checked the charging current.This unit was charging my galaxy note battery with 70 ma.Griffin was charging with 40 ma! And the griffin was 4 times more expensive.I have not used these for very long. I bought 15 of them, because they are so cheap and because they actually do seem to provide high current.No idea how long they last. I assume they will work fine. I have not been using them much. I did the testing , just to stock up on a high current charger. This passed and I stocked up. It worked great for the first couple of weeks then it just stopped completely.. so basically a small waste of money. I love that it has two ports for my phone and ipod. Who wants to be putting too many things in one socket. Sleek and convenient to store and I just love it. just what you need, I am always having to charge my phone and then find I have another item to charge also. does not have the need amps to charge things like ipads, or hp touchpads. but its super small and compact. They are nothing special for sure, but it's nice that you can tell when it is powered up by the led that glows in it. able to charge two phones at once in the car which is nice. I have several chargers. Have more than one vehicle so I keep more than one of these in each. Nothing more frustrating than finding all of a sudden one quits working, and you have no way to recharge your phone. That is always when you need it most. This one works well for any of my products. I bought this a little skeptical. After I tried it I bought two more. It works great and so far it has lasted for about 3 months. If that changes I will update this review. I am disappointed that the 1A didn't work with my iPad. That's what I get for buying a cheap adapter. This is a nice charger but you can tell it was made cheaply in China. When it is charging the phone, the car radio gets LOTS of static. Not so much that I have to stop charging but like when you are near power lines and the radio station is far away.So, no RF shielding.I gave it 4 stars because it works fine for me, but if you listen to the radio, you might consider it is more like 2 or 3 stars. After a week only one side works Yo get exactly what you order in a timely fashion. And the item is just as described. Great buy if you ask me This is a terribly awesome product in my Subaru's console. Passengers love it, I love it because it emits a blue glow, because it has held up to abuse over the years and performs like new. I've bough a munch of different things like this over the years. Most wouldn't stay in the jack, or would give out after a few days. This one is GREAT! Only works one side at a time. When you connect two cables, one side stop working and also overheated burning the fuses. I purchased two of them and it's the same problem. Cheap and bad quality. it works great i like i can charge two thing at the same now i dont have to wait for my to finish charging her phone It came at last, good looking and the price was good and i believe it is worth the time I waited for it to come to me good job Didn't last very long. Worked great when it worked but it is a cheap piece of plastic crap so I shouldn't have expected it to last. excellent product, works great , have easy handling, and good quality as it is announced. reached as is shown time and in very good condition thank you very much for everything Purchased two and we put one in each car. Now we charge our ipad and iohone together. Can also use to power my Galaxy Note 2 or recharge other USB devices. If you have to think about it you need counseling great charger for 2 devices and i haven't had any problems so far 3yrs now Purchased product about many a month ago. Pros: Loved it work just fine!! Cool Blue light feature when you plug it into the lighter!! Can plug in two things at one time....Con: Top clear piece came off easily overtime (I just put clear tape around the top) TOTAL: PROS:95% CONS:5% I bought this so that I could use and charge my Tab at the same time. My tab does not recognize the high power port and will either charge while off or use power while on, it will not charge while powered on. I bought this to charge my iphone and tablet in the car. works for iphone but not the tablet. I needed a stronger charge for that but this is still great for both iphoe and ipod together. it's cheap, small and compact Works great. The blue led light is a nice touch in the car. We charge a Samsung and Iphone using this outlet. Nice low profile too. works great and charges ipads, tablets, smartphones as well as bluetooth speakers and headsets. I would recommend it to anyone looking for a universal car charger. I could only give this USB car charger 2 stars because although it worked fine for about 3 months, it subsequently died on me.Pros:-Has 2 USB ports for charging, one (the top) is 2.1 amps. The bottom slot is lower, presumably 1 or 1.5 amps.-Fits well in my charging socket - holds tight. I've had some that were loose in my charging socket.-Works well to charge my iPhone in the top slot and the bottom slot works with most Android phones, except the very high end ones.-Has a blue LED light to tell you it is on and ready to charge.-The USB sockets seem to be well made and tightly fit all my cables.-Pretty solid construction.-It cost less than $2.Cons:-It died on me after 3 months.I had really enjoyed using this USB charger. It rapidly charged my iPhone and worked well with charging the phones of my family and friends. The two slot design makes this design very handy; with a 1 slot design, no one else can charge their device unless I remove my cord. Of course, after this charger died, I had to go back to the 1 slot design I was using before.After 3 months, I began to notice the LED light flickering. It was less than a week later that is quit working permanently. I'm not sure what the problem was - faulty wiring, bad design, I guess I'll never know. I thought about getting another since they are so cheap, but then I thought better of it. Maybe they are so cheap because they are simply disposable after a short period of time.For whatever reason, it died and I was forced to look for another 2 slot design - I haven't found one in my price range yet. This charger works great and is short, unlike most chargers. It does a great job with charging both my phone and my GPS when I am in the car. Great value and easy on the wallet. I bought 2 of this and tried to test first ... after few minutes of charging, it felt hot. Pulled it out and the product smelled burnt. Tried the other one too and same thing. Be careful... this one could be fire hazard or could potentially destroy the electrical system This portable USB Port car charger is a must have for people on the go. Pair this with an extra or spare cord and you are good to go! I love the way it can charge 2 devices at once! Seems to charge quicker without GPS on, with it on, it only holds the battery, but better than it dying!!! Bought so we could charge to phones at the same time while in the car from the same port. Works. I received this product before I expected. It looks pretty good and It works with my Iphone (3GS) and my phone (HTC Evo V 3D). It is a good deal because It is not easy to find something like this for this price I have several of these, from various retailers, all the same, each made in some anonymous Chinese factory, all exactly alike.They're great. They really do work, and they do deliver 2.1 and 1.0 power as they say they will do. They do the job.You wouldn't expect much QC for a $2 electrical product, but they each work just fine.They don't last forever-- about a year or two of active daily use seems about par, then they suddenly die. So do have some replacements on hand. At this price, no problem.Yes, they work with both my iPhone and iPad. Also with every other USB device I've needed to charge.The LED light that indicates you are plugged in correctly is very helpful.These are great for rental cars and travel. If you leave one behind, hey, it's only $2.I'm totally pleased with these clever little gems. No complaints. I wouldn't be without a couple, especially for travel.Hope this is helpful. Happy trails, all. good product at low price.purchased this looking for a smaller charger and I love Griffin products.Free shipping just took a little longer I use this in my car to charge my phone and my iPod as needed. It works well, but the bright blue light can be annoying when driving at night. Overall, a very good deal."


In [0]:
# Create the nlp object
nlp = spacy.load("en_core_web_sm")
print(nlp.pipe_names)

['tagger', 'parser', 'ner']


In [0]:
# Define function to cleanup text by removing personal pronouns, stopwords, and punctuations
def cleanup_text(text, stopwords, punc):
    texts = []
    doc = nlp(text, disable=['parser', 'ner']) # only do tokenization and pos tagging
    tokens = [tok.lemma_.lower().strip() for tok in doc if tok.lemma_ != '-PRON-']
    tokens = [tok for tok in tokens if tok not in stopwords and tok not in punc]
    tokens = ' '.join(tokens)
    texts.append(tokens)
    return pd.Series(texts)

if parameters['clean_reviews']:
    log.debug("reviews to be stripped away with stopwords and punctation")
    if not parameters['reload_clean_reviews'] or not is_not_empty_file_exists(parameters['cleaned_reviews_path']):
      punctuations = '!"#$%&\'()*+,-/:;<=>?@[\\]^_`{|}~©'
      prod_reviews['processedText'] = prod_reviews['reviewText'].apply(lambda x: cleanup_text(x, stopwords, punctuations))
      log.info("Writing cleaned reviews to %s" % parameters['cleaned_reviews_path'])
      write_to_file(parameters['cleaned_reviews_path'], prod_reviews)
    else:
      log.info("Reloading cleaned reviews from %s" % parameters['cleaned_reviews_path'])
      prod_reviews = load_from_file(parameters['cleaned_reviews_path'])
else:
    log.info("unprocessed reviews will be used")
    prod_reviews['processedText'] = prod_reviews['reviewText']

prod_reviews['processedText'][0]

[DEBUG] reviews to be stripped away with stopwords and punctation
[INFO] Reloading cleaned reviews from ./data/prod_reviews_cleaned.data


'sticker work like review . stick great stay phone . super stylish share sister . :) awesome phone look stylish use far year believe year great quality item arrive great time perfect condition . order button great deal include free screen protector . receive . big deal nice claim come . awesome stay look great . use multiple apple product . especially nail help elevated key . use home button easy . daughter like . purchase . worth price . come describe .. come unstuck cute people ask great drive .'

## Review Summarizer

### Tokenize Words & Calculate Word Frequency 

Only calculate word frequencies of non-stopwords



In [0]:
def _word_frequency(review) -> dict: 
  global stopwords 
  word_frequencies = {}  
  for token in nlp.make_doc(review):   #tokenize
      if token.text not in stopwords:
          if token.text not in word_frequencies.keys():
              word_frequencies[token.text] = 1
          else:
              word_frequencies[token.text] += 1
  return word_frequencies

# wf = _word_frequency(prod_reviews['reviewText'][8])
# print(wf)

### Tokenize Sentences & Score Sentences

In [0]:
def _tokenize_sentences(review) -> dict:
  doc = nlp(review)
  sentence_dict = {}
  for _, sentence in enumerate(doc.sents): 
    tokenized_sent = re.sub(r'[^\w\s]','', str(sentence))
    if tokenized_sent is not None:
      sentence_dict[tokenized_sent] = len(nlp.make_doc(sentence.text))
  return sentence_dict

# sent_dict = _tokenize_sentences(prod_reviews['reviewText'][8])
# print([(key, value) for key, value in sent_dict.items()])

def _score_sentences(sentences:dict, word_frequencies:dict) -> dict:
    """Score sentences based on word frequencies"""
    sentence_scores = {}

    for sent, word_count_in_sent in sentences.items():
        for word_freq in word_frequencies:
            if word_freq in sent.lower():
                if sent in sentence_scores: # use first 10 char as key
                    sentence_scores[sent] += word_frequencies[word_freq]
                else:
                    sentence_scores[sent] = word_frequencies[word_freq]
        
        if sent in sentence_scores: # divide sentence score by word count to reduce advantages of long sentences
          sentence_scores[sent] = sentence_scores[sent] / word_count_in_sent

    return sentence_scores

# sent_scores = _score_sentences(sent_dict, wf)
# print([(key, value) for key, value in sent_scores.items()])

### Sentence-level Summarizer


In [0]:
def generate_summary(review, num_of_sentences):
    word_frequencies = _word_frequency(review)

    sentences = _tokenize_sentences(review)

    sentence_scores = _score_sentences(sentences, word_frequencies)
    
    summary_sentences = heapq.nlargest(num_of_sentences, sentence_scores, key=sentence_scores.get)
    
    return ''.join(summary_sentences)

idx = 3
num_of_sentences = 7
if parameters['clean_reviews']:
    review = prod_reviews['processedText'][idx]
    
else:
    review = prod_reviews['reviewText'][idx]

summary = generate_summary(review, num_of_sentences)
log.info("ORIGINAL REVIEW: %s", review)
log.info("SUMMARIZED REVIEW: %s", summary)

[INFO] ORIGINAL REVIEW: love case pretty . love way case feel touch rubber . happy idea design sweet idea wear paint case sealant . cool 3d effect cost paint rub . pretty . worried order picture description page change desireable green orange cover . alas come day earlier expect beautiful pink product expect rubberized cover feel little greasy like armor all'd design gorgeous little 3d look pretty phone case love . love change .thanks case white silver . pretty case- fit phone perfectly . long arrive . defintiely worth price ... order . case 15 especially cell phone store . believe mark thing great case ... fit phone perfectly use time time . recommend received item quickly . design vivid expect . cover soft rubberize durable . receive compliment . excellent buy recommend want good look phone . cover old phone look feel new . like order cover little money snazzy phone .
[INFO] SUMMARIZED REVIEW: look pretty phone case love love case pretty pretty case fit phone perfectly cover old phon