# Aspect Based Sentiment Analysis <a class="anchor" id="chapter1"></a>

### Table of Contents

* [Aspect Based Sentiment Analysis](#chapter1)
    * [Importing Data & Libraries](#section_1_1)
    * [Feature Extraction](#section_1_2)
    * [Sentiment Analysis](#section_1_3)
    * [Visualization](#section_1_4)

## Importing Data & Libraries <a class="anchor" id="section_1_1"></a>

In [1]:
# Ignoring warnings
import warnings
warnings.filterwarnings('ignore')

In [2]:
import numpy as np
import pandas as pd
import gzip
import json
import spacy
from tqdm import tqdm
import re
import operator

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [3]:
# To see whole sentences
pd.set_option("display.max_colwidth", -1)

In [26]:
def parse(path):
  g = gzip.open(path, 'rb')
  for l in g:
    yield json.loads(l)

def getDF(path):
  i = 0
  df = {}
  for d in parse(path):
    df[i] = d
    i += 1
  return pd.DataFrame.from_dict(df, orient='index')

df_review = getDF('C:/Users/Esra/Desktop/Aspect Based Sentiment Analysis - Amazon Review Dataset/Dataset/Arts_Crafts_and_Sewing.json.gz')
df_metadata = getDF('C:/Users/Esra/Desktop/Aspect Based Sentiment Analysis - Amazon Review Dataset/Dataset/meta_Arts_Crafts_and_Sewing.json.gz')

In [30]:
df_review = df_review.loc[:, ['asin', 'reviewText','summary','overall']]

In [7]:
df_metadata= df_metadata.loc[:,['title','brand','asin']]
df_metadata.head(5)

Unnamed: 0,title,brand,asin
0,You Son of a Bitch! 1987 Embroidered Patch,Honchosfx,6665560953
1,Origami Stars Papers Package 1H (5 packs),,7000000376
2,Yi De Ge Chinese Calligraphy Sumi Drawing Black Liquid Ink (black),MasterChinese,7000001089
3,"10 pcs/Lot GITD Skeleton Skull ,Knife/ Flashlight/ Paracord Bracelet Accessories (N02)",Skull Beads N02.,7107269291
4,"Pinkie Tm girl flower Handmade soap silicone mold ,silica gel mould,silicon candle moulds,gift favor",,7121277158


In [None]:
# Here we can see, if datasets join with based on right table, unfortunately it doesn't include reviewText
# Due to this station metadata couldn't be used
df_joined_dataset = pd.merge(left=df_sample_review, right=df_metadata, left_on='asin', right_on='asin',  how='right')
df_joined_dataset[df_joined_dataset['reviewText'].notnull()]

## Feature Extraction <a class="anchor" id="section_1_2"></a>


In [32]:
sampling_indexes = np.random.randint(low = 0, high = 2875917, size = 50000)
df_sample_review = df_review.iloc[sampling_indexes, :]

In [34]:
df_sample_review.head(5)

Unnamed: 0,asin,reviewText,summary,overall
1783874,B01D3VCTKG,I love the chalk markers! They are so bright and vibrant and work great on the boards I used them on for my wedding. As soon as mine run out I am definitely getting more!!!,Best Chalkboard Markers,5.0
1140390,B00CD33ML6,"I use these for mirrors and windows for anything I don't want to wipe off right away. These do require a wet rag to remove and are a little messier than ""Liquid Chalkers"". However, they are perfect if you don't want your work to wipe off if somebody brushes past it.\n\nI use these to do things like outline a calendar on a mirror (and use ""liquid chalkers"" to put items into the calendar so I can easily wipe it off)\n\nThese are also the ones I let my 2 year old use since she gets upset if she messes up her work as she's drawing. These are great for kids....I have big mirrors and sliding glass doors - as I brainstorm on the top half, she draws on the bottom! Very fun!\n\nnote - I haven't tried them on any surface but mirrors and windows. You do need some patience to get them going the first time...be aware of that and don't pull them out in front of your inpatient toddler until you get them all going!",Love these for mirrors and windows,5.0
1004332,B008H3NJ4G,"Quality mica powder; I use it for glitter tattoos. The shine is vibrant, and it is the perfect pearl color. The only reason I took off a star was for the price, but if you are looking for a good mica this is a great one. I would recommend a cheaper glitter if you are only using it for glitter tattoos though.",Quality but a bit expensive,4.0
581400,B0024KMQ6K,great paper--good quality with wonderful feel. I'm a professional artist.,great product,5.0
2671664,B00ZSS5N38,"Worked amazingly well with a Heidi Swapp toner pen, deco foil and laminator. The remaining negative space was still usable due to the border dies included in the set.",Highly versatile,5.0


In [35]:
nlp = spacy.load("en_core_web_sm")

In [None]:
aspect_terms = []
comp_terms = []
for x in tqdm(sampling_indexes):
    amod_pairs = []
    advmod_pairs = []
    compound_pairs = []
    xcomp_pairs = []
    neg_pairs = []
    if len(str(df_sample_review['reviewText'][x])) != 0:
        lines = str(df_sample_review['reviewText'][x]).split('.')       
        for line in lines:
            doc = nlp(line)
            str1=''
            str2=''
            for token in doc:
                if token.pos_ == 'NOUN':
                    for j in token.lefts:
                        if j.dep_ == 'compound':
                            compound_pairs.append((j.text+' '+token.text,token.text))
                        if j.dep_ == 'amod' and j.pos_ == 'ADJ': #primary condition
                            str1 = j.text+' '+token.text
                            amod_pairs.append(j.text+' '+token.text)
                            for k in j.lefts:
                                if k.dep_ == 'advmod': #secondary condition to get adjective of adjectives
                                    str2 = k.text+' '+j.text+' '+token.text
                                    amod_pairs.append(k.text+' '+j.text+' '+token.text)
                            mtch = re.search(re.escape(str1),re.escape(str2))
                            if mtch is not None:
                                amod_pairs.remove(str1)
                if token.pos_ == 'VERB':
                    for j in token.lefts:
                        if j.dep_ == 'advmod' and j.pos_ == 'ADV':
                            advmod_pairs.append(j.text+' '+token.text)
                        if j.dep_ == 'neg' and j.pos_ == 'ADV':
                            neg_pairs.append(j.text+' '+token.text)
                    for j in token.rights:
                        if j.dep_ == 'advmod'and j.pos_ == 'ADV':
                            advmod_pairs.append(token.text+' '+j.text)
                if token.pos_ == 'ADJ':
                    for j,h in zip(token.rights,token.lefts):
                        if j.dep_ == 'xcomp' and h.dep_ != 'neg':
                            for k in j.lefts:
                                if k.dep_ == 'aux':
                                    xcomp_pairs.append(token.text+' '+k.text+' '+j.text)
                        elif j.dep_ == 'xcomp' and h.dep_ == 'neg':
                            if k.dep_ == 'aux':
                                    neg_pairs.append(h.text +' '+token.text+' '+k.text+' '+j.text)
        pairs = list(set(amod_pairs+advmod_pairs+neg_pairs+xcomp_pairs))
        for i in range(len(pairs)):
            if len(compound_pairs)!=0:
                for comp in compound_pairs:
                    mtch = re.search(re.escape(comp[1]),re.escape(pairs[i]))
                    if mtch is not None:
                        pairs[i] = pairs[i].replace(mtch.group(),comp[0])            
    aspect_terms.append(pairs)
    comp_terms.append(compound_pairs)
   
df_sample_review['compound_nouns'] = comp_terms
df_sample_review['aspect_keywords'] = aspect_terms
df_sample_review.head()

 10%|█         | 5094/50000 [04:21<28:55, 25.88it/s]  

In [11]:
df_sample_review.iloc[4, [2,4,4]]

reviewText        A gazillion pattern stitches, lucidly explained. Illustrations for each one on the same page as its instructions. A very thorough guide to basic techniques in an appendix. You won't get full patterns here, but the stitches are easily adapted to straightforward knitting projects on both straight and circular needles. A very useful book.
compound_nouns    [(gazillion pattern, pattern), (pattern stitches, stitches)]                                                                                                                                                                                                                                                                                     
compound_nouns    [(gazillion pattern, pattern), (pattern stitches, stitches)]                                                                                                                                                                                                                  

## Sentiment Analysis<a class="anchor" id="section_1_3"></a>

In [12]:
analyser = SentimentIntensityAnalyzer()

sentiment = []
for i in range(len(df_sample_review)):
    score_dict={'pos':0,'neg':0,'neu':0}
    if len(df_sample_review['aspect_keywords'][i])!=0: 
        for aspects in df_sample_review['aspect_keywords'][i]:
            sent = analyser.polarity_scores(aspects)
            score_dict['neg'] += sent['neg']
            score_dict['pos'] += sent['pos']
            #score_dict['neu'] += sent['neu']
        sentiment.append((aspects,max(score_dict.items(), key=operator.itemgetter(1))[0]))
    else:
        sentiment.append('NaN')
df_sample_review['sentiment'] = sentiment
df_sample_review.head()

Unnamed: 0,overall,asin,reviewText,summary,compound_nouns,aspect_keywords,sentiment
0,5.0,449819906,"I've read this book already and I've got plans for using it in future projects. I'm DELIGHTED with the patterns in it and the advice and suggestions are just as good as you would expect from Melissa Leapman. I'm so glad that I bought this. As a lifelong and addicted knitter, this has been a valuable addition to my already good sized book collection. Thanks Melissa for this very special knitting treat.",A WONDERFUL BOOK,"[(book collection, collection), (knitting treat, treat)]","[already good book collection, very special knitting treat, valuable addition, read already, lifelong knitter, sized book collection, special knitting treat, future projects, good book collection]","(good book collection, pos)"
1,5.0,449819906,Nicely written directions.,Nice,[],[Nicely written],"(Nicely written, pos)"
2,5.0,449819906,love it,Five Stars,[],[],
3,5.0,449819906,"Good additional knitting reference to have available in an electronic format. I like that it is easy to jump from one bit of information to another, since I was not sure if all the links to topics would be available in a Kindle format.",Good Reference in Kindle Edition,"[(knitting reference, reference), (Kindle format, format)]","[Good knitting reference, additional knitting reference, electronic Kindle format]","(electronic Kindle format, pos)"
4,5.0,449819906,"A gazillion pattern stitches, lucidly explained. Illustrations for each one on the same page as its instructions. A very thorough guide to basic techniques in an appendix. You won't get full patterns here, but the stitches are easily adapted to straightforward knitting projects on both straight and circular needles. A very useful book.","Extremely clear, thorough","[(gazillion pattern, pattern), (pattern stitches, stitches)]","[thorough guide, useful book, same page, full gazillion patterns, straightforward projects, very useful book, very thorough guide, lucidly explained, basic techniques, straight needles, easily adapted]","(easily adapted, pos)"


In [14]:
result = df_sample_review.groupby(["asin","sentiment"]).count().sort_values(by = "aspect_keywords",ascending="False").reset_index()

In [15]:
result

Unnamed: 0,asin,sentiment,overall,reviewText,summary,compound_nouns,aspect_keywords,title,brand
0,0449819906,"(Awesome book, pos)",1,1,1,1,1,0,0
1,0471749915,"(Very good book, pos)",1,1,1,1,1,0,0
2,0471749915,"(Good book, pos)",1,1,1,1,1,0,0
3,0471749915,"(Excellent book, pos)",1,1,1,1,1,0,0
4,0449819906,"(when traveling, pos)",1,1,1,1,1,0,0
...,...,...,...,...,...,...,...,...,...
84,0449819906,"(just love, pos)",1,1,1,1,1,0,0
85,0449819906,"(really love, pos)",2,2,2,2,2,0,0
86,0449819906,"(Great book, pos)",2,2,2,2,2,0,0
87,0471749915,,2,2,2,2,2,0,0


## Visualisation<a class="anchor" id="section_1_4"></a>

In [16]:
import plotly.express as px
import plotly.graph_objects as go
from jupyter_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output

# code and plot setup
# settings
pd.options.plotting.backend = "plotly"

In [18]:
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import plotly.express as px

app = JupyterDash(__name__)

app.layout = html.Div([
    html.P("Review Analysis"),
    dcc.Dropdown(
        id='asin', 
        options=[{'value': x, 'label': x} 
                 for x in [i for i in result.asin.unique()]],
        value = "0449819906",
        placeholder="Select a ASIN",
        clearable=False
    ),
    dcc.Graph(id="pie-chart"),
])

@app.callback(
    Output("pie-chart", "figure"), 
    [dash.dependencies.Input('asin', 'value')])

def generate_chart(value):
    df = result[result['asin'] == str(value)]
    fig = px.pie(df,  names='sentiment', values= 'aspect_keywords')
    return fig

app.run_server(mode='jupyterlab', port = 8191, dev_tools_ui=True, dev_tools_hot_reload =True, threaded=True)