# Aspect based sentiment analysis

In this notebook we are using [aspect_based_sentiment_analysis model](https://github.com/ScalaConsultants/Aspect-Based-Sentiment-Analysis) to generate textual aspects of our reviews.

## Notebook outline
    * Aspects based sentiment analysis
    * Inconsistency between ratings and text?

In [None]:
!pip install aspect_based_sentiment_analysis #installing module for aspect based sentiment analyisis

In [None]:
# useful imports
from google.colab import drive
import aspect_based_sentiment_analysis as absa
import pandas as pd
import seaborn as sns

In [None]:
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


## Aspects based sentiment analysis

We're going to use nlp model to analyze sentiment towards aspects in reviews users wrote.

In [None]:
# loading the nlp model for aspect based sentiment analysis
nlp = absa.load() 

In [None]:
# reading from pkl file we created in "data_import.ipynb"
data = pd.read_pickle('/content/drive/My Drive/reviews.pkl')

We're going to perform the analysis on one specific user, 'stjamesgate.163714', who has written around 2500 reviews. 

In [None]:
# specific user we're going to focus on
specific_user = 'stjamesgate.163714'

# selecting reviews that user wrote
one_user_data = data[data['User Id']==specific_user] 
print(one_user_data.shape) 

# we are going to take the sample of 50 reviews
test_sample = one_user_data.head(50) 

(2504, 17)


In [None]:
def sentiment_analysis(text_series, sentiments = {'Positive': 2, 'Neutral': 0, 'Negative': 1}, aspects = ['Appearance', 'Aroma', 'Palate', 'Taste']):
  """
  Function that does sentiment analyisis based on the text.

  Input:
    - text_series: pandas series containing textual reviews that users wrote
    - sentiments: dictionary of sentiments
    - aspects: list of aspects we're interested in

  Returns:
    - dataframe containing scores for each sentiment towards each aspect of 
      textual reviews from text_series.
  """
  
  # we're applying model to each textual review
  tasks = text_series.apply(lambda text: nlp(text, aspects=aspects)) 

  # dictionary that maps each (aspect,sentiment) pair to a list containing scores of the specifc sentiment to the specific aspects for each of the given reviews
  columns = {} 
  for aspect in aspects:
    for sentiment_str, _ in sentiments.items():
      # initially, list is empty
      columns[aspect + ' ' + sentiment_str] = [] 

  for task in tasks:
    for aspect_index in range(len(aspects)):
      scores = task.examples[aspect_index].scores
      for sentiment_str, i in sentiments.items():
        columns[aspects[aspect_index] + ' ' + sentiment_str].append(scores[i])
  
  # converting dictionary to pandas dataframe
  return pd.DataFrame(columns).set_index(text_series.index) 

In [None]:
# getting sentiment analysis from the function
columns = sentiment_analysis(test_sample['Text']) 

# concatenating scores in each row
expanded_test_sample = pd.concat([test_sample, columns], axis=1) 

In [None]:
# displaying the results
expanded_test_sample[[aspect + ' ' + sentiment for aspect in ['Appearance', 'Aroma', 'Palate', 'Taste'] for sentiment in ['Positive', 'Neutral', 'Negative']]].head()

Unnamed: 0,Appearance Positive,Appearance Neutral,Appearance Negative,Aroma Positive,Aroma Neutral,Aroma Negative,Palate Positive,Palate Neutral,Palate Negative,Taste Positive,Taste Neutral,Taste Negative
1,0.339148,0.226442,0.43441,0.434918,0.127827,0.437255,0.31074,0.30207,0.38719,0.285061,0.040979,0.673961
206,0.184759,0.04446,0.770781,0.139886,0.031676,0.828438,0.063168,0.019165,0.917667,0.04517,0.026108,0.928722
217,0.720533,0.165911,0.113556,0.856164,0.07169,0.072146,0.737318,0.182808,0.079873,0.540634,0.364637,0.094729
238,0.946061,0.020944,0.032995,0.870062,0.035975,0.093963,0.853761,0.065289,0.080949,0.89259,0.042577,0.064833
252,0.710887,0.167492,0.121621,0.861306,0.05798,0.080715,0.729664,0.162302,0.108034,0.729717,0.177313,0.09297


## Inconsistency between ratings and text?

We're now going to compare our scores with the grades users gave. 

Specifically, we'll be looking at texts with low scores and high ratings for aroma aspect: 

In [None]:
expanded_test_sample[(expanded_test_sample['Aroma']>=4) & (expanded_test_sample['Aroma Positive']<0.22)][['Text','Aroma']]

Unnamed: 0,Text,Aroma
293,Hazy dull blonde with a film of steady white f...,4.0
780,Pale amber with rimming off white froth that r...,4.0


In [None]:
expanded_test_sample.loc[293]['Text']

'Hazy dull blonde with a film of steady white froth. 3.75Peach, grass, + wheat. 4Cream of Wheat, fresh apricot, then hay, tangerine + papaya. 4Super soft, almost medium, round. 4.25Holy Vermont! Pillowy body, peachy esters - all there. Too aroma, if anything - no edge. (Where’s the Summit?) At 5.5%, I’d stay all night. 4'

On this example, we can see the case where nlp model gave low 'Aroma Positive' score, while the user rated the Aroma with 4 on [1,5] scale.

However, based on the textual review, we can observe that user didn't really prefer the aroma.

This indicates that this model could also be used to exploit inconsistencies between ratings users gave and textual reviews.