In [1]:
from google.colab import drive #mounting drive to be used in google colab notebook
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [2]:
!pip install aspect_based_sentiment_analysis #installing module for aspect based sentiment analyisis

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


#### Aspects based sentiment analysis

We're going to use nlp model to analyze sentiment towards aspects in reviews users wrote.

In [3]:
#imports
import aspect_based_sentiment_analysis as absa
import pandas as pd
import seaborn as sns

In [4]:
nlp = absa.load() #loading the model

Downloading:   0%|          | 0.00/1.08k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/438M [00:00<?, ?B/s]

Some layers from the model checkpoint at absa/classifier-rest-0.2 were not used when initializing BertABSClassifier: ['dropout_379']
- This IS expected if you are initializing BertABSClassifier from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertABSClassifier from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of BertABSClassifier were not initialized from the model checkpoint at absa/classifier-rest-0.2 and are newly initialized: ['dropout_37']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [5]:
!ls "/content/drive/My Drive/reviews.pkl"
data = pd.read_pickle('/content/drive/My Drive/reviews.pkl') #reading from pkl file we created in "data_import.ipynb"
data.head()

'/content/drive/My Drive/reviews.pkl'


Unnamed: 0,Beer Name,Beer Id,Brewery Name,Brewery Id,Style,Abv,Date,Username,User Id,Appearance,Aroma,Palate,Taste,Overall,Rating,Text,Review
0,Régab,142544,Societe des Brasseries du Gabon (SOBRAGA),37262,Euro Pale Lager,4.5,2015-08-20 10:00:00,nmann08,nmann08.184925,3.25,2.75,3.25,2.75,3.0,2.88,"From a bottle, pours a piss yellow color with ...",
1,Barelegs Brew,19590,Strangford Lough Brewing Company Ltd,10093,English Pale Ale,4.5,2009-02-20 11:00:00,StJamesGate,stjamesgate.163714,3.0,3.5,3.5,4.0,3.5,3.67,Pours pale copper with a thin head that quickl...,
2,Barelegs Brew,19590,Strangford Lough Brewing Company Ltd,10093,English Pale Ale,4.5,2006-03-13 11:00:00,mdagnew,mdagnew.19527,4.0,3.5,3.5,4.0,3.5,3.73,"500ml Bottle bought from The Vintage, Antrim.....",
3,Barelegs Brew,19590,Strangford Lough Brewing Company Ltd,10093,English Pale Ale,4.5,2004-12-01 11:00:00,helloloser12345,helloloser12345.10867,4.0,3.5,4.0,4.0,4.5,3.98,Serving: 500ml brown bottlePour: Good head wit...,
4,Barelegs Brew,19590,Strangford Lough Brewing Company Ltd,10093,English Pale Ale,4.5,2004-08-30 10:00:00,cypressbob,cypressbob.3708,4.0,4.0,4.0,4.0,4.0,4.0,"500ml bottlePours with a light, slightly hazy ...",


We're going to perform the analysis on one specific user, 'stjamesgate.163714', who has written around 2500 reviews. 

In [6]:
specific_user = 'stjamesgate.163714' #specific user we're going to focus on
one_user_data = data[data['User Id']==specific_user] #selecting reviews that user wrote
print(one_user_data.shape) 
test_sample = one_user_data.head(50) #we're going to take the sample of 50 reviews

(2504, 17)


In [7]:
def sentiment_analysis(text_series, sentiments = {'Positive': 2, 'Neutral': 0, 'Negative': 1}, aspects = ['Appearance', 'Aroma', 'Palate', 'Taste']):
  """
  Function that does sentiment analyisis based on the text.

  Input:
    - text_series: pandas series containing textual reviews that users wrote
    - sentiments: dictionary of sentiments
    - aspects: list of aspects we're interested in

  Returns:
    - dataframe containing scores for each sentiment towards each aspect of 
      textual reviews from text_series.
  """

  tasks = text_series.apply(lambda text: nlp(text, aspects=aspects)) #we're applying model to each textual review

  columns = {} #dictionary that maps each (aspect,sentiment) pair to a list containing scores of the specifc sentiment to the specific aspects for each of the given reviews
  for aspect in aspects:
    for sentiment_str, _ in sentiments.items():
      columns[aspect + ' ' + sentiment_str] = [] #initially, list is empty

  for task in tasks:
    for aspect_index in range(len(aspects)):
      scores = task.examples[aspect_index].scores
      for sentiment_str, i in sentiments.items():
        columns[aspects[aspect_index] + ' ' + sentiment_str].append(scores[i])
  
  return pd.DataFrame(columns).set_index(text_series.index) #converting dictionary to pandas dataframe

In [8]:
columns = sentiment_analysis(test_sample['Text']) #getting sentiment analysis from the function
expanded_test_sample = pd.concat([test_sample, columns], axis=1) #concatenating scores in each row
expanded_test_sample.head() 

Unnamed: 0,Beer Name,Beer Id,Brewery Name,Brewery Id,Style,Abv,Date,Username,User Id,Appearance,...,Appearance Negative,Aroma Positive,Aroma Neutral,Aroma Negative,Palate Positive,Palate Neutral,Palate Negative,Taste Positive,Taste Neutral,Taste Negative
1,Barelegs Brew,19590,Strangford Lough Brewing Company Ltd,10093,English Pale Ale,4.5,2009-02-20 11:00:00,StJamesGate,stjamesgate.163714,3.0,...,0.43441,0.434918,0.127827,0.437255,0.31074,0.30207,0.38719,0.285061,0.040979,0.673961
206,300,98728,Whitewater Brewing Co,3415,English Pale Ale,3.5,2013-08-30 10:00:00,StJamesGate,stjamesgate.163714,3.5,...,0.770781,0.139886,0.031676,0.828438,0.063168,0.019165,0.917667,0.04517,0.026108,0.928722
217,Belfast Ale,16371,Whitewater Brewing Co,3415,English Pale Ale,4.5,2011-01-24 11:00:00,StJamesGate,stjamesgate.163714,4.0,...,0.113556,0.856164,0.07169,0.072146,0.737318,0.182808,0.079873,0.540634,0.364637,0.094729
238,Belfast Lager,38838,Whitewater Brewing Co,3415,Munich Helles Lager,4.5,2011-10-29 10:00:00,StJamesGate,stjamesgate.163714,4.5,...,0.032995,0.870062,0.035975,0.093963,0.853761,0.065289,0.080949,0.89259,0.042577,0.064833
252,Clotworthy Dobbin,33820,Whitewater Brewing Co,3415,Irish Red Ale,5.0,2012-09-26 10:00:00,StJamesGate,stjamesgate.163714,3.5,...,0.121621,0.861306,0.05798,0.080715,0.729664,0.162302,0.108034,0.729717,0.177313,0.09297


We're now going to compare our scores with the grades users gave. 

Specifically, we'll be looking at texts with low scores and high ratings for aroma aspect: 

In [9]:
expanded_test_sample[(expanded_test_sample['Aroma']>=4) & (expanded_test_sample['Aroma Positive']<0.22)]

Unnamed: 0,Beer Name,Beer Id,Brewery Name,Brewery Id,Style,Abv,Date,Username,User Id,Appearance,...,Appearance Negative,Aroma Positive,Aroma Neutral,Aroma Negative,Palate Positive,Palate Neutral,Palate Negative,Taste Positive,Taste Neutral,Taste Negative
293,"Push & Pull Chinook, Summit, Cascade",255185,Boundary Brewing Cooperative,40307,American Pale Ale (APA),5.5,2016-11-03 11:00:00,StJamesGate,stjamesgate.163714,3.75,...,0.512566,0.027222,0.002519,0.970259,0.232234,0.384935,0.382832,0.168574,0.464608,0.366818
780,Urban IPA,86045,Tiny Rebel Brewing Co.,29967,American IPA,5.5,2013-07-03 10:00:00,StJamesGate,stjamesgate.163714,3.75,...,0.493053,0.160749,0.538922,0.300329,0.049148,0.534601,0.416251,0.034016,0.452187,0.513796


In [12]:
expanded_test_sample.loc[780]['Text']

"Pale amber with rimming off white froth that rings and speckles.Caramel but also deep tropical fruit and orange hard candy hops on the nose.Danish and orange marmalade then resiny, grassy, Seville orange hops. Lime finish and nettle linger.Medium, chewy, round.I heard this was like SNPA, only better, but it drinks way too heavy for an APA. Then I thought it might line up against Punk (same strength) but it's still not as bright or refreshing. Aroma hops grow as it goes, but still not getting 60 IBUs or much in the way of noble hops. An odd duck.2nd tasting: Honey nose with marmalade underneath, crystal cuts very lean, touch of wood spice + oily orange hops.Hop character and strength means this is probably actually meant to line up in the Bengal Lancer/White Shield mould of &quot;traditional&quot; English IPAs (though how is it then &quot;intercontinental?&quot;). But it's too crisp, chalky and ultimately harsh for that. In the end, falls between two stools."

On this example, we can see the case where nlp model gave low 'Aroma Positive' score, while the user rated the Aroma with 4 on [1,5] scale.

However, based on the textual review, we can observe that user didn't really prefer the aroma.

This indicates that this model could also be used to exploit inconsistencies between ratings users gave and textual reviews.