In [1]:
from google.colab import drive
drive.mount('/gdrive')

Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).


In [2]:
import os
os.chdir('/gdrive/My Drive/Sentihood')

In [3]:
import pandas as pd
from collections import defaultdict
from tqdm import tqdm

# Approach

The Sentihood dataset has for each text, some targets (namely LOCATION1 and LOCATION2) and for each target has an aspect and its corresponding sentiment. Due to the association of a target with multiple aspects, it would not be beneficial to build a model which predicts a single aspect and sentiment for a given text and target pair.

Hence, I designed a model which will perform multi-label classification for the aspect and for each aspect predict 3 classes of sentiment(None being the extra class which represents that the aspect is not appropiate for the target).

The model will take the text and target as input and predict probabilities for all possible aspects (i.e. 12 values) and probabilities for all possible sentiments for all possible aspects (i.e. 12*3=36 values). The experiments have been done using _bert-base-uncased_ and _roberta_ as the main component of the models and the hidden states of the first token are used to output the logits. 


# How to use the missing data?

The Sentihood dataset also has some missing values. Inorder to use the unlabelled data as well, I took the help of a semi-supervised learning concept, Pseudo-Labelling. The missing data of the train and dev sets are used for pseudo-labelling. The test set is avoided so as to prevent any leakage.

# Results Analysis

In [4]:
def analyze(file_name):
  df = pd.read_csv(file_name)

  aspect, sentiment = {}, {}
  c_aspect, c_sentiment = {}, {}


  for i, r in df.iterrows():
    a_l = str(r['aspect']).split(' ')
    s_l = str(r['sentiment']).split(' ')
    a_p = str(r['pred_aspect']).split(' ')
    s_p = str(r['pred_sentiment']).split(' ')

    for i,j in zip(a_l, a_p):
      if i not in aspect.keys():
        aspect[i] = 0
        c_aspect[i] = 0
      aspect[i]+=1
      if i==j:
        c_aspect[i]+=1

    for k, i, j in zip(a_l, s_l, s_p):
      if i not in sentiment.keys():
        sentiment[i] = 0
        c_sentiment[i] = 0
      sentiment[i]+=1
      if i==j:
        c_sentiment[i]+=1
  
  print('Aspect/Sentiment wise accuracy')
  print('----------------------------Aspect----------------------------')
  for k,v in aspect.items():
    print(f'{k} --> {c_aspect[k]/v}')
  print()
  print('----------------------------Sentiment----------------------------')
  for k,v in sentiment.items():
    print(f'{k} --> {c_sentiment[k]/v}')
  print()
  print('----------------------------Aspect wise Sentiment----------------------------')
  for k, v in aspect.items():
    tpos, tneg, pos, neg = 0, 0, 0, 0
    for _, r in df.iterrows():
      a_l = str(r['aspect']).split(' ')
      s_l = str(r['sentiment']).split(' ')
      s_p = str(r['pred_sentiment']).split(' ')

      for i, a in enumerate(a_l):
        if a==k:
          if s_l[i]=='Positive':
            tpos+=1
            if i<len(s_p) and s_l[i]==s_p[i]:
              pos+=1
          if s_l[i]=='Negative':
            tneg+=1
            if i<len(s_p) and s_l[i]==s_p[i]:
              neg+=1
    ppos = pos/tpos if tpos!=0 and pos!=0 else f'{pos}/{tpos}'
    pneg = neg/tneg if tneg!=0 and neg!=0 else f'{neg}/{tneg}'
    
    print(f'Aspect : {k} | Postive: {ppos} | Negative: {pneg}') 

## BERT BASE MODEL RESULTS

In [5]:
analyze(os.path.join('run_bert_multi_20', 'sub.csv'))

Aspect/Sentiment wise accuracy
----------------------------Aspect----------------------------
safety --> 0.7785234899328859
general --> 0.8460176991150442
price --> 0.6609442060085837
live --> 0.6145833333333334
transit-location --> 0.6502463054187192
quiet --> 0.4583333333333333
shopping --> 0.6973684210526315
dining --> 0.6
nightlife --> 0.676056338028169
multicultural --> 0.7708333333333334
green-nature --> 0.6136363636363636
touristy --> 0.6666666666666666

----------------------------Sentiment----------------------------
Positive --> 0.9306759098786829
Negative --> 0.709832134292566

----------------------------Aspect wise Sentiment----------------------------
Aspect : safety | Postive: 0.92 | Negative: 0.7469879518072289
Aspect : general | Postive: 0.8819599109131403 | Negative: 0.8129496402877698
Aspect : price | Postive: 0.7685185185185185 | Negative: 0.6153846153846154
Aspect : live | Postive: 0.8333333333333334 | Negative: 0.5925925925925926
Aspect : transit-location | Postiv

## BERT MODEL + PSEUDO-LABELLING RESULTS

In [6]:
analyze(os.path.join('run_bert_multi_pseudo', 'sub.csv'))

Aspect/Sentiment wise accuracy
----------------------------Aspect----------------------------
safety --> 0.7635135135135135
general --> 0.8253119429590018
price --> 0.6331877729257642
live --> 0.65625
transit-location --> 0.6548223350253807
quiet --> 0.4782608695652174
shopping --> 0.76
dining --> 0.6363636363636364
nightlife --> 0.7101449275362319
multicultural --> 0.723404255319149
green-nature --> 0.627906976744186
touristy --> 0.72

----------------------------Sentiment----------------------------
Positive --> 0.9241622574955908
Negative --> 0.6771844660194175

----------------------------Aspect wise Sentiment----------------------------
Aspect : safety | Postive: 0.88 | Negative: 0.7469879518072289
Aspect : general | Postive: 0.8663697104677061 | Negative: 0.7913669064748201
Aspect : price | Postive: 0.7777777777777778 | Negative: 0.5874125874125874
Aspect : live | Postive: 0.8333333333333334 | Negative: 0.2962962962962963
Aspect : transit-location | Postive: 0.8611111111111112 | 

BERT based models are quite accurate in predicting aspects like "general", "safety" and "multicultural". They perform a bit poorly on aspects like "quiet", "green-nature" and "price". Although pseudo-labelling is not improving the results, its results are also similar. Analysing the aspect wise sentiment prediction, the models perform well on the Positive class for almost all aspects. However, the performance of Negative sentiment is poor due to the class imbalance.

## ROBERTA BASE MODEL RESULTS

In [7]:
analyze(os.path.join('run_roberta_multi_20', 'sub.csv'))

Aspect/Sentiment wise accuracy
----------------------------Aspect----------------------------
safety --> 0.7872340425531915
general --> 0.8405017921146953
price --> 0.6454545454545455
live --> 0.5789473684210527
transit-location --> 0.6403940886699507
quiet --> 0.5
shopping --> 0.7397260273972602
dining --> 0.6060606060606061
nightlife --> 0.6617647058823529
multicultural --> 0.7755102040816326
green-nature --> 0.6585365853658537
touristy --> 0.6086956521739131

----------------------------Sentiment----------------------------
Positive --> 0.9225289403383793
Negative --> 0.683046683046683

----------------------------Aspect wise Sentiment----------------------------
Aspect : safety | Postive: 0.8533333333333334 | Negative: 0.7228915662650602
Aspect : general | Postive: 0.8752783964365256 | Negative: 0.7410071942446043
Aspect : price | Postive: 0.6944444444444444 | Negative: 0.5804195804195804
Aspect : live | Postive: 0.8205128205128205 | Negative: 0.48148148148148145
Aspect : transit-l

## ROBERTA MODEL + PSEUDO-LABELLING RESULTS

In [8]:
analyze(os.path.join('run_roberta_multi_pseudo_20', 'sub.csv'))

Aspect/Sentiment wise accuracy
----------------------------Aspect----------------------------
safety --> 0.7777777777777778
general --> 0.8375451263537906
price --> 0.6858407079646017
live --> 0.5894736842105263
transit-location --> 0.64
shopping --> 0.7432432432432432
dining --> 0.6363636363636364
quiet --> 0.45454545454545453
nightlife --> 0.7611940298507462
multicultural --> 0.7755102040816326
green-nature --> 0.6410256410256411
touristy --> 0.68

----------------------------Sentiment----------------------------
Positive --> 0.9126559714795008
Negative --> 0.7241379310344828

----------------------------Aspect wise Sentiment----------------------------
Aspect : safety | Postive: 0.7866666666666666 | Negative: 0.7590361445783133
Aspect : general | Postive: 0.8596881959910914 | Negative: 0.7913669064748201
Aspect : price | Postive: 0.75 | Negative: 0.6573426573426573
Aspect : live | Postive: 0.8076923076923077 | Negative: 0.37037037037037035
Aspect : transit-location | Postive: 0.8555

Roberta based models perform a little better than BERT as per metrics like F1 score and AUC-ROC score, specifically on Sentiment. They perform well on aspects like "general", "safety" and multiculture and a bit poorly on "quiet", "live" and "dining". When analysing the aspect wise sentiment predictions, it is observed that Roberta models perform a little better on the Negative class.

## Grammar Evaluation

Despite the availability of many machine learning libraries, my favourite library is __pytorch__, since it gives complete control of the ML pipeline. Moreover, the discussion forum of pytorch is also very active and helpful. There is no doubt why the number of Github repositories with pytorch code is on the rise. The only point that I dislike is that one has to put much effort while working with pytorch. One will have full control of the code, but "with great power comes great responsibility." Hence, one must be very careful while working with pytorch, paying attention to all the minute details.