## Baseline - Frequent Aspect Extraction
Baseline is a adaptation for the baseline used in he opinion target extraction (OTE) for 2014, 2015 and 2016 versions of SemEval Aspect-Based Sentiment Analysis Task.

Source:

Pontiki, M., Galanis, D., Pavlopoulos, J., Papageorgiou, H., Androutsopoulos, I., & Manandhar, S. (2014, August). Semeval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014) (pp. 27-35).

Pontiki, M., Galanis, D., Papageorgiou, H., Manandhar, S., & Androutsopoulos, I. (2015, June). Semeval-2015 task 12: Aspect based sentiment analysis. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Association for Computational Linguistics, Denver, Colorado (pp. 486-495).

In [1]:
from __future__ import unicode_literals
from __future__ import division
from __future__ import print_function

In [2]:
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
trainset = etree.parse('../corpus/SemEvalABSA2015EnglishRestaurants_train.xml', parser)
testset = etree.parse('../corpus/SemEvalABSA2015EnglishRestaurants_test.xml', parser)

## 1. All aspects in trainset

In [3]:
from collections import Counter
targets = Counter([opinion_node.get('target').lower()
                    for opinion_node in trainset.iter('Opinion')
                    if opinion_node.get('target') != 'NULL'
                   ])

In [4]:
import ipy_table

data = [['freq', '%freq', 'target']]
for target, freq in targets.most_common(20):
    ratio = freq / sum(targets.values()) *100
    data.append([freq, '{:.1f}%'.format(ratio), target])

ipy_table.make_table(data)
ipy_table.apply_theme('basic')

0,1,2
freq,%freq,target
158,12.4%,food
117,9.1%,service
82,6.4%,place
29,2.3%,restaurant
27,2.1%,staff
26,2.0%,pizza
21,1.6%,atmosphere
20,1.6%,sushi
16,1.3%,decor


In [5]:
# Build a regex to match the targets in the text
import re
targets_list = sorted(list(targets))
targets_list.sort(key=len, reverse=True)
targets_pattern = r'\b(' + '|'.join([re.escape(t) for t in targets_list]) + r')\b'
len(targets_list)

493

In [6]:
test_gold = list()
prediction = list()

for sentence_node in testset.iter('sentence'):    
    sentence_opinions = []
    for opinion_node in sentence_node.iter('Opinion'):
        target = opinion_node.get('target')
        start = int(opinion_node.get('from'))
        end = int(opinion_node.get('to'))
        # evaluation explicit says to discart NULL values
        if target != 'NULL':
            sentence_opinions.append((target, start, end))
    test_gold.append(sentence_opinions)
    
    text = sentence_node.xpath('./text/text()')[0]
    text_opinions = []
    
    for m in re.finditer(targets_pattern, text, flags=re.I):
        text_opinions.append( (m.group(), m.start(), m.end()) )
    prediction.append(text_opinions)

In [7]:
data = [['Gold Standard', 'Predicted', 'Sentence']]
for index, (gold, pred) in enumerate(list(zip(test_gold, prediction))[:100]):
    sentence = list(testset.iter('sentence'))[index].xpath('./text/text()')[0]
    data.append([gold, pred, sentence])

ipy_table.make_table(data)
ipy_table.set_global_style(wrap=True)
ipy_table.apply_theme('basic')

0,1,2
Gold Standard,Predicted,Sentence
"[('Al Di La', 5, 13)]",[],Love Al Di La
"[('place', 17, 22)]","[('place', 17, 22)]",I recommend this place to everyone.
"[('food', 6, 10)]","[('food', 6, 10)]",Great food.
[],[],One of my favorite places in Brooklyn.
"[('pastas', 4, 10), ('risottos', 31, 39), ('sepia', 58, 63), ('braised rabbit', 87, 101)]","[('pastas', 4, 10)]","The pastas are incredible, the risottos (particularly the sepia) are fantastic and the braised rabbit is amazing."
[],[],Overpriced and not tasty
"[('food', 4, 8)]","[('food', 4, 8)]",The food here was mediocre at best.
"[('fish and chips', 27, 41)]","[('fish and chips', 27, 41)]",It was totally overpriced- fish and chips was about $15....
[],"[('drink', 54, 59), ('place', 70, 75)]",There are so many other great places to go to eat and drink..... this place is not worth it...


### Aspect-extraction Evaluation methodology

Pontiki, M., Galanis, D., Papageorgiou, H., Manandhar, S., & Androutsopoulos, I. (2015, June). Semeval-2015 task 12: Aspect based sentiment analysis. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Association for Computational Linguistics, Denver, Colorado (pp. 486-495).

http://www.anthology.aclweb.org/S/S15/S15-2082.pdf

From 4.1 Evaluation Measures, page 491:

Slot 2: F-1 scores are calculated by comparing
the targets that a system returned (for all the sentences)
to the corresponding gold targets (using
micro-averaging). The targets are extracted using
their starting and ending offsets. The calculation
for each sentence considers only distinct targets
and discards NULL targets, since they do not correspond
to explicit mentions

In [8]:
# Micro-averaged Precision
correct = 0
total = 0
for index in range(len(list(testset.iter('sentence')))):
    correct += len([x for x in test_gold[index] if x in prediction[index]])
    total += len(prediction[index])

precision = 100 * correct / total
print('Precision: {:.2f}%'.format(precision))

Precision: 50.81%


In [9]:
# Micro-averaged Recall
correct = 0
total = 0
for index in range(len(list(testset.iter('sentence')))):
    correct += len([x for x in test_gold[index] if x in prediction[index]])
    total += len(test_gold[index])

recall = 100* correct / total
print('Recall: {:.2f}%'.format(recall))

Recall: 58.12%


In [10]:
print('F-measure: {:.2f}%'.format((2 * precision * recall) / (precision + recall)))

F-measure: 54.22%


In [16]:
# Save the prediction (Optional)
import re
for sentence_node in testset.iter('sentence'):
    sentence_opinions = []
    opinions_node = sentence_node.xpath('./Opinions')
    if opinions_node:
        opinions_node = opinions_node[0]
    else:
        opinions_node = etree.SubElement(sentence_node, 'Opinions')
        
    for opinion_node in sentence_node.xpath('./Opinions/Opinion'):
        opinions_node.remove(opinion_node)
    
    text = sentence_node.xpath('./text/text()')[0]
    for m in re.finditer(targets_pattern, text):
        opinion_node = etree.SubElement(opinions_node, 'Opinion')
        opinion_node.set('target', m.group())
        opinion_node.set('from', str(m.start()))
        opinion_node.set('to', str(m.end()))
        
etree.ElementTree(testset.getroot()).write('../corpus/pred.xml',encoding='utf8', xml_declaration=True, pretty_print=True)

## 2. All aspects in trainset removing stopwords

In [11]:
from collections import Counter
from nltk.corpus import stopwords
# build the targets
stopwords = stopwords.words('english')
targets = Counter([opinion_node.get('target').lower()
                    for opinion_node in trainset.iter('Opinion')
                    if opinion_node.get('target') != 'NULL' and
                       opinion_node.get('target').lower() not in stopwords
                   ])

# print the targets
import ipy_table
data = [['freq', '%freq', 'target']]
for target, freq in targets.most_common(20):
    ratio = freq / sum(targets.values()) *100
    data.append([freq, '{:.1f}%'.format(ratio), target])

ipy_table.make_table(data)
ipy_table.apply_theme('basic')

0,1,2
freq,%freq,target
158,12.4%,food
117,9.1%,service
82,6.4%,place
29,2.3%,restaurant
27,2.1%,staff
26,2.0%,pizza
21,1.6%,atmosphere
20,1.6%,sushi
16,1.3%,decor


In [12]:
import re

# function to evaluate targets in testset and return precision, recall and f-measure
def evaluate(targets):

    # Build a regex to match the targets in the text
    targets_list = sorted(list(targets))
    targets_list.sort(key=len, reverse=True)
    targets_pattern = r'\b(' + '|'.join([re.escape(t) for t in targets_list]) + r')\b'

    test_gold = list()
    prediction = list()

    for sentence_node in testset.iter('sentence'):    
        sentence_opinions = []
        for opinion_node in sentence_node.iter('Opinion'):
            target = opinion_node.get('target')
            start = int(opinion_node.get('from'))
            end = int(opinion_node.get('to'))
            # evaluation explicit says to discart NULL values
            if target != 'NULL':
                sentence_opinions.append((target, start, end))
        test_gold.append(sentence_opinions)

        text = sentence_node.xpath('./text/text()')[0]
        text_opinions = []

        for m in re.finditer(targets_pattern, text, flags=re.I):
            text_opinions.append( (m.group(), m.start(), m.end()) )
        prediction.append(text_opinions)
        
    # Micro-averaged Precision
    correct = 0
    total = 0
    for index in range(len(list(testset.iter('sentence')))):
        correct += len([x for x in test_gold[index] if x in prediction[index]])
        total += len(prediction[index])

    precision = 100 * correct / total    

    # Micro-averaged Recall
    correct = 0
    total = 0
    for index in range(len(list(testset.iter('sentence')))):
        correct += len([x for x in test_gold[index] if x in prediction[index]])
        total += len(test_gold[index])

    recall = 100* correct / total
    
    # F-measure
    if precision + recall != 0:
        fmeasure = (2 * precision * recall) / (precision + recall)
    else:
        fmeasure = 0
    
    return (precision, recall, fmeasure)
    
  

In [13]:
precision, recall, fmeasure = evaluate(targets)
print('Precision: {:.2f}%'.format(precision))
print('Recall: {:.2f}%'.format(recall))
print('F-measure: {:.2f}%'.format(fmeasure))

Precision: 50.81%
Recall: 58.12%
F-measure: 54.22%


## 3. All aspects in trainset with a cut in frequency

In [16]:
# build the targets
targets = Counter([opinion_node.get('target').lower()
                    for opinion_node in trainset.iter('Opinion')
                    if opinion_node.get('target') != 'NULL'])

data = [['cut', 'number of targets', 'precision', 'recall', 'f-measure']]
for min_freq in range(0,110,5):
    min_freq = min_freq/10
    target_list = [target for target, freq in targets.items() if freq/sum(targets.values()) >= min_freq/100]
    precision, recall, fmeasure = evaluate(target_list)
    data.append(['{:.1f}%'.format(min_freq), 
                 len(target_list),
                 '{:.2f}%'.format(precision), 
                 '{:.2f}%'.format(recall), 
                 '{:.2f}%'.format(fmeasure)])

ipy_table.make_table(data)
ipy_table.apply_theme('basic')

0,1,2,3,4
cut,number of targets,precision,recall,f-measure
0.0%,493,50.81%,58.12%,54.22%
0.5%,22,66.84%,41.88%,51.49%
1.0%,10,71.09%,35.01%,46.91%
1.5%,8,70.49%,34.00%,45.88%
2.0%,6,70.37%,31.83%,43.83%
2.5%,3,75.74%,25.63%,38.30%
3.0%,3,75.74%,25.63%,38.30%
3.5%,3,75.74%,25.63%,38.30%
4.0%,3,75.74%,25.63%,38.30%


## 3. All aspects with relative frequency
 
The token must happen % of the time annotated as aspect in the text.

In [17]:
sentences = ' '.join([node.get('form').lower() for node in testset.iter('word')])
freqlist =  Counter([node.get('target').lower() for node in testset.iter('Opinion')])

# build the targets
targets = Counter([opinion_node.get('target').lower()
                    for opinion_node in trainset.iter('Opinion')
                    if opinion_node.get('target') != 'NULL'])

data = [['cut', 'number of targets', 'precision', 'recall', 'f-measure']]
for min_freq in range(0,100,5):
    
    target_list = [target for target, freq in targets.items() if freq/max(sentences.count(' ' + target + ' '),0.00001) >= min_freq/100]
        
    precision, recall, fmeasure = evaluate(target_list)
    data.append(['{:.1f}%'.format(min_freq), 
                 len(target_list),
                 '{:.2f}%'.format(precision), 
                 '{:.2f}%'.format(recall), 
                 '{:.2f}%'.format(fmeasure)])

ipy_table.make_table(data)
ipy_table.apply_theme('basic')

0,1,2,3,4
cut,number of targets,precision,recall,f-measure
0.0%,493,50.81%,58.12%,54.22%
5.0%,491,56.14%,57.45%,56.79%
10.0%,491,56.14%,57.45%,56.79%
15.0%,486,58.71%,56.45%,57.56%
20.0%,484,60.00%,55.78%,57.81%
25.0%,482,61.34%,55.28%,58.15%
30.0%,482,61.34%,55.28%,58.15%
35.0%,473,63.29%,53.43%,57.95%
40.0%,473,63.29%,53.43%,57.95%
