### Aspect Classification

In SemEval 2015 and 2016, the task sentence-level ABSA has defined a subtask so-called Aspect Category Detection, whose aim is to identify every entity E and attribute A pair, towards which an opinion is expressed in the given text [24]. Specifically, given an input sentence such as “The food was delicious”, ABSA needs to detect the E and A pair (category=FOOD#QUALITY) for the target word “food” and to estimate its sentiment either positive or negative. The English dataset has been provided for two domains: Laptops and Restaurants. We have chosen the latter for this evaluation. In the restaurant domain, SemEval predefines a set of entities SERVICE, RESTAURANT, FOOD, DRINKS, AMBIANCE and LOCATION, which can be viewed as general aspect categories. Our task of aspect category classification consists in assigning a general aspect category to opinion target words. For example, words such as wine, beverage and soda are classified into DRINKS, while words such as bread, fish, and cheese are classified into FOOD. Note that only entity E (FOOD) is used as general aspect category and the attribute QUALITY is not con- sidered for simplicity.
This task challenges semantic relatedness methods, especially for corpus-based methods. For instance, in restaurant review corpora, those target words such as fish and wine would appear in same surrounding contexts (e.g., “the fish is delicious and the wine is great”). Since corpus-based methods are based on calculating co-occurrences of terms in a corpus, they can hardly discriminate terms from different categories that are frequently collocated (e.g., fish and wine). In such scenario, knowledge-based methods are useful to include the structural knowledge from domain taxonomy. Semantic similarity methods can be used to measure the taxonomical similarity between target words and aspect category in order to classify the target words into correct aspect category.

### Unsupervised Similarity-based Aspect Classification

The evaluation experiments for classifying categories based on semantic similarity scores. The results has been reported in the article:
> Ganggao Zhu and Carlos A. Iglesias [Computing Semantic Similarity of Concepts in Knowledge Graphs](http://ieeexplore.ieee.org/document/7572993/), TKDE, 2016.

The idea is to extract most frequent words associated with a category so that those words can be used as features to represet the category. Then, with given target words, by comparing the semantic similarity between target words and feature words, the correct category can be determined by choosing the category with the highest similarity score. The effectiveness of different knowledge-based semantic similarity metrics have been evaluated using this task.

The datasets can be found in [Aspect Based Sentiment Analysis 15](http://alt.qcri.org/semeval2015/task5/) and [Aspect Based Sentiment Analysis 16](http://alt.qcri.org/semeval2016/task5/)

In [1]:
from sematch.classification import SimCatClassifier
from sematch.semantic.similarity import WordNetSimilarity
from sematch.evaluation import ABSAEvaluation, generate_report


ABSA15_Train = 'eval/aspect/ABSA-15_Restaurants_Train_Final.xml'
ABSA15_Test = 'eval/aspect/ABSA15_Restaurants_Test.xml'
ABSA16_Train = 'eval/aspect/ABSA16_Restaurants_Train_SB1_v2.xml'
ABSA16_Test ='eval/aspect/ABSA16_Restaurants_Test_Gold.xml'

In [12]:
wns = WordNetSimilarity()
sim_metric_path = lambda x, y: wns.word_similarity(x, y, 'path')
sim_metric_lch = lambda x, y: wns.word_similarity(x, y, 'lch')
sim_metric_wup = lambda x, y: wns.word_similarity(x, y, 'wup')
sim_metric_li = lambda x, y: wns.word_similarity(x, y, 'li')
sim_metric_res = lambda x, y: wns.word_similarity(x, y, 'res')
sim_metric_lin = lambda x, y: wns.word_similarity(x, y, 'lin')
sim_metric_jcn = lambda x, y: wns.word_similarity(x, y, 'jcn')
sim_metric_wpath = lambda x, y: wns.word_similarity_wpath(x, y, 0.9)

In [13]:
absa_eval = ABSAEvaluation()
X_train_15, y_train_15 = absa_eval.load_dataset(ABSA15_Train)
X_test_15, y_test_15 = absa_eval.load_dataset(ABSA15_Test)
X_train_16, y_train_16 = absa_eval.load_dataset(ABSA16_Train)
X_test_16, y_test_16 = absa_eval.load_dataset(ABSA16_Test)

X_train = X_train_15 + X_train_16
y_train = y_train_15 + y_train_16

X_test = X_test_15 + X_test_16
y_test = y_test_15 + y_test_16

X = X_train + X_test
y = y_train + y_test

In [14]:
sim_path_classifier = SimCatClassifier.train(zip(X, y), sim_metric_path)
absa_eval.evaluate(X, y, sim_path_classifier)

macro averge:  (0.65889932837536824, 0.73624800391990597, 0.68088204044995548, None)
micro average:  (0.79437131184748067, 0.79437131184748067, 0.79437131184748067, None)
weighted average:  (0.81527408112409394, 0.79437131184748067, 0.80048315830078653, None)
accuracy:  0.794371311847
             precision    recall  f1-score   support

    SERVICE       0.47      0.44      0.46       519
 RESTAURANT       0.84      0.82      0.83       228
       FOOD       0.96      0.85      0.90      2256
   LOCATION       0.26      0.67      0.38        54
   AMBIENCE       0.61      0.70      0.65       597
     DRINKS       0.80      0.94      0.87       752

avg / total       0.82      0.79      0.80      4406

           |                        R      |
           |                        E      |
           |    A              L    S      |
           |    M              O    T    S |
           |    B    D         C    A    E |
           |    I    R         A    U    R |
           |    E

In [15]:
sim_lch_classifier = SimCatClassifier.train(zip(X, y), sim_metric_lch)
absa_eval.evaluate(X, y, sim_lch_classifier)

macro averge:  (0.65635306731834364, 0.70435973578754885, 0.66306397717679955, None)
micro average:  (0.78801634135270082, 0.78801634135270082, 0.78801634135270082, None)
weighted average:  (0.80384726345569513, 0.78801634135270082, 0.79184681236417442, None)
accuracy:  0.788016341353
             precision    recall  f1-score   support

    SERVICE       0.47      0.43      0.45       519
 RESTAURANT       0.84      0.63      0.72       228
       FOOD       0.94      0.86      0.90      2256
   LOCATION       0.28      0.67      0.40        54
   AMBIENCE       0.62      0.69      0.65       597
     DRINKS       0.79      0.94      0.86       752

avg / total       0.80      0.79      0.79      4406

           |                        R      |
           |                        E      |
           |    A              L    S      |
           |    M              O    T    S |
           |    B    D         C    A    E |
           |    I    R         A    U    R |
           |    E

In [16]:
sim_wup_classifier = SimCatClassifier.train(zip(X, y), sim_metric_wup)
absa_eval.evaluate(X, y, sim_wup_classifier)

macro averge:  (0.63090642367412297, 0.68519200438159267, 0.63717825320797139, None)
micro average:  (0.76940535633227414, 0.76940535633227414, 0.76940535633227414, None)
weighted average:  (0.784310525981046, 0.76940535633227414, 0.77077650572541434, None)
accuracy:  0.769405356332
             precision    recall  f1-score   support

    SERVICE       0.42      0.33      0.37       519
 RESTAURANT       0.84      0.67      0.75       228
       FOOD       0.95      0.86      0.90      2256
   LOCATION       0.26      0.67      0.37        54
   AMBIENCE       0.62      0.65      0.63       597
     DRINKS       0.70      0.93      0.80       752

avg / total       0.78      0.77      0.77      4406

           |                        R      |
           |                        E      |
           |    A              L    S      |
           |    M              O    T    S |
           |    B    D         C    A    E |
           |    I    R         A    U    R |
           |    E  

In [17]:
sim_li_classifier = SimCatClassifier.train(zip(X, y), sim_metric_li)
absa_eval.evaluate(X, y, sim_li_classifier)

macro averge:  (0.65984906716092195, 0.70155091577843098, 0.66725551753582202, None)
micro average:  (0.78302315024965952, 0.78302315024965952, 0.78302315024965952, None)
weighted average:  (0.79737279799046923, 0.78302315024965952, 0.78556460252592386, None)
accuracy:  0.78302315025
             precision    recall  f1-score   support

    SERVICE       0.47      0.43      0.45       519
 RESTAURANT       0.84      0.63      0.72       228
       FOOD       0.94      0.86      0.90      2256
   LOCATION       0.35      0.67      0.46        54
   AMBIENCE       0.60      0.68      0.64       597
     DRINKS       0.76      0.94      0.84       752

avg / total       0.80      0.78      0.79      4406

           |                        R      |
           |                        E      |
           |    A              L    S      |
           |    M              O    T    S |
           |    B    D         C    A    E |
           |    I    R         A    U    R |
           |    E 

In [18]:
sim_res_classifier = SimCatClassifier.train(zip(X, y), sim_metric_res)
absa_eval.evaluate(X, y, sim_res_classifier)

macro averge:  (0.56082227851282107, 0.67964920406478158, 0.55814905235610668, None)
micro average:  (0.72378574670903317, 0.72378574670903317, 0.72378574670903317, None)
weighted average:  (0.78880521176606977, 0.72378574670903317, 0.74026266367828464, None)
accuracy:  0.723785746709
             precision    recall  f1-score   support

    SERVICE       0.49      0.42      0.45       519
 RESTAURANT       0.48      0.89      0.62       228
       FOOD       0.97      0.83      0.89      2256
   LOCATION       0.09      0.74      0.16        54
   AMBIENCE       0.53      0.27      0.36       597
     DRINKS       0.80      0.93      0.86       752

avg / total       0.79      0.72      0.74      4406

           |                        R      |
           |                        E      |
           |    A              L    S      |
           |    M              O    T    S |
           |    B    D         C    A    E |
           |    I    R         A    U    R |
           |    E

In [19]:
sim_lin_classifier = SimCatClassifier.train(zip(X, y), sim_metric_lin)
absa_eval.evaluate(X, y, sim_lin_classifier)

macro averge:  (0.57566941584222298, 0.674909393775462, 0.56767607988833857, None)
micro average:  (0.73104857013163871, 0.73104857013163871, 0.73104857013163871, None)
weighted average:  (0.79293272073125476, 0.73104857013163871, 0.74815810907979352, None)
accuracy:  0.731048570132
             precision    recall  f1-score   support

    SERVICE       0.49      0.42      0.45       519
 RESTAURANT       0.59      0.86      0.70       228
       FOOD       0.97      0.85      0.91      2256
   LOCATION       0.08      0.74      0.15        54
   AMBIENCE       0.53      0.26      0.35       597
     DRINKS       0.80      0.93      0.86       752

avg / total       0.79      0.73      0.75      4406

           |                        R      |
           |                        E      |
           |    A              L    S      |
           |    M              O    T    S |
           |    B    D         C    A    E |
           |    I    R         A    U    R |
           |    E  

In [20]:
sim_jcn_classifier = SimCatClassifier.train(zip(X, y), sim_metric_jcn)
absa_eval.evaluate(X, y, sim_jcn_classifier)

macro averge:  (0.60669251709544469, 0.70306569831293275, 0.6093556279578537, None)
micro average:  (0.7319564230594644, 0.7319564230594644, 0.7319564230594644, None)
weighted average:  (0.82166030674304757, 0.7319564230594644, 0.7665347945260822, None)
accuracy:  0.731956423059
             precision    recall  f1-score   support

    SERVICE       0.51      0.44      0.47       519
 RESTAURANT       0.57      0.82      0.68       228
       FOOD       0.97      0.77      0.85      2256
   LOCATION       0.07      0.67      0.13        54
   AMBIENCE       0.65      0.67      0.66       597
     DRINKS       0.87      0.86      0.87       752

avg / total       0.82      0.73      0.77      4406

           |                        R      |
           |                        E      |
           |    A              L    S      |
           |    M              O    T    S |
           |    B    D         C    A    E |
           |    I    R         A    U    R |
           |    E    I 

In [21]:
sim_wpath_classifier = SimCatClassifier.train(zip(X, y), sim_metric_wpath)
absa_eval.evaluate(X, y, sim_wpath_classifier)

macro averge:  (0.66528658284033471, 0.74203316558663579, 0.68911802449209258, None)
micro average:  (0.80118020880617336, 0.80118020880617336, 0.80118020880617336, None)
weighted average:  (0.81764097352537535, 0.80118020880617336, 0.80528936770377413, None)
accuracy:  0.801180208806
             precision    recall  f1-score   support

    SERVICE       0.49      0.43      0.46       519
 RESTAURANT       0.84      0.85      0.84       228
       FOOD       0.97      0.86      0.91      2256
   LOCATION       0.29      0.67      0.41        54
   AMBIENCE       0.62      0.70      0.66       597
     DRINKS       0.79      0.94      0.86       752

avg / total       0.82      0.80      0.81      4406

           |                        R      |
           |                        E      |
           |    A              L    S      |
           |    M              O    T    S |
           |    B    D         C    A    E |
           |    I    R         A    U    R |
           |    E

### Supervised Similarity-based Aspect Classification

The evaluation experiments for classifying categories based on Support Vector Machine with semantic similarity scores. The results has been reported in the article:
> Oscar Araque, Ganggao Zhu, Manuel Garcia-Amado and Carlos A. Iglesias [Mining the Opinionated Web: Classification and Detection of Aspect Contexts for Aspect Based Sentiment Analysis](http://sentic.net/sentire2016araque.pdf), ICDM sentire, 2016.

In the restaurant domain, SemEval predefines a set of entity labels (SERVICE, RESTAURANT, FOOD, DRINKS, AMBIANCE, LOCATION) and a set of attribute labels (GENERAL, PRICE, QUALITY, STYLE OPTION, MISCELLANEOUS). The entities and labels compose 12 categories. In this evaluation, we use the complete category sets. 

The baseline of aspect category classification provided by SemEval employs a Support Vector Machine (SVM) with a linear kernel. Specifically, n unigram features are extracted from the training data, where the category value (e.g., FOOD#QUALITY) of the tuple is used as the correct label of the feature vector. For each test sentence, a feature vector is built and the trained SVM is used to predict the correct category. We implemented this as baseline and provided this as "onehot" model. 

This unigram feature representation lacks of the ability in addressing those feature words that are not encountered in the training process. As reported in SemEval, word clusters learned from Yelp data are used to expand the features in order to solve this ‘lack of vocabulary’ problem. We adopt this approach by using Word2Vec for training Yelp data. Therefore, we have obtained a continuous representation of words, namely word embeddings, where words that co-occur frequently are mapped to vectors close in vector space. Based on the distributional semantics hypothesis, the words co-occur in a same surrounding context are treated as relevant so that they have high similarity. Since those similar words of unigram features may not appear in the training data, if such similar words are simply added to the feature vector, the SVM model would treat those similar words having the same weight as the unigram features appearing in the training data. In this way, the valuable information of similarity scores between words are missing. In order to involve word2vec similarity model into the SVM, we retain the n unigram feature vector and use the similarity score to represent each dimension of feature vector. The idea of using similarity feature vector for target words is to learn a semantic predictive model for each category based on the feature words and similarity model.

However, the Word2Vec embedding considers the co-occurrence information of the same surrounding context, which would make a wide variety of words to be considered as related. Furthermore, Word2Vec represents words without clarifying their different meanings (word senses) and lexical relations. This would challenge the Word2Vec model when discriminating words from different categories that are frequently collocated (e.g. food and drink). For instance, in restaurant domain, those target words such as fish and wine would appear in same surrounding contexts (e.g. “the fish is delicious and the wine is great”). If a word2vec model is trained from such corpus simply based on calculating co- occurrences of words, many words belonging to different categories would have similar similarity. For such problem, knowledge-based semantic similarity methods are useful to include the structural knowledge from domain taxonomy. The semantic similarity measures exploit the hierarchical classification of all words via is-a relation, whose intuition is that two words are more similar if they are closer to each other in a taxonomy such as WordNet. Compared to word2vec, semantic similarity can retain taxonomical information from WordNet, however, semantic similarity measures can only address limited words that are contained in WordNet. By combining word2vec based semantic relatedness and WordNet based semantic similarity, the similarity model of feature words can have good ability in addressing large vocabulary and also have hierarchical information of common words in WordNet.

In [1]:
from sematch.classification import SimCatSVMClassifier
from sematch.evaluation import ABSAEvaluation, generate_report

absa_eval = ABSAEvaluation()

ABSA16_Train = 'eval/aspect/ABSA16_Restaurants_Train_SB1_v2.xml'
ABSA16_Test ='eval/aspect/ABSA16_Restaurants_Test_Gold.xml'

In [2]:
X_train, y_train = absa_eval.load_dataset(ABSA16_Train, cat_full=True)
X_test, y_test = absa_eval.load_dataset(ABSA16_Test, cat_full=True)

Simple baseline, one hot representation using unigram feature.

In [3]:
bow_model = SimCatSVMClassifier.train(X_train, y_train)
absa_eval.evaluate(X_test, y_test, bow_model, detailed_report=False)

Building the model
Complete model building in 3.278 seconds
accuracy:  0.744615384615
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.91      0.49      0.64        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.82      0.64      0.72        22
        LOCATION#GENERAL       0.73      0.73      0.73        11
          DRINKS#QUALITY       0.00      0.00      0.00        22
      RESTAURANT#GENERAL       0.68      0.96      0.80       283
             FOOD#PRICES       0.71      0.33      0.45        51
RESTAURANT#MISCELLANEOUS       1.00      1.00      1.00        10
            FOOD#QUALITY       0.71      0.62      0.66        58
      FOOD#STYLE_OPTIONS       0.29      0.11      0.16        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.99      0.89      0.94       107

             avg / total       0.72      0.74      0.7

Knowledge-based similarity features using semantic similarity metrics with WordNet

In [4]:
wordnet_path_model = SimCatSVMClassifier.train(X_train, y_train, feature='wordnet', wn_method='path')
absa_eval.evaluate(X_test, y_test, wordnet_path_model, detailed_report=False)

Building the model
Complete model building in 109.666 seconds
accuracy:  0.78
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.81      0.78      0.79        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.85      0.77      0.81        22
        LOCATION#GENERAL       0.67      0.73      0.70        11
          DRINKS#QUALITY       1.00      0.05      0.09        22
      RESTAURANT#GENERAL       0.74      0.95      0.83       283
             FOOD#PRICES       0.71      0.33      0.45        51
RESTAURANT#MISCELLANEOUS       1.00      1.00      1.00        10
            FOOD#QUALITY       0.70      0.67      0.68        58
      FOOD#STYLE_OPTIONS       0.29      0.11      0.16        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.98      0.92      0.95       107

             avg / total       0.77      0.78      0.75       

In [5]:
wordnet_lch_model = SimCatSVMClassifier.train(X_train, y_train, feature='wordnet', wn_method='lch')
absa_eval.evaluate(X_test, y_test, wordnet_lch_model, detailed_report=False)

Building the model
Complete model building in 136.443 seconds
accuracy:  0.758461538462
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.72      0.80      0.76        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.81      0.77      0.79        22
        LOCATION#GENERAL       0.64      0.64      0.64        11
          DRINKS#QUALITY       0.00      0.00      0.00        22
      RESTAURANT#GENERAL       0.79      0.89      0.83       283
             FOOD#PRICES       0.57      0.33      0.42        51
RESTAURANT#MISCELLANEOUS       1.00      0.90      0.95        10
            FOOD#QUALITY       0.52      0.74      0.61        58
      FOOD#STYLE_OPTIONS       0.22      0.11      0.15        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.97      0.93      0.95       107

             avg / total       0.72      0.76      0

In [6]:
wordnet_wup_model = SimCatSVMClassifier.train(X_train, y_train, feature='wordnet', wn_method='wup')
absa_eval.evaluate(X_test, y_test, wordnet_wup_model, detailed_report=False)

Building the model
Complete model building in 147.147 seconds
accuracy:  0.750769230769
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.75      0.71      0.73        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.85      0.77      0.81        22
        LOCATION#GENERAL       0.67      0.73      0.70        11
          DRINKS#QUALITY       0.00      0.00      0.00        22
      RESTAURANT#GENERAL       0.78      0.89      0.83       283
             FOOD#PRICES       0.58      0.29      0.39        51
RESTAURANT#MISCELLANEOUS       1.00      1.00      1.00        10
            FOOD#QUALITY       0.52      0.71      0.60        58
      FOOD#STYLE_OPTIONS       0.20      0.06      0.09        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.84      0.96      0.90       107

             avg / total       0.70      0.75      0

In [8]:
wordnet_li_model = SimCatSVMClassifier.train(X_train, y_train, feature='wordnet', wn_method='li')
absa_eval.evaluate(X_test, y_test, wordnet_li_model, detailed_report=False)

Building the model
Complete model building in 77.879 seconds
accuracy:  0.787692307692
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.76      0.81      0.79        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.85      0.77      0.81        22
        LOCATION#GENERAL       0.67      0.73      0.70        11
          DRINKS#QUALITY       0.00      0.00      0.00        22
      RESTAURANT#GENERAL       0.78      0.95      0.85       283
             FOOD#PRICES       0.60      0.35      0.44        51
RESTAURANT#MISCELLANEOUS       1.00      1.00      1.00        10
            FOOD#QUALITY       0.71      0.71      0.71        58
      FOOD#STYLE_OPTIONS       0.20      0.06      0.09        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.95      0.94      0.95       107

             avg / total       0.74      0.79      0.

In [7]:
wordnet_res_model = SimCatSVMClassifier.train(X_train, y_train, feature='wordnet', wn_method='res')
absa_eval.evaluate(X_test, y_test, wordnet_res_model, detailed_report=False)

Building the model
Complete model building in 21.529 seconds
accuracy:  0.743076923077
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.80      0.73      0.76        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.59      0.59      0.59        22
        LOCATION#GENERAL       0.42      0.45      0.43        11
          DRINKS#QUALITY       0.00      0.00      0.00        22
      RESTAURANT#GENERAL       0.74      0.90      0.82       283
             FOOD#PRICES       0.64      0.27      0.38        51
RESTAURANT#MISCELLANEOUS       1.00      1.00      1.00        10
            FOOD#QUALITY       0.67      0.67      0.67        58
      FOOD#STYLE_OPTIONS       0.25      0.17      0.20        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.88      0.93      0.91       107

             avg / total       0.70      0.74      0.

In [9]:
wordnet_lin_model = SimCatSVMClassifier.train(X_train, y_train, feature='wordnet', wn_method='lin')
absa_eval.evaluate(X_test, y_test, wordnet_lin_model, detailed_report=False)

Building the model
Complete model building in 21.652 seconds
accuracy:  0.773846153846
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.74      0.83      0.78        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.93      0.64      0.76        22
        LOCATION#GENERAL       0.54      0.64      0.58        11
          DRINKS#QUALITY       0.00      0.00      0.00        22
      RESTAURANT#GENERAL       0.74      0.96      0.84       283
             FOOD#PRICES       0.75      0.29      0.42        51
RESTAURANT#MISCELLANEOUS       1.00      1.00      1.00        10
            FOOD#QUALITY       0.67      0.60      0.64        58
      FOOD#STYLE_OPTIONS       0.20      0.06      0.09        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.96      0.94      0.95       107

             avg / total       0.73      0.77      0.

In [10]:
wordnet_jcn_model = SimCatSVMClassifier.train(X_train, y_train, feature='wordnet', wn_method='jcn')
absa_eval.evaluate(X_test, y_test, wordnet_jcn_model, detailed_report=False)

Building the model
Complete model building in 28.361 seconds
accuracy:  0.767692307692
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.82      0.71      0.76        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.81      0.59      0.68        22
        LOCATION#GENERAL       0.73      0.73      0.73        11
          DRINKS#QUALITY       1.00      0.05      0.09        22
      RESTAURANT#GENERAL       0.72      0.96      0.82       283
             FOOD#PRICES       0.74      0.33      0.46        51
RESTAURANT#MISCELLANEOUS       0.91      1.00      0.95        10
            FOOD#QUALITY       0.71      0.62      0.66        58
      FOOD#STYLE_OPTIONS       0.29      0.11      0.16        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.98      0.92      0.95       107

             avg / total       0.77      0.77      0.

In [11]:
wordnet_wpath_model = SimCatSVMClassifier.train(X_train, y_train, feature='wordnet', wn_method='wpath')
absa_eval.evaluate(X_test, y_test, wordnet_wpath_model, detailed_report=False)

Building the model
Complete model building in 83.651 seconds
accuracy:  0.792307692308
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.76      0.81      0.79        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.85      0.77      0.81        22
        LOCATION#GENERAL       0.62      0.73      0.67        11
          DRINKS#QUALITY       0.00      0.00      0.00        22
      RESTAURANT#GENERAL       0.77      0.95      0.85       283
             FOOD#PRICES       0.67      0.35      0.46        51
RESTAURANT#MISCELLANEOUS       0.91      1.00      0.95        10
            FOOD#QUALITY       0.72      0.72      0.72        58
      FOOD#STYLE_OPTIONS       0.20      0.06      0.09        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.98      0.96      0.97       107

             avg / total       0.74      0.79      0.

Dense Vector based similarity features. Word2Vec is used to train word similarity model from YELP dataset

In [6]:
vec_file = 'models/w2v-400d-yelp-comment_w2vformat'
binary = False
yelp_vector_model = SimCatSVMClassifier.train(X_train, y_train, feature='word2vec', vec_file=vec_file, binary=binary)
absa_eval.evaluate(X_test, y_test, yelp_vector_model, detailed_report=False)

Building the model
Complete model building in 69.400 seconds
accuracy:  0.818461538462
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.84      0.88      0.86        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.72      0.82      0.77        22
        LOCATION#GENERAL       0.75      0.82      0.78        11
          DRINKS#QUALITY       1.00      0.05      0.09        22
      RESTAURANT#GENERAL       0.82      0.95      0.88       283
             FOOD#PRICES       0.75      0.35      0.48        51
RESTAURANT#MISCELLANEOUS       1.00      1.00      1.00        10
            FOOD#QUALITY       0.64      0.84      0.73        58
      FOOD#STYLE_OPTIONS       0.00      0.00      0.00        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.96      0.98      0.97       107

             avg / total       0.79      0.82      0.

Combining WordNet similarity and Word2Vec similarity

In [7]:
both_path_model = SimCatSVMClassifier.train(X_train, y_train, feature='both', wn_method='path', vec_file=vec_file, binary=binary)
absa_eval.evaluate(X_test, y_test, both_path_model, detailed_report=False)

Building the model
Complete model building in 209.375 seconds
accuracy:  0.82
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.85      0.88      0.87        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.95      0.86      0.90        22
        LOCATION#GENERAL       0.73      0.73      0.73        11
          DRINKS#QUALITY       1.00      0.05      0.09        22
      RESTAURANT#GENERAL       0.81      0.96      0.88       283
             FOOD#PRICES       0.69      0.35      0.47        51
RESTAURANT#MISCELLANEOUS       1.00      1.00      1.00        10
            FOOD#QUALITY       0.67      0.81      0.73        58
      FOOD#STYLE_OPTIONS       0.29      0.11      0.16        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.96      0.98      0.97       107

             avg / total       0.80      0.82      0.79       

In [8]:
both_lch_model = SimCatSVMClassifier.train(X_train, y_train, feature='both', wn_method='lch', vec_file=vec_file, binary=binary)
absa_eval.evaluate(X_test, y_test, both_lch_model, detailed_report=False)

Building the model
Complete model building in 227.442 seconds
accuracy:  0.810769230769
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.84      0.88      0.86        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.82      0.82      0.82        22
        LOCATION#GENERAL       0.67      0.73      0.70        11
          DRINKS#QUALITY       1.00      0.05      0.09        22
      RESTAURANT#GENERAL       0.81      0.94      0.87       283
             FOOD#PRICES       0.72      0.35      0.47        51
RESTAURANT#MISCELLANEOUS       1.00      1.00      1.00        10
            FOOD#QUALITY       0.69      0.81      0.75        58
      FOOD#STYLE_OPTIONS       0.22      0.11      0.15        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.95      0.97      0.96       107

             avg / total       0.80      0.81      0

In [9]:
both_wup_model = SimCatSVMClassifier.train(X_train, y_train, feature='both', wn_method='wup', vec_file=vec_file, binary=binary)
absa_eval.evaluate(X_test, y_test, both_wup_model, detailed_report=False)

Building the model
Complete model building in 219.896 seconds
accuracy:  0.813846153846
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.87      0.88      0.87        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.90      0.82      0.86        22
        LOCATION#GENERAL       0.73      0.73      0.73        11
          DRINKS#QUALITY       1.00      0.05      0.09        22
      RESTAURANT#GENERAL       0.80      0.96      0.87       283
             FOOD#PRICES       0.72      0.35      0.47        51
RESTAURANT#MISCELLANEOUS       1.00      1.00      1.00        10
            FOOD#QUALITY       0.68      0.76      0.72        58
      FOOD#STYLE_OPTIONS       0.22      0.11      0.15        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.96      0.98      0.97       107

             avg / total       0.80      0.81      0

In [10]:
both_li_model = SimCatSVMClassifier.train(X_train, y_train, feature='both', wn_method='li', vec_file=vec_file, binary=binary)
absa_eval.evaluate(X_test, y_test, both_li_model, detailed_report=False)

Building the model
Complete model building in 149.345 seconds
accuracy:  0.812307692308
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.85      0.88      0.87        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.90      0.82      0.86        22
        LOCATION#GENERAL       0.73      0.73      0.73        11
          DRINKS#QUALITY       1.00      0.05      0.09        22
      RESTAURANT#GENERAL       0.81      0.95      0.87       283
             FOOD#PRICES       0.72      0.35      0.47        51
RESTAURANT#MISCELLANEOUS       1.00      1.00      1.00        10
            FOOD#QUALITY       0.66      0.78      0.71        58
      FOOD#STYLE_OPTIONS       0.25      0.11      0.15        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.93      0.98      0.95       107

             avg / total       0.80      0.81      0

In [11]:
both_res_model = SimCatSVMClassifier.train(X_train, y_train, feature='both', wn_method='res', vec_file=vec_file, binary=binary)
absa_eval.evaluate(X_test, y_test, both_res_model, detailed_report=False)

Building the model
Complete model building in 98.903 seconds
accuracy:  0.784615384615
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.80      0.86      0.83        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.86      0.82      0.84        22
        LOCATION#GENERAL       0.54      0.64      0.58        11
          DRINKS#QUALITY       0.17      0.05      0.07        22
      RESTAURANT#GENERAL       0.81      0.90      0.85       283
             FOOD#PRICES       0.74      0.33      0.46        51
RESTAURANT#MISCELLANEOUS       0.90      0.90      0.90        10
            FOOD#QUALITY       0.68      0.83      0.74        58
      FOOD#STYLE_OPTIONS       0.14      0.11      0.12        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.93      0.96      0.94       107

             avg / total       0.76      0.78      0.

In [12]:
both_lin_model = SimCatSVMClassifier.train(X_train, y_train, feature='both', wn_method='lin', vec_file=vec_file, binary=binary)
absa_eval.evaluate(X_test, y_test, both_lin_model, detailed_report=False)

Building the model
Complete model building in 93.706 seconds
accuracy:  0.813846153846
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.84      0.88      0.86        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.82      0.82      0.82        22
        LOCATION#GENERAL       0.67      0.73      0.70        11
          DRINKS#QUALITY       1.00      0.05      0.09        22
      RESTAURANT#GENERAL       0.81      0.96      0.88       283
             FOOD#PRICES       0.75      0.35      0.48        51
RESTAURANT#MISCELLANEOUS       1.00      1.00      1.00        10
            FOOD#QUALITY       0.67      0.76      0.71        58
      FOOD#STYLE_OPTIONS       0.25      0.11      0.15        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.96      0.98      0.97       107

             avg / total       0.80      0.81      0.

In [13]:
both_jcn_model = SimCatSVMClassifier.train(X_train, y_train, feature='both', wn_method='jcn', vec_file=vec_file, binary=binary)
absa_eval.evaluate(X_test, y_test, both_jcn_model, detailed_report=False)

Building the model
Complete model building in 104.471 seconds
accuracy:  0.82
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.84      0.88      0.86        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.79      0.86      0.83        22
        LOCATION#GENERAL       0.73      0.73      0.73        11
          DRINKS#QUALITY       1.00      0.05      0.09        22
      RESTAURANT#GENERAL       0.82      0.96      0.88       283
             FOOD#PRICES       0.74      0.33      0.46        51
RESTAURANT#MISCELLANEOUS       1.00      1.00      1.00        10
            FOOD#QUALITY       0.67      0.81      0.73        58
      FOOD#STYLE_OPTIONS       0.29      0.11      0.16        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.96      0.98      0.97       107

             avg / total       0.80      0.82      0.79       

In [14]:
both_wpath_model = SimCatSVMClassifier.train(X_train, y_train, feature='both', wn_method='wpath', vec_file=vec_file, binary=binary)
absa_eval.evaluate(X_test, y_test, both_wpath_model, detailed_report=False)

Building the model
Complete model building in 156.562 seconds
accuracy:  0.809230769231
                          precision    recall  f1-score   support

           DRINKS#PRICES       0.83      0.88      0.85        59
        AMBIENCE#GENERAL       0.00      0.00      0.00         4
         SERVICE#GENERAL       0.86      0.82      0.84        22
        LOCATION#GENERAL       0.73      0.73      0.73        11
          DRINKS#QUALITY       1.00      0.05      0.09        22
      RESTAURANT#GENERAL       0.80      0.95      0.87       283
             FOOD#PRICES       0.69      0.35      0.47        51
RESTAURANT#MISCELLANEOUS       1.00      1.00      1.00        10
            FOOD#QUALITY       0.68      0.74      0.71        58
      FOOD#STYLE_OPTIONS       0.20      0.11      0.14        18
       RESTAURANT#PRICES       0.00      0.00      0.00         5
    DRINKS#STYLE_OPTIONS       0.96      0.98      0.97       107

             avg / total       0.79      0.81      0