## Sentiment Classification in Practice

**Sentiment classification of reviews**

- Classification of the nominal sentiment polarity or score of a customer review on a product, service, or work of art.

**Data**

- 2100 English hotel reviews from TripAdvisor.
  900 training, 600 validation, and 600 test reviews.
- Each review has a sentiment score from {1, ..., 5}.

![](figs/7.png)


**Tasks**

- 3-class sentiment: 1–2 mapped to negative, 3 to neutral, 4–5 to positive.
  Training set balanced with random undersampling.
- 5-class sentiment: each score interpreted as one (nominal) class.

**Approach**

- Algorithm: linear SVM with one-versus-all multi-class handling.
- Features: combination of several standard and specific feature types.


### Feature Engineering

**What is feature engineering?**

- The design and development of the feature representation of instances used to address a given task.
- The representation governs what patterns can be found during learning.

**Standard vs. specific features**

- Standard: features that can be derived from (more or less) general linguistic phenomena and that may help in several tasks.
- Specific: features that are engineered for a specific tasks, usually based on expert knowledge about the task.

Features covered here

- Standard content features: token n-grams, target class features.
- Standard style features: POS and phrase n-grams, stylometric features.
- Specific features: local sentiment, discourse relations, flow patterns.


**Some General Linguistic Phenomena**

![](figs/8.png)


#### Standard Content Feature Types

**Token n-grams**

- Token unigrams (bag-of-words): the distribution of all token 1-grams that occur in at least 5% of all training texts.
- Token bigrams/trigrams

**Target class features**

- Core vocabulary: the distribution of all words that occur at least three times as often in one class as in every other.
- Sentiment scores: the mean positivity, negativity, and objectivity of all first and average word senses in SentiWordNet.
- Sentiment words: the distribution of all subjective words in SentiWordNet.


#### Standard Style Feature Types

**Part-of-speech (POS) tag n-grams**

- POS unigrams. The distribution of all part-of-speech 1-grams that occur in at least 5% of all training texts.
- POS bigrams/trigrams. Analog for 2-grams and 3-grams.

**Phrase type n-grams**

- Phrase unigrams. The distribution of all phrase type 1-grams that occur in at least 5% of all training texts.
- Phrase bigrams/trigrams. Analog for 2-grams and 3-grams.

**Stylometric features**

- Character trigrams. The distribution of all character 3-grams that occur in at least 5% of all training texts.
- Function words. The distribution of the top 100 words in the training set.
- Lexical statistics. Average numbers of tokens, clauses, and sentences.


#### Evaluation of the Standard Feature Types

![](figs/11.png)


**Evaluation**

- One linear SVM for each feature type alone and for their combination.
- Training on training set, tuning on validation set, test on test set.

**Discussion**

- Token unigrams: best, but some other types close.
- Combination does not outperform best single feature type.
- 60.8% accuracy does not seem very good.


#### Review Argumentation

**Example hotel review**

“We spent one night at that hotel. Staff at the front desk was very nice, the room was clean and cozy, and the hotel lies in the city center... but all this never justifies the price, which is outrageous!”

![](figs/12.png)

**A shallow model of review argumentation**

- A review can be seen as a flow of local sentiments on domain concepts that are connected by discourse relations.


#### Specific Feature Types for Review Sentiment Analysis

**Local sentiment distribution**

- The frequencies of positive, neutral, and negative local sentiment as well as of changes of local sentiments.

  > positive 0.4 neutral 0.4 negative 0.2 (neutral, positive) 0.25 ...

- The average local sentiment value from 0.0 (negative) to 1.0 (positive).

  > average sentiment 0.6

- The interpolated local sentiment at each normalized position in the text.
  > e.g., normalization length 9: (0.5, 0.75, 1.0, 1.0, 1.0, 0.75, 0.5, 0.25, 0.0)


**Discourse relation distribution**

- The distribution of discourse relation types in the text.

  > background 0.25 elaboration 0.5 contrast 0.25 (all others 0.0)

- The distribution of combinations of relation types and local sentiments.

  > background(neutral, positive) 0.25 elaboration(positive, positive) 0.25 ...


**Sentiment flow patterns**

- The similarity of the normalized flow of the text to each flow pattern.

![](figs/13.png)

**Content and style features**

- Content: token n-grams, sentiment scores.
- Style: part-of-speech n-grams, character trigrams, lexical statistics.


#### Evaluation of the Specific Feature Types

![](figs/14.png)


**Evaluation**

- One linear SVM for each feature type alone and for their combination.
- Training on training set, tuning on validation set, test on test set.
- Both 3-class and 5-class.

**Cost hyperparameter tuning**

- Tested $C$ values. 0.001, 0.01, 0.1, 1.0, 50.0
- Best $C$ used on test set.
- Results shown here for the 3-class task only.


#### Results and Discussion for the Specific Features

**Effectiveness results on test set (accuracy)**

![](figs/15.png)


**Discussion**

- Content and style features: a bit weaker than in the experiment above, due to slight differences in the experiment setting.
- Sentiment ﬂow patterns: impact is more visible across domains.
- Combination of features: works out this time, so more complementary.
- The 5-class accuracy seems insufficient.
- Classification misses to model the ordinal relation between classes; regression might be better.
