# Sentiment Analysis Benchmarks

This notebook contains benchmarks for selecting the best sentiment analysis solution to be included into the framework proposed in my dissertation. I am comparing three candidates:

- TextBlob - basic lexicon-and-rule-based analyzer 
- TextBlob - pre-trained Naive Bayes classifier
- VADER - lexicon-based analyzer desinged for social network posts

In [10]:
import nltk
import pandas as pd
from sklearn.metrics import f1_score
from textblob import TextBlob
from textblob.en.sentiments import NaiveBayesAnalyzer
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

A small portion of the [Twitter Sentiment Dataset](https://www.kaggle.com/datasets/saurabhshahane/twitter-sentiment-dataset) will be used as ground truth for evaluation. This dataset contains tweets related to Indian politics. Each tweet is labeled as positive (1), negative(-1) or neutral(0):

In [2]:
tweets = pd.read_csv('../datasets/twitter_benchmark.csv').dropna()[:10000]
tweets['category'] = tweets['category'].astype(int)
tweets.head(5)

Unnamed: 0,clean_text,category
0,when modi promised “minimum government maximum...,-1
1,talk all the nonsense and continue all the dra...,0
2,what did just say vote for modi welcome bjp t...,1
3,asking his supporters prefix chowkidar their n...,1
4,answer who among these the most powerful world...,1


All sentiment analysis methods under consideration return a continous polarity value ranging from -1 to 1. The following function transforms this continuous value into a label matching the evaluation dataset.

In [3]:
DISCRETIZATION_BOUNDARY = 0.05

def discretize_polarity(sent: float) -> int:
    if sent < -DISCRETIZATION_BOUNDARY:
        return -1
    elif sent > DISCRETIZATION_BOUNDARY:
        return 1
    else:
        return 0

Now we evaluate all 3 methods. Our metric of choice is `f1_score`.

## TextBlob - lexicon based

In [4]:
sentiments = tweets['clean_text'].apply(lambda t: TextBlob(t).sentiment.polarity)
sentiments.head()

0   -0.300000
1    0.000000
2    0.483333
3    0.150000
4    0.400000
Name: clean_text, dtype: float64

In [5]:

sentiments = sentiments.apply(discretize_polarity)
sentiments.head()

0   -1
1    0
2    1
3    1
4    1
Name: clean_text, dtype: int64

In [6]:
f1_score(tweets['category'], sentiments, average='weighted')

0.93193707680778

## VADER

In [7]:
analyzer = SentimentIntensityAnalyzer()
vader_sentiment = tweets['clean_text'].apply(lambda t: analyzer.polarity_scores(t)['compound'])
vader_sentiment.head()

0    0.5267
1   -0.4019
2    0.7096
3   -0.0713
4    0.4754
Name: clean_text, dtype: float64

In [8]:
vader_sentiment = vader_sentiment.apply(discretize_polarity)
vader_sentiment.head()

0    1
1   -1
2    1
3   -1
4    1
Name: clean_text, dtype: int64

In [9]:

f1_score(tweets['category'], vader_sentiment, average='weighted')

0.5622663754024169

## Naive Bayes

The Naive Bayes sentiment returns the probability of a given tweet being positive (and a complementary probability of the tweet being negative). Before we can apply the discretization function we first transform this probability into "polarity" value from the expected `[-1; 1]` interval. We simply multiply the probability by 2 and subtract 1 from the result.

Some `nltk` datasets are required to make the classifier work.

In [15]:
nltk.download('movie_reviews')
nltk.download('punkt')
nb = NaiveBayesAnalyzer()
nb_sentiment = tweets['clean_text'].apply(lambda t: nb.analyze(t)[1] * 2 - 1)
nb_sentiment.head()

[nltk_data] Downloading package movie_reviews to
[nltk_data]     /home/milos/nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!
[nltk_data] Downloading package punkt to /home/milos/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


0    0.512373
1   -0.591206
2    0.782222
3   -0.451511
4    0.858200
Name: clean_text, dtype: float64

In [16]:
nb_sentiment = nb_sentiment.apply(discretize_polarity)
nb_sentiment.head()

0    1
1   -1
2    1
3   -1
4    1
Name: clean_text, dtype: int64

In [17]:
f1_score(tweets['category'], nb_sentiment, average='weighted')

0.34057464628325873

## Conclusion

We see that the TextBlob analyzer based on a sentiment lexicon plus several additional rules provides the best `f1_score`. It was therefore included into the proposed frameworks as the sentiment analysis solution.