![](https://www.scubadiving.com/sites/scubadiving.com/files/styles/opengraph_1_91x1/public/images/2015/10/darth-vader.jpg?itok=fW9Tvc0i)
([Image Source](https://www.google.com/search?q=Vader&client=safari&rls=en&sxsrf=ALeKk02vuMaylLazNndJ2sffmpnux3uErA:1612409404732&source=lnms&tbm=isch&sa=X&ved=2ahUKEwjBwLSOpc_uAhXvwzgGHX6oBOwQ_AUoAnoECAIQBA&biw=1920&bih=1000&dpr=1#imgrc=ni-aeMdeIhzgFM))

# Introduction 

Two popular libraries for peforming unsupervised Sentiment Classification are Vader and TextBlob.Both of these libraries are able to assign a numeric value to a sentence without having any labels to compare with. However, we need to decide on how to assign these numbers to categories such as Negative, Positive, Neutral etc. 

In this notebook, we perform grid search to find the optimum split points for doing extactly this. 
Once we find these optimal splits for both VADER and TextBlob, we compare the best accuracies of both. 

In [None]:
# imports 
from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA
from sklearn.model_selection import train_test_split

from tqdm.notebook import tqdm
import textblob

import pandas as pd 
import numpy as np 
import os 

In [None]:
# generate sentiment scores using VADER and TextBlob
sia = SIA()

data = pd.read_csv("/kaggle/input/twitter-airline-sentiment/Tweets.csv")

# Balance the dataset by choosing 2363 from each class
# This is becayse there are only 2363 samples in Positive.

g = data.groupby('airline_sentiment')
data = g.apply(lambda x: x.sample(g.size().min()).reset_index(drop=True))

vader = [sia.polarity_scores(x) for x in data['text']] 

blob_sentiments = [textblob.TextBlob(x).sentiment[0] for x in data['text']]
vader_sentiments = [x['compound'] for x in vader]
targets = data['airline_sentiment'].replace({ "negative": -1, "neutral": 0, "positive": 1})

In [None]:
# convert float to labels based on threshold
def get_sign(x, p, n):
    if x > p:
        return 1
    if x < n:
        return -1 
    return 0

In [None]:
# gridsearch to get the best split points for positive,negative and neutral
def get_best_threshold(sentiments, targets):

    neg_thresh =np.arange(-1,1, 0.05)
    pos_thresh = np.arange(-1,1, 0.05)

    best_params = []
    best_acc = 0 
    i = 0
    total = len(pos_thresh) * len(neg_thresh)


    for p in pos_thresh:
        for n in neg_thresh:
            i += 1 
            print(f"Processing: {i/total*100:.2f}%", end="\r")

            params = (p,n)
            res = [get_sign(x, p, n) for x in sentiments]
            acc = sum(res == targets)/len(targets)

            if acc > best_acc:
                best_acc= acc
                best_params = params

    
    return best_acc, best_params

In [None]:
# train-test-split to check for overfitting
(
    vader_train, 
    vader_test,  
    blob_train, 
    blob_test, 
    y_train, 
    y_test 
) = train_test_split(
    vader_sentiments, 
    blob_sentiments, 
    targets, 
    stratify=targets,
    test_size=0.1,
    random_state=42
)

In [None]:
# Best split points and max accuracy for TextBlob
best_acc, best_params = get_best_threshold(blob_train, y_train)
print("\nTextBlob Results: ")
print("\nBest Accuracy: ", best_acc)
print("Best (pos, neg) threshold values: ", best_params)

preds = [get_sign(x, *best_params) for x in blob_test]
val_acc = sum(preds == y_test)/len(y_test)
print("\nValid Accuracy for selected params: ", val_acc)

In [None]:
# Best split points and max accuracy for VADER
best_acc, best_params = get_best_threshold(vader_train, y_train)

print("\nVADER Results: ")
print("\nBest Accuracy: ", best_acc)
print("Best (pos, neg) threshold values: ", best_params)

preds = [get_sign(x, *best_params) for x in vader_test]
val_acc = sum(preds == y_test)/len(y_test)
print("\nValid Accuracy for selected params: ", val_acc)

# Conclusions:
1. Vader significantly outperforms TextBlob for twitter sentiments 
2. Best accuracy when using Vader:  ~62%

Best Thresholds for splitting Vader tweet [](http://)sentiments:
* sentiment['compound'] < -0.05 => Negative
* sentiment['compound'] > 0.35 => Positive
* -0.05 <= sentiment['compound'] <= 0.35 => Neutral