# Vader Sentiment Analysis

**Vader** is an excellent library for getting rapid sentiment analysis results, particularly for the *social media* text. It has some great **advantages** which could be counted as the following:

* No labeling process is required!
* Fast and deployable,
* Not bad accuracy even without Text Preprocessing.

However, there are some main **disadvantages** as well, and the primary one is the fact that it is a rule-based approach, it utilizes the predefined polarity scores of each words (and emojis!) by summing them up to get the final score of the sentence or paragraph, depending on the context that we would like to extract the sentiment. 

Another disadvantage that I have discored thus far, in connection with the first one, is that we cannot go beyond a certain accuracy (compared to NLP approaches), usually I prefer training an NLP model (such as BERT etc.) for attaining higher success rates. In a future notebook, I intent to compare the result with BERT Model.

* Rule-Based sentiment analysis & no learning.

In [None]:
!pip install vaderSentiment

In [None]:
import numpy as np 
import pandas as pd 
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import time
import os
for dirname, _, filenames in os.walk('/kaggle/input/tweet-sentiment-extraction/'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

We will be using the "Tweet Sentiment Extraction" data from Kaggle, in particular, the "text" and the "sentiment" features.

In [None]:
data = pd.read_csv('/kaggle/input/tweet-sentiment-extraction/train.csv')

In [None]:
data.shape

In [None]:
data.head()

In [None]:
data.tail()

In [None]:
data.info()

In [None]:
data.isnull().sum()

In [None]:
data.dropna(inplace=True)

In [None]:
data.info()

Initialize the sentiment analyzer, and calculating the sentiment scores of each sentences in the "text" feature:

In [None]:
analyzer = SentimentIntensityAnalyzer()

In [None]:
def calculate_sentiment_scores(sentence):
    sntmnt = analyzer.polarity_scores(sentence)['compound']
    return(sntmnt)

In [None]:
start = time.time()

eng_snt_score =  []

for comment in data.text.to_list():
    snts_score = calculate_sentiment_scores(comment)
    eng_snt_score.append(snts_score)
    
end = time.time()

# total time taken
print(f"Runtime of the program is {(end - start)/60} minutes or {(end - start)} seconds")

In [None]:
data['sentiment_score'] = np.array(eng_snt_score)
data.head()

In [None]:
i = 0

vader_sentiment = [ ]

while(i<len(data)):
    if ((data.iloc[i]['sentiment_score'] >= 0.05)):
        vader_sentiment.append('positive')
        i = i+1
    elif ((data.iloc[i]['sentiment_score'] > -0.05) & (data.iloc[i]['sentiment_score'] < 0.05)):
        vader_sentiment.append('neutral')
        i = i+1
    elif ((data.iloc[i]['sentiment_score'] <= -0.05)):
        vader_sentiment.append('negative')
        i = i+1

In [None]:
data['vader_sentiment_labels'] = vader_sentiment

In [None]:
data.head(15)

In [None]:
data['actual_label'] = data['sentiment'].map({'positive': 1, 'neutral': 0, 'negative':-1})
data['predicted_label'] = data['vader_sentiment_labels'].map({'positive': 1, 'neutral': 0, 'negative':-1})

data.head()

In [None]:
from sklearn.metrics import accuracy_score

In [None]:
y_act = data['actual_label'].values
y_pred = data['predicted_label'].values

In [None]:
accuracy_score(y_act, y_pred)

**64% Accuracy** is not bad for **classifying sentiments of 27481 sentences in about 3 seconds**! Moreover, we did not apply any text preprocessing, this accuracy may be increased through a proper preprocessing. The main advantage may be the fact that no labeling process is involved, however, we would prefer an NLP approach for achieving higher accuracy.