# Vader Sentiment Analysis

**Vader** is an excellent library for getting rapid sentiment analysis results, particularly for the *social media* text. It has some great **advantages** which could be counted as the following:

* No labeling process is required!
* Fast and deployable,
* Not bad accuracy even without Text Preprocessing.

However, there are some main **disadvantages** as well, and the primary one is the fact that it is a rule-based approach, it utilizes the predefined polarity scores of each words (and emojis!) by summing them up to get the final score of the sentence or paragraph, depending on the context that we would like to extract the sentiment. 

Another disadvantage that I have discored thus far, in connection with the first one, is that we cannot go beyond a certain accuracy (compared to NLP approaches), usually I prefer training an NLP model (such as BERT etc.) for attaining higher success rates. In a future notebook, I intent to compare the result with BERT Model.

* Rule-Based sentiment analysis & no learning.

In [1]:
!pip install vaderSentiment

Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
[K     |████████████████████████████████| 125 kB 4.5 MB/s eta 0:00:01
Installing collected packages: vaderSentiment
Successfully installed vaderSentiment-3.3.2


In [2]:
import numpy as np 
import pandas as pd 
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import time
import os
for dirname, _, filenames in os.walk('/kaggle/input/tweet-sentiment-extraction/'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/tweet-sentiment-extraction/train.csv
/kaggle/input/tweet-sentiment-extraction/test.csv
/kaggle/input/tweet-sentiment-extraction/sample_submission.csv


We will be using the "Tweet Sentiment Extraction" data from Kaggle, in particular, the "text" and the "sentiment" features.

In [3]:
data = pd.read_csv('/kaggle/input/tweet-sentiment-extraction/train.csv')

In [4]:
data.shape

(27481, 4)

In [5]:
data.head()

Unnamed: 0,textID,text,selected_text,sentiment
0,cb774db0d1,"I`d have responded, if I were going","I`d have responded, if I were going",neutral
1,549e992a42,Sooo SAD I will miss you here in San Diego!!!,Sooo SAD,negative
2,088c60f138,my boss is bullying me...,bullying me,negative
3,9642c003ef,what interview! leave me alone,leave me alone,negative
4,358bd9e861,"Sons of ****, why couldn`t they put them on t...","Sons of ****,",negative


In [6]:
data.tail()

Unnamed: 0,textID,text,selected_text,sentiment
27476,4eac33d1c0,wish we could come see u on Denver husband l...,d lost,negative
27477,4f4c4fc327,I`ve wondered about rake to. The client has ...,", don`t force",negative
27478,f67aae2310,Yay good for both of you. Enjoy the break - y...,Yay good for both of you.,positive
27479,ed167662a5,But it was worth it ****.,But it was worth it ****.,positive
27480,6f7127d9d7,All this flirting going on - The ATG smiles...,All this flirting going on - The ATG smiles. Y...,neutral


In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27481 entries, 0 to 27480
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   textID         27481 non-null  object
 1   text           27480 non-null  object
 2   selected_text  27480 non-null  object
 3   sentiment      27481 non-null  object
dtypes: object(4)
memory usage: 858.9+ KB


In [8]:
data.isnull().sum()

textID           0
text             1
selected_text    1
sentiment        0
dtype: int64

In [9]:
data.dropna(inplace=True)

In [10]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 27480 entries, 0 to 27480
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   textID         27480 non-null  object
 1   text           27480 non-null  object
 2   selected_text  27480 non-null  object
 3   sentiment      27480 non-null  object
dtypes: object(4)
memory usage: 1.0+ MB


Initialize the sentiment analyzer, and calculating the sentiment scores of each sentences in the "text" feature:

In [11]:
analyzer = SentimentIntensityAnalyzer()

In [12]:
def calculate_sentiment_scores(sentence):
    sntmnt = analyzer.polarity_scores(sentence)['compound']
    return(sntmnt)

In [13]:
start = time.time()

eng_snt_score =  []

for comment in data.text.to_list():
    snts_score = calculate_sentiment_scores(comment)
    eng_snt_score.append(snts_score)
    
end = time.time()

# total time taken
print(f"Runtime of the program is {(end - start)/60} minutes or {(end - start)} seconds")

Runtime of the program is 0.06339306831359863 minutes or 3.803584098815918 seconds


In [14]:
data['sentiment_score'] = np.array(eng_snt_score)
data.head()

Unnamed: 0,textID,text,selected_text,sentiment,sentiment_score
0,cb774db0d1,"I`d have responded, if I were going","I`d have responded, if I were going",neutral,0.0
1,549e992a42,Sooo SAD I will miss you here in San Diego!!!,Sooo SAD,negative,-0.7437
2,088c60f138,my boss is bullying me...,bullying me,negative,-0.5994
3,9642c003ef,what interview! leave me alone,leave me alone,negative,-0.3595
4,358bd9e861,"Sons of ****, why couldn`t they put them on t...","Sons of ****,",negative,0.0


In [15]:
i = 0

vader_sentiment = [ ]

while(i<len(data)):
    if ((data.iloc[i]['sentiment_score'] >= 0.05)):
        vader_sentiment.append('positive')
        i = i+1
    elif ((data.iloc[i]['sentiment_score'] > -0.05) & (data.iloc[i]['sentiment_score'] < 0.05)):
        vader_sentiment.append('neutral')
        i = i+1
    elif ((data.iloc[i]['sentiment_score'] <= -0.05)):
        vader_sentiment.append('negative')
        i = i+1

In [16]:
data['vader_sentiment_labels'] = vader_sentiment

In [17]:
data.head(15)

Unnamed: 0,textID,text,selected_text,sentiment,sentiment_score,vader_sentiment_labels
0,cb774db0d1,"I`d have responded, if I were going","I`d have responded, if I were going",neutral,0.0,neutral
1,549e992a42,Sooo SAD I will miss you here in San Diego!!!,Sooo SAD,negative,-0.7437,negative
2,088c60f138,my boss is bullying me...,bullying me,negative,-0.5994,negative
3,9642c003ef,what interview! leave me alone,leave me alone,negative,-0.3595,negative
4,358bd9e861,"Sons of ****, why couldn`t they put them on t...","Sons of ****,",negative,0.0,neutral
5,28b57f3990,http://www.dothebouncy.com/smf - some shameles...,http://www.dothebouncy.com/smf - some shameles...,neutral,0.4215,positive
6,6e0c6d75b1,2am feedings for the baby are fun when he is a...,fun,positive,0.7506,positive
7,50e14c0bb8,Soooo high,Soooo high,neutral,0.0,neutral
8,e050245fbd,Both of you,Both of you,neutral,0.0,neutral
9,fc2cbefa9d,Journey!? Wow... u just became cooler. hehe....,Wow... u just became cooler.,positive,0.695,positive


In [18]:
data['actual_label'] = data['sentiment'].map({'positive': 1, 'neutral': 0, 'negative':-1})
data['predicted_label'] = data['vader_sentiment_labels'].map({'positive': 1, 'neutral': 0, 'negative':-1})

data.head()

Unnamed: 0,textID,text,selected_text,sentiment,sentiment_score,vader_sentiment_labels,actual_label,predicted_label
0,cb774db0d1,"I`d have responded, if I were going","I`d have responded, if I were going",neutral,0.0,neutral,0,0
1,549e992a42,Sooo SAD I will miss you here in San Diego!!!,Sooo SAD,negative,-0.7437,negative,-1,-1
2,088c60f138,my boss is bullying me...,bullying me,negative,-0.5994,negative,-1,-1
3,9642c003ef,what interview! leave me alone,leave me alone,negative,-0.3595,negative,-1,-1
4,358bd9e861,"Sons of ****, why couldn`t they put them on t...","Sons of ****,",negative,0.0,neutral,-1,0


In [19]:
from sklearn.metrics import accuracy_score

In [20]:
y_act = data['actual_label'].values
y_pred = data['predicted_label'].values

In [21]:
accuracy_score(y_act, y_pred)

0.6362809315866085

**64% Accuracy** is not bad for **classifying sentiments of 27481 sentences in about 3 seconds**! Moreover, we did not apply any text preprocessing, this accuracy may be increased through a proper preprocessing. The main advantage may be the fact that no labeling process is involved, however, we would prefer an NLP approach for achieving higher accuracy.