<a href="https://colab.research.google.com/github/souparnabose99/Sentiment-Analysis-NLTK/blob/main/Vader_Sentiment_Analysis_NLTK.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Install NLTK & VADER:

In [1]:
import nltk

In [2]:
nltk.download("vader_lexicon")

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


True

In [4]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

sent_int_analyzer = SentimentIntensityAnalyzer()
# Vader sia takes a string and returns a dictionary of score in 4 categories: negative, neutral, positive, compound

In [5]:
sample = "This is a good movie"
sent_int_analyzer.polarity_scores(sample)

{'compound': 0.4404, 'neg': 0.0, 'neu': 0.508, 'pos': 0.492}

In [6]:
sample = "This is a really awesome movie"
sent_int_analyzer.polarity_scores(sample)

{'compound': 0.659, 'neg': 0.0, 'neu': 0.477, 'pos': 0.523}

In [7]:
sample = "This is the best, most awesome movie EVER MADE!!"
sent_int_analyzer.polarity_scores(sample)

{'compound': 0.88, 'neg': 0.0, 'neu': 0.433, 'pos': 0.567}

In [8]:
sample = "This is the WORST movie EVER MADE!!"
sent_int_analyzer.polarity_scores(sample)

{'compound': -0.7519, 'neg': 0.474, 'neu': 0.526, 'pos': 0.0}

### Load Dataset:

In [9]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import requests

In [13]:
!wget https://raw.githubusercontent.com/souparnabose99/Sentiment-Analysis-NLTK/main/amazonreviews.tsv

--2021-06-26 05:17:55--  https://raw.githubusercontent.com/souparnabose99/Sentiment-Analysis-NLTK/main/amazonreviews.tsv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4458101 (4.3M) [text/plain]
Saving to: ‘amazonreviews.tsv.1’


2021-06-26 05:17:55 (51.3 MB/s) - ‘amazonreviews.tsv.1’ saved [4458101/4458101]



In [14]:
pd.set_option('Display.max_columns', None)

df =  pd.read_csv('amazonreviews.tsv.1', sep='\t')
df.head(10)

Unnamed: 0,label,review
0,pos,Stuning even for the non-gamer: This sound tra...
1,pos,The best soundtrack ever to anything.: I'm rea...
2,pos,Amazing!: This soundtrack is my favorite music...
3,pos,Excellent Soundtrack: I truly like this soundt...
4,pos,"Remember, Pull Your Jaw Off The Floor After He..."
5,pos,an absolute masterpiece: I am quite sure any o...
6,neg,"Buyer beware: This is a self-published book, a..."
7,pos,Glorious story: I loved Whisper of the wicked ...
8,pos,A FIVE STAR BOOK: I just finished reading Whis...
9,pos,Whispers of the Wicked Saints: This was a easy...


### Review counts:

In [16]:
df['label'].value_counts()

neg    5097
pos    4903
Name: label, dtype: int64

### Check missing values:

In [17]:
df.isnull().sum()

label     0
review    0
dtype: int64

### Checking blank spaces:

In [21]:
blanks = []

for ind, lab, rev in df.itertuples():
  if type(rev)== str:
      if rev.isspace():
        blanks.append(ind)

In [22]:
blanks

[]

### Checking sentiment score for single review:

In [23]:
df.iloc[0]['review']

'Stuning even for the non-gamer: This sound track was beautiful! It paints the senery in your mind so well I would recomend it even to people who hate vid. game music! I have played the game Chrono Cross but out of all of the games I have ever played it has the best music! It backs away from crude keyboarding and takes a fresher step with grate guitars and soulful orchestras. It would impress anyone who cares to listen! ^_^'

In [24]:
sent_int_analyzer.polarity_scores(df.iloc[0]['review'])

{'compound': 0.9454, 'neg': 0.088, 'neu': 0.669, 'pos': 0.243}

### Adding scores to dataframe:

In [26]:
df['scores'] = df['review'].apply(lambda review: sent_int_analyzer.polarity_scores(review))

df.head(10)

Unnamed: 0,label,review,scores
0,pos,Stuning even for the non-gamer: This sound tra...,"{'neg': 0.088, 'neu': 0.669, 'pos': 0.243, 'co..."
1,pos,The best soundtrack ever to anything.: I'm rea...,"{'neg': 0.018, 'neu': 0.837, 'pos': 0.145, 'co..."
2,pos,Amazing!: This soundtrack is my favorite music...,"{'neg': 0.04, 'neu': 0.692, 'pos': 0.268, 'com..."
3,pos,Excellent Soundtrack: I truly like this soundt...,"{'neg': 0.09, 'neu': 0.615, 'pos': 0.295, 'com..."
4,pos,"Remember, Pull Your Jaw Off The Floor After He...","{'neg': 0.0, 'neu': 0.746, 'pos': 0.254, 'comp..."
5,pos,an absolute masterpiece: I am quite sure any o...,"{'neg': 0.014, 'neu': 0.737, 'pos': 0.249, 'co..."
6,neg,"Buyer beware: This is a self-published book, a...","{'neg': 0.124, 'neu': 0.806, 'pos': 0.069, 'co..."
7,pos,Glorious story: I loved Whisper of the wicked ...,"{'neg': 0.064, 'neu': 0.588, 'pos': 0.349, 'co..."
8,pos,A FIVE STAR BOOK: I just finished reading Whis...,"{'neg': 0.113, 'neu': 0.712, 'pos': 0.174, 'co..."
9,pos,Whispers of the Wicked Saints: This was a easy...,"{'neg': 0.033, 'neu': 0.777, 'pos': 0.19, 'com..."


In [28]:
df['compound'] = df['scores'].apply(lambda x: x.get('compound'))
df.head(10)

Unnamed: 0,label,review,scores,compound
0,pos,Stuning even for the non-gamer: This sound tra...,"{'neg': 0.088, 'neu': 0.669, 'pos': 0.243, 'co...",0.9454
1,pos,The best soundtrack ever to anything.: I'm rea...,"{'neg': 0.018, 'neu': 0.837, 'pos': 0.145, 'co...",0.8957
2,pos,Amazing!: This soundtrack is my favorite music...,"{'neg': 0.04, 'neu': 0.692, 'pos': 0.268, 'com...",0.9858
3,pos,Excellent Soundtrack: I truly like this soundt...,"{'neg': 0.09, 'neu': 0.615, 'pos': 0.295, 'com...",0.9814
4,pos,"Remember, Pull Your Jaw Off The Floor After He...","{'neg': 0.0, 'neu': 0.746, 'pos': 0.254, 'comp...",0.9781
5,pos,an absolute masterpiece: I am quite sure any o...,"{'neg': 0.014, 'neu': 0.737, 'pos': 0.249, 'co...",0.99
6,neg,"Buyer beware: This is a self-published book, a...","{'neg': 0.124, 'neu': 0.806, 'pos': 0.069, 'co...",-0.8744
7,pos,Glorious story: I loved Whisper of the wicked ...,"{'neg': 0.064, 'neu': 0.588, 'pos': 0.349, 'co...",0.9908
8,pos,A FIVE STAR BOOK: I just finished reading Whis...,"{'neg': 0.113, 'neu': 0.712, 'pos': 0.174, 'co...",0.8353
9,pos,Whispers of the Wicked Saints: This was a easy...,"{'neg': 0.033, 'neu': 0.777, 'pos': 0.19, 'com...",0.8196


In [29]:
df['compound_label'] = df['compound'].apply(lambda score: 'pos' if score>=0 else 'neg')
df.head()

Unnamed: 0,label,review,scores,compound,compound_label
0,pos,Stuning even for the non-gamer: This sound tra...,"{'neg': 0.088, 'neu': 0.669, 'pos': 0.243, 'co...",0.9454,pos
1,pos,The best soundtrack ever to anything.: I'm rea...,"{'neg': 0.018, 'neu': 0.837, 'pos': 0.145, 'co...",0.8957,pos
2,pos,Amazing!: This soundtrack is my favorite music...,"{'neg': 0.04, 'neu': 0.692, 'pos': 0.268, 'com...",0.9858,pos
3,pos,Excellent Soundtrack: I truly like this soundt...,"{'neg': 0.09, 'neu': 0.615, 'pos': 0.295, 'com...",0.9814,pos
4,pos,"Remember, Pull Your Jaw Off The Floor After He...","{'neg': 0.0, 'neu': 0.746, 'pos': 0.254, 'comp...",0.9781,pos


### Comparing compound_labels with original labels:

In [30]:
df['label'].value_counts()

neg    5097
pos    4903
Name: label, dtype: int64

In [31]:
df['compound_label'].value_counts()

pos    6942
neg    3058
Name: compound_label, dtype: int64

In [32]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

accuracy_score(df['label'], df['compound_label'])

0.7091

In [34]:
print(classification_report(df['label'], df['compound_label']))

              precision    recall  f1-score   support

         neg       0.86      0.51      0.64      5097
         pos       0.64      0.91      0.75      4903

    accuracy                           0.71     10000
   macro avg       0.75      0.71      0.70     10000
weighted avg       0.75      0.71      0.70     10000



In [35]:
print(confusion_matrix(df['label'], df['compound_label']))

[[2623 2474]
 [ 435 4468]]


The Vader model performs well for positive reviews but poorly for negative reviews