# TextBlob and Vader Sentiment Analyzers to Classify Movie reviews

This project involves classifying movie reviews based on user sentiment using two popular natural language processing tools: VADER (Valence Aware Dictionary and sEntiment Reasoner) and TextBlob. Both are rule-based sentiment analyzers designed to interpret the polarity of text data. VADER is particularly effective for analyzing sentiments expressed in social media and short texts, while TextBlob provides polarity and subjectivity scores using a lexicon-based approach. The objective is to compare the performance and results of both tools in determining whether a movie review is positive, negative, or neutral.



##  Importing Libraries:

In [1]:
import pandas as pd
from textblob import TextBlob
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

##  Loading data

In [2]:
# Path to the TSV file
file_path = r"C:\Users\sanas\Desktop\master\DSC550 Datasets\labeledTrainData.tsv\labeledTrainData.tsv"

# Read the TSV file into a pandas DataFrame
data = pd.read_csv(file_path, sep='\t')

# Display the first few rows of the DataFrame to check if it loaded properly
print(data.head())

       id  sentiment                                             review
0  5814_8          1  With all this stuff going down at the moment w...
1  2381_9          1  \The Classic War of the Worlds\" by Timothy Hi...
2  7759_3          0  The film starts with a manager (Nicholas Bell)...
3  3630_4          0  It must be assumed that those who praised this...
4  9495_8          1  Superbly trashy and wondrously unpretentious 8...


- How many of each positive and negative reviews are there?


In [4]:
# count the number of positive and negative reviews
count_reviews = data['sentiment'].value_counts()
# display the number of positive and negative reviews
print("Number of Positive Reviews: ", count_reviews.get(1,0))
print("Number of Negative Reviews: ", count_reviews.get(0,0))

Number of Positive Reviews:  12500
Number of Negative Reviews:  12500


### - Using TextBlob library to classify movie reviews


In [5]:
pip show textblob

Name: textblob
Version: 0.18.0.post0
Summary: Simple, Pythonic text processing. Sentiment analysis, part-of-speech tagging, noun phrase parsing, and more.
Home-page: 
Author: 
Author-email: Steven Loria <sloria1@gmail.com>
License: 
Location: C:\Users\sanas\anaconda3\Lib\site-packages
Requires: nltk
Required-by: 
Note: you may need to restart the kernel to use updated packages.




In [6]:
# Create a function to classify the review using TextBlob.
def classify_sentiment(review):
    polarity = TextBlob(review).sentiment.polarity
    return 1 if polarity>=0 else 0

In [7]:
# Apply the function to the review column 
data['predicted_sentiment'] = data['review'].apply(classify_sentiment)

In [8]:
# Display the columns (id, sentiment, predicted_sentiment)
data[['id', 'sentiment', 'predicted_sentiment']]

Unnamed: 0,id,sentiment,predicted_sentiment
0,5814_8,1,1
1,2381_9,1,1
2,7759_3,0,0
3,3630_4,0,1
4,9495_8,1,0
...,...,...,...
24995,3453_3,0,1
24996,5064_1,0,1
24997,10905_3,0,1
24998,10194_3,0,1


 - Now, let's Check the accuracy of the TextBlob sentiment analyser and compare it to random guessing.

In [14]:
# calculate the number of correct predictions
correct_predictions = (data['sentiment']==data['predicted_sentiment']).sum()
correct_predictions

17131

In [12]:
# count the total number of reviews
total_number_reviews = len(data)
total_number_reviews

25000

In [15]:
# Calculate the accuracy 
textblob_accuracy = correct_predictions/total_number_reviews
textblob_accuracy

0.68524

In [12]:
print(f'TextBlob Accuracy: {textblob_accuracy*100:.2f}%')

TextBlob Accuracy: 68.52%


In [13]:
# Compare the model accuracy to random guessing (random guessing assumed as 50%, because we have 2 classes of reviews)
if textblob_accuracy>0.5:
    print('Model performs better than random guessing')
else:
    print('random guessing performs better than Model') 

Model performs better than random guessing


### - Using VADER text sentiment analyzer to classify movie reviews

In [16]:
# download vader_lexicon package

nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\sanas\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [17]:
# Initialize VADER sentiment analyser
sia = SentimentIntensityAnalyzer()


In [18]:
# Create a function to classify the sentiment using VADER
def vader_sentiment(review):
    score = sia.polarity_scores(review)['compound'] # get compoud score from vader
    return 1 if score >=0 else 0

In [19]:
# Apply the VADER sentiment classification to the 'review' column:
data['vader_predicted_sentiment']= data['review'].apply(vader_sentiment)

In [20]:
# display the first few rows with vader_predicted_sentiment
data[['id', 'sentiment', 'vader_predicted_sentiment']].head()

Unnamed: 0,id,sentiment,vader_predicted_sentiment
0,5814_8,1,0
1,2381_9,1,1
2,7759_3,0,0
3,3630_4,0,0
4,9495_8,1,1


In [21]:
# calculate the vader correct predictions
vader_correct_predictions = (data['sentiment']==data['vader_predicted_sentiment']).sum()
vader_correct_predictions

17339

In [22]:
# Calculate the accuracy of the vader model:
vader_accuracy = vader_correct_predictions/total_number_reviews
vader_accuracy

0.69356

In [23]:
print(f'VADER Accuracy:{vader_accuracy *100:.2f}%')

VADER Accuracy:69.36%


In [24]:
# compare to random gessing:
if vader_accuracy>0.5:
    print("VADER model is performing better than random gessing.")
else:
    print("VADER model is less performing than random gessing.")

VADER model is performing better than random gessing.


The Vader sentiment analyser is performing slightly better than the TexBlob sentiment analyser, and both of them are fitting better than random guessing with the movie reviews data. The Vader sentiment classifier is specially tuned for social media/text data (twits/reviews/comments).