# VADER Sentiment Analyser (SA) for CX web app tool

#### Author: Felipe Valencia - Data Scientist at Dataplicada


## Project Introduction

This program is a vital component of a larger initiative aimed at enhancing the accuracy of sentiment analysis in customer experience tools. We are investigating the differences between three prominent sentiment analysis models: VADER (Valence Aware Dictionary and sEntiment Reasoner), TextBlob, and a a fine-tune checkpoint of DistilBERT-base-uncased model. Our goal is to identify the most effective approach for implementing sentiment analysis in a web-based feedback tool, enabling businesses to upload multiple comments and reviews for evaluation.

## VADER for SA Introduction

The program analyses customer feedback using the VADER* model to classify sentiments into a 5-star metric, which can then be used for deeper data analysis.

_*VADER is a rule-based sentiment analysis tool tailored for social media text._


**Note:** _As we progress, we will rigorously test VADER, TextBlob, and DistilBERT-base-uncased for this classification task, prioritising accuracy while also considering factors such as server storage, speed, and CPU usage. This comprehensive analysis will ensure we choose the best sentiment analysis option for our users, ultimately enhancing their understanding of customer feedback and improving overall service quality._

In [None]:
# Install libraries


In [1]:
# Load libraries
import pandas as pd
import numpy as np

In [2]:
# Read CSV

data_file = pd.read_csv("Datafiniti_Hotel_Reviews.csv")

In [3]:
# Convert ratings from float to integer

data_file['reviews.rating'] = data_file['reviews.rating'].astype(int)

# Convert text to string

data_file['reviews.text'] = data_file['reviews.text'].astype(str)

In [4]:
# Simplify the dataframe

data = data_file[['id', 'reviews.rating', 'reviews.text']]

In [5]:
data

Unnamed: 0,id,reviews.rating,reviews.text
0,AVwc252WIN2L1WUfpqLP,5,Our experience at Rancho Valencia was absolute...
1,AVwc252WIN2L1WUfpqLP,5,Amazing place. Everyone was extremely warm and...
2,AVwc252WIN2L1WUfpqLP,5,We booked a 3 night stay at Rancho Valencia to...
3,AVwdOclqIN2L1WUfti38,2,Currently in bed writing this for the past hr ...
4,AVwdOclqIN2L1WUfti38,5,I live in Md and the Aloft is my Home away fro...
...,...,...,...
9995,AVwd4TMv_7pvs4fz-Ers,3,It is hard for me to review an oceanfront hote...
9996,AVwdRp4DIN2L1WUfuGZZ,4,"I live close by, and needed to stay somewhere ..."
9997,AVwd1TbkByjofQCxs6FH,4,Rolled in 11:30 laid out heads down woke up to...
9998,AVwdHbizIN2L1WUfsXto,1,Absolutely terrible..I was told I was being gi...


In [6]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

# Start the Vader Sentiment Analyser

sid = SentimentIntensityAnalyzer()

In [7]:
# Generate the 5-star classification with Vader

def sentiment_Vader(text):
    """
    Analyzes text strings and determines the overall polarity of the text.
    It classifies between 1 to 5 stars based on the compound polarity score.
    """
    over_all_polarity = sid.polarity_scores(text)
    compound_score = over_all_polarity['compound']

    if compound_score >= 0.6:
        return 5  # Very positive
    elif compound_score >= 0.2:
        return 4  # Positive
    elif compound_score >= -0.2:
        return 3  # Neutral
    elif compound_score >= -0.6:
        return 2  # Negative
    else:
        return 1  # Very negative

In [8]:
import time

# Process the reviews
print("Starting sentiment analysis...")
start_time = time.time()

# Add the Vader Sentiment to the dataframe
data['vader.sentiment'] = data['reviews.text'].apply(lambda x: sentiment_Vader(x) if x != "nan" else '') # This filters out the observations where no text is found

print(f"Processing completed in {(time.time() - start_time) / 60:.2f} minutes")

Starting sentiment analysis...
Processing completed in 0.04 minutes


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['vader.sentiment'] = data['reviews.text'].apply(lambda x: sentiment_Vader(x) if x != "nan" else '') # This filters out the observations where no text is found


In [9]:
data

Unnamed: 0,id,reviews.rating,reviews.text,vader.sentiment
0,AVwc252WIN2L1WUfpqLP,5,Our experience at Rancho Valencia was absolute...,5
1,AVwc252WIN2L1WUfpqLP,5,Amazing place. Everyone was extremely warm and...,5
2,AVwc252WIN2L1WUfpqLP,5,We booked a 3 night stay at Rancho Valencia to...,5
3,AVwdOclqIN2L1WUfti38,2,Currently in bed writing this for the past hr ...,3
4,AVwdOclqIN2L1WUfti38,5,I live in Md and the Aloft is my Home away fro...,5
...,...,...,...,...
9995,AVwd4TMv_7pvs4fz-Ers,3,It is hard for me to review an oceanfront hote...,5
9996,AVwdRp4DIN2L1WUfuGZZ,4,"I live close by, and needed to stay somewhere ...",5
9997,AVwd1TbkByjofQCxs6FH,4,Rolled in 11:30 laid out heads down woke up to...,5
9998,AVwdHbizIN2L1WUfsXto,1,Absolutely terrible..I was told I was being gi...,3


In [10]:
# Save DataFrame to CSV
data.to_csv('output_Vader.csv', index=False)  # Set index=False to avoid saving row indices