## ASSIGNMENT 4
## SENTIMENT ANALYSIS

### IMPORT PANDAS AND READ .CSV FILE

In [41]:
# Import pandas package and reading .csv file

import pandas as pd 
df = pd.read_csv( 'reviews_data.csv')
df.sample(10)

Unnamed: 0,name,location,Date,Rating,Review,Image_Links
366,Brad,"Elk Grove, CA","Reviewed Aug. 11, 2017",3.0,Starbucks employees that gotten out of hand. T...,['No Images']
639,Larry,"Livermore, CA","Reviewed Sept. 4, 2012",2.0,I go regularly to the Starbucks at 223 South V...,['No Images']
781,Mari,"Claremont, CA","Reviewed July 5, 2009",,I asked for a tall coffee that would be poured...,['No Images']
438,margaret,"Tacoma, WA","Reviewed March 31, 2016",1.0,Has anyone been annoyed with the service at St...,['No Images']
121,Linda,"Fountain Hills, AZ","Reviewed June 1, 2021",1.0,I contacted Starbucks customer service to ask ...,['No Images']
215,Cathi,Canada,"Reviewed Dec. 19, 2018",1.0,My sweetheart and I went into our neighborhood...,['No Images']
210,Francois,"San Marcos, CA","Reviewed Jan. 18, 2019",5.0,Aside from great drinks and good food the serv...,['No Images']
810,Daniel,"Loveland, CO","Reviewed Dec. 24, 2008",,No Review Text,['No Images']
466,VICTORIA,"Salt Lake City, UT","Reviewed Oct. 12, 2015",1.0,I love Starbucks!!! I frequent Starbucks every...,['No Images']
170,Sharon,"Grand Rapids, MI","Reviewed Oct. 1, 2019",1.0,I have been a loyal customer of Starbucks for ...,['No Images']


### NLTK AND OPINION LEXICON 

The Natural Language Toolkit is used to access the Opinion Lexicon, which is a lexicon of positive and negative opinion words or sentiment words. 

In [42]:
# Importing ntlk package 

from sklearn import preprocessing
import nltk
nltk.download('opinion_lexicon')
from nltk.corpus import opinion_lexicon                     # using opinion lexicon dataset from nltk.corpus
from nltk.tokenize import word_tokenize

print('Total number of words in opinion lexicon', len(opinion_lexicon.words()))
print('Examples of positive words in opinion lexicon',      # printing 10 positive opinion lexicons
      opinion_lexicon.positive()[:10])
print('Examples of negative words in opinion lexicon',      # printing 10 negative opinion lexicons
      opinion_lexicon.negative()[:10])


Total number of words in opinion lexicon 6789
Examples of positive words in opinion lexicon ['a+', 'abound', 'abounds', 'abundance', 'abundant', 'accessable', 'accessible', 'acclaim', 'acclaimed', 'acclamation']
Examples of negative words in opinion lexicon ['2-faced', '2-faces', 'abnormal', 'abolish', 'abominable', 'abominably', 'abominate', 'abomination', 'abort', 'aborted']


[nltk_data] Downloading package opinion_lexicon to
[nltk_data]     /Users/sid/nltk_data...
[nltk_data]   Package opinion_lexicon is already up-to-date!


### DICTIONARY FOR SCORING REVIEWS

In [43]:
# Let's create a dictionary which we can use it for scoring our review text

nltk.download('punkt')
df.rename(columns={"Review": "text"}, inplace=True)
pos_score = 1
neg_score = -1
word_dict = {}
 
# Adding the positive words to the dictionary

for word in opinion_lexicon.positive():
        word_dict[word] = pos_score
      
# Adding the negative words to the dictionary

for word in opinion_lexicon.negative():
        word_dict[word] = neg_score


[nltk_data] Downloading package punkt to /Users/sid/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


### BING_LIU_SCORE FUNCTION

To group a dataframe df by unique values in the 'overall' column and calculate the mean of the 'Bing_Liu_Score' column for each group to give avg sentiment score.

In [44]:
# bing_liu_score function 

def bing_liu_score(text):
    sentiment_score = 0
    bag_of_words = word_tokenize(text.lower())
    for word in bag_of_words:
        if word in word_dict:
            sentiment_score += word_dict[word]
    return sentiment_score 

### REPLACING NULL VALUES

In [45]:
# filling empty claues with 'no review'

df['text'].fillna('no review', inplace=True)
df['Bing_Liu_Score'] = df['text'].apply(bing_liu_score)

### HEAD METHOD


In [46]:
# Using head() on dataframe

df[['Rating',"text", 'Bing_Liu_Score']].head(10)

Unnamed: 0,Rating,text,Bing_Liu_Score
0,5.0,Amber and LaDonna at the Starbucks on Southwes...,5
1,5.0,** at the Starbucks by the fire station on 436...,9
2,5.0,I just wanted to go out of my way to recognize...,3
3,5.0,Me and my friend were at Starbucks and my card...,6
4,5.0,I’m on this kick of drinking 5 cups of warm wa...,10
5,1.0,We had to correct them on our order 3 times. T...,1
6,1.0,I have tried Starbucks several different times...,-1
7,1.0,Starbucks near me just launched new fall foods...,1
8,1.0,"I ordered online for the Reisterstown Rd, St T...",-3
9,1.0,Staff at the Smythe St. Superstore location in...,-5


### GROUPBY OVERALL

In [47]:
#grouping by unique values 

df.groupby('Rating').agg({'Bing_Liu_Score':'mean'})

Unnamed: 0_level_0,Bing_Liu_Score
Rating,Unnamed: 1_level_1
1.0,-0.682927
2.0,-0.070707
3.0,1.424242
4.0,2.358974
5.0,4.012048


## APPLYING F-1 SCORING FOR BING-LIU ALGORITHM

we applied f-1 score to evaluate performance of algorithm------edit

In [50]:
# Apply bing_liu_score function to calculate sentiment scores for each review
df['Bing_Liu_Score'] = df['text'].apply(bing_liu_score)

# Define true labels based on the 'Rating' column
df['True_Labels'] = df['Rating'].apply(lambda x: 'positive' if x >= 4 else 'negative' if x <= 2 else 'neutral')

# Define predicted scores based on the 'Bing_Liu_Score' column
df['Predicted_Scores'] = df['Bing_Liu_Score'].apply(lambda x: 'positive' if x > 0 else 'negative' if x < 0 else 'neutral')

# Calculate F1-score
f1 = f1_score(df['True_Labels'], df['Predicted_Scores'], average='weighted')

print("F1-score:", f1)



F1-score: 0.5401984518579771


## VADER LEXICON SCORING ALGORITHM

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically used to perform analysis on sentiments expressed in social media. It assigns sentiment scores to text documents, indicating the positivity, negativity, or neutrality of the sentiment expressed in the text.

In [53]:
import nltk
nltk.download('vader_lexicon')
import pandas as pd 
from nltk.sentiment import SentimentIntensityAnalyzer

df = pd.read_csv('reviews_data.csv')

sid = SentimentIntensityAnalyzer()

def vader_score(text):
    sentiment_score = sid.polarity_scores(text)['compound']
    return sentiment_score

df['Review'].fillna('no review', inplace=True)

df['VADER_Score'] = df['Review'].apply(vader_score)

print(df[['Rating', 'Review', 'VADER_Score']].head(10))

print(df.groupby('Rating').agg({'VADER_Score': 'mean'}))


[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/sid/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


   Rating                                             Review  VADER_Score
0     5.0  Amber and LaDonna at the Starbucks on Southwes...       0.8991
1     5.0  ** at the Starbucks by the fire station on 436...       0.7766
2     5.0  I just wanted to go out of my way to recognize...       0.5242
3     5.0  Me and my friend were at Starbucks and my card...       0.9698
4     5.0  I’m on this kick of drinking 5 cups of warm wa...       0.9793
5     1.0  We had to correct them on our order 3 times. T...      -0.7269
6     1.0  I have tried Starbucks several different times...      -0.8963
7     1.0  Starbucks near me just launched new fall foods...       0.8994
8     1.0  I ordered online for the Reisterstown Rd, St T...      -0.8316
9     1.0  Staff at the Smythe St. Superstore location in...      -0.7912
        VADER_Score
Rating             
1.0       -0.149688
2.0       -0.020645
3.0        0.159991
4.0        0.642367
5.0        0.724286


## APPLYING F-1 SCORING

we used F-1 score algorithm to evaluate the performance of a model in predicting two classes: positive and negative (or true and false). It combines the precision and recall of a model into a single metric, providing a balance between these two aspects of performance

In [54]:
df['VADER_Score'] = df['Review'].apply(vader_score)

df['True_Labels'] = df['Rating'].apply(lambda x: 'positive' if x >= 4 else 'negative' if x <= 2 else 'neutral')

df['Predicted_Labels'] = df['VADER_Score'].apply(lambda x: 'positive' if x > 0 else 'negative' if x < 0 else 'neutral')

f1 = f1_score(df['True_Labels'], df['Predicted_Labels'], average='weighted')

print("F1-score:", f1)

F1-score: 0.49143257430718185


## TEXTBLOB

In [55]:
import pandas as pd
from textblob import TextBlob
from sklearn.metrics import f1_score

df = pd.read_csv('reviews_data.csv')

def textblob_score(text):
    sentiment_score = TextBlob(text).sentiment.polarity
    return sentiment_score

df['Review'].fillna('no review', inplace=True)

df['TextBlob_Score'] = df['Review'].apply(textblob_score)

print(df[['Rating', 'Review', 'TextBlob_Score']].head(10))

print(df.groupby('Rating').agg({'TextBlob_Score': 'mean'}))


   Rating                                             Review  TextBlob_Score
0     5.0  Amber and LaDonna at the Starbucks on Southwes...        0.340816
1     5.0  ** at the Starbucks by the fire station on 436...        0.289394
2     5.0  I just wanted to go out of my way to recognize...       -0.060714
3     5.0  Me and my friend were at Starbucks and my card...        0.263750
4     5.0  I’m on this kick of drinking 5 cups of warm wa...        0.356905
5     1.0  We had to correct them on our order 3 times. T...        0.008929
6     1.0  I have tried Starbucks several different times...        0.000000
7     1.0  Starbucks near me just launched new fall foods...        0.133144
8     1.0  I ordered online for the Reisterstown Rd, St T...       -0.500000
9     1.0  Staff at the Smythe St. Superstore location in...       -0.173810
        TextBlob_Score
Rating                
1.0          -0.029953
2.0           0.022190
3.0           0.112247
4.0           0.244469
5.0           0

## APPLYING F-1 SCORING 

In [57]:
df['True_Labels'] = df['Rating'].apply(lambda x: 'positive' if x >= 4 else 'negative' if x <= 2 else 'neutral')

df['Predicted_Labels'] = df['TextBlob_Score'].apply(lambda x: 'positive' if x > 0 else 'negative' if x < 0 else 'neutral')

f1 = f1_score(df['True_Labels'], df['Predicted_Labels'], average='weighted')

print("F1-score:", f1)

F1-score: 0.5395528343601881


In [59]:
#!pip install senticnet


Collecting senticnet
  Downloading senticnet-1.6-py3-none-any.whl (51.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.9/51.9 MB[0m [31m32.3 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0mm
[?25hInstalling collected packages: senticnet
Successfully installed senticnet-1.6
Note: you may need to restart the kernel to use updated packages.


## SENTICNET SCORING ALGORITHM

In [65]:
import pandas as pd
from senticnet.senticnet import SenticNet

df = pd.read_csv('reviews_data.csv')

sn = SenticNet()
def senticnet_score(text):
    words = text.split()
    sentiment_scores = [float(sn.polarity_value(word.lower())) if word.lower() in sn.data else 0 for word in words]
    if sentiment_scores:
        sentiment_score = sum(sentiment_scores) / len(sentiment_scores)
    else:
        sentiment_score = 0  
    return sentiment_score

df['Review'].fillna('no review', inplace=True)

df['SenticNet_Score'] = df['Review'].apply(senticnet_score)

print(df[['Rating', 'Review', 'SenticNet_Score']].head(10))

print(df.groupby('Rating').agg({'SenticNet_Score': 'mean'}))


   Rating                                             Review  SenticNet_Score
0     5.0  Amber and LaDonna at the Starbucks on Southwes...         0.024383
1     5.0  ** at the Starbucks by the fire station on 436...         0.035695
2     5.0  I just wanted to go out of my way to recognize...         0.069542
3     5.0  Me and my friend were at Starbucks and my card...         0.057500
4     5.0  I’m on this kick of drinking 5 cups of warm wa...         0.082932
5     1.0  We had to correct them on our order 3 times. T...        -0.036397
6     1.0  I have tried Starbucks several different times...        -0.001422
7     1.0  Starbucks near me just launched new fall foods...         0.066921
8     1.0  I ordered online for the Reisterstown Rd, St T...        -0.005109
9     1.0  Staff at the Smythe St. Superstore location in...         0.005837
        SenticNet_Score
Rating                 
1.0            0.019291
2.0            0.021642
3.0            0.034656
4.0            0.05071

## F-1 SCORING 

In [66]:
from sklearn.metrics import f1_score

df['True_Labels'] = df['Rating'].apply(lambda x: 'positive' if x >= 4 else 'negative' if x <= 2 else 'neutral')

df['Predicted_Labels'] = df['SenticNet_Score'].apply(lambda x: 'positive' if x > 0 else 'negative' if x < 0 else 'neutral')

f1 = f1_score(df['True_Labels'], df['Predicted_Labels'], average='weighted')

print("F1 Score:", f1)


F1 Score: 0.29808810480714026


## CONCLUSION