# ASSIGNMENT-4 SENTIMENT ANALYSIS

## SENTIMENT ANALYSIS

Sentiment analysis, also known as opinion mining, is a natural language processing (NLP) technique used to determine the sentiment or emotion expressed in a piece of text. The goal is to identify whether the expressed sentiment is positive, negative, or neutral.

In [31]:
#This dataset contains sentiment of users 
import pandas as pd 
df = pd.read_csv( 'sentiment.csv', encoding='latin1')
df.sample(10)


Unnamed: 0,textID,text,sentiment,Time of Tweet,Age of User,Country,Population -2020,Land Area (Km²),Density (P/Km²)
764,803e9c0931,While driving u come across aggressive driving...,negative,night,31-45,Turkey,84339067,770000.0,110
2291,1dda981140,and so another week begins. this one has got t...,positive,night,31-45,Madagascar,27691018,581795.0,48
962,2c5f261ff3,Jesus heals,positive,night,31-45,Uganda,45741007,199810.0,229
3291,695f322a0a,I hate when you cant sleep,negative,morning,46-60,Ireland,4937786,68890.0,72
2346,ecfcc8230f,ahaha thats okay and thanks,positive,morning,46-60,Singapore,5850342,700.0,8358
3460,77a7b3282b,_mommy oh well i hope she gets better,positive,noon,60-70,Estonia,1326535,42390.0,31
2426,63983dd792,"_dubbs curse you, igloo dwellers!!",negative,night,70-100,Croatia,4105267,55960.0,73
1397,5c8a453297,My head hurts.... Can wait to see the new pho...,neutral,night,70-100,Equatorial Guinea,1402985,28050.0,50
1808,932af12853,Hey! Let`s Follow each other! Wouldn`t that ju...,positive,night,70-100,Australia,25499884,7682300.0,3
1030,52fc580b72,"burning cd`s,,,,,,,,, **** outa blank disc`s",positive,noon,60-70,Estonia,1326535,42390.0,31


## IMPORTING LIBRARIES

In [32]:
#Importing preprocessing module from sklearn
from sklearn import preprocessing
import nltk

## DOWNLOADING OPINION LEXICON

In [33]:
#downloads the Opinion Lexicon dataset from NLTK
nltk.download('opinion_lexicon')


[nltk_data] Downloading package opinion_lexicon to
[nltk_data]     /Users/likithareddykotla/nltk_data...
[nltk_data]   Package opinion_lexicon is already up-to-date!


True

## IMPORTING COMPONENTS FROM NLTK

In [34]:
#opinion_lexicon provides access to positive and negative words
from nltk.corpus import opinion_lexicon
from nltk.tokenize import word_tokenize #word_tokenize is used for tokenization.



## PRINTING INFORMATION

In [35]:
# prints total number of words in the opinion lexicon
print('Total number of words in opinion lexicon', len(opinion_lexicon.words()))
print('Examples of positive words in opinion lexicon',
      opinion_lexicon.positive()[:10])     #display examples of positive and negative words from the opinion lexicon
print('Examples of negative words in opinion lexicon',
      opinion_lexicon.negative()[:10])


Total number of words in opinion lexicon 6789
Examples of positive words in opinion lexicon ['a+', 'abound', 'abounds', 'abundance', 'abundant', 'accessable', 'accessible', 'acclaim', 'acclaimed', 'acclamation']
Examples of negative words in opinion lexicon ['2-faced', '2-faces', 'abnormal', 'abolish', 'abominable', 'abominably', 'abominate', 'abomination', 'abort', 'aborted']


## DOWNLOADING  'PUNTK' TOKENIZER'

In [36]:
#'punkt' tokenizer is a pre-trained unsupervised machine learning model for tokenizing text into words
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/likithareddykotla/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

## CREATING SENTIMENT SCORING DICTIONARY FOR TEXT

In [37]:
# Let's create a dictionary which we can use for scoring our review text
df.rename(columns={"text": "text"}, inplace=True)

## INITIALIZING SCORES

In [38]:
#set the positive and negative scores to be assigned to words in the word_dict dictionary.
pos_score = 1
neg_score = -1


##  CREATING WORD DICTIONARY

In [39]:
#Adding the positive words to the dictionary
word_dict = {}
for word in opinion_lexicon.positive():
        word_dict[word] = pos_score       #iterates through the positive words in the Opinion Lexicon and assigns each word a positive score in the dictionary
      
# Adding the negative words to the dictionary
for word in opinion_lexicon.negative():
        word_dict[word] = neg_score       #iterates throug -ve words and assigns each a -ve score in dictionary


## BING_LIU_SCORE 

The bing_liu_score function is a sentiment scoring function that assigns a sentiment score to a piece of text based on a predefined dictionary (word_dict). 

In [40]:
# Sentiment analysis function using a simple bag-of-words approach
def bing_liu_score(text):
    sentiment_score = 0
    bag_of_words = word_tokenize(text.lower())  #word_tokenize splits the text into individual words.
    for word in bag_of_words:
        if word in word_dict:
            sentiment_score += word_dict[word]
    return sentiment_score  


## BING_LU_SCORE() FOR TEXT DF

In [41]:
# Replace missing values in the 'text' column with the string 'no review'
df['text'].fillna('no review', inplace=True)
df['Bing_Liu_Score'] = df['text'].apply(bing_liu_score)


## TOP10 ROWS

In [42]:
#Previewing the sentiment analysis results for the top 10 rows
df[['Time of Tweet',"text", 'Bing_Liu_Score']].head(10)


Unnamed: 0,Time of Tweet,text,Bing_Liu_Score
0,morning,Last session of the day http://twitpic.com/67ezh,0
1,noon,Shanghai is also really exciting (precisely -...,4
2,night,"Recession hit Veronique Branquinho, she has to...",-2
3,morning,happy bday!,1
4,noon,http://twitpic.com/4w75p - I like it!!,1
5,night,that`s great!! weee!! visitors!,1
6,morning,I THINK EVERYONE HATES ME ON HERE lol,-1
7,noon,"soooooo wish i could, but im in school and my...",0
8,night,and within a short time of the last clue all ...,0
9,morning,What did you get? My day is alright.. haven`...,0


## AVERAGE BING LU SENTIMENT SCORES GROUPED BY OVERALL RATINGS

In [43]:
#Calculating the mean Bing Liu sentiment score
df.groupby('Time of Tweet').agg({'Bing_Liu_Score':'mean'})


Unnamed: 0_level_0,Bing_Liu_Score
Time of Tweet,Unnamed: 1_level_1
morning,0.174024
night,0.198642
noon,0.241935


# FLAIR ALGORITHM

Flair supports multiple languages and provides pre-trained models for various NLP tasks, making it a versatile library for natural language processing.

## IMPORTING ALL REQUIRED MODULES

In [44]:
#flair for data manipulation and analysis and NLP
import pandas as pd
from flair.data import Sentence
from flair.models import TextClassifier    # flairs text classifier for text classification tasks
from flair.training_utils import EvaluationMetric
from flair.datasets import CSVClassificationCorpus



## LOADING AND PREPROCESSING

In [45]:
# Load the CSV file
df = pd.read_csv('sentiment.csv', encoding='latin1')
df.rename(columns={"text": "text"}, inplace=True)



## HANDLING MISSING VALUES

In [46]:
# Replace missing values in the 'texts' column with the string 'no review'
df['text'].fillna('no review', inplace=True)

# Load Flair sentiment model
classifier = TextClassifier.load('sentiment')


## SENTIMENT ANALYSIS USING FLAIR

In [47]:
# Function for sentiment analysis using Flair
def flair_sentiment(text):
    sentence = Sentence(text)
    classifier.predict(sentence)
    return sentence.labels[0].value
# Apply Flair sentiment analysis to the 'texts' column
df['Flair_Sentiment'] = df['text'].apply(flair_sentiment)



## PREVIEW FOR TOP10 ROWS


In [48]:
# Previewing the sentiment analysis results for the top 10 rows
print(df[['Time of Tweet', 'text', 'Flair_Sentiment']].head(10))


  Time of Tweet                                               text  \
0       morning  Last session of the day  http://twitpic.com/67ezh   
1          noon   Shanghai is also really exciting (precisely -...   
2         night  Recession hit Veronique Branquinho, she has to...   
3       morning                                        happy bday!   
4          noon             http://twitpic.com/4w75p - I like it!!   
5         night                    that`s great!! weee!! visitors!   
6       morning            I THINK EVERYONE HATES ME ON HERE   lol   
7          noon   soooooo wish i could, but im in school and my...   
8         night   and within a short time of the last clue all ...   
9       morning   What did you get?  My day is alright.. haven`...   

  Flair_Sentiment  
0        NEGATIVE  
1        POSITIVE  
2        NEGATIVE  
3        POSITIVE  
4        POSITIVE  
5        POSITIVE  
6        NEGATIVE  
7        NEGATIVE  
8        NEGATIVE  
9        NEGATIVE  


## COMPUTING AND DISPLAYING

In [49]:

# Calculating the distribution of Flair sentiment labels
sentiment_distribution = df['Flair_Sentiment'].value_counts()
print("Flair Sentiment Distribution:\n", sentiment_distribution)

Flair Sentiment Distribution:
 Flair_Sentiment
POSITIVE    1835
NEGATIVE    1699
Name: count, dtype: int64


 # AFINN ALGORITHM

AFINN algorithm assigns pre-defined scores (sentiment scores) to words based on their sentiment polarity. The scores range from negative to positive, with zero indicating neutrality.

## IMPORTING PACKAGES

In [50]:
#importing necessary modules
import pandas as pd
from afinn import Afinn    #Afinn library for sentiment analysis
from nltk.tokenize import word_tokenize    #NLTK for natural language processing, specifically word tokenization
import nltk
nltk.download('punkt')



[nltk_data] Downloading package punkt to
[nltk_data]     /Users/likithareddykotla/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

## LOADING DATA

In [51]:
# Load the CSV file
df = pd.read_csv('sentiment.csv', encoding='latin1')
df.rename(columns={"text": "text"}, inplace=True)



## REPLACING MISSING VALUES

In [52]:
# Replacing missing values in the 'texts' column with the string 'no review'
df['text'].fillna('no review', inplace=True)


## AFFINN SENTIMENT ANALYSIS

In [53]:
# Initialize Afinn
afinn = Afinn()

# Sentiment analysis function using Afinn
def afinn_score(text):
    return afinn.score(text)



## ANALYSIS AND RESULT

In [54]:
# Apply Afinn sentiment analysis to the 'texts' column
df['Afinn_Score'] = df['text'].apply(afinn_score)

# Previewing the sentiment analysis results for the top 10 rows
print(df[['Time of Tweet', 'text', 'Afinn_Score']].head(10))

# Calculating the mean Afinn sentiment score
mean_afinn_score = df.groupby('Time of Tweet')['Afinn_Score'].mean()
print("Mean Afinn Sentiment Score:\n", mean_afinn_score)


  Time of Tweet                                               text  \
0       morning  Last session of the day  http://twitpic.com/67ezh   
1          noon   Shanghai is also really exciting (precisely -...   
2         night  Recession hit Veronique Branquinho, she has to...   
3       morning                                        happy bday!   
4          noon             http://twitpic.com/4w75p - I like it!!   
5         night                    that`s great!! weee!! visitors!   
6       morning            I THINK EVERYONE HATES ME ON HERE   lol   
7          noon   soooooo wish i could, but im in school and my...   
8         night   and within a short time of the last clue all ...   
9       morning   What did you get?  My day is alright.. haven`...   

   Afinn_Score  
0          0.0  
1          6.0  
2         -4.0  
3          3.0  
4          2.0  
5          3.0  
6          0.0  
7          0.0  
8          0.0  
9          0.0  
Mean Afinn Sentiment Score:
 Time of Tweet

# VADER ALGORITHM

VADER is specifically designed for sentiment analysis and is good at handling sentiment in short texts, social media language, and emoticons.It provides compound scores of positive ,negative values.

## IMPORTING PACKAGES

In [55]:
#Importing libraries and modules
import pandas as pd
from nltk.sentiment import SentimentIntensityAnalyzer   #NLTK's SentimentIntensityAnalyzer for sentiment analysis
import nltk
from nltk.tokenize import word_tokenize
nltk.download('vader_lexicon')
nltk.download('punkt')


[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/likithareddykotla/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /Users/likithareddykotla/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

## LOADING DATASET

In [56]:
# Load Sentiment Data from CSV
df = pd.read_csv('sentiment.csv', encoding='latin1')

# Rename 'text' column to 'texts' for consistency
df.rename(columns={"text": "text"}, inplace=True)    

# Handle Missing Values in 'texts' Column
df['text'].fillna('no review', inplace=True)



## NLTKS SENTIMENTANALYSZER FOR SENTIMENT ANALYSIS

In [57]:
# Initialize NLTK's SentimentIntensityAnalyzer for sentiment analysis
sia = SentimentIntensityAnalyzer()

def vader_score(texts):
    return sia.polarity_scores(texts)['compound']

df['VADER_Score'] = df['text'].apply(vader_score)

print(df[['Time of Tweet', 'text', 'VADER_Score']].head(10))

mean_vader_score = df.groupby('Time of Tweet')['VADER_Score'].mean()
print("Mean VADER Sentiment Score:\n", mean_vader_score)


  Time of Tweet                                               text  \
0       morning  Last session of the day  http://twitpic.com/67ezh   
1          noon   Shanghai is also really exciting (precisely -...   
2         night  Recession hit Veronique Branquinho, she has to...   
3       morning                                        happy bday!   
4          noon             http://twitpic.com/4w75p - I like it!!   
5         night                    that`s great!! weee!! visitors!   
6       morning            I THINK EVERYONE HATES ME ON HERE   lol   
7          noon   soooooo wish i could, but im in school and my...   
8         night   and within a short time of the last clue all ...   
9       morning   What did you get?  My day is alright.. haven`...   

   VADER_Score  
0       0.0000  
1       0.7501  
2      -0.7345  
3       0.6114  
4       0.4738  
5       0.7405  
6      -0.2103  
7      -0.3048  
8       0.0000  
9       0.0000  
Mean VADER Sentiment Score:
 Time of Tweet

## CONCLUSION

For the above data set Bing_Lu_Score and Vader Algorithms are more suitable as they provide more accurate positive and negative score based on the analysis
Bing_Lu_Score provides a numerical sentiment score it suits as to measure sentiment strength.
Vader provides a compound sentiment score suits for obtaining a continuous sentiment score with a focus on the overall sentiment.Though we can see vader is providing us more precise sentiment scores for the dataset provided
