# ASSIGNMENT-4

## SENTIMENT ANALYSIS 

### IMPORTING DATA FROM CSV FILE

In [24]:
import pandas as pd 
df = pd.read_csv("sentiment_tweets3.csv")
df.sample(10)

Unnamed: 0,Index,message to examine,label (depression result)
5990,598128,@michaelsheen flange huh i will remember that ...,0
3583,363196,@K7vans Yeah it's 7:00AM here. Still too early...,0
7635,761676,Stuck doing a tonnnn of homework.. Fun fun... ...,0
8554,800554,"Every time I see my Facebook profile, I imagin...",1
8296,800296,Avoid anxiety and #depression by knowing these...,1
6425,640599,had the best time at Dana's birthday party,0
2509,256041,I got a new Tattoo yesterday it is a mama lion...,0
7153,713133,welcome to work: 232 mejlÅ¯ aÅ¾ jsem si z toho...,0
7163,715016,Watching ze Lakers game 5. &amp;feeling 'Just...,0
2911,297287,"@rosiebunny No trust me, it'll work for every...",0


### OPINION LEXICON

The Opinion Lexicon contains lists of positive and negative words, which are used to assign sentiment scores to text data. 
This lexicon serves as a foundational resource for sentiment analysis tasks, aiding in the identification of sentiment-bearing words within text data.




In [25]:
from sklearn import preprocessing
import nltk
nltk.download('opinion_lexicon')
from nltk.corpus import opinion_lexicon
from nltk.tokenize import word_tokenize

print('Total number of words in opinion lexicon', len(opinion_lexicon.words()))
print('Examples of positive words in opinion lexicon',
      opinion_lexicon.positive()[:10])
print('Examples of negative words in opinion lexicon',
      opinion_lexicon.negative()[:10])


Total number of words in opinion lexicon 6789
Examples of positive words in opinion lexicon ['a+', 'abound', 'abounds', 'abundance', 'abundant', 'accessable', 'accessible', 'acclaim', 'acclaimed', 'acclamation']
Examples of negative words in opinion lexicon ['2-faced', '2-faces', 'abnormal', 'abolish', 'abominable', 'abominably', 'abominate', 'abomination', 'abort', 'aborted']


[nltk_data] Downloading package opinion_lexicon to
[nltk_data]     C:\Users\laxmi\AppData\Roaming\nltk_data...
[nltk_data]   Package opinion_lexicon is already up-to-date!


### CREATING A DICTIONARY

Here we initializes a sentiment dictionary using the Opinion Lexicon from NLTK. Positive and negative words from the lexicon are assigned sentiment scores and stored in a dictionary, preparing for sentiment analysis of review text data.

In [29]:
# Let's create a dictionary which we can use for scoring our review text
nltk.download('punkt')
df.rename(columns={"message to examine": "text"}, inplace=True)
pos_score = 1
neg_score = -1
word_dict = {}
 
# Adding the positive words to the dictionary
for word in opinion_lexicon.positive():
        word_dict[word] = pos_score
      
# Adding the negative words to the dictionary
for word in opinion_lexicon.negative():
        word_dict[word] = neg_score


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\laxmi\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


### BING-LIU LEXCION SCORING ALGORITHM

Here we calculate the sentiment by using bing-liu algorithm where the algorithm tokenizes each word into bag of words and check if they are present in dictionary ,if they are present then it is added to overall score.

In [30]:
def bing_liu_score(text):
    sentiment_score = 0
    bag_of_words = word_tokenize(text.lower())
    for word in bag_of_words:
        if word in word_dict:
            sentiment_score += word_dict[word]
    return sentiment_score  


### FILLING NULL VALUES

we fill the null or empty spaces of the dataset by 'no review'

In [31]:

df['text'] = df['text'].fillna('no review')

df['Bing_Liu_Score'] = df['text'].apply(bing_liu_score)


### Head

this line of code is used to print the first 10 rows of the dataset

In [33]:
df[['label (depression result)',"text", 'Bing_Liu_Score']].head(10)


Unnamed: 0,label (depression result),text,Bing_Liu_Score
0,0,just had a real good moment. i missssssssss hi...,1
1,0,is reading manga http://plurk.com/p/mzp1e,0
2,0,@comeagainjen http://twitpic.com/2y2lx - http:...,0
3,0,@lapcat Need to send 'em to my accountant tomo...,0
4,0,ADD ME ON MYSPACE!!! myspace.com/LookThunder,0
5,0,so sleepy. good times tonight though,1
6,0,"@SilkCharm re: #nbn as someone already said, d...",0
7,0,23 or 24ï¿½C possible today. Nice,1
8,0,nite twitterville workout in the am -ciao,0
9,0,"@daNanner Night, darlin'! Sweet dreams to you",1


### GROUPBY

This code provides an average sentiment score for reviews categorized by their overall rating.

In [34]:
df.groupby('label (depression result)').agg({'Bing_Liu_Score':'mean'})


Unnamed: 0_level_0,Bing_Liu_Score
label (depression result),Unnamed: 1_level_1
0,0.5575
1,-1.408384


## VADER ALGORITHM

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a sentiment analysis algorithm designed for text, particularly social media. It evaluates sentiment by considering the intensity of positive and negative words, along with features like punctuation and emoticons, providing a compound sentiment score.

similar to bing-liu algorithm vader is also widely used for sentiment analysis  where we get vaderscoring considering intensity of positive ,negative words.

In [43]:
import pandas as pd
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')
sia = SentimentIntensityAnalyzer()
def vader_score(text):
    # Analyze sentiment of the text
    scores = sia.polarity_scores(text)
    return scores['compound']
df['VADER_Score'] = df['text'].apply(vader_score)
print(df[['text', 'VADER_Score']])


[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\laxmi\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


                                                    text  VADER_Score
0      just had a real good moment. i missssssssss hi...       0.4404
1             is reading manga  http://plurk.com/p/mzp1e       0.0000
2      @comeagainjen http://twitpic.com/2y2lx - http:...       0.0000
3      @lapcat Need to send 'em to my accountant tomo...       0.4404
4          ADD ME ON MYSPACE!!!  myspace.com/LookThunder       0.0000
...                                                  ...          ...
10309  No Depression by G Herbo is my mood from now o...      -0.8126
10310  What do you do when depression succumbs the br...      -0.2960
10311  Ketamine Nasal Spray Shows Promise Against Dep...      -0.7845
10312  dont mistake a bad day with depression! everyo...       0.1950
10313                                                  0       0.0000

[10314 rows x 2 columns]


## TEXTBLOB ALGORITHM

TextBlob relies on a lexicon and machine learning,TextBlob adds up the sentiment polarity scores of individual words to calculate the overall sentiment polarity score for a given text.

In [49]:
import pandas as pd
from textblob import TextBlob

def textblob_score(text):
    # Create a TextBlob object
    blob = TextBlob(text)
    # Get the sentiment polarity
    sentiment_score = blob.sentiment.polarity
    return sentiment_score
df = pd.read_csv("sentiment_tweets3.csv")
df['TextBlob_Score'] = df['message to examine'].apply(textblob_score)  

print(df[['message to examine', 'TextBlob_Score']])


                                      message to examine  TextBlob_Score
0      just had a real good moment. i missssssssss hi...        0.600000
1             is reading manga  http://plurk.com/p/mzp1e        0.000000
2      @comeagainjen http://twitpic.com/2y2lx - http:...        0.000000
3      @lapcat Need to send 'em to my accountant tomo...        0.041667
4          ADD ME ON MYSPACE!!!  myspace.com/LookThunder        0.000000
...                                                  ...             ...
10309  No Depression by G Herbo is my mood from now o...        0.000000
10310  What do you do when depression succumbs the br...        0.000000
10311  Ketamine Nasal Spray Shows Promise Against Dep...        0.000000
10312  dont mistake a bad day with depression! everyo...       -1.000000
10313                                                  0        0.000000

[10314 rows x 2 columns]


### SENTIWORDNET

SentiWordNet relies on synsets in WordNet to assign sentiment scores to words.SentiWordNet works by assigning sentiment scores to words based on their synsets (sets of synonyms representing a concept) in WordNet. It provides scores for both positivity and negativity, allowing for nuanced sentiment analysis.

In [50]:
import nltk
nltk.download('averaged_perceptron_tagger')
import nltk
nltk.download('sentiwordnet')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\laxmi\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package sentiwordnet to
[nltk_data]     C:\Users\laxmi\AppData\Roaming\nltk_data...
[nltk_data]   Package sentiwordnet is already up-to-date!


True

In [48]:
import pandas as pd
from nltk.corpus import sentiwordnet as swn
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import nltk

# Download NLTK resources
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('stopwords')

# Initialize WordNet Lemmatizer and stopwords
lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words('english'))

def sentiwordnet_score(text):
    total_score = 0
    word_count = 0
    tokenized_text = word_tokenize(text.lower())
    for word, pos_tag in nltk.pos_tag(tokenized_text):
        if word in stop_words:
            continue
        synsets = list(swn.senti_synsets(word))
        if synsets:
            synset = synsets[0]
            total_score += synset.pos_score() - synset.neg_score()
            word_count += 1
    if word_count == 0:
        return 0
    return total_score / word_count

# Read the CSV file into a DataFrame
df = pd.read_csv("sentiment_tweets3.csv") 
df['SentiWordNet_Score'] = df['message to examine'].apply(sentiwordnet_score)  
print(df[['message to examine', 'SentiWordNet_Score']])


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\laxmi\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\laxmi\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\laxmi\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


                                      message to examine  SentiWordNet_Score
0      just had a real good moment. i missssssssss hi...            0.100000
1             is reading manga  http://plurk.com/p/mzp1e            0.000000
2      @comeagainjen http://twitpic.com/2y2lx - http:...            0.000000
3      @lapcat Need to send 'em to my accountant tomo...            0.045455
4          ADD ME ON MYSPACE!!!  myspace.com/LookThunder            0.500000
...                                                  ...                 ...
10309  No Depression by G Herbo is my mood from now o...            0.035714
10310  What do you do when depression succumbs the br...            0.031250
10311  Ketamine Nasal Spray Shows Promise Against Dep...            0.027778
10312  dont mistake a bad day with depression! everyo...           -0.468750
10313                                                  0            0.000000

[10314 rows x 2 columns]


## CONCLUSION

Here, I used four lexicon scoring algorithms for doing sentiment analysis on tweets dataset those include bing-liu lexcion scoring,sentiwordnet,textblob and vader algorithms.

VADER (Valence Aware Dictionary and sEntiment Reasoner):: VADER is often preferred for its simplicity, speed, and effectiveness in analyzing sentiment from social media text and short informal messages. It doesn't require training data and provides sentiment scores based on lexicons and grammatical rule

Out of all as my datase was a tweet dataset i.e social media related which had a high possibility of having informal messages.VADER algorithm is considered as a good choice.ay suffice.