# Assignment - Exercise 4.2 Sentiment Analysis
## Week#4
## Date - July-02-2021
## Author - Ganesh Kale

#### Import required packages

In [8]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

%matplotlib inline

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

Load data file into dataframe

In [9]:
comm = pd.read_csv("data/DailyComments.csv")

In [10]:
comm

Unnamed: 0,Day of Week,comments
0,Monday,"Hello, how are you?"
1,Tuesday,Today is a good day!
2,Wednesday,It's my birthday so it's a really special day!
3,Thursday,Today is neither a good day or a bad day!
4,Friday,I'm having a bad day.
5,Saturday,There' s nothing special happening today.
6,Sunday,Today is a SUPER good day!


### Scheme Used for Sentiment Analysis - VADER from NLTK

VADER(Valence Aware Dictionary for sEntiment Reasoning) is used to categorise the each comment as positive or negative or neutral. This model used for text sentiment analysis that is sensitive to both polarity (positive and negative) and intensity or strength of emotion. This model is from the natural Language Tool Kit(NLTK) package and used on unlableled text.
Vader has built in lexicons of sentiment related words and it is pretrained model that uses rule based values tuned to sentiments from social media. This meodel returns the 4 scores based on the given sentence, these scores are related to positive or neutral or neagtive score and its compound score.
- pos: The probability of the sentiment to be positive
- neu: The probability of the sentiment to be neutral
- neg: The probability of the sentiment to be negative
- compound: The Normalized compund score which calculates the sum of all lexicon ratings and takes values from -1 to 1.

The probabilities of positive, negative and neutral add up to 1 and compound score range from -1 to 1.
The threshold values of compund score for each polarity is as below - 
- positive: compound score >=0.05
- neutral: compound score between -0.05 to 0.05
- negative: compound score <= -0.05

In [11]:
nltk.download('vader_lexicon')

vader = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/ganeshkale/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [12]:
# function to get max polarity score 

def max_score(text):
    pos = vader.polarity_scores(text)['pos']
    neg = vader.polarity_scores(text)['neg']
    neu = vader.polarity_scores(text)['neu']
    
    if max(pos,neg,neu)==pos:
        return 'positive'
    elif max(pos,neg,neu)==neg:
        return 'negative'
    else:
        return 'neutral'

create separate column to tag sentiment either positive, negative or neutral

In [13]:
comm['sentiment'] = comm.comments.apply(lambda x : max_score(x))

Create new column for compound score

In [14]:
comm['compound'] = comm.comments.apply(lambda x: vader.polarity_scores(x)['compound'])
comm['positive'] = comm.comments.apply(lambda x: vader.polarity_scores(x)['pos'])
comm['negative'] = comm.comments.apply(lambda x: vader.polarity_scores(x)['neg'])
comm['neutral'] = comm.comments.apply(lambda x: vader.polarity_scores(x)['neu'])

In [15]:
comm

Unnamed: 0,Day of Week,comments,sentiment,compound,positive,negative,neutral
0,Monday,"Hello, how are you?",neutral,0.0,0.0,0.0,1.0
1,Tuesday,Today is a good day!,positive,0.4926,0.516,0.0,0.484
2,Wednesday,It's my birthday so it's a really special day!,neutral,0.5497,0.336,0.0,0.664
3,Thursday,Today is neither a good day or a bad day!,negative,-0.735,0.0,0.508,0.492
4,Friday,I'm having a bad day.,negative,-0.5423,0.0,0.538,0.462
5,Saturday,There' s nothing special happening today.,neutral,-0.3089,0.0,0.361,0.639
6,Sunday,Today is a SUPER good day!,positive,0.8327,0.723,0.0,0.277


### The Sentiment of each comments

In [16]:
comm.filter(['Day of Week','comments','sentiment'])

Unnamed: 0,Day of Week,comments,sentiment
0,Monday,"Hello, how are you?",neutral
1,Tuesday,Today is a good day!,positive
2,Wednesday,It's my birthday so it's a really special day!,neutral
3,Thursday,Today is neither a good day or a bad day!,negative
4,Friday,I'm having a bad day.,negative
5,Saturday,There' s nothing special happening today.,neutral
6,Sunday,Today is a SUPER good day!,positive


# Sentiment Analysis on different Data Set - tweets from twitter

Load tweets data set from kaggle datasets

In [17]:
tweets = pd.read_csv("data/tweets.csv")

In [18]:
tweets.shape
tweets.head()

(17197, 2)

Unnamed: 0,id,tweet
0,31963,#studiolife #aislife #requires #passion #dedic...
1,31964,@user #white #supremacists want everyone to s...
2,31965,safe ways to heal your #acne!! #altwaystohe...
3,31966,is the hp and the cursed child book up for res...
4,31967,"3rd #bihday to my amazing, hilarious #nephew..."


using VADER , calculate polarity score for each tweet

In [19]:
tweets['positive'] = tweets.tweet.apply(lambda x: vader.polarity_scores(x)['pos'])
tweets['negative'] = tweets.tweet.apply(lambda x: vader.polarity_scores(x)['neg'])
tweets['neutral'] = tweets.tweet.apply(lambda x: vader.polarity_scores(x)['neu'])
tweets['compound'] = tweets.tweet.apply(lambda x: vader.polarity_scores(x)['compound'])

Removed the # & @user from tweets since neutral score is higher for such chars

In [20]:
tweets['tweets'] = tweets.tweet.apply(lambda x : x.replace('@user',''))
tweets['tweets'] = tweets.tweets.apply(lambda x : x.replace('#',''))

created new column based on scores and tagged whether tweet is positive or negative

In [21]:
tweets['sentiment'] = tweets.tweets.apply(lambda x:max_score(x))

sample of tweets data after tagging sentiments to each tweet

In [22]:
tweets.filter(['id','tweets','sentiment']).sample(10)

Unnamed: 0,id,tweets,sentiment
14483,46446,the hairy legged mystery via fun childrensli...,neutral
17110,49073,tomorrow is my pageant nervous,neutral
9902,41865,di! dark digitala di fantasy girl ayyasap ...,neutral
6937,38900,ðððððð10 minutes go for ult...,neutral
8138,40101,wish your dear ones bihday with this lovely a...,positive
7346,39309,we're at the sta raring to go for horley carni...,neutral
6621,38584,math make me happy you not so much math mak...,neutral
16169,48132,love u papa fathers day,positive
10127,42090,lorena would like to meet somebody who know...,neutral
9558,41521,that's crazy yo !!,neutral


distribution of sentiments for all tweets

In [23]:
tweets.sentiment.value_counts()

neutral     14716
positive     2135
negative      346
Name: sentiment, dtype: int64

# END