# <center>Sentimental Analysis (in Python)</center>

Analysis is done on the word <b>'Bandersnatch'</b> to gauge latest public sentiment (via tweets) about the movie.

Bandersnatch is a psychological thriller interactive film released on December 28, 2018 by Netflix. Since then it is trending in social media for good and bad reason. 
Good - for an innovative step of interactive content, thriller story and 
Bad  - as Netflix faces lawsuit over the movie for trademark infringement, false designation of origin, unfair competition and trademark diluation.

In [73]:
# print all the outputs in a cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [74]:
import nltk
import re
import string

# <b>Approach 1 : Using external lexicon file.

The files containing positive and negative word are downloaded from the website "http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html". The lexicon file is part of below research papers. 
1. Minqing Hu and Bing Liu. "Mining and Summarizing Customer Reviews." Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), Aug 22-25, 2004, Seattle, Washington, USA.
2. Bing Liu, Minqing Hu and Junsheng Cheng. "Opinion Observer: Analyzing and Comparing Opinions on the Web." Proceedings of the 14th International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan.

In [75]:
#Read POSITIVE words from file 'positive words.txt' and store in a dictionary. Using dictionary as its look-up performance is much better than the list.
read_positive = open('positive_words.txt', 'r')
d_positive_words={}
for line in read_positive:
    word = line.rstrip() 
    d_positive_words[word]=word
read_positive.close()

In [76]:
#Read NEGATIVE words from file 'negative words.txt' and store in a dictionary. Using dictionary as its look-up performance is much better than the list.
read_negative = open('negative_words.txt', 'r')
d_negative_words={}
for line in read_negative:
    word = line.rstrip() 
    d_negative_words[word]=word
read_negative.close()

In [77]:
#Read file which contains emojis and other non desirable characters like emoticons, !?@#$%^&*()__+= etc.
read_non_desired_char = open('emoji_and_nonalphanumeric.txt', 'r',encoding="utf8")
non_desired_char = ''
for line in read_non_desired_char:
    non_desired_char+=line
read_non_desired_char.close() 

In [78]:
#Read bandersnatch.txt file which contins extracted tweets (extracted on Jan 12,2019 1:30 PM). 
read_tweets = open('bandersnatch.txt', 'r',encoding="utf8")
raw_words = []
for line in read_tweets:
    #raw_words.append((line.strip(non_desired_char)).lower().split())
    raw_words.append(line.lower())
read_tweets.close() 

<b>Data Cleaning

In [79]:
wn = nltk.WordNetLemmatizer()
def f_lemmatize(text):
    if text is not None:
        #text = [wn.lemmatize(word) for word in text]
        text = wn.lemmatize(text)
    return text

In [80]:
def f_remove_punctuations(text):
    if text is not None:
        text = "".join([word for word in text if word not in string.punctuation])
    return text

In [81]:
def f_remove_stopwords(text):
    if text is not None:
        tokens = re.split('\W+', text)
        text = [word for word in tokens if word not in stopwords]
    return text

In [82]:
stopwords = nltk.corpus.stopwords.words('english')
def f_clean_data(text):
    if text is not None:
        text = f_remove_punctuations(text)
        text = f_remove_stopwords(text)
    return text

In [83]:
clean_words=[]
for word in raw_words:
    cleanwrd = f_clean_data(word)
    sublst =[]
    for i in cleanwrd:
        if len(i)>0:
            sublst.append(f_lemmatize(i))
    clean_words.append(sublst)

## Solution 1: Using each word to gauge the overall sentiment.

In [84]:
#Define variables to store respective sentiment counts and initialize them to zero.
positive_count = 0
negative_count = 0
neutral_count = 0

#For each word, assign a value of +1 for positive sentiment, a value -1 for negative sentiment, or a value of 0 for neutral sentiment.
for i in clean_words:
    for j in i:
        j = j.strip(non_desired_char) #strip non-desired characters like emoticons, !?@#$%^&*()__+= etc.
        if j in d_positive_words:
            positive_count+=1
        elif j in d_negative_words:
            negative_count-=1
        else:
            neutral_count+=0
            
print("Positive words= %d, Negative words= %d"%(positive_count,negative_count))

Positive words= 8907, Negative words= -7251


Below function <b>finalsentiment</b> evaluate sentiments based on the sum of positive(+ve values), negative (-ve values) and neutral (0's) words.
1. if the sum is positive (i.e.greater than 0), sentiment is considered as positive.
2. if the sum is negative (i.e.less than 0), sentiment is considered as negative.
3. if the sum is equal to 0, sentiment is considered as neither positive or negative.

In [85]:
#Function to evaluate and print overall sentiment
def finalsentiment(positive_count,negative_count,neutral_count):
    sentiment = positive_count+negative_count+neutral_count
    if sentiment>0:
        print("Solution 1: Extrated tweets shows positive sentiment for the 'bandersnatch' movie.")
    elif sentiment<0:
        print("Solution 1: Extrated tweets shows negative sentiment for the 'bandersnatch' movie.")
    else:
        print("Solution 1: Extrated tweets shows neither positive or negative sentiments for the 'bandersnatch' movie.")

In [86]:
#Calling above function 'finalsentiment'
finalsentiment(positive_count,negative_count,neutral_count)

Solution 1: Extrated tweets shows positive sentiment for the 'bandersnatch' movie.


## Solution 2: Using whole 'tweets'  instead of just 'words' to gauge the sentiment.

Here, first I have analyed whether tweet is positive or negative. For each tweet I am counting number of positive and negative words. Then,
1. If number of positive words in a tweet is <b>greater</b> than negative words, <b>tweet</b> is considered as <b>positive</b>.
2. If number of positive words in a tweet is <b>less</b> than negative words, <b>tweet</b> is considered as <b>negative</b>.
3. If number of positive words and negative words in a tweet is <b>equal</b> or if there is no postive and negative words in a tweet, <b>tweet</b> is considered as <b>neutral</b>.

In [87]:
#Define variables to store respective sentiment counts and initialize them to zero.
positive_tweet_count = 0
negative_tweet_count = 0
neutral_tweet_count = 0

#Loop through each 'tweet' and decide whether tweet is positive or negative or neutral.
#For each 'tweet', assign a value of +1 for positive sentiment, a value +1 for negative sentiment, or a value of +1 for neutral sentiment.
for i in clean_words:
    positive_count = 0 #use to store positive word count for a tweet
    negative_count = 0 #use to store negative word count for a tweet
    neutral_count = 0
    #Loop through each word in a tweet and count number of positve and negative words.
    for j in i:
        j = j.strip(non_desired_char) #strip non-desired characters like emoticons, !?@#$%^&*()__+= etc.
        if j in d_positive_words:
            positive_count+=1
        elif j in d_negative_words:
            negative_count+=1
        else:
            neutral_count+=0
    #Conclude whether tweet positive or negative or neutral
    if positive_count > negative_count:
        positive_tweet_count+=1
    elif positive_count < negative_count:
        negative_tweet_count+=1
    else:
        neutral_tweet_count+=1
        
print("Positive tweets= %d, Negative tweets= %d, Neutral tweets= %d, Total tweets:%d "%(positive_tweet_count,negative_tweet_count,neutral_tweet_count,len(clean_words)))

Positive tweets= 5114, Negative tweets= 3738, Neutral tweets= 9148, Total tweets:18000 


In [88]:
#Function to evaluate and print overall sentiment
def tweetsentiment(positive_tweet_count,negative_tweet_count,neutral_tweet_count):
    if positive_tweet_count>negative_tweet_count and positive_tweet_count>neutral_tweet_count:
        print("Solution 2: Extrated tweets shows positive sentiment for the 'bandersnatch' movie.")
    elif negative_tweet_count>positive_tweet_count and negative_tweet_count>neutral_tweet_count:
        print("Solution 2: Extrated tweets shows negative sentiment for the 'bandersnatch' movie.")
    else:
        print("Solution 2: Extrated tweets shows neither positive or negative sentiments for the 'bandersnatch' movie.")

In [89]:
#calling function tweetsentiment
tweetsentiment(positive_tweet_count,negative_tweet_count,neutral_tweet_count)

Solution 2: Extrated tweets shows neither positive or negative sentiments for the 'bandersnatch' movie.


# Approach 2: Using NLTK library

In [108]:
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer 
sid = SentimentIntensityAnalyzer()
score=[]
for tw in raw_words:
    score.append(sid.polarity_scores(tw))

In [110]:
df_score = pd.DataFrame(score)

In [160]:
df_score['sentiment'] = df_score['compound'].apply(lambda a: 1 if float(a)>0 else -1 if float(a)<0 else 0)

In [159]:
positive_tweet_count = df_score[df_score.sentiment==1].sentiment.count()
negative_tweet_count = df_score[df_score.sentiment==-1].sentiment.count()
neutral_tweet_count = df_score[df_score.sentiment==-0].sentiment.count()
tweetsentiment(positive_tweet_count,negative_tweet_count,neutral_tweet_count)

Solution 2: Extrated tweets shows neither positive or negative sentiments for the 'bandersnatch' movie.
