# Sentiment Analysis of Movie Reviews
### GA DSI-SG-26 Capstone
> By: Matthew Lio
---

Project notebook organisation:

1. Data Cleaning and EDA
2. Lexicon-based Models (current notebook)
3. Binary Classification ML Models
4. Deep Learning Models

# 2. Lexicon-based Models
---

### Contents:
- [About Lexicon Models](#About-Lexicon-Models)
- [Library Imports](#Library-Imports)
- [VADER Lexicon](#VADER-Lexicon)
- [AFINN Lexicon](#AFINN-Lexicon)
- [TextBlob Lexicon](#TextBlob-Lexicon)
- [Lexicon Models Evaluation](#Lexicon-Models-Evaluation)

## About Lexicon Models

One of the approaches to sentiment analysis is the lexicon-based approach. Lexicon means the vocabulary of a person, language or branch of knowledge. Here, in lexicon based sentiment analysis, we already have a given set of dictionary of words with each labelled as positive negative, neutral sentiments along with polarity, parts of speech and subjectivity classifiers, mood, modality and the like [source](https://medium.com/nerd-for-tech/sentiment-analysis-lexicon-models-vs-machine-learning-b6e3af8fe746). Simply put, each and every word in the dictionary contains a corresponding sentiment score to it.

This technique uses pre-trained lexicon-based models, which calculate the sentiment polarity of the texts or document based on these semantic orientation of lexicons or individual words. The text is first tokenized and each token is matched with the available words in the model to find out its context and sentiment. A combining function is then taken to make the final sentimental prediction regarding the total text component.

However, there are several issues with this approach. For instance, most of the time, in online reviews or any other online text source, the presence of more positive words does not necessarily make the review to be positive or vice versa [[source]](https://www.sciencedirect.com/topics/computer-science/lexicon-based-approach). This is because the meaning of the whole corpus might be different than each individual words used, based on phrases or sentences that imply sarcasm or in a context of comparison.

In this notebook, we are going to explore three popular lexicon-based models:
- VADER Lexicon
- AFINN Lexicon
- TextBlob Lexicon

## Library Imports

In [1]:
# !pip install afinn
# !pip install textblob

In [39]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
import regex as re

import math
import matplotlib.ticker as mticker

import nltk
from nltk.tokenize import sent_tokenize, word_tokenize, RegexpTokenizer
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from nltk.sentiment.vader import SentimentIntensityAnalyzer

from sklearn.metrics import (confusion_matrix, plot_confusion_matrix, classification_report, plot_roc_curve, roc_auc_score, 
accuracy_score, precision_score, recall_score, f1_score, auc, precision_recall_curve, average_precision_score)

from afinn import Afinn
from textblob import TextBlob

from nltk.corpus import sentiwordnet as swn
from nltk.corpus import wordnet as wn
from nltk.tag import pos_tag

# import spacy
# from spacy.language import Language
# from spacy_langdetect import LanguageDetector
# from textblob import TextBlob

import warnings
import time
warnings.filterwarnings("ignore")
pd.set_option('display.max_columns', None)
pd.options.display.max_colwidth = 400

In [3]:
# import cleaned data
train = pd.read_csv('../data/train_cleaned.csv', index_col = 0)

In [4]:
train

Unnamed: 0,text,sentiment,rate
0,zero day lead you to think even re think why two boy young men would do what they did commit mutual suicide via slaughtering their classmates. it capture what must be beyond a bizarre mode of being for two human who have decided to withdraw from common civility in order to define their own mutual world via coupled destruction. it is not a perfect movie but given what money time the filmmaker a...,1,10
1,word can't describe how bad this movie is. i can't explain it by writing only. you have too see it for yourself to get at grip of how horrible a movie really can be. not that i recommend you to do that. there are so many clich s mistake and all other negative thing you can imagine here that will just make you cry. to start with the technical first there are a lot of mistake regarding the airpl...,0,1
2,everyone play their part pretty well in this little nice movie . belushi get the chance to live part of his life differently but end up realizing that what he had going to be just a good or maybe even better. the movie show u that we ought to take advantage of the opportunity we have not the one we do not or cannot have. if u can get this movie on video for around it d be an investment,1,10
3,there are a lot of highly talented filmmaker actor in germany now. none of them are associated with this movie . why in the world do producer actually invest money in something like this this you could have made good film with the budget of this garbage it's not entertaining to have seven grown men running around a dwarf pretending to be funny. what is funny though is that the film's producer ...,0,1
4,i've just had the evidence that confirmed my suspicions. a bunch of kid to put on the dvd of titanic on a fantastic state of the art mega screen home entertainment type deal. only two of them had actually seen it before. but they all had seen the moment of kate leo and celine dion so many time that most of them felt they had seen the whole movie. shortly after the epic started they started to ...,0,2
...,...,...,...
24899,footlight parade released viewed . the ice cream cone is invented in new york. kevin after a long and busy break we hit another busby berkeley musical from warner bros. this time it's the ultra fast paced footlight parade starring james cagney a juggernaut stage producer chester kent. i am certain that cagney channeling berkeley with his performance of the irrepressible kent who to come up wit...,1,8
24900,deeply humorous yet honest comedy about a bunch of grownup bill paxton julie warner kevin pollak elizabeth perkins vincent spano matt craven and diane lane who are invited back to spend a week to tomawka a camp in ontario canada by their former consuelor alan arkin . writer director mike binder drew upon his experience at the same camp a the main source of creating a gentle and understanding y...,1,9
24901,st watched out of dir sydney pollack dvd version i watched titled day of the condor so so cia drama full of laid back performance making for a very laid back movie. the premise of the story revolves around out of member of a cia research group being killed with robert redford's character codename condor being the one that left. who killed them and why that's what redford try to find out while ...,0,4
24902,i watch lot of scary movie or at least they try to be and this to be the worst if not nd worst movie i have ever had to make myself try to sit through. i never knew the depth of masacism until i rented this piece of moldy cheese covered in a used latex contraceptive. i am a fan of julian sans but this is worse than i would hope for him. on the other hand the story promising and i intrigued...f...,0,2


## VADER Lexicon

VADER (Valence Aware Dictionary for Sentiment Reasoning) is a model that is sensitive to both polarities (positive/negative) and intensity (strength) of emotion. Emotion intensity of each word in the text is derived, before a single compound score is obtained by summing up each individual scores. In addition to the compound score, VADER also detail 3 separate scores: positive, negative and neutral scores. It is available in the NLTK package and can be applied directly to unlabeled text data.

In [5]:
# Instantiate VADER Sentiment Intensity Analyzer
sent = SentimentIntensityAnalyzer()

### Accurate predictions using VADER

Let's explore some correct predictions and scores from VADER.

In [6]:
# Positive sentiment review
print(train['text'][2])
sent.polarity_scores(train['text'][2])

everyone play their part pretty well in this little nice movie . belushi get the chance to live part of his life differently but end up realizing that what he had going to be just a good or maybe even better. the movie show u that we ought to take advantage of the opportunity we have not the one we do not or cannot have. if u can get this movie on video for around it d be an investment


{'neg': 0.0, 'neu': 0.746, 'pos': 0.254, 'compound': 0.9612}

In [7]:
# Negative sentiment review
print(train['text'][24903])
sent.polarity_scores(train['text'][24903])

absolutely the worst film yet by burton who seems to be getting worse with each film he directs. a miserable script loaded with cliche is only the first of many objectionable aspect to this film. this is the kind of movie where every time something happens you'll be sure to hear someone shout out he's lost his gun or whatever it is to let everybody know. carter is really awful and so is wahlberg who can't play this straight and be convincing. very nice effect and photography but poor music in the john williams mold by burton's crony elfman. heston appears in a nonsensichal scene to spout out his most famous catch phrase from the first movie. very poor results. if anyone else out there also saw sleepy hollow they will probably have noticed a i have the declining quality of burton's films. i've heard that this particular project produced by others and that burton brought in a director in which case his judgement should be questioned. but i think he allowed any possible vision he might ha

{'neg': 0.147, 'neu': 0.757, 'pos': 0.097, 'compound': -0.9669}

### Inaccurate predictions using VADER

In [8]:
# Positive sentiment review
print(train['text'][0])
sent.polarity_scores(train['text'][0])

# VADER shows negative sentiment!
# Even though this is actually a positive review

zero day lead you to think even re think why two boy young men would do what they did commit mutual suicide via slaughtering their classmates. it capture what must be beyond a bizarre mode of being for two human who have decided to withdraw from common civility in order to define their own mutual world via coupled destruction. it is not a perfect movie but given what money time the filmmaker and actor had it is a remarkable product. in term of explaining the motif and action of the two young suicide murderer it is better than 'elephant' in term of being a film that get under our 'rationalistic' skin it is a far far better film than almost anything you are likely to see. flawed but honest with a terrible honesty.


{'neg': 0.157, 'neu': 0.702, 'pos': 0.141, 'compound': -0.3816}

In [9]:
# Negative sentiment review
print(train['text'][3])
sent.polarity_scores(train['text'][3])

# VADER shows positive sentiment!
# Even though this is actually a negative review

there are a lot of highly talented filmmaker actor in germany now. none of them are associated with this movie . why in the world do producer actually invest money in something like this this you could have made good film with the budget of this garbage it's not entertaining to have seven grown men running around a dwarf pretending to be funny. what is funny though is that the film's producer who happens to be the oldest guy of the bunch is playing the youngest dwarf. the film is filled with moment that scream for caption saying you're supposed to laugh now . it's hard to believe that this crap's supposed to be a comedy. many people actually stood up and left the cinema minute into the movie. i should have done the same instead of wasting my time... pain


{'neg': 0.079, 'neu': 0.768, 'pos': 0.153, 'compound': 0.8907}

From the above incorrect prediction examples, it seems like the way VADER works, by summing individual emotion intensity of words, skewed the overall score, rendering them incorrect. Most of these words in the reviews are just descriptions of the movie, yet VADER took them into consideration and thus resulted in wrong predictions.

### Confusing predictions using VADER

In [10]:
# Negative sentiment review
print(train['text'][1])
sent.polarity_scores(train['text'][1])

# VADER shows positive sentiment!
# Even though the words in this review CLEARLY shows a negative review

word can't describe how bad this movie is. i can't explain it by writing only. you have too see it for yourself to get at grip of how horrible a movie really can be. not that i recommend you to do that. there are so many clich s mistake and all other negative thing you can imagine here that will just make you cry. to start with the technical first there are a lot of mistake regarding the airplane. i won't list them here but just mention the coloring of the plane. they didn't even manage to show an airliner in the color of a fictional airline but instead used a painted in the original boeing livery. very bad. the plot is stupid and been done many time before only much much better. there are so many ridiculous moment here that i lost count of it really early. also i on the bad guys' side all the time in the movie because the good guy were so stupid. executive decision should without a doubt be you're choice over this one even the turbulence movie are better. in fact every other movie in 

{'neg': 0.122, 'neu': 0.744, 'pos': 0.134, 'compound': 0.7007}

This example is an interesting one. Even though this review contains mostly negative words, VADER still predicted it as a positive review overall.

### VADER Lexicon for all reviews

We will now apply VADER on the rest of the reviews. Score columns will be created in our dataframe as well as sentiment prediction. We will also do this for all lexicon-based models, and at the end, we will do a comparison and evaluation of all of them.

In [11]:
# create list of reviews from dataframe to feed into VADER
vader_texts = train['text']

In [12]:
vader_texts.head()

0    zero day lead you to think even re think why two boy young men would do what they did commit mutual suicide via slaughtering their classmates. it capture what must be beyond a bizarre mode of being for two human who have decided to withdraw from common civility in order to define their own mutual world via coupled destruction. it is not a perfect movie but given what money time the filmmaker a...
1    word can't describe how bad this movie is. i can't explain it by writing only. you have too see it for yourself to get at grip of how horrible a movie really can be. not that i recommend you to do that. there are so many clich s mistake and all other negative thing you can imagine here that will just make you cry. to start with the technical first there are a lot of mistake regarding the airpl...
2               everyone play their part pretty well in this little nice movie . belushi get the chance to live part of his life differently but end up realizing that what he had going to be

In [13]:
# initiate list of VADER compound scores
vader_scores = []

# getting sentimental prediction of all reviews by VADER
for text in vader_texts:
    vader_scores.append(sent.polarity_scores(text)['compound'])

In [14]:
vader_scores

[-0.3816,
 0.7007,
 0.9612,
 0.8907,
 0.0561,
 0.9872,
 0.9659,
 0.9464,
 -0.9896,
 -0.9208,
 -0.8977,
 0.1963,
 0.1984,
 0.9985,
 -0.6705,
 0.9927,
 0.7004,
 0.9764,
 0.5351,
 0.9946,
 0.9694,
 0.9855,
 0.9512,
 0.7043,
 0.9751,
 -0.8885,
 -0.9896,
 -0.1172,
 0.8687,
 0.9622,
 0.8098,
 0.5331,
 -0.9217,
 -0.9909,
 0.9742,
 0.3247,
 0.9981,
 0.7717,
 0.3835,
 -0.7129,
 0.9696,
 0.6764,
 0.7949,
 -0.3182,
 -0.9941,
 0.9836,
 0.1531,
 0.8511,
 0.9753,
 0.9898,
 -0.9917,
 -0.8564,
 0.9922,
 0.9832,
 -0.9113,
 0.9662,
 0.9874,
 0.9308,
 0.9683,
 0.7695,
 0.4883,
 0.9861,
 -0.8037,
 0.3335,
 0.9502,
 -0.6249,
 0.9962,
 0.5106,
 0.9631,
 -0.9776,
 -0.7943,
 0.9822,
 -0.9883,
 -0.9869,
 0.7195,
 0.9828,
 0.0258,
 -0.8275,
 0.2486,
 -0.8397,
 0.963,
 0.9411,
 0.8395,
 0.9337,
 0.9833,
 0.9837,
 0.9939,
 0.9125,
 0.9311,
 0.4888,
 0.9925,
 -0.9944,
 0.7964,
 0.8098,
 -0.9824,
 0.9785,
 0.986,
 -0.0258,
 -0.5717,
 -0.9879,
 -0.8767,
 0.9846,
 -0.3937,
 0.8848,
 0.2566,
 0.4927,
 0.9965,
 0.5427,

In [15]:
# Create new column for VADER compound scores
train.loc[:,'v_compound'] = vader_scores

In [16]:
# check VADER compound scores
train.head()

Unnamed: 0,text,sentiment,rate,v_compound
0,zero day lead you to think even re think why two boy young men would do what they did commit mutual suicide via slaughtering their classmates. it capture what must be beyond a bizarre mode of being for two human who have decided to withdraw from common civility in order to define their own mutual world via coupled destruction. it is not a perfect movie but given what money time the filmmaker a...,1,10,-0.3816
1,word can't describe how bad this movie is. i can't explain it by writing only. you have too see it for yourself to get at grip of how horrible a movie really can be. not that i recommend you to do that. there are so many clich s mistake and all other negative thing you can imagine here that will just make you cry. to start with the technical first there are a lot of mistake regarding the airpl...,0,1,0.7007
2,everyone play their part pretty well in this little nice movie . belushi get the chance to live part of his life differently but end up realizing that what he had going to be just a good or maybe even better. the movie show u that we ought to take advantage of the opportunity we have not the one we do not or cannot have. if u can get this movie on video for around it d be an investment,1,10,0.9612
3,there are a lot of highly talented filmmaker actor in germany now. none of them are associated with this movie . why in the world do producer actually invest money in something like this this you could have made good film with the budget of this garbage it's not entertaining to have seven grown men running around a dwarf pretending to be funny. what is funny though is that the film's producer ...,0,1,0.8907
4,i've just had the evidence that confirmed my suspicions. a bunch of kid to put on the dvd of titanic on a fantastic state of the art mega screen home entertainment type deal. only two of them had actually seen it before. but they all had seen the moment of kate leo and celine dion so many time that most of them felt they had seen the whole movie. shortly after the epic started they started to ...,0,2,0.0561


In [17]:
# function to compute binary sentiment: Positive(1) or Negative(0)
def to_binary_sentiment(score):
    if score > 0:
        return 1
    else:
        return 0

In [18]:
# Create new column for VADER sentiment
train.loc[:,'v_sentiment'] = train.loc[:,'v_compound'].apply(to_binary_sentiment)

In [19]:
# check sentiment as predicted by VADER
train.head()

Unnamed: 0,text,sentiment,rate,v_compound,v_sentiment
0,zero day lead you to think even re think why two boy young men would do what they did commit mutual suicide via slaughtering their classmates. it capture what must be beyond a bizarre mode of being for two human who have decided to withdraw from common civility in order to define their own mutual world via coupled destruction. it is not a perfect movie but given what money time the filmmaker a...,1,10,-0.3816,0
1,word can't describe how bad this movie is. i can't explain it by writing only. you have too see it for yourself to get at grip of how horrible a movie really can be. not that i recommend you to do that. there are so many clich s mistake and all other negative thing you can imagine here that will just make you cry. to start with the technical first there are a lot of mistake regarding the airpl...,0,1,0.7007,1
2,everyone play their part pretty well in this little nice movie . belushi get the chance to live part of his life differently but end up realizing that what he had going to be just a good or maybe even better. the movie show u that we ought to take advantage of the opportunity we have not the one we do not or cannot have. if u can get this movie on video for around it d be an investment,1,10,0.9612,1
3,there are a lot of highly talented filmmaker actor in germany now. none of them are associated with this movie . why in the world do producer actually invest money in something like this this you could have made good film with the budget of this garbage it's not entertaining to have seven grown men running around a dwarf pretending to be funny. what is funny though is that the film's producer ...,0,1,0.8907,1
4,i've just had the evidence that confirmed my suspicions. a bunch of kid to put on the dvd of titanic on a fantastic state of the art mega screen home entertainment type deal. only two of them had actually seen it before. but they all had seen the moment of kate leo and celine dion so many time that most of them felt they had seen the whole movie. shortly after the epic started they started to ...,0,2,0.0561,1


## AFINN Lexicon

The AFINN lexicon is a list of English terms manually rated for valence with an integer between -5 (negative) and +5 (positive) by Finn Årup Nielsen between 2009 and 2011. It contains 3300+ words with a polarity score associated with each word. It is perhaps the simplest and popular lexicons for sentiment analysis [[source]](https://www.geeksforgeeks.org/python-sentiment-analysis-using-affin/).

In [20]:
# instantiate AFINN lexicon
afn = Afinn()

### Accurate predictions using AFINN

In [21]:
# Positive sentiment review
print(train['text'][24900])
afn.score(train['text'][24900])

deeply humorous yet honest comedy about a bunch of grownup bill paxton julie warner kevin pollak elizabeth perkins vincent spano matt craven and diane lane who are invited back to spend a week to tomawka a camp in ontario canada by their former consuelor alan arkin . writer director mike binder drew upon his experience at the same camp a the main source of creating a gentle and understanding yarn that make sense. also the movie plenty of funny moment some of which are completely bizarre like my favorite the one involves using masking tape. newton thomas sigel the usual suspect three king provides the film with some impressive shot of the canadian wilderness. among the cast sam raimi director of the evil dead film and the gift appears here a arkin's bumbling right hand man. one more thing this film reassured me that a camp doesn't have to be a site of bloody murders.


5.0

In [22]:
# Negative sentiment review
print(train['text'][24903])
afn.score(train['text'][24903])

absolutely the worst film yet by burton who seems to be getting worse with each film he directs. a miserable script loaded with cliche is only the first of many objectionable aspect to this film. this is the kind of movie where every time something happens you'll be sure to hear someone shout out he's lost his gun or whatever it is to let everybody know. carter is really awful and so is wahlberg who can't play this straight and be convincing. very nice effect and photography but poor music in the john williams mold by burton's crony elfman. heston appears in a nonsensichal scene to spout out his most famous catch phrase from the first movie. very poor results. if anyone else out there also saw sleepy hollow they will probably have noticed a i have the declining quality of burton's films. i've heard that this particular project produced by others and that burton brought in a director in which case his judgement should be questioned. but i think he allowed any possible vision he might ha

-3.0

### Inaccurate predictions using AFINN

In [23]:
# Positive sentiment review
print(train['text'][0])
afn.score(train['text'][0])

# AFINN shows negative sentiment!
# Even though this is actually a positive review

zero day lead you to think even re think why two boy young men would do what they did commit mutual suicide via slaughtering their classmates. it capture what must be beyond a bizarre mode of being for two human who have decided to withdraw from common civility in order to define their own mutual world via coupled destruction. it is not a perfect movie but given what money time the filmmaker and actor had it is a remarkable product. in term of explaining the motif and action of the two young suicide murderer it is better than 'elephant' in term of being a film that get under our 'rationalistic' skin it is a far far better film than almost anything you are likely to see. flawed but honest with a terrible honesty.


-5.0

In [24]:
# Negative sentiment review
print(train['text'][3])
afn.score(train['text'][3])

# AFINN shows positive sentiment!
# Even though this is actually a negative review

there are a lot of highly talented filmmaker actor in germany now. none of them are associated with this movie . why in the world do producer actually invest money in something like this this you could have made good film with the budget of this garbage it's not entertaining to have seven grown men running around a dwarf pretending to be funny. what is funny though is that the film's producer who happens to be the oldest guy of the bunch is playing the youngest dwarf. the film is filled with moment that scream for caption saying you're supposed to laugh now . it's hard to believe that this crap's supposed to be a comedy. many people actually stood up and left the cinema minute into the movie. i should have done the same instead of wasting my time... pain


5.0

Similar to VADER, both these reviews were also wrongly predicted by AFINN lexicon model. The reason perhaps might be the same, summing up individual scores of each word and not taking into account descriptive or sarcastic words.

### AFINN Lexicon for all reviews

Applying AFINN lexicon model to all reviews for final comparison and evaluation later.

In [25]:
def get_afn_scores(text):
    return afn.score(text)

In [26]:
# Create new column for AFINN scores
train.loc[:,'afinn_scores'] = train.loc[:,'text'].apply(get_afn_scores)

# Create new column for AFINN sentiment
train.loc[:,'afinn_sentiment'] = train.loc[:,'afinn_scores'].apply(to_binary_sentiment)

In [27]:
train.head()

Unnamed: 0,text,sentiment,rate,v_compound,v_sentiment,afinn_scores,afinn_sentiment
0,zero day lead you to think even re think why two boy young men would do what they did commit mutual suicide via slaughtering their classmates. it capture what must be beyond a bizarre mode of being for two human who have decided to withdraw from common civility in order to define their own mutual world via coupled destruction. it is not a perfect movie but given what money time the filmmaker a...,1,10,-0.3816,0,-5.0,0
1,word can't describe how bad this movie is. i can't explain it by writing only. you have too see it for yourself to get at grip of how horrible a movie really can be. not that i recommend you to do that. there are so many clich s mistake and all other negative thing you can imagine here that will just make you cry. to start with the technical first there are a lot of mistake regarding the airpl...,0,1,0.7007,1,-16.0,0
2,everyone play their part pretty well in this little nice movie . belushi get the chance to live part of his life differently but end up realizing that what he had going to be just a good or maybe even better. the movie show u that we ought to take advantage of the opportunity we have not the one we do not or cannot have. if u can get this movie on video for around it d be an investment,1,10,0.9612,1,15.0,1
3,there are a lot of highly talented filmmaker actor in germany now. none of them are associated with this movie . why in the world do producer actually invest money in something like this this you could have made good film with the budget of this garbage it's not entertaining to have seven grown men running around a dwarf pretending to be funny. what is funny though is that the film's producer ...,0,1,0.8907,1,5.0,1
4,i've just had the evidence that confirmed my suspicions. a bunch of kid to put on the dvd of titanic on a fantastic state of the art mega screen home entertainment type deal. only two of them had actually seen it before. but they all had seen the moment of kate leo and celine dion so many time that most of them felt they had seen the whole movie. shortly after the epic started they started to ...,0,2,0.0561,1,-2.0,0


In [48]:
train['afinn_scores'].value_counts()

0.0      733
4.0      731
6.0      709
2.0      700
5.0      687
        ... 
135.0      1
100.0      1
103.0      1
124.0      1
77.0       1
Name: afinn_scores, Length: 208, dtype: int64

Interestingly, there are some scores that are above 5 or below -5. We are not sure the reason why.

## TextBlob Lexicon

TextBlob is a simple library which supports complex analysis and operations on textual data. It returns polarity and subjectivity of a sentence. Polarity lies between -1 and 1 (-1 defines a negative sentiment and 1 defines a positive sentiment). Negation words reverse the polarity. TextBlob has semantic labels that help with fine-grained analysis. For example, it takes into account emoticons and exclamation marks.

Subjectivity lies between 0 and 1. Subjectivity quantifies the amount of personal opinion and factual information contained in the text. Higher subjectivity means that the text contains personal opinion rather than factual information.

Lastly, TextBlob has an intensity parameter. It calculates subjectivity by looking at the 'intensity'. Intensity determines if a word modifies the next word in the sentence. Adverbs are also used as modifiers. [[source]](https://towardsdatascience.com/my-absolute-go-to-for-sentiment-analysis-textblob-3ac3a11d524#:~:text=TextBlob%20is%20a%20simple%20library,classifying%20negative%20and%20positive%20words)

For our analysis, we will only take a look at polarity scores from TextBlob. We first create columns for both polarity scores and sentiment prediction.

In [28]:
# Create a function to get polarity scores
def polarity(text): 
    return TextBlob(text).sentiment.polarity

In [29]:
# getting TextBlob polarity scores for all texts
train.loc[:,'tb_polarity'] = train.loc[:,'text'].apply(polarity)

# Create new column for TextBlob sentiment
train.loc[:,'tb_sentiment'] = train.loc[:,'tb_polarity'].apply(to_binary_sentiment)

In [30]:
train.head()

Unnamed: 0,text,sentiment,rate,v_compound,v_sentiment,afinn_scores,afinn_sentiment,tb_polarity,tb_sentiment
0,zero day lead you to think even re think why two boy young men would do what they did commit mutual suicide via slaughtering their classmates. it capture what must be beyond a bizarre mode of being for two human who have decided to withdraw from common civility in order to define their own mutual world via coupled destruction. it is not a perfect movie but given what money time the filmmaker a...,1,10,-0.3816,0,-5.0,0,0.091176,1
1,word can't describe how bad this movie is. i can't explain it by writing only. you have too see it for yourself to get at grip of how horrible a movie really can be. not that i recommend you to do that. there are so many clich s mistake and all other negative thing you can imagine here that will just make you cry. to start with the technical first there are a lot of mistake regarding the airpl...,0,1,0.7007,1,-16.0,0,-0.046733,0
2,everyone play their part pretty well in this little nice movie . belushi get the chance to live part of his life differently but end up realizing that what he had going to be just a good or maybe even better. the movie show u that we ought to take advantage of the opportunity we have not the one we do not or cannot have. if u can get this movie on video for around it d be an investment,1,10,0.9612,1,15.0,1,0.285552,1
3,there are a lot of highly talented filmmaker actor in germany now. none of them are associated with this movie . why in the world do producer actually invest money in something like this this you could have made good film with the budget of this garbage it's not entertaining to have seven grown men running around a dwarf pretending to be funny. what is funny though is that the film's producer ...,0,1,0.8907,1,5.0,1,0.125595,1
4,i've just had the evidence that confirmed my suspicions. a bunch of kid to put on the dvd of titanic on a fantastic state of the art mega screen home entertainment type deal. only two of them had actually seen it before. but they all had seen the moment of kate leo and celine dion so many time that most of them felt they had seen the whole movie. shortly after the epic started they started to ...,0,2,0.0561,1,-2.0,0,0.073993,1


### Accurate predictions using TextBlob

In [36]:
# Accurate positive review
print(train['text'][0])
train['tb_sentiment'][0]

zero day lead you to think even re think why two boy young men would do what they did commit mutual suicide via slaughtering their classmates. it capture what must be beyond a bizarre mode of being for two human who have decided to withdraw from common civility in order to define their own mutual world via coupled destruction. it is not a perfect movie but given what money time the filmmaker and actor had it is a remarkable product. in term of explaining the motif and action of the two young suicide murderer it is better than 'elephant' in term of being a film that get under our 'rationalistic' skin it is a far far better film than almost anything you are likely to see. flawed but honest with a terrible honesty.


1

In [37]:
# Accurate negative review
print(train['text'][1])
train['tb_sentiment'][1]

word can't describe how bad this movie is. i can't explain it by writing only. you have too see it for yourself to get at grip of how horrible a movie really can be. not that i recommend you to do that. there are so many clich s mistake and all other negative thing you can imagine here that will just make you cry. to start with the technical first there are a lot of mistake regarding the airplane. i won't list them here but just mention the coloring of the plane. they didn't even manage to show an airliner in the color of a fictional airline but instead used a painted in the original boeing livery. very bad. the plot is stupid and been done many time before only much much better. there are so many ridiculous moment here that i lost count of it really early. also i on the bad guys' side all the time in the movie because the good guy were so stupid. executive decision should without a doubt be you're choice over this one even the turbulence movie are better. in fact every other movie in 

0

Surprisingly, TextBlob predicted correctly for the first review in the dataframe (as compared to our VADER and AFINN models where both predicted incorrectly as a negative review).

### Inaccurate predictions using TextBlob

In [38]:
# Inaccurate review
print(train['text'][3])
train['tb_sentiment'][3]

there are a lot of highly talented filmmaker actor in germany now. none of them are associated with this movie . why in the world do producer actually invest money in something like this this you could have made good film with the budget of this garbage it's not entertaining to have seven grown men running around a dwarf pretending to be funny. what is funny though is that the film's producer who happens to be the oldest guy of the bunch is playing the youngest dwarf. the film is filled with moment that scream for caption saying you're supposed to laugh now . it's hard to believe that this crap's supposed to be a comedy. many people actually stood up and left the cinema minute into the movie. i should have done the same instead of wasting my time... pain


1

Looking at both correct and incorrect classifications using TextBlob model, it is hard to determine how TextBlob quantifies texts that are of personal opinion or factual information. But we can say for sure this particular ability of this model makes TextBlob stand out as compared to the other lexicon-based models.

## Lexicon Models Evaluation

With all 3 lexicon-based models done with the prediction of sentiment polarities of our reviews, let's see how each of them fare in terms of accuracy.

### VADER Lexicon

In [43]:
print(classification_report(train['sentiment'], train['v_sentiment']))
confusion_matrix(train['sentiment'], train['v_sentiment'])

              precision    recall  f1-score   support

           0       0.78      0.54      0.64     12432
           1       0.65      0.85      0.74     12472

    accuracy                           0.69     24904
   macro avg       0.72      0.69      0.69     24904
weighted avg       0.72      0.69      0.69     24904



array([[ 6670,  5762],
       [ 1843, 10629]], dtype=int64)

### AFINN Lexicon

In [44]:
print(classification_report(train['sentiment'], train['afinn_sentiment']))
confusion_matrix(train['sentiment'], train['afinn_sentiment'])

              precision    recall  f1-score   support

           0       0.79      0.58      0.67     12432
           1       0.67      0.85      0.75     12472

    accuracy                           0.71     24904
   macro avg       0.73      0.71      0.71     24904
weighted avg       0.73      0.71      0.71     24904



array([[ 7198,  5234],
       [ 1916, 10556]], dtype=int64)

### TextBlob Lexicon

In [45]:
print(classification_report(train['sentiment'], train['tb_sentiment']))
confusion_matrix(train['sentiment'], train['tb_sentiment'])

              precision    recall  f1-score   support

           0       0.89      0.43      0.58     12432
           1       0.62      0.95      0.75     12472

    accuracy                           0.69     24904
   macro avg       0.76      0.69      0.66     24904
weighted avg       0.76      0.69      0.66     24904



array([[ 5289,  7143],
       [  668, 11804]], dtype=int64)

Looking at the scores of each lexicon-based models, their results are quite underwhelming. The best lexicon model in terms of accuracy is the AFINN model, at an overall accuracy score of 0.71. Both VADER and TextBlob have the same accuracy scores at 0.69.

This is due to the fact that lexicon-based models are poor when it comes to consideration of sarcasm and comparison phrases. Often times, meaning of texts take into account phrases or sentences, instead of purely individual texts. However most lexicon-based models are only able to place heavy emphasis on individual words instead of overall sentences. This results in poor grasping of the context of the corpus.

In order to create a more robust model to predict sentiments, we cannot rely on lexicon-based models. We must instead use machine learning and labeled datasets to train our algorithms. In the next notebook, we explore using multiple machine learning algorithms and their effectiveness in sentiment analysis.