# **Exploring Sentiment Analysis in Natural Language Processing (NLP) with Vader**

In this notebook, we embark on a journey into the world of Natural Language Processing (NLP) and the fascinating realm of sentiment analysis. Our focus will be on lexicon-based sentiment analysis, harnessed through the formidable tool known as Vader ('Valence Aware Dictionary for Sentiment Reasoning').

NLP stands at the crossroads of computer science and linguistics, aiming to equip computers with the capability to comprehend and interpret human language. Within this field, sentiment analysis plays a pivotal role, enabling us to discern the emotional tone and sentiments expressed within textual data.

Vader, an open-source software, accompanies us as a trusted ally on this analytical odyssey. Remarkably, it offers the flexibility for you to examine and tailor its code to align with your specific requirements.

Throughout this week's lab sessions, our central emphasis will revolve around the lexicon, an integral component in sentiment analysis. We shall delve into how Vader effectively harnesses this lexicon to assign sentiment scores to text data.

To kickstart our exploration into the domain of NLP-based sentiment analysis using Vader, the subsequent cells will perform the necessary tasks of importing essential libraries and instantiating a SentimentIntensityAnalyzer object.

In [1]:
!pip install vaderSentiment

Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/126.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━[0m [32m92.2/126.0 kB[0m [31m2.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m126.0/126.0 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: vaderSentiment
Successfully installed vaderSentiment-3.3.2


In [2]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import pandas as pd
pd.set_option('display.max_colwidth', 140)
analyzer = SentimentIntensityAnalyzer()

Please refer to the "About the Scoring" section in the Vader Github README at the following link: [Vader Scoring Information](https://github.com/cjhutto/vaderSentiment#about-the-scoring).

In the README, there is an informative section on scoring, which provides insights into the scores generated by Vader:

For those interested in understanding the scoring mechanism, particularly researchers seeking to establish standardized thresholds for classifying sentences into positive, neutral, or negative sentiments, the README includes valuable details. Commonly cited threshold values in relevant literature are as follows:

- Positive sentiment: A compound score greater than or equal to 0.05.
- Neutral sentiment: A compound score greater than -0.05 and less than 0.05.
- Negative sentiment: A compound score less than or equal to -0.05.

It's worth noting that the compound score is the primary metric widely adopted by researchers, including the authors themselves, for sentiment analysis purposes.

We can quickly test the Vader lexicon

In [3]:
example = '''
The movie is terrible.
'''
vs = analyzer.polarity_scores(example)
print(str(vs))

{'neg': 0.508, 'neu': 0.492, 'pos': 0.0, 'compound': -0.4767}


## Evaluating an Entire Review

In the following section, we will examine a movie review sourced from the NLTK movie reviews dataset. To obtain sentiment scores for this particular review, please execute the cell below.

If you wish to access the dataset, you can download it from the following link or load it directly into our notebook: [NLTK Movie Reviews Dataset](https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/movie_reviews.zip).

In [5]:
review = '''
no film in recent has left me with such conflicted feelings as neil jordan's harrowing , humorous , horrifying adaptation of patrick mccabe's novel about young lad francie brady's ( eamonn owens ) descent into madness in 1960s ireland .
on one hand , it was difficult for me to become invested in francie's story because he is such an unsavory character , unjustifyably venting his rage at his nosy but otherwise harmless neighbor mrs . nugent ( fiona shaw ) .
on another hand , i found it difficult to laugh at some of francie's darkly comic shenanigans because he obviously is such a sick , needy child , having been raised by a drunken father ( stephen rea ) and a suicidal mother ( aisling o'sullivan ) .
on yet another hand , i also found it difficult to completely sympathize with francie during his more emotional scenes because some of his , for lack of a better word , " bad " deeds are so incredibly shocking in their brutality and the malicious glee in which he performs them .
however , the butcher boy's power is undeniable , and the film as a whole is unforgettable--perhaps because it is so disturbing .
what makes it so unsettling is the francie's overall wink-wink yet matter-of-fact attitude about everything , expressed in a cheeky voiceover narration delivered by the adult francie ( rea again ) .
think heavenly creatures played largely for laughs , and you'll sort of understand .
anchoring the whole film is the astonishing debut performance of owens ; love francie or hate him , you cannot take your eyes off of owens .
the butcher boy truly is a twisted , unusual film that is bound to make just about anyone uncomfortable .
in the lobby after the screening , i overheard one man raving about how great yet disturbing it was ; i also heard one particularly offended woman say with disgust , " that movie was so unfunny ! "
 " i didn't know what to expect .
it's like something you chase for so long , but then you don't know how to react when you get it .
i still don't know how to react . "
--michael jordan , on winning his first nba championship in 1991 . . . or ,
my thoughts after meeting him on november 21 , 1997
'''

In [6]:
vs = analyzer.polarity_scores(review)
print(str(vs))

{'neg': 0.148, 'neu': 0.706, 'pos': 0.146, 'compound': 0.0831}


#### Evaluating Compound Scores: A Perspective on Accuracy

While compound scores tend to be reasonably accurate in many cases, their accuracy diminishes when applied to lengthy texts, hovering at around 65%. It's important to note that software like Vader performs optimally when dealing with shorter texts, which aligns with its original design intent. This observation leads us to explore some of the challenges and limitations inherent in deriving overall sentiment scores through a lexicon-based approach, shedding light on the complexities of sentiment measurement.


## Examining Sentiment Scores on a Sentence Level

To gain deeper insights, let's consider the nuances of sentiment within individual sentences. To illustrate this, we will examine a review of Neil Jordan's film, "The Butcher Boy," specifically sourced from the file named `cv079_11933.txt`.

Within this review, we encounter two distinct frames of reference through which sentiments can be connected:

1. **Descriptive Statements:** These statements provide insights into the film's content. For instance, in sentence 3, we find the description: "Francie is a 'sick, needy child'"—this conveys what occurs in the film.

2. **Analytic Statements:** In contrast, analytic statements delve into the examination and analysis of the film's content. Sentence 3 also offers an example: "I found it difficult to laugh at some of Francie's darkly comic shenanigans." Here, the reviewer is critically analysing the film's impact.

It's important to acknowledge that these distinctions are not always clear-cut. Both types of statements can incorporate evaluative language and provide indications of the reviewer's perspective on the movie. Nevertheless, a key takeaway is that lexicon-based sentiment analysis can encounter challenges when a review contains a substantial amount of "negative" content but still receives a positive overall assessment.

In [7]:
review = '''
no film in recent has left me with such conflicted feelings as neil jordan's harrowing , humorous , horrifying adaptation of patrick mccabe's novel about young lad francie brady's ( eamonn owens ) descent into madness in 1960s ireland .
on one hand , it was difficult for me to become invested in francie's story because he is such an unsavory character , unjustifyably venting his rage at his nosy but otherwise harmless neighbor mrs . nugent ( fiona shaw ) .
on another hand , i found it difficult to laugh at some of francie's darkly comic shenanigans because he obviously is such a sick , needy child , having been raised by a drunken father ( stephen rea ) and a suicidal mother ( aisling o'sullivan ) .
on yet another hand , i also found it difficult to completely sympathize with francie during his more emotional scenes because some of his , for lack of a better word , " bad " deeds are so incredibly shocking in their brutality and the malicious glee in which he performs them .
however , the butcher boy's power is undeniable , and the film as a whole is unforgettable--perhaps because it is so disturbing .
what makes it so unsettling is the francie's overall wink-wink yet matter-of-fact attitude about everything , expressed in a cheeky voiceover narration delivered by the adult francie ( rea again ) .
think heavenly creatures played largely for laughs , and you'll sort of understand .
anchoring the whole film is the astonishing debut performance of owens ; love francie or hate him , you cannot take your eyes off of owens .
the butcher boy truly is a twisted , unusual film that is bound to make just about anyone uncomfortable .
in the lobby after the screening , i overheard one man raving about how great yet disturbing it was ; i also heard one particularly offended woman say with disgust , " that movie was so unfunny ! "
 " i didn't know what to expect .
it's like something you chase for so long , but then you don't know how to react when you get it .
i still don't know how to react . "
--michael jordan , on winning his first nba championship in 1991 . . . or ,
my thoughts after meeting him on november 21 , 1997
'''

# this splits the review by newlines and removes any empty strings
sentences = []
for sentence in review.splitlines():
    if sentence:
        sentences.append(sentence)

sentences

["no film in recent has left me with such conflicted feelings as neil jordan's harrowing , humorous , horrifying adaptation of patrick mccabe's novel about young lad francie brady's ( eamonn owens ) descent into madness in 1960s ireland . ",
 "on one hand , it was difficult for me to become invested in francie's story because he is such an unsavory character , unjustifyably venting his rage at his nosy but otherwise harmless neighbor mrs . nugent ( fiona shaw ) . ",
 "on another hand , i found it difficult to laugh at some of francie's darkly comic shenanigans because he obviously is such a sick , needy child , having been raised by a drunken father ( stephen rea ) and a suicidal mother ( aisling o'sullivan ) . ",
 'on yet another hand , i also found it difficult to completely sympathize with francie during his more emotional scenes because some of his , for lack of a better word , " bad " deeds are so incredibly shocking in their brutality and the malicious glee in which he performs t

In [10]:
# Create an empty DataFrame with column names
df = pd.DataFrame(columns=['sentence', 'neg', 'neu', 'pos', 'compound'])

# Initialize the sentiment analyzer
analyzer = SentimentIntensityAnalyzer()

# Process each sentence and add it to the DataFrame
for sentence in sentences:
    vs = analyzer.polarity_scores(sentence)
    vs['sentence'] = sentence
    df = pd.concat([df, pd.DataFrame([vs])], ignore_index=True)

# Now df contains the sentiment scores for each sentence
df

Unnamed: 0,sentence,neg,neu,pos,compound
0,"no film in recent has left me with such conflicted feelings as neil jordan's harrowing , humorous , horrifying adaptation of patrick mcc...",0.181,0.719,0.101,-0.5994
1,"on one hand , it was difficult for me to become invested in francie's story because he is such an unsavory character , unjustifyably ven...",0.126,0.777,0.097,-0.1027
2,"on another hand , i found it difficult to laugh at some of francie's darkly comic shenanigans because he obviously is such a sick , need...",0.207,0.683,0.111,-0.7096
3,"on yet another hand , i also found it difficult to completely sympathize with francie during his more emotional scenes because some of h...",0.217,0.613,0.17,-0.5233
4,"however , the butcher boy's power is undeniable , and the film as a whole is unforgettable--perhaps because it is so disturbing .",0.162,0.838,0.0,-0.6418
5,"what makes it so unsettling is the francie's overall wink-wink yet matter-of-fact attitude about everything , expressed in a cheeky voic...",0.0,1.0,0.0,0.0
6,"think heavenly creatures played largely for laughs , and you'll sort of understand .",0.0,0.534,0.466,0.8625
7,"anchoring the whole film is the astonishing debut performance of owens ; love francie or hate him , you cannot take your eyes off of owe...",0.112,0.76,0.128,0.128
8,"the butcher boy truly is a twisted , unusual film that is bound to make just about anyone uncomfortable .",0.111,0.766,0.123,0.0772
9,"in the lobby after the screening , i overheard one man raving about how great yet disturbing it was ; i also heard one particularly offe...",0.2,0.694,0.106,-0.6793
