# DIGI405 Lab Class 11: Sentiment Analysis

This week’s class will investigate lexicon-based sentiment analysis with Vader (‘Valence Aware Dictionary for sEntiment Reasoning’). Vader is open source software, so you can inspect the code and modify it if you wish. In this week’s lab we will mainly refer to the lexicon.

The following cells imports libraries and creates a SentimentIntensityAnalyzer object.

In [None]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import pandas as pd
pd.set_option('display.max_colwidth', 140)
analyzer = SentimentIntensityAnalyzer()

Read the "About the Scoring" section of the Vader Github README, which explains the scores that are returned by Vader:  
https://github.com/cjhutto/vaderSentiment#about-the-scoring

**QUESTION:** What range of values of the Compound Score should be associated with a "neutral" classification?


## Score some text and understand Vader's lexicon and booster/negation rules

In the cell below is a short phrase to show you the output of Vader. 

First, run it on this text and make sure you understand what each number tells us. 

**ACTIVITY:** Try different text and make sure you understand the scores Vader returns.

Try:
1. A sentence that is obviously positive like "The movie is great"
2. A sentence that uses a "booster" e.g. "The movie is really terrible"
3. A sentence that uses negation e.g. "The movie is not great". 
4. Some sentences that attempts to fool Vader. 

Look at the lexicon and the booster/negation words in code so you get more insight into the scores. 

The main Vader module (including negations and booster words on lines 48-181): https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py 

The Vader lexicon, which you can search in your browser or download and use as a text file:
https://github.com/cjhutto/vaderSentiment/blob/master/vaderSentiment/vader_lexicon.txt 

Make sure you are clear what the values in the Vader lexicon actually mean. Here are some examples for your reference:

    hope 	1.9 0.53852 [3, 2, 2, 1, 2, 2, 1, 2, 2, 2]
    hopeless -2.0 1.78885 [-3, -3, -3, -3, 3, -1, -3, -3, -2, -2]

In [None]:
example = '''
The movie is terrible.
'''
vs = analyzer.polarity_scores(example)
print(str(vs))

## Scoring a whole review

This is a review from the movie reviews dataset we used last week. 

Run the cell below to get the scores for this movie review.

**ACTIVITY:**
Download the dataset here: https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/movie_reviews.zip 

Try some different reviews from the dataset and see what scores Vader comes up with. 

**QUESTION:** Are the scores correct against the actual label?

In [None]:
review = '''
no film in recent has left me with such conflicted feelings as neil jordan's harrowing , humorous , horrifying adaptation of patrick mccabe's novel about young lad francie brady's ( eamonn owens ) descent into madness in 1960s ireland . 
on one hand , it was difficult for me to become invested in francie's story because he is such an unsavory character , unjustifyably venting his rage at his nosy but otherwise harmless neighbor mrs . nugent ( fiona shaw ) . 
on another hand , i found it difficult to laugh at some of francie's darkly comic shenanigans because he obviously is such a sick , needy child , having been raised by a drunken father ( stephen rea ) and a suicidal mother ( aisling o'sullivan ) . 
on yet another hand , i also found it difficult to completely sympathize with francie during his more emotional scenes because some of his , for lack of a better word , " bad " deeds are so incredibly shocking in their brutality and the malicious glee in which he performs them . 
however , the butcher boy's power is undeniable , and the film as a whole is unforgettable--perhaps because it is so disturbing . 
what makes it so unsettling is the francie's overall wink-wink yet matter-of-fact attitude about everything , expressed in a cheeky voiceover narration delivered by the adult francie ( rea again ) . 
think heavenly creatures played largely for laughs , and you'll sort of understand . 
anchoring the whole film is the astonishing debut performance of owens ; love francie or hate him , you cannot take your eyes off of owens . 
the butcher boy truly is a twisted , unusual film that is bound to make just about anyone uncomfortable . 
in the lobby after the screening , i overheard one man raving about how great yet disturbing it was ; i also heard one particularly offended woman say with disgust , " that movie was so unfunny ! " 
 " i didn't know what to expect . 
it's like something you chase for so long , but then you don't know how to react when you get it . 
i still don't know how to react . " 
--michael jordan , on winning his first nba championship in 1991 . . . or , 
my thoughts after meeting him on november 21 , 1997 
'''
print(review)

In [None]:
vs = analyzer.polarity_scores(review)
print(str(vs))

The compound scores are accurate more often than not, but accuracy is not great on these long texts (around 65%). Software like Vader works better on short texts. This is what it was designed for. We can use this functionality to understand some of the problems deriving overall sentiment scores using a lexicon-based approach and some of the challenges of measuring sentiment more generally.

## Looking at sentiment scores for each sentence

Let’s look at an example review to think about the different frames of reference to which sentiments might be connected. The example we will use is a review of Neil Jordan’s film The Butcher Boy filename cv079_11933.txt. 

A descriptive statement describes the content of the film. Eg sentence 3: Francie is a “sick, needy child” - this tells us about what happens in the film.

An analytic statement analyses the content of the film. 

Eg sentence 3: “I found it difficult to laugh at some of Francie’s darkly comic shenanigans” - here the reviewer is analysing the effects of the film.

It’s not a perfect distinction, but we can observe that negative content in the film doesn’t necessarily imply a negative review of the film. Both types of statements can include evaluative language and include indications of the reviewer's point of view about the movie, but lexicon-based sentiment analysis will have difficulty if a review has a lot of “negative” content, but is nonetheless given a positive review.

**ACTIVITY:** Run the following cells to get scores for each sentence.

In [None]:
review = '''
no film in recent has left me with such conflicted feelings as neil jordan's harrowing , humorous , horrifying adaptation of patrick mccabe's novel about young lad francie brady's ( eamonn owens ) descent into madness in 1960s ireland . 
on one hand , it was difficult for me to become invested in francie's story because he is such an unsavory character , unjustifyably venting his rage at his nosy but otherwise harmless neighbor mrs . nugent ( fiona shaw ) . 
on another hand , i found it difficult to laugh at some of francie's darkly comic shenanigans because he obviously is such a sick , needy child , having been raised by a drunken father ( stephen rea ) and a suicidal mother ( aisling o'sullivan ) . 
on yet another hand , i also found it difficult to completely sympathize with francie during his more emotional scenes because some of his , for lack of a better word , " bad " deeds are so incredibly shocking in their brutality and the malicious glee in which he performs them . 
however , the butcher boy's power is undeniable , and the film as a whole is unforgettable--perhaps because it is so disturbing . 
what makes it so unsettling is the francie's overall wink-wink yet matter-of-fact attitude about everything , expressed in a cheeky voiceover narration delivered by the adult francie ( rea again ) . 
think heavenly creatures played largely for laughs , and you'll sort of understand . 
anchoring the whole film is the astonishing debut performance of owens ; love francie or hate him , you cannot take your eyes off of owens . 
the butcher boy truly is a twisted , unusual film that is bound to make just about anyone uncomfortable . 
in the lobby after the screening , i overheard one man raving about how great yet disturbing it was ; i also heard one particularly offended woman say with disgust , " that movie was so unfunny ! " 
 " i didn't know what to expect . 
it's like something you chase for so long , but then you don't know how to react when you get it . 
i still don't know how to react . " 
--michael jordan , on winning his first nba championship in 1991 . . . or , 
my thoughts after meeting him on november 21 , 1997 
'''

# this splits the review by newlines and removes any empty strings
sentences = []
for sentence in review.splitlines():
    if sentence:
        sentences.append(sentence)

sentences

In [None]:
df = pd.DataFrame(columns=['sentence','neg','neu','pos','compound'])
for sentence in sentences:
    vs = analyzer.polarity_scores(sentence)
    vs['sentence'] = sentence
    df = df.append(dict(vs), ignore_index=True)

df

**ACTIVITY:** Look closely at each sentence and work out which ones relate to the reviewer's evaluation of the movie. 

**QUESTION:** Is Vader doing a good job of scoring these sentences?

**ACTIVITY:** 
Try this with another review. You will need to replace the review text using one of the reviews from the movie reviews dataset you downloaded above and rerun the cells. Look carefully at the positively and negatively evaluated sentences using the compound score. 

**QUESTION:** 
From this analysis, what challenges do you see in correctly assigning overall sentiment scores to movie reviews?

**ACTIVITY:** 
In class this week we discussed how sentiment analysis might not be an appropriate technique for analysing some kinds of texts. For example, some texts are not primarily about presenting a point of view or evaluation (e.g. journalistic texts, scientific writing) and authors/speakers don't always present their evaluations in a straightforward way (e.g. some political texts).  

Take some time to explore some different kinds of texts (e.g. editorials, fiction, tweets, news articles, political speeches, texts from the corpus you built for the Corpus Building Project). Vader will tend to perform better with short texts, so make sure you try texts of different lengths.

**QUESTION:** 
How does Vader perform on different kinds of texts? What kinds of texts are challenging for a lexicon-based approach to sentiment analysis? What kinds of texts are not appropriate for sentiment analysis?

**This is it for the labs for DIGI405! Before you go today – make sure you thank your tutors for all their help and support during the course!**