# Challenges of Sentiment Analysis

Note to DIGI405 students: Make sure you use the Python 3.12 Kernel to run this notebook.  

This notebook introduces challenges of sentiment measurement using a lexicon-based sentiment analysis tool called VADER (‘Valence Aware Dictionary for sEntiment Reasoning’). VADER is open source software, so you can [inspect the code](https://github.com/cjhutto/vaderSentiment) and lexicon. [VADER's GitHub repository](https://github.com/cjhutto/vaderSentiment) and [Hutto and Gilbert's 2014 paper about VADER](https://ojs.aaai.org/index.php/ICWSM/article/view/14550) are the best explanation of VADER and how the VADER lexicon and rules were derived.  

Although VADER is more than 10 years old, it is still commonly used. You can learn lots about how language expresses sentiment by using VADER, understanding how it works, when it works and when it doesn't. 

The following cell imports required libraries.

In [None]:
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from IPython.display import display, HTML
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from textplumber.vader import SentimentIntensityInterpreter

This is the basic setup required to start scoring texts with VADER.

In [None]:

analyzer = SentimentIntensityAnalyzer()

## 1. Learn about VADER scores

In the cell below is a short phrase to show you the output of VADER. Get VADER's scores for the provided text and make sure you understand what each number tells us.

In [None]:
example = '''
This movie is terrible.
'''
vs = analyzer.polarity_scores(example)
print(str(vs))

Read the "About the Scoring" section of the Vader Github README, which explains the scores that are returned by Vader:  
https://github.com/cjhutto/vaderSentiment#about-the-scoring

### 1.1 Questions

1. What do the 'neu', 'pos', and 'neg' scores represent?  
2. What range of values of the Compound Score should be associated with a "neutral" classification?  


## 2. Experiment scoring some texts to try to understand Vader's lexicon and booster/negation rules

 

Here's another example - you can copy and paste this code into new code cells to test out different phrases.

In [None]:
example = '''
The movie was almost great.
'''
vs = analyzer.polarity_scores(example)
print(str(vs))

### 2.1 Activities

Try different text and make sure you understand the scores VADER returns. Copy the code above into new cells below for each example you come up with.

Create examples for the following conditions:   

1. A sentence that is obviously positive e.g. "The movie is great".
2. A sentence that is obviously negative e.g. "The movie is terrible".
3. A sentence that uses a "booster" e.g. "The movie is really great".
4. A sentence that hedges e.g. "The movie is almost great".
5. A sentence that uses negation e.g. "The movie is not great". 

### 2.2 Aide your understanding of VADER

Look at the lexicon and the booster/negation words on the VADER repository to get more insight into the scores.

* The VADER module code is here: https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py  
* Negations and booster words are on lines 48-181.  
* The Vader lexicon is available here: https://github.com/cjhutto/vaderSentiment/blob/master/vaderSentiment/vader_lexicon.txt  Note: you can search the lexicon in your browser or you can download it and inspect it in a text editor.  
* Make sure you are clear what the values in the VADER lexicon actually mean.  

Here are some examples for your reference: 

    hope 	1.9 0.53852 [3, 2, 2, 1, 2, 2, 1, 2, 2, 2]
    hopeless -2.0 1.78885 [-3, -3, -3, -3, 3, -1, -3, -3, -2, -2]

* The VADER paper itself is helpful also: https://ojs.aaai.org/index.php/ICWSM/article/view/14550

## 3. A tool to aide interpreting VADER scores

Textplumber includes a [tool to explain VADER scores](https://geoffford.nz/textplumber/vader.html#sentimentintensityinterpreter). Run the following code to run the explain tool over the example texts below. You can change the examples if you like.  

In [None]:
interpreter = SentimentIntensityInterpreter()

In [None]:
examples = [
    'Marmite is good.',
    'Marmite is occasionally good.',
    'Marmite is not good.',
    'Marmite is a savoury food spread based on yeast extract. It is often used as a spread on toast. Marmite is controversial: it is loved by some and hated by others. ',
    'Marmite marmite marmite',
    'Marmite is very good.',
    '😀',
    ':) That Marmite was 😀!',
    'Marmite is bad.',
    'Marmite is good - and bad', 
    'I hate Marmite! :)',
    'I love Marmite :(',
]

print('Hint: Hover your mouse over specific words or the bar chart to see more information.  ')

for example in examples:
    interpreter.explain(example)

## 4. Challenges of sentiment analysis

Language does not always express sentiment in a straightforward way. VADER's word-level scoring allows us to understand some of the challenges of measuring sentiment. Below we explain a number of challenges with examples. For each challenge, replace the "Your example here" text with your own example to test it out. Rather than invent an example, think of some real-life examples that you have encountered over the last 24-48 hours (e.g. something you read online, something someone said to you, a message you received, etc) or go find something from a corpus we've used in the course. 

### Straightforward sentiment

Some texts are straightforward and unambiguous!

In [None]:
examples = [
    # from movie review (cv064_25842.txt)
    'all this leads to a finale that is so dumb , and so stupid that is it unbelievably dumb, and the stunts, dialogue and acting all ruin this movie',
    'Your example here',
]

print('Hint: Hover your mouse over specific words or the bar chart to see more information.  ')

for example in examples:
    interpreter.explain(example)

### Topics

What sentiment is being expressed about matters. Words that are often used to express sentiment are relevant for discussing specific topics (e.g. military conflict, crime, horror movies, comedy movies), but understood in the appropriate context they have nothing to do with evaluation.   

In [None]:
examples = [
    # a quote from an original review in The New York Review of the movie Dr Strangelove: https://www.nybooks.com/articles/1964/02/06/out-of-this-world/
    'The outline of the film is this: a psychotic right-wing general, convinced that the Communists are poisoning Americans through fluoridation, exercises emergency powers and sends a wing command to bomb the Soviet Union. ',
    'Your example here',
]

print('Hint: Hover your mouse over specific words or the bar chart to see more information.  ')

for example in examples:
    interpreter.explain(example)

### "Aspect"

Sentiment as it is expressed in texts is often about a specific aspect of what is being discussed. For example, in relation to a movie, this could relate to the performances of the actors, the plot, the ending, and so on. When measuring sentiment across a text we may not be able to identify what specifically is being evaluated. This means we have to be careful about the claims we make about the meaning of sentiment scores. There are aspect-based sentiment analysis tools that attempt to address this challenge. 

In [None]:
examples = [
    'The hotel was good but the food terrible', # from an online hotel review
    'Your example here',
]

print('Hint: Hover your mouse over specific words or the bar chart to see more information.  ')

for example in examples:
    interpreter.explain(example)

### Choice of words

A word-level scoring tool like VADER allows us to see examples where individual word choices do not add up straightforwardly to accurate sentiment scores. In each of the examples below, the combination of words and what is unsaid leads to sentiment predictions that do not match the sentiment expressed in the text. There are also words that can be used to express sentiment in positive *and* negative ways (e.g. "That's sick!", "That's mean!").

In [None]:
examples = [
    'I was really glad when this movie was over', # relies on understanding that wishing for the end of a movie ("over") is not a good thing
    'The service was a joke', # jokes are usually something fun, but the alignment of service and joke here expresses 
    'He has an attitude', # "an attitude" is a way to express "a bad attitude"
    'Your example here',
]

print('Hint: Hover your mouse over specific words or the bar chart to see more information.  ')

for example in examples:
    interpreter.explain(example)

### Sarcasm

> "Sarcasm refers to the use of words that mean the opposite of what you really want to say, especially in order to insult someone, or to show irritation, or just to be funny. For example, saying "they're really on top of things" to describe a group of people who are very disorganized is using sarcasm. Most often, sarcasm is biting, and intended to cause pain. Irony can also refer to the use of words that mean the opposite of what you really want to say; the "they're really on top of things" statement about the very disorganized group of people can also be described as an ironic statement." ([merriam-webster.com](https://www.merriam-webster.com/dictionary/sarcasm))

Sarcasm is challenging because the intention of the speaker is express the opposite of the literal meaning of the words. This is a challenge for all sentiment analysis tools. Sarcasm is also a challenge for humans asked to assess sentiment, particularly if the context is not clear. 

In [None]:
examples = [
    'The bathroom was a sight to behold, I loved the moldy sink', # the use of moldy gives away the intended meaning
    'Your example here',
]

print('Hint: Hover your mouse over specific words or the bar chart to see more information.  ')

for example in examples:
    interpreter.explain(example)

### Negation

Negation reverses the meaning of words, but when applied to evaluation this is not always straightforward. For instance, "terrible" is the antonym of "great", but saying something is "not great" or "terrible" are qualitatively different evaluations. In the examples below, do "no problems" and "wasn't unhappy" express positivity or more neutral sentiment? 

In [None]:
examples = [
    'I had no problems with this phone', 
    "I wasn't unhappy with the results", 
    'Your example here',
]

print('Hint: Hover your mouse over specific words or the bar chart to see more information.  ')

for example in examples:
    interpreter.explain(example)

### Vagueness

Vagueness, whether indended or not, makes sentiment difficult to measure. Vagueness can be expressed in different ways (e.g. through complex linguistic constructions, talking in general abstract terms about specific things, through hedging), but the effect is that it makes it difficult to determine what a person really mean. People may want to avoid committing to a strong position, or they may want to obscure their true position, or vagueness may be unintentional. In the example below, from New Zealand's Parliament, the speaker is implying the opposition member's account of a problem may not be the full story, but is also not denying a problem exists. What do they really think? That is left unsaid. 

In [None]:

examples = [
    'I must say to the member that given his description, there is something interesting, if not tricky, about the particular case if the circumstances are as he outlines.',
    'Your example here',
]

print('Hint: Hover your mouse over specific words or the bar chart to see more information.  ')

for example in examples:
    interpreter.explain(example)

### Addressing these issues and realising some limitations

Language models that have been pre-trained on very large corpora and then fine-tuned for sentiment analysis can address some of these issues. Performance on sentiment classification does depend on the training data used and whether the texts intended for inferring sentiment are similar to the training data. 

## 5. It might be obvious, but remember: Not all texts express sentiment!

Sentiment analysis may not be an appropriate technique for analysing some kinds of texts. For example, some texts are not primarily about presenting a point of view or evaluation (e.g. journalistic texts, scientific writing) and, as already raised in the vagueness example, in some domains authors/speakers are motivated to express their point of view in an ambiguous way (e.g. a politician answering tricky questions). Take some time to explore some different kinds of texts (e.g. editorials, fiction, tweets, news articles, political speeches, texts from the corpora you've worked with through the course). VADER will tend to perform better with short texts, so make sure you try texts of different lengths to see the effect.  

In [None]:
# this is some text from New Zealand's Wikipedia page
our_example = '''
The islands of New Zealand were the last large habitable land to be settled by humans. Between about 1280 and 1350, Polynesians began to settle in the islands and subsequently developed a distinctive Māori culture. In 1642, the Dutch explorer Abel Tasman became the first European to sight and record New Zealand. In 1769 the British explorer Captain James Cook became the first European to set foot on and map New Zealand. In 1840, representatives of the United Kingdom and Māori chiefs signed the Treaty of Waitangi which paved the way for Britain's declaration of sovereignty later that year and the establishment of the Crown Colony of New Zealand in 1841. Subsequently, a series of conflicts between the colonial government and Māori tribes resulted in the alienation and confiscation of large amounts of Māori land. New Zealand became a dominion in 1907; it gained full statutory independence in 1947, retaining the monarch as head of state. Today, the majority of New Zealand's population of around 5.3 million is of European descent; the indigenous Māori are the largest minority, followed by Asians and Pasifika. Reflecting this, New Zealand's culture is mainly derived from Māori and early British settlers but has recently broadened from increased immigration. The official languages are English, Māori, and New Zealand Sign Language, with the local dialect of English being dominant. 
'''

your_example = '''
Your example here.
'''

interpreter.explain(our_example)
interpreter.explain(your_example)

## Wrapping up

One thing to take away from working through this notebook is that measuring sentiment can be challenging. Do yourself a favour and look at the texts you want to apply sentiment analysis to and examine how they express sentiment and identify likely challenges before you start measuring sentiment across the texts, and making claims about sentiment to others.  

Here are some questions to consider in wrapping up this notebook: 

* How does VADER perform on different kinds of texts?  
* What kinds of texts are challenging for a lexicon-based approach to sentiment analysis?  
* What kinds of texts are not appropriate for sentiment analysis?  
* If you apply sentiment analysis to texts that are not appropriate for sentiment analysis, what are you measuring?  

