# Sentiment Analysis using VADER


Sentiment Analysis or Opinion Mining is a sub-field of Natural Language Processing (NLP) that tries to identify and extract opinions within a given text. The aim of sentiment analysis is to gauge the attitude, sentiments, evaluations, attitudes and emotions of a speaker/writer based on the computational treatment of subjectivity in a text.

Though it may seem easy on paper, Sentiment Analysis is actually a tricky subject. There are various reasons for that:

- Understanding emotions through text are not always easy. Sometimes even humans can get misled.


- A text may contain multiple sentiments all at once. For instance, “The intent behind the movie was great, but it could have been better”. This sentence consists of two polarities, i.e., Positive as well as Negative. So how do we conclude whether the review was Positive or Negative?


- Computers aren’t too comfortable in comprehending Figurative Speech. Figurative language uses words in a way that deviates from their conventionally accepted definitions in order to convey a more complicated meaning or heightened effect. Use of similes, metaphors, hyperboles etc qualify for a figurative speech: “The best I can say about the movie is that it was interesting.” Here, the word ’interesting’ does not necessarily convey positive sentiment and can be confusing for algorithms.


- Heavy use of emoticons and slangs with sentiment values in social media texts like that of Twitter and Facebook also makes text analysis difficult. For example a “ :)” denotes a smiley and generally refers to positive sentiment while “:(” denotes a negative sentiment on the other hand. Also, acronyms like “LOL“, ”OMG” and commonly used slangs like “Nah”, “meh”, ”giggly” etc are also strong indicators of some sort of sentiment in a sentence.


**VADER (Valence Aware Dictionary and sEntiment Reasoner)** belongs to a type of sentiment analysis that is based on lexicons of sentiment-related words. In this approach, each of the words in the lexicon is rated as to whether it is positive or negative, and in many cases, how positive or negative. Below you can see an excerpt from VADER’s lexicon, where more positive words have higher positive ratings and more negative words have lower negative ratings.

```
Word	  Sentiment rating
tragedy	    -3.4
rejoiced   	 2.0
insane	     -1.7
disaster   	-3.1
great	       3.1
```

To work out whether these words are positive or negative (and optionally, to what degree), the developers of these approaches need to get a bunch of people to manually rate them, which is obviously pretty expensive and time-consuming. In addition, the lexicon needs to have good coverage of the words in your text of interest, otherwise it won’t be very accurate. On the flipside, when there is a good fit between the lexicon and the text, this approach is accurate, and additionally quickly returns results even on large amounts of text.

In [1]:
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

For this example, we'll use a curated dataset of Amazon reviews for toys and games filtered to products with more than 500 reviews. This dataset was collected and cleaned in the `0.Data Scraping and Prep` notebook.

In [10]:
reviews = pd.read_csv('../data/reviews_amazon_small.csv')

When VADER analyses a piece of text it checks to see if any of the words in the text are present in the lexicon. For example, the sentence “The food is good and the atmosphere is nice” has two words in the lexicon (good and nice) with ratings of 1.9 and 1.8 respectively. If none of the words in the text appear in the lexicon, it returns an error.

VADER produces four sentiment metrics from these word ratings. The first three, positive, neutral and negative, represent the proportion of the text that falls into those categories. The final metric, the compound score, is the sum of all of the lexicon ratings normalized to range between -1 and 1.

In [16]:
reviews['negative'] = 0
reviews['neutral'] = 0
reviews['positive'] = 0
reviews['compound'] = 0

for n in range(len(df)):
    try:
        vs = analyzer.polarity_scores(df.loc[n, 'text'])
        reviews.loc[n,'negative'] = vs['neg']
        reviews.loc[n,'neutral'] = vs['neu']
        reviews.loc[n,'positive'] = vs['pos']
        reviews.loc[n,'compound'] = vs['compound']
        
    except:
        pass

We can compare the compound scores from VADER against the ratings left by reviewers:

In [17]:
rating_vs_compound = pd.crosstab(reviews['rating'], reviews['compound'] > 0)
rating_vs_compound = rating_vs_compound.T / rating_vs_compound.sum(axis = 1) * 100
rating_vs_compound = rating_vs_compound.T

In [18]:
import matplotlib.pyplot as plt
from matplotlib import colors

def background_gradient(s, m, M, cmap='PuBu', low=0, high=0):
    rng = M - m
    norm = colors.Normalize(m - (rng * low),
                            M + (rng * high))
    normed = norm(s.values)
    c = [colors.rgb2hex(x) for x in plt.cm.get_cmap(cmap)(normed)]
    return ['background-color: %s' % color for color in c]

rating_vs_compound.round(2).style.apply(background_gradient,
               cmap='YlGnBu',
               m=rating_vs_compound.min().min(),
               M=rating_vs_compound.max().max(),
               low=0.5,
               high=0.8)

compound,False,True
rating,Unnamed: 1_level_1,Unnamed: 2_level_1
1.0,53.3,46.7
2.0,35.94,64.06
3.0,20.03,79.97
4.0,5.06,94.94
5.0,2.41,97.59


While reviews matched with high ratings (4 and 5 stars) tend to have positive sentiment (defined as a compound score greater than zero, reviews with the lowest possible score (1 star) are split almost evenly between having positive and negative sentiment. This suggests that the compound sentiment of any one review may not be enough to predict the corresponding rating.

To understand the variability in sentiment among reviews, we can look at the statistics of compound scores for reviews of each rating:

In [36]:
def high_value_text_color(val):
    """
    Takes a scalar and returns a string with
    the css property `'color: white'` for very high
    values, black otherwise.
    """
    color = 'white' if val > 1 else 'black'
    return 'color: %s' % color

desc = reviews[['rating', 'compound']].groupby('rating').describe()

desc.round(2).style.apply(background_gradient,
               cmap='YlGnBu',
               m=0,
               M=1,
               low=0.25,
               high=0.75,
           ).applymap(high_value_text_color)

Unnamed: 0_level_0,compound,compound,compound,compound,compound,compound,compound,compound
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
rating,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
1.0,182,0.0,0.68,-0.98,-0.64,-0.04,0.69,1
2.0,256,0.23,0.67,-0.98,-0.39,0.43,0.87,1
3.0,724,0.52,0.57,-0.99,0.24,0.79,0.95,1
4.0,1718,0.81,0.34,-0.97,0.83,0.94,0.97,1
5.0,6839,0.87,0.24,-0.99,0.88,0.95,0.98,1


The most significant observation from this table is that there is a large class imbalance between each rating class. While this doesn't affect our sentiment analysis, it would be a concern if we were to use these data for other purposes.

The mean values of the compound scores follow the same pattern as seen above. Interestingly, the standard deviation in compound scores follows the opposite pattern, meaning that the sentiment of reviews with low ratings is more variable than for reviews with high ratings.

Read through the text of reviews with opposite ratings and compound scores. Even very negative reviews contain words with very positive sentiment like "enjoy", "love", "nice", and "great". You can also try splitting up a review into segments and exploring their scores.

In [46]:
for t in reviews.loc[(reviews['rating'] == 1) & (reviews['compound'] > 0.96), 'text']: print(t, '\n')

A Dud... I just got word from my sis in law that the heli we sent my nephew for Christmas is a dud. How sad... He really was looking fwd to this gift. The box seal was broken upon arrival and my sis said it looks like it might have been opened, but she would wrap it and give it to him anyways and turns out it doesnt even work. We purchased this from SVM but it was fulfilled through Amazon (I recently just read some knock off Syma helicopters were being sold through here so I wonder if ours was one or if it was truly a dud) and they have initiated a refund for us and sent out a new one already. So I give Amazon 5 stars for the effort! I hope the new one works. 

Terrible for young would-be readers This toy would have been good for youngsters if they had simply used lower-case letters.  Unfortunately, focusing exclusively on the upper-case alphabet is disastrous for those learning to read.LeapFrog forces you to buy an expansion pack to get the lower-case letters, an extra hidden expense.