<a href="https://colab.research.google.com/github/cwmarris/pull-request-monitor/blob/master/OH_Introduction_to_NLP_02_SentimentOverview.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Analysis

* Author: Amy Zhuang
* Last updated: October 2020

## Sentiment Analysis Types


- Lexicon-based
- Cloud API
- Machine Learning Model


## Lexicon Sentiment Analysis Packages

Lexicon sentiment analysis outputs a polarity score of -1 to 1, where -1 represents the extremely negative sentiment and 1 represents the extremely positive sentiment. A value near 0 means neutral sentiment.
- TextBlob is a popular Python library built on top of NLTK. 
- VADER (Valence Aware Dictionary and sEntiment Reasoner) is part of NLTK. 

A key difference between TextBlob and VADER is that VADER is focused on social media. VADER puts a lot of effort into identifying the sentiments of contents typically appear on social media, such as emojis, repetitive words, and exclamation points.


## Cloud API Providers

- Amazon Comprehend: Amazon comprehend service pricing is based on units (100 characters). https://aws.amazon.com/comprehend/pricing/
- Azure Text Analytics API: Azure Text Analytics pricing is based on Text Records, which correspond to 1000 character units. It has different instances to choose from. https://azure.microsoft.com/en-us/pricing/details/cognitive-services/text-analytics/
- Google Natural Language API: Google Natural Language API pricing is based on units, where one unit correspond to 1000 characters. Characters include whitespace characters and any markup characters such as HTML or XML tags. https://cloud.google.com/natural-language/pricing
- IBM Watson Tone Analyzer: IBM Watson Tone Analyzer Pricing is based on API Calls. https://cloud.ibm.com/catalog/services/tone-analyzer

## Comparison

- Cost: Cloud API>Machine Learning>Lexicon
- Level of Effort: Machine Learning>Lexicon>Cloud API
- Accuracy: Machine Learning>Cloud API>Lexicon


## Machine Learning Model

- Predictors: count based vs. embedding based
- Model: bag of words model vs. sequence model

## Hands-on Exercises

### VADER

In [None]:
!pip install vaderSentiment

Collecting vaderSentiment
[?25l  Downloading https://files.pythonhosted.org/packages/76/fc/310e16254683c1ed35eeb97386986d6c00bc29df17ce280aed64d55537e9/vaderSentiment-3.3.2-py2.py3-none-any.whl (125kB)
[K     |██▋                             | 10kB 23.8MB/s eta 0:00:01[K     |█████▏                          | 20kB 26.5MB/s eta 0:00:01[K     |███████▉                        | 30kB 18.4MB/s eta 0:00:01[K     |██████████▍                     | 40kB 15.0MB/s eta 0:00:01[K     |█████████████                   | 51kB 11.9MB/s eta 0:00:01[K     |███████████████▋                | 61kB 11.7MB/s eta 0:00:01[K     |██████████████████▏             | 71kB 11.5MB/s eta 0:00:01[K     |████████████████████▉           | 81kB 11.6MB/s eta 0:00:01[K     |███████████████████████▍        | 92kB 10.7MB/s eta 0:00:01[K     |██████████████████████████      | 102kB 10.3MB/s eta 0:00:01[K     |████████████████████████████▋   | 112kB 10.3MB/s eta 0:00:01[K     |██████████████████████████

In [None]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [None]:
analyser = SentimentIntensityAnalyzer()

In [None]:
def sentiment_analyzer_scores(sentence):
    score = analyser.polarity_scores(sentence)
    print(f"{sentence} {score}")

In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so awesome.")

The Harvard Business Analytics Program is so awesome. {'neg': 0.0, 'neu': 0.572, 'pos': 0.428, 'compound': 0.7384}


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME.")

The Harvard Business Analytics Program is so AWESOME. {'neg': 0.0, 'neu': 0.532, 'pos': 0.468, 'compound': 0.7996}


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME!")

The Harvard Business Analytics Program is so AWESOME! {'neg': 0.0, 'neu': 0.52, 'pos': 0.48, 'compound': 0.8151}


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is soooooooo AWESOME!")

The Harvard Business Analytics Program is soooooooo AWESOME! {'neg': 0.0, 'neu': 0.577, 'pos': 0.423, 'compound': 0.729}


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so so so so AWESOME!")

The Harvard Business Analytics Program is so so so so AWESOME! {'neg': 0.0, 'neu': 0.584, 'pos': 0.416, 'compound': 0.8453}


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME! {'neg': 0.0, 'neu': 0.296, 'pos': 0.704, 'compound': 0.9621}


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!! {'neg': 0.0, 'neu': 0.289, 'pos': 0.711, 'compound': 0.965}


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!👍")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!👍 {'neg': 0.0, 'neu': 0.343, 'pos': 0.657, 'compound': 0.965}


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!🤩")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!🤩 {'neg': 0.0, 'neu': 0.317, 'pos': 0.683, 'compound': 0.965}


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!❤️")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!❤️ {'neg': 0.0, 'neu': 0.343, 'pos': 0.657, 'compound': 0.965}


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!🙂")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!🙂 {'neg': 0.0, 'neu': 0.311, 'pos': 0.689, 'compound': 0.9718}


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😄")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😄 {'neg': 0.0, 'neu': 0.305, 'pos': 0.695, 'compound': 0.977}


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😊")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😊 {'neg': 0.0, 'neu': 0.301, 'pos': 0.699, 'compound': 0.9782}


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😄😄")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😄😄 {'neg': 0.0, 'neu': 0.315, 'pos': 0.685, 'compound': 0.9838}


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😊😊")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😊😊 {'neg': 0.0, 'neu': 0.308, 'pos': 0.692, 'compound': 0.9852}


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😊😊😊😊😊😊😊😊")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😊😊😊😊😊😊😊😊 {'neg': 0.0, 'neu': 0.322, 'pos': 0.678, 'compound': 0.9965}


### TextBlob

In [None]:
!pip install textblob



In [None]:
from textblob import TextBlob
def sentiment_analyzer_scores(sentence):
    analyser = TextBlob(sentence)
    score = analyser.sentiment.polarity
    print(f"{sentence} {str(score)}")

In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so awesome.")

The Harvard Business Analytics Program is so awesome. 1.0


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME.")

The Harvard Business Analytics Program is so AWESOME. 1.0


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME!")

The Harvard Business Analytics Program is so AWESOME! 1.0


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is soooooooo AWESOME!")

The Harvard Business Analytics Program is soooooooo AWESOME! 1.0


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so so so so AWESOME!")

The Harvard Business Analytics Program is so so so so AWESOME! 1.0


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME! 1.0


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!! 1.0


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!👍")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!👍 1.0


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!🤩")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!🤩 1.0


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!❤️")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!❤️ 1.0


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!🙂")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!🙂 1.0


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😄")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😄 1.0


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😊")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😊 1.0


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😄😄")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😄😄 1.0


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😊😊")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😊😊 1.0


In [None]:
sentiment_analyzer_scores("The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😊😊😊😊😊😊😊😊")

The Harvard Business Analytics Program is so AWESOME AWESOME AWESOME!!!😊😊😊😊😊😊😊😊 1.0
