<a href="https://colab.research.google.com/github/gsvn32/COMP_SCI-5530-Principles-of-Data-Science/blob/main/Week7_SentimentAnalysis_VADER.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Sentiment analysis without labels with VADER**

In this ICP we will quickly learn about how to perform sentiment analysis wihtout labels (negative,positive, and neutral)

VADER (Valence Aware Dictionary and Sentiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. VADER uses a combination of A sentiment lexicon is a list of lexical features (e.g., words) which are generally labelled according to their semantic orientation as either positive or negative.

VADER is a model used for text sentiment analysis that is sensitive to both polarity (positive/negative) and intensity (strength) of emotion. It can be applied directly to unlabeled text data.

VADER sentimental analysis relies on a dictionary that maps lexical features to emotion intensities known as sentiment scores. The sentiment score of a text can be obtained by summing up the intensity of each word in the text.
For example- Words like ‘love’, ‘enjoy’, ‘happy’, ‘like’ all convey a positive sentiment. Also VADER is intelligent enough to understand the basic context of these words, such as “did not love” as a negative statement. It also understands the emphasis of capitalization and punctuation, such as “ENJOY”

**Polarity classification**

VADER is mainly used for polarity classification, in this approach We won’t try to determine if a sentence is objective or subjective, fact or opinion. Rather, we care only if the text expresses a positive, negative or neutral opinion.

VADER has been found to be quite successful when dealing with social media texts, NY Times editorials, movie reviews, and product reviews. This is because VADER not only tells about the Positivity and Negativity score but also tells us about how positive or negative a sentiment is.
It is fully open-sourced under the MIT License.

**Advantages of using VADER**

VADER has a lot of advantages over traditional methods of Sentiment Analysis, including:

>It works exceedingly well on social media type text, yet readily generalizes to multiple domains

>It doesn’t require any training data but is constructed from a generalizable, valence-based, human-curated gold standard sentiment lexicon

>It is fast enough to be used online with streaming data, and

>It does not severely suffer from a speed-performance tradeoff.

(Note: The source of this information is the paper published by creaters of VADER library. You can read the paper here [VADER paper](http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf)

Lets begin our ICP by installing the library.

In [1]:
!pip install vaderSentiment

Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m126.0/126.0 kB[0m [31m1.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: vaderSentiment
Successfully installed vaderSentiment-3.3.2


Once VADER is installed let us call the SentimentIntensityAnalyser object

In [2]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyser = SentimentIntensityAnalyzer()

**Working & Scoring**

Let us test our first sentiment using VADER now. We will use the polarity_scores() method to obtain the polarity indices for the given sentence.

In [3]:
def sentiment_analyzer_scores(sentence):
    score = analyser.polarity_scores(sentence)
    print("{:-<4} {}".format(sentence, str(score)))

Let us check how VADER performs on a given review:

In [4]:
sentiment_analyzer_scores("This is such a good problem to worry.")

This is such a good problem to worry. {'neg': 0.415, 'neu': 0.37, 'pos': 0.215, 'compound': -0.4019}


The Positive, Negative and Neutral scores represent the proportion of text that falls in these categories. This means our sentence was rated as 22% Positive, 37% Neutral and 42% Negative. Hence all these should add up to 1.

The Compound score is a metric that calculates the sum of all the lexicon ratings which have been normalized between -1(most extreme negative) and +1 (most extreme positive). In the case above, the compound score turns out to be -0.4019 , denoting a very negative sentiment.

**Compund Score metric**
>Positive sentiment (Compund score >=0.05)

>Neutral sentiment (Compund score > - 0.05) and (Compund score < 0.05)

>Negative sentiment (Compund score <= -0.05)


read [here](https://github.com/cjhutto/vaderSentiment#about-the-scoring) for more details on VADER scoring methodology.

VADER analyses sentiments primarily based on certain key points:

>Punctuation: The use of an exclamation mark(!), increases the magnitude of the intensity without modifying the semantic orientation. For example, “The food here is good!” is more intense than “The food here is good.” and an increase in the number of (!), increases the magnitude accordingly.

In [5]:
# Baselinse Sentence
sentiment_analyzer_scores("The food here is good")

The food here is good {'neg': 0.0, 'neu': 0.58, 'pos': 0.42, 'compound': 0.4404}


lets see the results with punctiuations. Notice how the overall compound score is increasing with the increase in exclamation marks.

In [25]:
# Punctuation
print(sentiment_analyzer_scores("The food here is good!"))
print(sentiment_analyzer_scores("The food here is good!!"))
print(sentiment_analyzer_scores("The food here is good!!!"))
print(sentiment_analyzer_scores("The food here is good!!!!"))


The food here is good! {'neg': 0.0, 'neu': 0.556, 'pos': 0.444, 'compound': 0.4926}
None
The food here is good!! {'neg': 0.0, 'neu': 0.534, 'pos': 0.466, 'compound': 0.5399}
None
The food here is good!!! {'neg': 0.0, 'neu': 0.514, 'pos': 0.486, 'compound': 0.5826}
None
The food here is good!!!! {'neg': 0.0, 'neu': 0.496, 'pos': 0.504, 'compound': 0.6209}
None


>Capitalization: Using upper case letters to emphasize a sentiment-relevant word in the presence of other non-capitalized words, increases the magnitude of the sentiment intensity. For example, “The food here is GREAT!” conveys more intensity than “The food here is great!”

In [7]:
# Baselinse Sentence
sentiment_analyzer_scores("The food here is great!")

The food here is great! {'neg': 0.0, 'neu': 0.477, 'pos': 0.523, 'compound': 0.6588}


lets use capitalization. Notice how the overall compound score increased

In [8]:
# using capitol word
sentiment_analyzer_scores("The food here is GREAT!")

The food here is GREAT! {'neg': 0.0, 'neu': 0.438, 'pos': 0.562, 'compound': 0.729}


>Degree modifiers: Also called intensifiers, they impact the sentiment intensity by either increasing or decreasing the intensity. For example, “The service here is extremely good” is more intense than “The service here is good”, whereas “The service here is marginally good” reduces the intensity.

In [9]:
print(sentiment_analyzer_scores("The service here is good"))
print(sentiment_analyzer_scores("The service here is extremely good"))
print(sentiment_analyzer_scores("The service here is marginally good"))

The service here is good {'neg': 0.0, 'neu': 0.58, 'pos': 0.42, 'compound': 0.4404}
None
The service here is extremely good {'neg': 0.0, 'neu': 0.61, 'pos': 0.39, 'compound': 0.4927}
None
The service here is marginally good {'neg': 0.0, 'neu': 0.657, 'pos': 0.343, 'compound': 0.3832}
None


>Conjunctions: Use of conjunctions like “but” signals a shift in sentiment polarity, with the sentiment of the text following the conjunction being dominant. “The food here is great, but the service is horrible” has mixed sentiment, with the latter half dictating the overall rating.

In [10]:
# Comjunctions
sentiment_analyzer_scores('The food here is great, but the service is horrible')

The food here is great, but the service is horrible {'neg': 0.31, 'neu': 0.523, 'pos': 0.167, 'compound': -0.4939}


**Handling Emojis, Slangs, and Emoticons.**

VADER performs very well with emojis, slangs, and emoticons in sentences. Let us see each with an example.

In [11]:
#Emojis
print(sentiment_analyzer_scores('I am 😄 today'))
print(sentiment_analyzer_scores('😊'))
print(sentiment_analyzer_scores('😥'))
print(sentiment_analyzer_scores('☹️'))

I am 😄 today {'neg': 0.0, 'neu': 0.522, 'pos': 0.478, 'compound': 0.6705}
None
😊--- {'neg': 0.0, 'neu': 0.333, 'pos': 0.667, 'compound': 0.7184}
None
😥--- {'neg': 0.275, 'neu': 0.268, 'pos': 0.456, 'compound': 0.3291}
None
☹️-- {'neg': 0.706, 'neu': 0.294, 'pos': 0.0, 'compound': -0.34}
None


In [12]:
# Slangs
print(sentiment_analyzer_scores("Today SUX!"))
print(sentiment_analyzer_scores("Today only kinda sux! But I'll get by, lol"))

Today SUX! {'neg': 0.779, 'neu': 0.221, 'pos': 0.0, 'compound': -0.5461}
None
Today only kinda sux! But I'll get by, lol {'neg': 0.127, 'neu': 0.556, 'pos': 0.317, 'compound': 0.5249}
None


In [13]:
# Emoticons
print(sentiment_analyzer_scores("Make sure you :) or :D today!"))

Make sure you :) or :D today! {'neg': 0.0, 'neu': 0.294, 'pos': 0.706, 'compound': 0.8633}
None


VADER can easily detect sentiment from emojis and slangs which form an important component of the social media environment.

**What else we can do with VADER?**

We can use VADER to generate labels for data (that doesn't have any labels) and use traditional supervised models for sentiment analysis (ICP today). This is one of the many things we can do with this model.

#ICP5
- Get the unlabled data
- define a analyser_sentence function that lables the given sentence
- Label all the unlabled data
- write the results to a csv file

In [26]:
import csv

# List of sentences
sentences = [
    "The weather today is sunny.",
    "I love eating pizza!!",
    "He is studying for his exams.",
    "The concert was amazing!",
    "She goes to the gym every morning.",
    "This movie is really boring.",
    "They are planning a trip to Japan.",
    "I finished reading the book.",
    "The cake tastes delicious.",
    "We went hiking last weekend.",
    "She is learning to play the piano.",
    "The traffic was TERRIBLE this morning.",
    "He bought a new car.",
    "The lecture was very informative.",
    "They are organizing a charity event."
]

# Label  sentence based on compound caclulation
def analyser_sentence(sentence):
  sentiment=analyser.polarity_scores(sentence)['compound']
  if sentiment>=0.05:
    return "positive"
  if sentiment<0.05 and sentiment>-0.05:
    return "neutral"
  if sentiment<=-0.05:
    return "negative"

# Label the data
label_data = [(sentence,analyser_sentence(sentence)) for sentence in sentences]
print(label_data)
# Write labled data in a CSV file
with open('labeled_sentences.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Sentence', 'Label'])
    writer.writerows(label_data)

print("\nSuceesfully stored sentences to file labeled_sentences.csv\n")

[('The weather today is sunny.', 'positive'), ('I love eating pizza!!', 'positive'), ('He is studying for his exams.', 'neutral'), ('The concert was amazing!', 'positive'), ('She goes to the gym every morning.', 'neutral'), ('This movie is really boring.', 'negative'), ('They are planning a trip to Japan.', 'neutral'), ('I finished reading the book.', 'neutral'), ('The cake tastes delicious.', 'positive'), ('We went hiking last weekend.', 'neutral'), ('She is learning to play the piano.', 'positive'), ('The traffic was TERRIBLE this morning.', 'negative'), ('He bought a new car.', 'neutral'), ('The lecture was very informative.', 'neutral'), ('They are organizing a charity event.', 'positive')]

Suceesfully stored sentences to file labeled_sentences.csv

