# **Sentiment Analysis - Lab 4**

Sentiment analysis is crucial when attempting to analyze the portion of positives and negatives ratio regarding a specific topic. For example, if you create an application and want to analyze customer's reviews you can label these reviews as positives or negatives using sentiment analysis tools. Several tools exist for such task including but not limited to:


1. TextBlob (better for formal language)
2. Stanza (No slight negative/slight positive)
3. VADER (via NLTK) (better for slang and most consistent)
4. Pattern
5. Flair


TextBlob, Patterm, and VADER are the fastest tools

In [1]:
 !pip install -U textblob
 !pip install vader-sentiment



In [2]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from textblob import TextBlob
import nltk
nltk.download('punkt')



[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

# **Text Blob**

TextBlob returns polarity and subjectivity of a sentence. Polarity lies between [-1,1], -1 defines a negative sentiment and 1 defines a positive sentiment. Negation words reverse the polarity. TextBlob has semantic labels that help with fine-grained analysis. For example — emoticons, exclamation mark, emojis, etc. Subjectivity lies between [0,1]. Subjectivity quantifies the amount of personal opinion and factual information contained in the text. The higher subjectivity means that the text contains personal opinion rather than factual information.

In [3]:
zen = TextBlob("Beautiful is better than ugly. "
                "Explicit is better than implicit. "
             "Cairo is in Egypt")

In [4]:
zen.sentences

[Sentence("Beautiful is better than ugly."),
 Sentence("Explicit is better than implicit."),
 Sentence("Cairo is in Egypt")]

In [5]:
for sen in zen.sentences:
  print(sen.sentiment.polarity,sen.sentiment.subjectivity)
  print(sen.sentiment) 

0.2166666666666667 0.8333333333333334
Sentiment(polarity=0.2166666666666667, subjectivity=0.8333333333333334)
0.5 0.5
Sentiment(polarity=0.5, subjectivity=0.5)
0.0 0.0
Sentiment(polarity=0.0, subjectivity=0.0)


# **VADER**

VADER stands for *Valence Aware Dictionary and sEntiment Reasoner*. It is an NLTK tool that could be used for sentiment analysis. It is especially beneficial when working with emojis and emoticons. It works by mapping the word you pass into it, to lexical features with emotional intensities, which are synoynms to reflect what the word relates to.


Every word in vader has a sentiment score, and the sentence score is formed by the summation of such words. It also pays attention to capitalizations and exclamation marks

In [6]:
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [7]:
vader = SentimentIntensityAnalyzer() 

In [8]:
sample = 'I really love NVIDIA'
vader.polarity_scores(sample)

{'compound': 0.6697, 'neg': 0.0, 'neu': 0.308, 'pos': 0.692}






*   positive sentiment: compound score >= 0.05
*   neutral sentiment: (compound score > -0.05) and (compound score < 0.05)
*   negative sentiment: compound score <= -0.05


NOTE: The compound score is the one most commonly used for sentiment analysis by most researchers, including the authors.

# **Lab Assignment #2**

Apply sentiment analysis on Amazon reviews of a product

In [9]:
import numpy as np
import pandas as pd
df = pd.read_csv('amazonreviews.tsv', sep='\t')
df.head()

Unnamed: 0,label,review
0,pos,Stuning even for the non-gamer: This sound tra...
1,pos,The best soundtrack ever to anything.: I'm rea...
2,pos,Amazing!: This soundtrack is my favorite music...
3,pos,Excellent Soundtrack: I truly like this soundt...
4,pos,"Remember, Pull Your Jaw Off The Floor After He..."


To access a certain column of a dataframe

In [10]:
df['review']

0       Stuning even for the non-gamer: This sound tra...
1       The best soundtrack ever to anything.: I'm rea...
2       Amazing!: This soundtrack is my favorite music...
3       Excellent Soundtrack: I truly like this soundt...
4       Remember, Pull Your Jaw Off The Floor After He...
                              ...                        
9995    A revelation of life in small town America in ...
9996    Great biography of a very interesting journali...
9997    Interesting Subject; Poor Presentation: You'd ...
9998    Don't buy: The box looked used and it is obvio...
9999    Beautiful Pen and Fast Delivery.: The pen was ...
Name: review, Length: 10000, dtype: object

In [11]:
df.dropna(inplace=True)

In [12]:
df

Unnamed: 0,label,review
0,pos,Stuning even for the non-gamer: This sound tra...
1,pos,The best soundtrack ever to anything.: I'm rea...
2,pos,Amazing!: This soundtrack is my favorite music...
3,pos,Excellent Soundtrack: I truly like this soundt...
4,pos,"Remember, Pull Your Jaw Off The Floor After He..."
...,...,...
9995,pos,A revelation of life in small town America in ...
9996,pos,Great biography of a very interesting journali...
9997,neg,Interesting Subject; Poor Presentation: You'd ...
9998,neg,Don't buy: The box looked used and it is obvio...


**A.** Use the `apply lambda` function to apply vader `polarity_scores` on the `review` column. Store the result in a new column called `scores`.

In [13]:
#Your answer here

**B.** Extract only the *compound* score and save it in a new column 

Hint: Use lambda function to extract *compound* score

In [14]:
#Your answer here

**C.** Create a new column such that if the compound value is >= the column should contain "pos" as a value, otherwise "neg"

In [15]:
#Your answer here

**References**

https://realpython.com/python-nltk-sentiment-analysis/



https://medium.com/geekculture/what-nlp-library-you-should-use-for-your-sentimental-analysis-project-bef6b357a6db




