## Sentiment Analysis

Sentiment analysis falls under the heading of text classification and is a use case of natural language processing (NLP). Simply described, sentiment analysis includes categorizing a text into several emotions, such as happy or sad, neutral, or happy or sad. Determining the underlying tone, emotion, or sentiment of a document is the ultimate goal of sentiment analysis. Another name for this is opinion mining.

In [2]:
import warnings
warnings.filterwarnings('ignore')
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import pandas as pd
from sklearn.metrics import accuracy_score, classification_report

### VADER

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a rule-based sentiment analyzer that has been trained on social media text. Just like Text Blob, its usage in Python is pretty simple.

In [3]:
sentiment = SentimentIntensityAnalyzer()

In [4]:
text = 'the book was perfect balance between writting style and plot'

In [5]:
sent = sentiment.polarity_scores(text)
sent

{'neg': 0.0, 'neu': 0.709, 'pos': 0.291, 'compound': 0.5719}

In this analysis, VADER analyzed 'text' and gave us a result of negative, neutral, positive, and compound values.

### NLTK

The Natural Language Toolkit (NLTK) is a Python library used for working with human language data. Widely used in the field of Natural Language Processing (NLP), NLTK provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing and semantic reasoning.

In [6]:
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [11]:
sentiment_nltk = SentimentIntensityAnalyzer()

In [12]:
sentiment = sentiment_nltk.polarity_scores(text)
sentiment

{'neg': 0.0, 'neu': 0.709, 'pos': 0.291, 'compound': 0.5719}

### VADER on Large Dataset

In [25]:
url="https://raw.githubusercontent.com/keitazoumana/VADER_sentiment-Analysis/main/data/testdata.manual.2009.06.14.csv"
data = pd.read_csv(url)

In [14]:
data.head()

Unnamed: 0,4,3,Mon May 11 03:17:40 UTC 2009,kindle2,tpryan,"@stellargirl I loooooooovvvvvveee my Kindle2. Not that the DX is cool, but the 2 is fantastic in its own right."
0,4,4,Mon May 11 03:18:03 UTC 2009,kindle2,vcu451,Reading my kindle2... Love it... Lee childs i...
1,4,5,Mon May 11 03:18:54 UTC 2009,kindle2,chadfu,"Ok, first assesment of the #kindle2 ...it fuck..."
2,4,6,Mon May 11 03:19:04 UTC 2009,kindle2,SIX15,@kenburbary You'll love your Kindle2. I've had...
3,4,7,Mon May 11 03:21:41 UTC 2009,kindle2,yamarama,@mikefish Fair enough. But i have the Kindle2...
4,4,8,Mon May 11 03:22:00 UTC 2009,kindle2,GeorgeVHulme,@richardebaker no. it is too big. I'm quite ha...


Next function drop unnecessary columns, and create labels (positive, neutral, and negative).

In [26]:
def format_data(data):
    last_col= data.columns[-1]
    first_col= data.columns[0]

    data.rename(columns = {last_col: 'tweet', first_col: 'polarity'}, inplace=True)

    #label polarity
    labels = {0:'negative', 2:'neutral', 4: 'positive'}
    data['polarity'] = data['polarity'].map(labels)

    #get only the two columns
    return data[['tweet','polarity']]

In [27]:
data= format_data(data)

In [28]:
data.head()

Unnamed: 0,tweet,polarity
0,Reading my kindle2... Love it... Lee childs i...,positive
1,"Ok, first assesment of the #kindle2 ...it fuck...",positive
2,@kenburbary You'll love your Kindle2. I've had...,positive
3,@mikefish Fair enough. But i have the Kindle2...,positive
4,@richardebaker no. it is too big. I'm quite ha...,positive


In [29]:
def format_output(output_dict):
    polarity = 'neutral'

    if (output_dict['compound'] >= 0.05):
        polarity='positive'
    elif(output_dict['compound']<=-0.05):
        polarity='negative'
    return polarity
    

In [32]:
def predict_sentiment(text):
    output_dict = sentiment_nltk.polarity_scores(text)
    return format_output(output_dict)

In [33]:
data['vader_prediction']=data['tweet'].apply(predict_sentiment)

In [34]:
data.sample(5)

Unnamed: 0,tweet,polarity,vader_prediction
412,Tom Shanahan's latest column on SDSU and its N...,neutral,neutral
53,annoying new trend on the internets: people p...,negative,negative
436,@evelynbyrne have you tried Nike ? V. addictive.,positive,neutral
311,@Fraggle312 oh those are awesome! i so wish th...,negative,positive
427,eating breakfast and then school,neutral,neutral


### VADER Performance

In [35]:
accuracy = accuracy_score(data['polarity'], data['vader_prediction'])

In [36]:
accuracy

0.716297786720322

In [38]:
#report
report=classification_report(data['polarity'],data['vader_prediction'])
print(report)

              precision    recall  f1-score   support

    negative       0.84      0.64      0.72       177
     neutral       0.67      0.70      0.68       139
    positive       0.67      0.81      0.73       181

    accuracy                           0.72       497
   macro avg       0.73      0.71      0.71       497
weighted avg       0.73      0.72      0.72       497

