Social Media Sentiment Analysis In Python With VADER
(https://zoumanakeita.medium.com/)


In [None]:
import nltk
nltk.download("vader_lexicon")

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


True

In [None]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sent_analyzer = SentimentIntensityAnalyzer()

# Positive sentence example
sentence = "VADER is pretty good  at identifying the underlying sentiment of a text!"
print(sent_analyzer.polarity_scores(sentence))

{'neg': 0.0, 'neu': 0.553, 'pos': 0.447, 'compound': 0.8057}


In [None]:
# Negative sentence example
sentence = "I do HATE those fake news on internet!!😡"
print(sent_analyzer.polarity_scores(sentence))

{'neg': 0.619, 'neu': 0.381, 'pos': 0.0, 'compound': -0.8449}


In [None]:
import pandas as pd
data_url = "https://raw.githubusercontent.com/keitazoumana/VADER_sentiment-Analysis/main/data/testdata.manual.2009.06.14.csv"
sentiment_data = pd.read_csv(data_url)

sentiment_data.head(10)

Unnamed: 0,4,3,Mon May 11 03:17:40 UTC 2009,kindle2,tpryan,"@stellargirl I loooooooovvvvvveee my Kindle2. Not that the DX is cool, but the 2 is fantastic in its own right."
0,4,4,Mon May 11 03:18:03 UTC 2009,kindle2,vcu451,Reading my kindle2... Love it... Lee childs i...
1,4,5,Mon May 11 03:18:54 UTC 2009,kindle2,chadfu,"Ok, first assesment of the #kindle2 ...it fuck..."
2,4,6,Mon May 11 03:19:04 UTC 2009,kindle2,SIX15,@kenburbary You'll love your Kindle2. I've had...
3,4,7,Mon May 11 03:21:41 UTC 2009,kindle2,yamarama,@mikefish Fair enough. But i have the Kindle2...
4,4,8,Mon May 11 03:22:00 UTC 2009,kindle2,GeorgeVHulme,@richardebaker no. it is too big. I'm quite ha...
5,0,9,Mon May 11 03:22:30 UTC 2009,aig,Seth937,Fuck this economy. I hate aig and their non lo...
6,4,10,Mon May 11 03:26:10 UTC 2009,jquery,dcostalis,Jquery is my new best friend.
7,4,11,Mon May 11 03:27:15 UTC 2009,twitter,PJ_King,Loves twitter
8,4,12,Mon May 11 03:29:20 UTC 2009,obama,mandanicole,how can you not love Obama? he makes jokes abo...
9,2,13,Mon May 11 03:32:42 UTC 2009,obama,jpeb,Check this video out -- President Obama at the...


In [None]:
def format_data(data):

  last_col = str(data.columns[-1])
  first_col = str(data.columns[0])

  data.rename(columns = {last_col: 'tweet_text', first_col: 'polarity'}, inplace=True) 

  # Change 0, 2, 4 to negative, neutral and positive
  labels = {0: 'negative', 2: 'neutral', 4: 'positive'}
  data['polarity'] = data['polarity'].map(labels)

  # Get only the two columns
  return data[['tweet_text', 'polarity']]

In [None]:
data = format_data(sentiment_data)
data.head(10)

Unnamed: 0,tweet_text,polarity
0,Reading my kindle2... Love it... Lee childs i...,positive
1,"Ok, first assesment of the #kindle2 ...it fuck...",positive
2,@kenburbary You'll love your Kindle2. I've had...,positive
3,@mikefish Fair enough. But i have the Kindle2...,positive
4,@richardebaker no. it is too big. I'm quite ha...,positive
5,Fuck this economy. I hate aig and their non lo...,negative
6,Jquery is my new best friend.,positive
7,Loves twitter,positive
8,how can you not love Obama? he makes jokes abo...,positive
9,Check this video out -- President Obama at the...,neutral


In [None]:
def format_output(output_dict):
  
  polarity = "neutral"

  if(output_dict['compound']>= 0.05):
    polarity = "positive"

  elif(output_dict['compound']<= -0.05):
    polarity = "negative"

  return polarity

def predict_sentiment(text):
  
  output_dict =  sent_analyzer.polarity_scores(text)
  return format_output(output_dict)

In [None]:
data["vader_prediction"] = data["tweet_text"].apply(predict_sentiment)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [None]:
data.sample(10)

Unnamed: 0,tweet_text,polarity,vader_prediction
170,NCAA Baseball Super Regional - Rams Club http:...,neutral,positive
220,@KarrisFoxy If you're being harassed by calls ...,negative,negative
360,saw night at the museum out of sheer desperati...,negative,negative
88,Why the hell is Pelosi in freakin China? and o...,negative,negative
377,Having the old Coca-Cola guy on the GM board i...,negative,negative
177,"@XPhile1908 I have three words for you: ""Safew...",neutral,neutral
78,Took the Graduate Field Exam for Computer Scie...,negative,negative
242,Obama is quite a good comedian! check out his ...,positive,positive
274,Lyx is cool.,positive,positive
480,@Iheartseverus we love you too and don't want ...,negative,negative


In [None]:
from sklearn.metrics import accuracy_score, classification_report

accuracy = accuracy_score(data['polarity'], data['vader_prediction'])

print(f"Accuracy: {accuracy}\n")

# Show the classification report
print(classification_report(data['polarity'], data['vader_prediction']))

Accuracy: 0.716297786720322

              precision    recall  f1-score   support

    negative       0.84      0.64      0.72       177
     neutral       0.67      0.70      0.68       139
    positive       0.67      0.81      0.73       181

    accuracy                           0.72       497
   macro avg       0.73      0.71      0.71       497
weighted avg       0.73      0.72      0.72       497

