# sentiment analysis using NLTK

https://investigate.ai/investigating-sentiment-analysis/comparing-sentiment-analysis-tools/


a demo of sentiment analyses

|technique | word source | word selection| scores|
|-|-|-|-|
|NLTK (VADER)|everywhere|hand-picked|internet people, word-by-word|
|TextBlob|product reviews|hand-picked, mostly adjectives|internet people, word-by-word|
|TextBlob + NaiveBayesAnalyzer|movie reviews|all words|automatic based on score|

In [1]:
import nltk

In [2]:
nltk.download('vader_lexicon')
nltk.download('movie_reviews')
nltk.download('punkt')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\emiel\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
[nltk_data] Downloading package movie_reviews to
[nltk_data]     C:\Users\emiel\AppData\Roaming\nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\emiel\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [3]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA

sia = SIA()
print(sia.polarity_scores("The corona vaccine contains dangerous chemicals"))
print(sia.polarity_scores("The corona vaccine contains only safe ingredients and is harmless"))
print(sia.polarity_scores("The corona vaccine contains a compound called Formaldehyde"))

{'neg': 0.383, 'neu': 0.617, 'pos': 0.0, 'compound': -0.4767}
{'neg': 0.0, 'neu': 0.62, 'pos': 0.38, 'compound': 0.5994}
{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}


In [4]:
from textblob import TextBlob
from textblob import Blobber
from textblob.sentiments import NaiveBayesAnalyzer

In [5]:
print((TextBlob("The corona vaccine contains dangerous chemicals")).sentiment)
print((TextBlob("The corona vaccine contains only safe ingredients and is harmless")).sentiment)
print((TextBlob("The corona vaccine contains a compound called Formaldehyde")).sentiment)

Sentiment(polarity=-0.6, subjectivity=0.9)
Sentiment(polarity=0.25, subjectivity=0.75)
Sentiment(polarity=0.0, subjectivity=0.0)


In [6]:
posneg = Blobber(analyzer=NaiveBayesAnalyzer())

print((posneg("The corona vaccine contains dangerous chemicals")).sentiment)
print((posneg("The corona vaccine contains only safe ingredients and is harmless")).sentiment)
print((posneg("The corona vaccine contains a compound called Formaldehyde")).sentiment)

Sentiment(classification='neg', p_pos=0.26831221732575683, p_neg=0.7316877826742444)
Sentiment(classification='pos', p_pos=0.6241247363042428, p_neg=0.3758752636957584)
Sentiment(classification='neg', p_pos=0.40799306089467274, p_neg=0.5920069391053251)


In [7]:
import pandas as pd
pd.set_option("display.max_colwidth", 200)

df = pd.DataFrame({'content': [
    "I love love love love this kitten",
    "I hate hate hate hate this keyboard",
    "I'm not sure how I feel about toast",
    "Did you see the baseball game yesterday?",
    "The package was delivered late and the contents were broken",
    "Trashy television shows are some of my favorites",
    "I'm seeing a Kubrick film tomorrow, I hear not so great things about it.",
    "I find chirping birds irritating, but I know I'm not the only one",
    "--------------------------------------",
    "The corona vaccine contains dangerous chemicals",
    "The corona vaccine contains only safe ingredients and is harmless",
    "The corona vaccine contains a compound called Formaldehyde",
]})
df

Unnamed: 0,content
0,I love love love love this kitten
1,I hate hate hate hate this keyboard
2,I'm not sure how I feel about toast
3,Did you see the baseball game yesterday?
4,The package was delivered late and the contents were broken
5,Trashy television shows are some of my favorites
6,"I'm seeing a Kubrick film tomorrow, I hear not so great things about it."
7,"I find chirping birds irritating, but I know I'm not the only one"
8,--------------------------------------
9,The corona vaccine contains dangerous chemicals


In [8]:
def get_scores(content):
    blob = TextBlob(content)
    nb_blob = posneg(content)
    sia_scores = sia.polarity_scores(content)
    
    return pd.Series({
        'content': content,
        'textblob': blob.sentiment.polarity,
        'textblob_bayes': nb_blob.sentiment.p_pos - nb_blob.sentiment.p_neg,
        'nltk': sia_scores['compound'],
    })

scores = df.content.apply(get_scores)
scores.style.background_gradient(cmap='RdYlGn', axis=None, low=0.4, high=0.4)

Unnamed: 0,content,textblob,textblob_bayes,nltk
0,I love love love love this kitten,0.5,-0.087933,0.9571
1,I hate hate hate hate this keyboard,-0.8,-0.214151,-0.9413
2,I'm not sure how I feel about toast,-0.25,0.394659,-0.2411
3,Did you see the baseball game yesterday?,-0.4,0.61305,0.0
4,The package was delivered late and the contents were broken,-0.35,-0.57427,-0.4767
5,Trashy television shows are some of my favorites,0.0,0.040076,0.4215
6,"I'm seeing a Kubrick film tomorrow, I hear not so great things about it.",0.8,0.717875,-0.6296
7,"I find chirping birds irritating, but I know I'm not the only one",-0.2,0.257148,-0.25
8,--------------------------------------,0.0,0.0,0.0
9,The corona vaccine contains dangerous chemicals,-0.6,-0.463376,-0.4767


In [None]:
# TODO try default sentiment parser tool, some others, build one myself
# TODO line by line sentiment parser (-1,1)
# TODO article sentiment (class?)
    # list of sentence sentiments in order
    # article text
    # amount of lines
    # statistical info for learning/model (args kwargs?)
    # defs to:
        # def to analyse sentiments (input parser)
        # map sentence to score and back
        # set stat functions

# TODO visualise sentiment throughout article

# compare sentiment line of different articles, find average ones for fake or no

# TODO article sentiment.statistics as input for Weka
# Combine article sentiment data with other content cues for better everything