We are trying two sentiment analysis solutions for Python. The first is TextBlob, and the second is going to be Vader Sentiment.

### We start with TextBlob

With TextBlob, we get a polarity and a subjectivity metric. The polarity is the sentiment itself, ranging from a -1 to a +1. The subjectivity is a measure of the sentiment being objective to subjective, and goes from 0 to 1.

In [1]:
from textblob import TextBlob

In [4]:
analysis = TextBlob('TextBlob sure looks like it has some interesting features!')

In [6]:
#dir(analysis)

In [7]:
analysis.translate(to = 'es')

TextBlob("¡TextBlob seguro parece tener algunas características interesantes!")

In [8]:
analysis.translate(to = 'fr')

TextBlob("TextBlob semble avoir des fonctionnalités intéressantes!")

In [9]:
analysis.tags

[('TextBlob', 'NNP'),
 ('sure', 'JJ'),
 ('looks', 'VBZ'),
 ('like', 'IN'),
 ('it', 'PRP'),
 ('has', 'VBZ'),
 ('some', 'DT'),
 ('interesting', 'JJ'),
 ('features', 'NNS')]

In [10]:
analysis.sentiment

Sentiment(polarity=0.5625, subjectivity=0.6944444444444444)

So this sentence is fairly positive, but also highly subjective.

In [34]:
# Now, let's test this on a bit more data using the positive.txt and negative.txt datasets.

pos_count = 0
pos_correct = 0
with open('positive.txt', 'r') as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)
        if analysis.sentiment.polarity > 0:
            pos_correct += 1
        pos_count += 1

neg_count = 0
neg_correct = 0

with open('negative.txt', 'r') as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)
        if analysis.sentiment.polarity <= 0:
            neg_correct += 1
        neg_count += 1
        
# with > 0 and <= 0

print('Positive accuracy = {}% via {} samples'.format(pos_correct/pos_count*100.0, pos_count))
print('Negative accuracy = {}% via {} samples'.format(neg_correct/neg_count*100.0, neg_count))

Positive accuracy = 71.11236165822548% via 5331 samples
Negative accuracy = 55.861939598574374% via 5331 samples


It looks like our positive accuracy is decent, but the negative sentiment accuracy isn't all that good. It could be the case that this classifier is biased across the board, so our "zero" could be moved a bit, let's say 0.2, so we change:

In [32]:
# What if we play with the subjectivity though? Maybe we can only look at reviews that we feel are more objective?

pos_count = 0
pos_correct = 0
with open('positive.txt', 'r') as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)
        
        if analysis.sentiment.subjectivity < 0.3:
            if analysis.sentiment.polarity > 0.1:
                pos_correct += 1
            pos_count += 1

neg_count = 0
neg_correct = 0

with open('negative.txt', 'r') as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)
        
        if analysis.sentiment.subjectivity < 0.3:
            if analysis.sentiment.polarity <= 0.1:
                neg_correct += 1
            neg_count += 1
        
# with > 0 and <= 0

print('Positive accuracy = {}% via {} samples'.format(pos_correct/pos_count*100.0, pos_count))
print('Negative accuracy = {}% via {} samples'.format(neg_correct/neg_count*100.0, neg_count))

Positive accuracy = 18.574297188755022% via 996 samples
Negative accuracy = 87.27969348659003% via 1305 samples


In [33]:
# What if we flip things around and require a high degree of subjectivity?

pos_count = 0
pos_correct = 0
with open('positive.txt', 'r') as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)
        
        if analysis.sentiment.subjectivity < 0.9:
            if analysis.sentiment.polarity > 0.1:
                pos_correct += 1
            pos_count += 1

neg_count = 0
neg_correct = 0

with open('negative.txt', 'r') as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)
        
        if analysis.sentiment.subjectivity < 0.9:
            if analysis.sentiment.polarity <= 0.1:
                neg_correct += 1
            neg_count += 1
        
# with > 0 and <= 0

print('Positive accuracy = {}% via {} samples'.format(pos_correct/pos_count*100.0, pos_count))
print('Negative accuracy = {}% via {} samples'.format(neg_correct/neg_count*100.0, neg_count))

Positive accuracy = 59.263092527427034% via 4831 samples
Negative accuracy = 68.27292974286293% via 4939 samples


### Let us try VADER Sentiment and see if it's better than TextBlob

In [35]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [36]:
analyzer = SentimentIntensityAnalyzer()
vs = analyzer.polarity_scores('VADER Sentiment looks interesting, I have high hopes!')
print(vs)

{'neg': 0.0, 'neu': 0.463, 'pos': 0.537, 'compound': 0.6996}


Vader documentation suggests:
- positive sentiment: compound score >= 0.5
- neutral sentiment: (compound score > -0.5) and (compound score < 0.5)
- negative sentiment: compound score <= -0.5

In [39]:
# to properly compare, we should just start with 0.

pos_count = 0
pos_correct = 0

with open('positive.txt', 'r') as f:
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)
        if vs ['compound'] > 0:
            pos_correct += 1
        pos_count += 1


neg_count = 0
neg_correct = 0

with open('negative.txt', 'r') as f:
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)
        if vs['compound'] <= 0:
            neg_correct += 1
        neg_count += 1

print('Positive accuracy = {}% via {} samples'.format(pos_correct/pos_count*100.0, pos_count))
print('Negative accuracy = {}% via {} samples'.format(neg_correct/neg_count*100.0, neg_count))

Positive accuracy = 69.44288126055149% via 5331 samples
Negative accuracy = 57.75651847683362% via 5331 samples


Okay, now let's go with the 0.5 and -0.5 as suggested by the documentation:

In [41]:
pos_count = 0
pos_correct = 0

threshold = 0.5

with open('positive.txt', 'r') as f:
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)
        
        if vs['compound'] >= threshold or vs['compound'] <= -threshold:
            if vs ['compound'] > 0:
                pos_correct += 1
            pos_count += 1


neg_count = 0
neg_correct = 0

with open('negative.txt', 'r') as f:
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)
        
        if vs['compound'] >= threshold or vs['compound'] <= -threshold:
            if vs['compound'] <= -0.5:
                neg_correct += 1
            neg_count += 1

print('Positive accuracy = {}% via {} samples'.format(pos_correct/pos_count*100.0, pos_count))
print('Negative accuracy = {}% via {} samples'.format(neg_correct/neg_count*100.0, neg_count))

Positive accuracy = 87.22179585571757% via 2606 samples
Negative accuracy = 50.0% via 1818 samples


We used a lot of samples here, and we aren't doing much different than TextBlob. Should we give up? Maybe, but what if we instead look for no conflict. So, what if we look only for signals where the opposite is lower, or non-existent? For example, to classify something as positive here, why not require the neg bit to be less than 0.1 or something like:

In [42]:
import time

In [43]:
pos_count = 0
pos_correct = 0

with open("positive.txt","r") as f:
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)
        if not vs['neg'] > 0.1:
            if vs['pos']-vs['neg'] > 0:
                pos_correct += 1
            pos_count +=1


neg_count = 0
neg_correct = 0

with open("negative.txt","r") as f:
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)
        if not vs['pos'] > 0.1:
            if vs['pos']-vs['neg'] <= 0:
                neg_correct += 1
            neg_count +=1

print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count))
print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))

Positive accuracy = 80.71428571428572% via 3920 samples
Negative accuracy = 91.73643975245722% via 2747 samples


Recall the suggestion about -0.5 to 0.5 being "neutral" with VADER? What if we tried this with the TextBlob?

In [44]:
pos_count = 0
pos_correct = 0

with open("positive.txt","r") as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)

        if analysis.sentiment.polarity >= 0.5:
            if analysis.sentiment.polarity > 0:
                pos_correct += 1
            pos_count +=1


neg_count = 0
neg_correct = 0

with open("negative.txt","r") as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)
        if analysis.sentiment.polarity <= -0.5:
            if analysis.sentiment.polarity <= 0:
                neg_correct += 1
            neg_count +=1

print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count))
print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))

Positive accuracy = 100.0% via 766 samples
Negative accuracy = 100.0% via 282 samples


### Part 2 is Sentiment Analysis GUI with Dash and Python