# SENTIMENT ANALYSIS with TextBlob

Let's define the subjectivity score to be the share of positive/negative words
- subjectivity = (# of pos/neg words) / (# words)
- If there is no emotional keyword, then subj = 0
- If all words are emotional, then subj = 1

Then let's define the polarity score to be as following:
- polarity = (# pos words - #neg words) / (# words)
- If all words are postive, then pol = 1
- If all words are negative, then pol = -1
- If # pos words == # neg words, then pol = 0

Web resources:
- https://textblob.readthedocs.io/en/dev/
- http://textblob.readthedocs.io/en/dev/quickstart.html

## (1) Install textblob

In [None]:
!pip install textblob

## (2) Create an instance object of TextBlob

In [None]:
from textblob import TextBlob

In [None]:
s = 'DSA8640 is challenging but interesting lol'

# Create an instance object of TextBlob with a string variable s
tb = TextBlob(s)

## (3) Let's start Sentiment Analysis with the TextBlob instance object!

> **< instance object of TextBlob >.sentiment** performs sentiment analysis on the instance object and returns the outcome

In [None]:
# Perform sentiment analysis to the instance object and print out the outcome
tb.sentiment

> **< instance object of TextBlob>.sentiment.polarity** returns the polarity of strings.

In [None]:
tb.sentiment.polarity

> **< instance object of TextBlob>.sentiment.subjectivty** returns the subjectivity of strings.

In [None]:
tb.sentiment.subjectivity

> **< instance object of TextBlob >.sentences** returns the string of the instance object

In [None]:
tb.sentences

## (4) Practice #1: Sentiment Analysis

In [None]:
s_pos = 'this is a great course!!!'
tb_pos = TextBlob(s_pos)
print(tb_pos.sentiment)

In [None]:
tb = TextBlob('Covid-19 makes my life miserable!!!')
print(tb.sentiment)

## (5) Practice #2: Now let's try to do sentiment analysis on real text!

In [None]:
infile = open('frankenstein.txt')
sentences = infile.readlines()
len(sentences)

In [None]:
sub_list = []
pol_list = []
for s in sentences:
    tb = TextBlob(s)
    sub_list.append(tb.sentiment.subjectivity)
    pol_list.append(tb.sentiment.polarity)
print(sub_list)

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.hist(sub_list, bins=10)
plt.xlabel('subjectivity score')
plt.ylabel('sentence count')
plt.grid(True)
plt.savefig('subjectivity.pdf')
plt.show()

In [None]:
plt.hist(pol_list, bins=10)
plt.xlabel('polarity score')
plt.ylabel('sentence count')
plt.grid(True)
plt.savefig('polarity.pdf')
plt.show()

## (6) Summary

In [None]:
from textblob import TextBlob
import matplotlib.pyplot as plt

infile = open('frankenstein.txt')
sentences = infile.readlines()

sub_list = []
pol_list = []

for s in sentences:
    tb = TextBlob(s)
    sub_list.append(tb.sentiment.subjectivity)
    pol_list.append(tb.sentiment.polarity)

plt.hist(sub_list, bins=10)
plt.xlabel('subjectivity score')
plt.ylabel('sentence count')
plt.grid(True)
plt.savefig('subjectivity.pdf')
plt.show()

plt.hist(pol_list, bins=10)
plt.xlabel('polarity score')
plt.ylabel('sentence count')
plt.grid(True)
plt.savefig('polarity.pdf')
plt.show()

## (7) Practice #3: tweet_stream_COVID_1000.json

In [None]:
import json
from textblob import TextBlob
import matplotlib.pyplot as plt
%matplotlib inline

infile = open('tweet_stream_COVID_1000.json')
data = json.load(infile)
infile.close()

Tweets = []

for t in data:
    Tweets.append(t['text'])

sub_list = []
pol_list = []

for t in Tweets:
    tb = TextBlob(t)
    sub_list.append(tb.sentiment.subjectivity)
    pol_list.append(tb.sentiment.polarity)
    
plt.hist(sub_list, bins=10)
plt.xlabel('subjectivity score')
plt.ylabel('sentence count')
plt.grid(True)
plt.savefig('subjectivity.pdf')
plt.show()

plt.hist(pol_list, bins=10)
plt.xlabel('polarity score')
plt.ylabel('sentence count')
plt.grid(True)
plt.savefig('polarity.pdf')
plt.show()

## (8) Practice #3: removing objective tweets

In [None]:
import json
from textblob import TextBlob
import matplotlib.pyplot as plt
%matplotlib inline

infile = open('tweet_stream_COVID_1000.json')
data = json.load(infile)
infile.close()

Tweets = []

for t in data:
    Tweets.append(t['text'])

sub_list = []
pol_list = []

for t in Tweets:
    tb = TextBlob(t)
    if tb.sentiment.subjectivity != 0:
        sub_list.append(tb.sentiment.subjectivity)
        pol_list.append(tb.sentiment.polarity)

plt.hist(sub_list, bins=10)
plt.xlabel('subjectivity score')
plt.ylabel('sentence count')
plt.grid(True)
plt.savefig('subjectivity_2.pdf')
plt.show()

plt.hist(pol_list, bins=10)
plt.xlabel('polarity score')
plt.ylabel('sentence count')
plt.grid(True)
plt.savefig('polarity_2.pdf')
plt.show()