# Sentiment Analysis

**Introduction**
 
Sentiment Analysis is a process of analyzing the sentiment expressed in a text.

Typically, in AI or Data Science, sentiment could have either two or three classes.

If two classes - positive and negative sentiment.

If three classes - positive, neutral and negative sentiment.


**TextBlob**

TextBlob returns polarity and subjectivity of a sentence. Polarity lies between [-1,1], -1 defines a negative sentiment and 1 defines a positive sentiment. Negation words reverse the polarity.

Documentation - https://textblob.readthedocs.io/en/dev/

In [1]:
from textblob import TextBlob

In [8]:
import pandas as pd

df_train = pd.read_csv('../Datasets/imdb_review/train_data.csv')
df_train.head(5)

Unnamed: 0,ID,SentimentText,Sentiment
0,0,first think another disney movie might good it...,1
1,1,put aside dr house repeat missed desperate hou...,0
2,2,big fan stephen king s work film made even gre...,1
3,3,watched horrid thing tv needless say one movie...,0
4,4,truly enjoyed film acting terrific plot jeff c...,1


In [3]:
sample_text = df_train['SentimentText'][0]
sample_text

'first think another disney movie might good it s kids movie watch it can not help enjoy it ages love movie first saw movie years later still love it danny glover superb could play part better christopher lloyd hilarious perfect part tony danza believable mel clark can not help enjoy movie give'

In [4]:
blob = TextBlob(sample_text)
blob.tags

[('first', 'RB'),
 ('think', 'VB'),
 ('another', 'DT'),
 ('disney', 'NN'),
 ('movie', 'NN'),
 ('might', 'MD'),
 ('good', 'VB'),
 ('it', 'PRP'),
 ('s', 'JJ'),
 ('kids', 'NNS'),
 ('movie', 'NN'),
 ('watch', 'IN'),
 ('it', 'PRP'),
 ('can', 'MD'),
 ('not', 'RB'),
 ('help', 'VB'),
 ('enjoy', 'VB'),
 ('it', 'PRP'),
 ('ages', 'VBZ'),
 ('love', 'VB'),
 ('movie', 'NN'),
 ('first', 'RB'),
 ('saw', 'VBD'),
 ('movie', 'NN'),
 ('years', 'NNS'),
 ('later', 'RB'),
 ('still', 'RB'),
 ('love', 'VB'),
 ('it', 'PRP'),
 ('danny', 'JJ'),
 ('glover', 'NN'),
 ('superb', 'NN'),
 ('could', 'MD'),
 ('play', 'VB'),
 ('part', 'NN'),
 ('better', 'RBR'),
 ('christopher', 'NN'),
 ('lloyd', 'RB'),
 ('hilarious', 'JJ'),
 ('perfect', 'JJ'),
 ('part', 'NN'),
 ('tony', 'NN'),
 ('danza', 'NN'),
 ('believable', 'JJ'),
 ('mel', 'NN'),
 ('clark', 'NN'),
 ('can', 'MD'),
 ('not', 'RB'),
 ('help', 'VB'),
 ('enjoy', 'VB'),
 ('movie', 'NN'),
 ('give', 'VB')]

In [5]:
for sentence in blob.sentences:
    print(sentence.sentiment.polarity)

0.5


In [6]:
def get_sentiment(x, threshold):
    _sentiment = TextBlob(x).sentiment.polarity
    if _sentiment > threshold:
        return 1
    else:
        return 0

In [None]:
df_train['text_blob_sentiment_0'] = df_train['SentimentText'].apply(lambda x: get_sentiment(x, 0))
df_train['text_blob_sentiment_0.1'] = df_train['SentimentText'].apply(lambda x: get_sentiment(x, 0.1))

In [None]:
df_train

In [None]:
df_train['Sentiment'].value_counts()

In [None]:
from sklearn.metrics import accuracy_score
accuracy_score(df_train['Sentiment'], df_train['text_blob_sentiment_0'])

In [None]:
accuracy_score(df_train['Sentiment'], df_train['text_blob_sentiment_0.1'])

## Sentiment Analysis using Transformers

In [9]:
from transformers import pipeline



In [10]:
nlp = pipeline('sentiment-analysis')

In [11]:
nlp('This was an absolutely terrible movie.')

[{'label': 'NEGATIVE', 'score': 0.999608039855957}]

In [12]:
def get_hf_sentiment(x):
    if len(x) > 2000:
        _sentiment = nlp(x[:2000])[0]['label']
    else:
        _sentiment = nlp(x)[0]['label']
    if _sentiment == 'POSITIVE':
        return 1
    else:
        return 0

In [None]:
df_train['hf_sentiment'] = df_train['SentimentText'].apply(lambda x: get_hf_sentiment(x))

In [None]:
accuracy_score(df_train['Sentiment'], df_train['hf_sentiment'])