# Unsupervised Lexicon-Based Models

## Import necessary depencencies

In [2]:
#Before we start our analysis, let’s load the necessary dependencies and configuration settings

import pandas as pd
import numpy as np
import text_normalizer as tn
import model_evaluation_utils as meu

np.set_printoptions(precision=2, linewidth=80)

## Load and normalize data

In [3]:
#Now, we can load our IMDb review dataset, subset out the last 15,000 reviews which will be used for our
#analysis, and normalize them using the following snippet

dataset = pd.read_csv(r'data\movie_reviews.csv')

reviews = np.array(dataset['review'])
sentiments = np.array(dataset['sentiment'])

# extract data for model evaluation
test_reviews = reviews[35000:]
test_sentiments = sentiments[35000:]
sample_review_ids = [7626, 3533, 13010]

# normalize dataset
norm_test_reviews = tn.normalize_corpus(test_reviews)

## Sentiment Analysis with AFINN

In [5]:
#The AFINN lexicon is perhaps one of the simplest and most popular lexicons that can be used extensively
#for sentiment analysis. The current version of the lexicon is AFINN-en-165.
#txt and it contains over 3,300+ words with a polarity score associated with each word. You can find this
#lexicon at the author’s official GitHub repository along with previous versions of this lexicon including
#AFINN-111 at https://github.com/fnielsen/afinn/blob/master/afinn/data/. The author has also
#created a nice wrapper library on top of this in Python called afinn which we will be using for our analysis
#needs. You can import the library and instantiate an object using the following code 

from afinn import Afinn

afn = Afinn(emoticons=True)

## Predict sentiment for sample reviews

In [6]:
#We can now use this object and compute the polarity of our chosen four sample reviews 
for review, sentiment in zip(test_reviews[sample_review_ids], test_sentiments[sample_review_ids]):
    print('REVIEW:', review)
    print('Actual Sentiment:', sentiment)
    print('Predicted Sentiment polarity:', afn.score(review))
    print('-'*60)

REVIEW: no comment - stupid movie, acting average or worse... screenplay - no sense at all... SKIP IT!
Actual Sentiment: negative
Predicted Sentiment polarity: -7.0
------------------------------------------------------------
REVIEW: I don't care if some people voted this movie to be bad. If you want the Truth this is a Very Good Movie! It has every thing a movie should have. You really should Get this one.
Actual Sentiment: positive
Predicted Sentiment polarity: 3.0
------------------------------------------------------------
REVIEW: Worst horror film ever but funniest film ever rolled in one you have got to see this film it is so cheap it is unbeliaveble but you have to see it really!!!! P.s watch the carrot
Actual Sentiment: positive
Predicted Sentiment polarity: -3.0
------------------------------------------------------------


## Predict sentiment for test dataset

In [7]:
#We can compare the actual sentiment label for each review and also check out the predicted sentiment
#polarity score. A negative polarity typically denotes negative sentiment. To predict sentiment on our
#complete test dataset of 15,000 reviews (I used the raw text documents because AFINN takes into account
#other aspects like emoticons and exclamations), we can now use the following snippet. I used a threshold
#of >= 1.0 to determine if the overall sentiment is positive else negative. You can choose your own threshold
#based on analyzing your own corpora in the future

sentiment_polarity = [afn.score(review) for review in test_reviews]
predicted_sentiments = ['positive' if score >= 1.0 else 'negative' for score in sentiment_polarity]

## Evaluate model performance

In [8]:
#Now that we have our predicted sentiment labels, we can evaluate our model performance based on
#standard performance metrics using our utility function.
meu.display_model_performance_metrics(true_labels=test_sentiments, predicted_labels=predicted_sentiments, 
                                  classes=['positive', 'negative'])
#We get an overall F1-Score of 71%, which is quite decent considering it’s an unsupervised model.
#Looking at the confusion matrix we can clearly see that quite a number of negative sentiment based
#reviews have been misclassified as positive (3,189) and this leads to the lower recall of 57% for the negative
#sentiment class. Performance for positive class is better with regard to recall or hit-rate, where we correctly
#predicted 6,376 out of 7,510 positive reviews, but precision is 67% because of the many wrong positive
#predictions made in case of negative sentiment reviews.

Model Performance metrics:
------------------------------
Accuracy: 0.7118
Precision: 0.7289
Recall: 0.7118
F1 Score: 0.7062

Model Classification report:
------------------------------
             precision    recall  f1-score   support

   positive       0.67      0.85      0.75      7510
   negative       0.79      0.57      0.67      7490

avg / total       0.73      0.71      0.71     15000


Prediction Confusion Matrix:
------------------------------
                 Predicted:         
                   positive negative
Actual: positive       6376     1134
        negative       3189     4301


## Sentiment Analysis with SentiWordNet

In [11]:
#The WordNet corpus is definitely one of the most popular corpora for the English language used extensively
#in natural language processing and semantic analysis. WordNet gave us the concept of synsets or synonym
#sets. The SentiWordNet lexicon is based on WordNet synsets and can be used for sentiment analysis and
#opinion mining. The SentiWordNet lexicon typically assigns three sentiment scores for each WordNet synset.
#These include a positive polarity score, a negative polarity score and an objectivity score. Further details
#are available on the official web site http://sentiwordnet.isti.cnr.it, including research papers and
#download links for the lexicon. We will be using the nltk library, which provides a Pythonic interface into
#SentiWordNet. Consider we have the adjective awesome. We can get the sentiment scores associated with
#the synset for this word using the following snippet.

from nltk.corpus import sentiwordnet as swn

awesome = list(swn.senti_synsets('awesome', 'a'))[0]
print('Positive Polarity Score:', awesome.pos_score())
print('Negative Polarity Score:', awesome.neg_score())
print('Objective Score:', awesome.obj_score())

Positive Polarity Score: 0.875
Negative Polarity Score: 0.125
Objective Score: 0.0


## Build model

In [12]:
#Let’s now build a generic function to extract and aggregate sentiment scores for a complete textual
#document based on matched synsets in that document

#Our function basically takes in a movie review, tags each word with its corresponding POS tag, extracts
#out sentiment scores for any matched synset token based on its POS tag, and finally aggregates the scores.

def analyze_sentiment_sentiwordnet_lexicon(review,
                                           verbose=False):

    # tokenize and POS tag text tokens
    tagged_text = [(token.text, token.tag_) for token in tn.nlp(review)]
    pos_score = neg_score = token_count = obj_score = 0
    # get wordnet synsets based on POS tags
    # get sentiment scores if synsets are found
    for word, tag in tagged_text:
        ss_set = None
        if 'NN' in tag and list(swn.senti_synsets(word, 'n')):
            ss_set = list(swn.senti_synsets(word, 'n'))[0]
        elif 'VB' in tag and list(swn.senti_synsets(word, 'v')):
            ss_set = list(swn.senti_synsets(word, 'v'))[0]
        elif 'JJ' in tag and list(swn.senti_synsets(word, 'a')):
            ss_set = list(swn.senti_synsets(word, 'a'))[0]
        elif 'RB' in tag and list(swn.senti_synsets(word, 'r')):
            ss_set = list(swn.senti_synsets(word, 'r'))[0]
        # if senti-synset is found        
        if ss_set:
            # add scores for all found synsets
            pos_score += ss_set.pos_score()
            neg_score += ss_set.neg_score()
            obj_score += ss_set.obj_score()
            token_count += 1
    
    # aggregate final scores
    final_score = pos_score - neg_score
    norm_final_score = round(float(final_score) / token_count, 2)
    final_sentiment = 'positive' if norm_final_score >= 0 else 'negative'
    if verbose:
        norm_obj_score = round(float(obj_score) / token_count, 2)
        norm_pos_score = round(float(pos_score) / token_count, 2)
        norm_neg_score = round(float(neg_score) / token_count, 2)
        # to display results in a nice table
        sentiment_frame = pd.DataFrame([[final_sentiment, norm_obj_score, norm_pos_score, 
                                         norm_neg_score, norm_final_score]],
                                       columns=pd.MultiIndex(levels=[['SENTIMENT STATS:'], 
                                                             ['Predicted Sentiment', 'Objectivity',
                                                              'Positive', 'Negative', 'Overall']], 
                                                             labels=[[0,0,0,0,0],[0,1,2,3,4]]))
        print(sentiment_frame)
        
    return final_sentiment

## Predict sentiment for sample reviews

In [13]:
#This will be clearer when we run it on our sample documents.
for review, sentiment in zip(test_reviews[sample_review_ids], test_sentiments[sample_review_ids]):
    print('REVIEW:', review)
    print('Actual Sentiment:', sentiment)
    pred = analyze_sentiment_sentiwordnet_lexicon(review, verbose=True)    
    print('-'*60)

REVIEW: no comment - stupid movie, acting average or worse... screenplay - no sense at all... SKIP IT!
Actual Sentiment: negative
     SENTIMENT STATS:                                      
  Predicted Sentiment Objectivity Positive Negative Overall
0            negative        0.76     0.09     0.15   -0.06
------------------------------------------------------------
REVIEW: I don't care if some people voted this movie to be bad. If you want the Truth this is a Very Good Movie! It has every thing a movie should have. You really should Get this one.
Actual Sentiment: positive
     SENTIMENT STATS:                                      
  Predicted Sentiment Objectivity Positive Negative Overall
0            positive        0.76     0.19     0.06    0.13
------------------------------------------------------------
REVIEW: Worst horror film ever but funniest film ever rolled in one you have got to see this film it is so cheap it is unbeliaveble but you have to see it really!!!! P.s watch 

## Predict sentiment for test dataset

In [14]:
#We can clearly see the predicted sentiment along with sentiment polarity scores and an objectivity
#score for each sample movie review depicted in formatted dataframes. Let’s use this model now to predict
#the sentiment of all our test reviews and evaluate its performance. A threshold of >=0 has been used for the
#overall sentiment polarity to be classified as positive and < 0 for negative sentiment.
predicted_sentiments = [analyze_sentiment_sentiwordnet_lexicon(review, verbose=False) for review in norm_test_reviews]

## Evaluate model performance

In [15]:
meu.display_model_performance_metrics(true_labels=test_sentiments, predicted_labels=predicted_sentiments, 
                                  classes=['positive', 'negative'])

#We get an overall F1-Score of 68%, which is definitely a step down from our AFINN based model. While
#we have lesser number of negative sentiment based reviews being misclassified as positive, the other aspects
#of the model performance have been affected.

Model Performance metrics:
------------------------------
Accuracy: 0.6851
Precision: 0.6905
Recall: 0.6851
F1 Score: 0.6827

Model Classification report:
------------------------------
             precision    recall  f1-score   support

   positive       0.66      0.77      0.71      7510
   negative       0.72      0.60      0.66      7490

avg / total       0.69      0.69      0.68     15000


Prediction Confusion Matrix:
------------------------------
                 Predicted:         
                   positive negative
Actual: positive       5784     1726
        negative       2998     4492


## Sentiment Analysis with VADER

In [16]:
#The VADER lexicon, developed by C.J. Hutto, is a lexicon that is based on a rule-based sentiment analysis
#framework, specifically tuned to analyze sentiments in social media.
#You can use the library based on nltk's interface under the nltk.sentiment.vader module. Besides this,
#you can also download the actual lexicon or install the framework from https://github.com/cjhutto/
#vaderSentiment, which also contains detailed information about VADER. This lexicon, present in the file
#titled vader_lexicon.txt contains necessary sentiment scores associated with words, emoticons and slangs
#(like wtf, lol, nah, and so on). There were a total of over 9000 lexical features from which over 7500 curated
#lexical features were finally selected in the lexicon with proper validated valence scores. Each feature was
#rated on a scale from "[-4] Extremely Negative" to "[4] Extremely Positive", with allowance for "[0]
#Neutral (or Neither, N/A)". The process of selecting lexical features was done by keeping all features that
#had a non-zero mean rating and whose standard deviation was less than 2.5, which was determined by the
#aggregate of ten independent raters.

from nltk.sentiment.vader import SentimentIntensityAnalyzer

## Build model

In [17]:
#:(     -1.9     1.13578 [-2, -3, -2, 0, -1, -1, -2, -3, -1, -4]
#:)      2.0     1.18322 [2, 2, 1, 1, 1, 1, 4, 3, 4, 1]
#...
#terrorizing     -3.0    1.0     [-3, -1, -4, -4, -4, -3, -2, -3, -2, -4]
#thankful         2.7    0.78102 [4, 2, 2, 3, 2, 4, 3, 3, 2, 2]

#Each line in the preceding lexicon sample depicts a unique term, which can either be an emoticon or a
#word. The first token indicates the word/emoticon, the second token indicates the mean sentiment polarity
#score, the third token indicates the standard deviation, and the final token indicates a list of scores given
#by ten independent scorers. Now let’s use VADER to analyze our movie reviews! 

def analyze_sentiment_vader_lexicon(review, 
                                    threshold=0.1,
                                    verbose=False):
    # pre-process text
    review = tn.strip_html_tags(review)
    review = tn.remove_accented_chars(review)
    review = tn.expand_contractions(review)
    
    # analyze the sentiment for review
    analyzer = SentimentIntensityAnalyzer()
    scores = analyzer.polarity_scores(review)
    # get aggregate scores and final sentiment
    agg_score = scores['compound']
    final_sentiment = 'positive' if agg_score >= threshold\
                                   else 'negative'
    if verbose:
        # display detailed sentiment statistics
        positive = str(round(scores['pos'], 2)*100)+'%'
        final = round(agg_score, 2)
        negative = str(round(scores['neg'], 2)*100)+'%'
        neutral = str(round(scores['neu'], 2)*100)+'%'
        sentiment_frame = pd.DataFrame([[final_sentiment, final, positive,
                                        negative, neutral]],
                                        columns=pd.MultiIndex(levels=[['SENTIMENT STATS:'], 
                                                                      ['Predicted Sentiment', 'Polarity Score',
                                                                       'Positive', 'Negative', 'Neutral']], 
                                                              labels=[[0,0,0,0,0],[0,1,2,3,4]]))
        print(sentiment_frame)
    
    return final_sentiment

## Predict sentiment for sample reviews

In [19]:
#In our modeling function, we do some basic pre-processing but keep the punctuations and emoticons
#intact. Besides this, we use VADER to get the sentiment polarity and also proportion of the review text with
#regard to positive, neutral and negative sentiment. We also predict the final sentiment based on a user-input
#threshold for the aggregated sentiment polarity. Typically, VADER recommends using positive sentiment 
#for aggregated polarity >= 0.5, neutral between [-0.5, 0.5], and negative for polarity < -0.5. We use a
#threshold of >= 0.4 for positive and < 0.4 for negative in our corpus.

for review, sentiment in zip(test_reviews[sample_review_ids], test_sentiments[sample_review_ids]):
    print('REVIEW:', review)
    print('Actual Sentiment:', sentiment)
    pred = analyze_sentiment_vader_lexicon(review, threshold=0.4, verbose=True)    
    print('-'*60)

REVIEW: no comment - stupid movie, acting average or worse... screenplay - no sense at all... SKIP IT!
Actual Sentiment: negative
     SENTIMENT STATS:                                         
  Predicted Sentiment Polarity Score Positive Negative Neutral
0            negative           -0.8     0.0%    40.0%   60.0%
------------------------------------------------------------
REVIEW: I don't care if some people voted this movie to be bad. If you want the Truth this is a Very Good Movie! It has every thing a movie should have. You really should Get this one.
Actual Sentiment: positive
     SENTIMENT STATS:                                                     
  Predicted Sentiment Polarity Score Positive             Negative Neutral
0            negative          -0.16    16.0%  14.000000000000002%   69.0%
------------------------------------------------------------
REVIEW: Worst horror film ever but funniest film ever rolled in one you have got to see this film it is so cheap it is unb

## Predict sentiment for test dataset

In [20]:
#We can see the details statistics pertaining to the sentiment and polarity for each sample movie review.
#Let’s try out our model on the complete test movie review corpus now and evaluate the model performance.

predicted_sentiments = [analyze_sentiment_vader_lexicon(review, threshold=0.4, verbose=False) for review in test_reviews]

## Evaluate model performance

In [21]:
meu.display_model_performance_metrics(true_labels=test_sentiments, predicted_labels=predicted_sentiments, 
                                  classes=['positive', 'negative'])

#We get an overall F1-Score and model accuracy of 71%, which is quite similar to the AFINN based
#model. The AFINN based model only wins out on the average precision by 1%; otherwise, both models have
#a similar performance.

Model Performance metrics:
------------------------------
Accuracy: 0.711
Precision: 0.7236
Recall: 0.711
F1 Score: 0.7068

Model Classification report:
------------------------------
             precision    recall  f1-score   support

   positive       0.67      0.83      0.74      7510
   negative       0.78      0.59      0.67      7490

avg / total       0.72      0.71      0.71     15000


Prediction Confusion Matrix:
------------------------------
                 Predicted:         
                   positive negative
Actual: positive       6235     1275
        negative       3060     4430
