## Using Amazon Comprehend: Detect Sentiment service for scoring reviews

<a href="https://colab.research.google.com/github/peckjon/hosting-ml-as-microservice/blob/master/part1/score_reviews_via_service.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Obtain labelled reviews

In order to test any of the sentiment analysis APIs, we need a labelled dataset of reviews and their sentiment polarity. We'll use NLTK to download the movie_reviews corpus.

In [1]:
from nltk import download

download('movie_reviews')

[nltk_data] Downloading package movie_reviews to
[nltk_data]     /Users/ozge/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.


True

### Load the data

The files in movie_reviews have already been divided into two sets: positive ('pos') and negative ('neg'), so we can load the raw text of the reviews into two lists, one for each polarity.

In [2]:
from nltk.corpus import movie_reviews

# extract words from reviews, pair with label

reviews_pos = []
for fileid in movie_reviews.fileids('pos'):
    review = movie_reviews.raw(fileid)
    reviews_pos.append(review)

reviews_neg = []
for fileid in movie_reviews.fileids('neg'):
    review = movie_reviews.raw(fileid)
    reviews_neg.append(review)

### Connect to the scoring API

Fill in this function with code that connects to the Amazon Comprehend API, and uses it to score a single review:

* [Documentation - Amazon Comprehend: Detect Sentiment](https://docs.aws.amazon.com/comprehend/latest/dg/API_DetectSentiment.html)

Your function must return either 'pos' or 'neg', so you'll need to make some decisions about how to map the results of the API call to one of these values. Amazon Comprehend can return "NEUTRAL" or "MIXED" for the Sentiment - if this happens, you will need to inspect the numeric values under the SentimentScore to see whether it leans toward positive or negative.


In [3]:
import boto3
client =  boto3.client('comprehend',region_name='eu-central-1',aws_access_key_id='',aws_secret_access_key='')



In [22]:
text = """My order was delayed by several days without any updates or communication from the seller. Terrible shipping service."""

client.detect_sentiment(Text=text, LanguageCode='en')

{'Sentiment': 'NEGATIVE',
 'SentimentScore': {'Positive': 0.00015661792713217437,
  'Negative': 0.9997037053108215,
  'Neutral': 0.00011433633335400373,
  'Mixed': 2.5389967049704865e-05},
 }

In [17]:
def score_review(review):
    ret_dict=client.detect_sentiment(Text=review[:5000], LanguageCode='en')
    
    if (ret_dict['Sentiment']=="NEGATIVE"):
        return 'neg'
    elif (ret_dict['Sentiment']=="POSITIVE"):
        return 'pos'
    elif (ret_dict['SentimentScore']['Positive']> ret_dict['SentimentScore']['Negative']):
        return 'pos'
    else:
        return 'neg'
    

### Score each review

Now, we can use the function you defined to score each of the reviews

In [20]:
# Create 2 smaller subsets for testing
subset_pos = reviews_pos[:10]
subset_neg = reviews_neg[:10]

results_pos = []
# When comfortable with results switch `subset_pos` to reviews_pos`
for review in subset_pos:
    result = score_review(review)
    results_pos.append(result)

results_neg = []
# When comfortable with results switch `subset_neg` to reviews_neg`
for review in subset_neg:
    result = score_review(review)
    results_neg.append(result)

### Calculate accuracy

For each of our known positive reviews, we can count the number which our function scored as 'pos', and use this to calculate the % accuracy. We repeaty this for negative reviews, and also for overall accuracy.

In [21]:
correct_pos = results_pos.count('pos')
accuracy_pos = float(correct_pos) / len(results_pos)
correct_neg = results_neg.count('neg')
accuracy_neg = float(correct_neg) / len(results_neg)
correct_all = correct_pos + correct_neg
accuracy_all = float(correct_all) / (len(results_pos)+len(results_neg))

print('Positive reviews: {}% correct'.format(accuracy_pos*100))
print('Negative reviews: {}% correct'.format(accuracy_neg*100))
print('Overall accuracy: {}% correct'.format(accuracy_all*100))

Positive reviews: 64.7% correct
Negative reviews: 68.0% correct
Overall accuracy: 66.35% correct


## BONUS: Use the entire review

If we are not happy with the results we are getting from Comprehend, instead of truncating the review, we can submit it part by part and combine the results at the end. This will be a bit more complex, but it _might_ give us a boost in accuracy

In [None]:
import os

import boto3

# You need to obtain valid credentials to call the Comprehend APIs
#
# You can find more information about the different types of credential
# and how to obtain them on the following tutorial:
# https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html
# 
# And you can learn how to configure credentials in the AWS Python SDK (boto3)
# in its official documentation:
# https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
comprehend = boto3.client(
    "comprehend",
    aws_access_key_id="YOUR_AWS_ACCESS_KEY_ID_HERE",
    aws_secret_access_key="YOUR_AWS_SECRET_ACCESS_KEY_HERE"
)

def score_review(review):   
    # Comprehend has a limit of 5000 characters in the text we submit for sentiment
    # detection. If the text is longer, we will need to submit it in parts and
    # combine the results to obtain the overall score
    if len(review) < 5000:
        return score_short_review(review)
    else:
        return score_long_review(review)
    

def score_short_review(review):
    response = comprehend.detect_sentiment(Text=review, LanguageCode="en")
    
    detected_sentiment = response["Sentiment"]
    if detected_sentiment == "POSITIVE":
        return 'pos'
    elif detected_sentiment == "NEGATIVE":
        return 'neg'
    else:
        # There isn't a clear sentiment, the result is NEUTRAL or MIXED
        # We need to compare the scores to detect if either positive or negative
        # is the dominant one
        sentiment_score = response["SentimentScore"]
        if sentiment_score["Positive"] >= sentiment_score["Negative"]:
            return 'pos'
        else:
            return 'neg'
    
def score_long_review(review):
    max_supported_length = 5000
    # We split the review in fragments of 5000 characters
    review_fragments = [review[i:i+max_supported_length] for i in range(0, len(review), max_supported_length)]
    
    accumulated_scores = {"Positive": 0.0, "Negative": 0.0, "Neutral": 0.0, "Mixed": 0.0}
    for fragment in review_fragments:
        # We need to ensure that shorter fragments don't overcompensate over larger ones
        # To do so, we'll use a coefficient representing the % of the max supported length
        fragment_weight = len(fragment)/max_supported_length
        
        response = comprehend.detect_sentiment(Text=fragment, LanguageCode="en")
        sentiment_scores = response["SentimentScore"]

        for sentiment in sentiment_scores.keys():
            accumulated_scores[sentiment] = accumulated_scores[sentiment] + sentiment_scores[sentiment] * fragment_weight
        
        return accumulated_scores
