# 452 - Machine Learning - HW9 - Sentiment Analysis using Amazon Comprehend API

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Reading-the-input-file" data-toc-modified-id="Reading-the-input-file-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Reading the input file</a></span></li><li><span><a href="#Importing-the-required-packages" data-toc-modified-id="Importing-the-required-packages-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Importing the required packages</a></span></li><li><span><a href="#Connecting-to-boto" data-toc-modified-id="Connecting-to-boto-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Connecting to boto</a></span></li><li><span><a href="#Sentiment-analysis-on-one-record" data-toc-modified-id="Sentiment-analysis-on-one-record-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Sentiment analysis on one record</a></span></li><li><span><a href="#Sentiment-analysis-on-entire-document" data-toc-modified-id="Sentiment-analysis-on-entire-document-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Sentiment analysis on entire document</a></span></li></ul></div>

### Reading the input file

In [22]:
# Opening and Reading the files into a list 
with open("amazon_reviews.txt","r") as text_file:
    lines = text_file.read().split('\n')
# split by tab and remove corrupted data if any or lines which are not tab seperated
lines = [line.split("\t") for line in lines if len(line.split("\t"))==2 and line.split("\t")[1]!='']
lines[0:10]

[['So there is no way for me to plug it in here in the US unless I go by a converter.',
  '0'],
 ['Good case, Excellent value.', '1'],
 ['Great for the jawbone.', '1'],
 ['Tied to charger for conversations lasting more than 45 minutes.MAJOR PROBLEMS!!',
  '0'],
 ['The mic is great.', '1'],
 ['I have to jiggle the plug to get it to line up right to get decent volume.',
  '0'],
 ['If you have several dozen or several hundred contacts, then imagine the fun of sending each of them one by one.',
  '0'],
 ['If you are Razr owner...you must have this!', '1'],
 ['Needless to say, I wasted my money.', '0'],
 ['What a waste of money and time!.', '0']]

### Importing the required packages

In [23]:
import pandas as pd
import boto3
import json

### Connecting to boto

In [25]:
comprehend = boto3.client(service_name='comprehend',region_name='us-west-2')

### Sentiment analysis on one record

In [28]:
text_ip = lines[1][0]
text_ip

'Good case, Excellent value.'

In [29]:
print(json.dumps(comprehend.detect_sentiment(Text=text_ip, LanguageCode='en'), sort_keys=True, indent=4))

{
    "ResponseMetadata": {
        "HTTPHeaders": {
            "connection": "keep-alive",
            "content-length": "167",
            "content-type": "application/x-amz-json-1.1",
            "date": "Tue, 06 Mar 2018 08:10:47 GMT",
            "x-amzn-requestid": "e0315848-2115-11e8-9342-67a4d2892763"
        },
        "HTTPStatusCode": 200,
        "RequestId": "e0315848-2115-11e8-9342-67a4d2892763",
        "RetryAttempts": 0
    },
    "Sentiment": "POSITIVE",
    "SentimentScore": {
        "Mixed": 0.002174907363951206,
        "Negative": 0.00012512893590610474,
        "Neutral": 0.0017619254067540169,
        "Positive": 0.9959380626678467
    }
}


### Sentiment analysis on entire document

In [48]:
# Considering only first 50 records due to constraints
new_lines = [item[0] for item in lines][0:50]
new_lines[0:10]

['So there is no way for me to plug it in here in the US unless I go by a converter.',
 'Good case, Excellent value.',
 'Great for the jawbone.',
 'Tied to charger for conversations lasting more than 45 minutes.MAJOR PROBLEMS!!',
 'The mic is great.',
 'I have to jiggle the plug to get it to line up right to get decent volume.',
 'If you have several dozen or several hundred contacts, then imagine the fun of sending each of them one by one.',
 'If you are Razr owner...you must have this!',
 'Needless to say, I wasted my money.',
 'What a waste of money and time!.']

In [49]:
whole_doc = ', '.join(map(str, new_lines))

In [50]:
print(json.dumps(comprehend.detect_sentiment(Text=whole_doc, LanguageCode='en'), sort_keys=True, indent=4))

{
    "ResponseMetadata": {
        "HTTPHeaders": {
            "connection": "keep-alive",
            "content-length": "161",
            "content-type": "application/x-amz-json-1.1",
            "date": "Tue, 06 Mar 2018 08:19:59 GMT",
            "x-amzn-requestid": "28ebb387-2117-11e8-bcee-fbf5db9af180"
        },
        "HTTPStatusCode": 200,
        "RequestId": "28ebb387-2117-11e8-bcee-fbf5db9af180",
        "RetryAttempts": 0
    },
    "Sentiment": "POSITIVE",
    "SentimentScore": {
        "Mixed": 0.4448610246181488,
        "Negative": 0.08887863159179688,
        "Neutral": 0.00491439551115036,
        "Positive": 0.4613458812236786
    }
}
