# Comprehend API  invoke  API Demo

## This notebook shows how to use Comprehend Sentiment API

In [None]:
! pip install boto3==1.7.62

Configure boto3 client for comprehend

In [None]:
import boto3
import pprint
client = boto3.client('comprehend')


#### Test the Comprehend  api which accepts a single doc

In [None]:
text="This is an amazing place"
language_code ='en' # Language code for English

In [None]:
response = client.detect_sentiment(Text=text, LanguageCode=language_code)
sentiment = response["Sentiment"]
confidence_score = response["SentimentScore"][sentiment.title()]

print( "The sentiment for \" {} \" is: {}, with score {}".format(text, sentiment, confidence_score))

#### Now let's  test the Comprehend batch api

In [None]:
list_of_doc =[
     "simplistic , silly and tedious" 
, "it's so laddish and juvenile , only teenage boys could possibly find it funny ."
, "exploitative and largely devoid of the depth or sophistication that would make watching such a graphic treatment of the crimes bearable ."
, "perhaps no picture ever made has more literally showed that the road to hell is paved with good intentions ."
, "steers turns in a snappy screenplay that curls at the edges ; it's so clever you want to hate it . but he somehow pulls it off ."
 ]

In [None]:
response = client.batch_detect_sentiment(TextList=list_of_doc, LanguageCode=language_code)
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(response)


#### Lets test the Comprehend Async Api

1. Create a bucket in this account you have access to
2. Create an IAM Service Role for Comprehend to have read and write access to your bucket. The quickest way to make this happen is go to the comprehend console -> Try comprehend -> Analysis-> Create analysis job. Submit a dummy job and copy paste the role ARN it created in here

In [None]:
bucket_name="aegovansagemaker"
role='arn:aws:iam::<AccountId>:role/service-role/<RoleName?'

Download public movies dataset

In [None]:
! wget http://www.cs.cornell.edu/people/pabo/movie-review-data/rt-polaritydata.tar.gz
! tar -xf "rt-polaritydata.tar.gz" 
! ls rt-polaritydata

Copy data to s3

In [None]:
local_file='rt-polaritydata/rt-polarity.pos'
local_file_converted='rt-polaritydata/rt-polarity.pos.utf8.txt'
s3_input_dir = 'rt-polaritydata'
s3_input_key = '{}/{}'.format(s3_input_dir, 'rt-polarity.pos.utf8.txt')
s3_output_key = 's3://{}'.format(bucket_name) 



Comprehend API only accepts UTF-8 formatted documents .., so convert the file which uses latin encoding to utf-8

In [None]:
def convert_to_utf8(sourcefile, destinationfile, source_encoding):
    with open(local_file, 'r', encoding=source_encoding) as input:
         with open(local_file_converted, 'w', encoding="utf-8") as out:
            for line in input:
                out.write(line[:-1]+'\n')
        


Upload the converted file to s3

In [None]:

s3 = boto3.resource('s3')
convert_to_utf8(local_file, local_file, 'latin')
with open(local_file_converted, 'rb') as data:
    s3.Bucket(bucket_name).put_object(Key=s3_input_key, Body=data)

Submit a comprehend sentiment analysis job

In [None]:
from time import gmtime, strftime
import uuid
job_name = 'start_sentiment_detection_job' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
request_id = str(uuid.uuid4())


response = client.start_sentiment_detection_job(
    InputDataConfig={
        'S3Uri': 's3://{}/{}'.format(bucket_name, s3_input_dir),
        'InputFormat': 'ONE_DOC_PER_LINE'
    },
    OutputDataConfig={
        'S3Uri': s3_output_key,
    },
    DataAccessRoleArn=role,
    JobName=job_name,
    LanguageCode='en',
    ClientRequestToken=request_id
)
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(response)

Check that the status of the job until it is complete or failed. This will take atleast 10 minutes to complete..

In [None]:
%%timeit -n1 -r1

import time
is_complete = False
pp = pprint.PrettyPrinter(indent=4)
while not is_complete:
    response_describe = client.describe_sentiment_detection_job(JobId=response['JobId'])
    job_status = response_describe['SentimentDetectionJobProperties']['JobStatus']
    if  job_status in  ['SUBMITTED', 'IN_PROGRESS']:
        ## Comprehend is still working through.. sleep and try again
        print("{} job is in status {}...".format(strftime("%Y-%m-%d-%H-%M-%S", gmtime()), job_status))
        time.sleep(10)     
        continue
    else:
        is_complete = True
        

print (" The job completed with code {}".format(job_status))
pp.pprint(response_describe)
