<h1>Motivation:</h1> 
<p>This jupyter notebook is meant to serve as a complementary material for my tutorial on How to use Amazon Comprehend.</p>


<ul>
    <h3> This notebook is designed to showcase how to use AMAZON COMPREHEND for three tasks.</h3>
    <li> Sentiment Analysis.</li>
    <li> Key Phrase Extraction. </li>
    <li> Named Entities Extraction. </li>
    <li> Language Detection</li>
</ul>




Import Boto3 and connect its client to the comprehend.

In [1]:
import boto3
comprehend = boto3.client(service_name='comprehend', region_name='us-east-1',aws_access_key_id = 'Your Access Key Id Goes Here', 
                          aws_secret_access_key = 'You Secret Key Goes Here')

You are all set to use Comprehend! 


<h1> 1) Comprehend for Sentiment Analysis.</h1>
<p> It is simple. Just use the __detect_sentiment__ method of your client.</p>

In [12]:
sent = comprehend.detect_sentiment(Text = 'Amazon is a great company. I love working there.', LanguageCode= 'en')
sent

{'Sentiment': 'POSITIVE',
 'SentimentScore': {'Positive': 0.9986692667007446,
  'Negative': 5.039777897763997e-05,
  'Neutral': 0.00023834838066250086,
  'Mixed': 0.0010420106118544936},
 'ResponseMetadata': {'RequestId': '83970793-e89e-11e8-97c2-ddc68243f710',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Thu, 15 Nov 2018 06:20:14 GMT',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '167',
   'connection': 'keep-alive',
   'x-amzn-requestid': '83970793-e89e-11e8-97c2-ddc68243f710'},
  'RetryAttempts': 0}}

<p>The output is in java script object notation jason format. In python, this is equivallent to a nested dictionary. The output consists of three main parts namely:</p> 
<ul>
    <li>__Sentiment__: This is a word that describes the sentiment expressed in the Text.</li>
    <li>__SentimentScore__: This is a dictionary of sentiments wiith their probabilities of represented in the Text.</li>
    <li>__ResponseMetadata__: This is the metadata about the call to the comphrehend service.</li>
</ul>
<p> They are four valid sentiments namely: </p>
## POSITIVE | NEGATIVE | NEUTRAL | MIXED

<br> You can only have one sentiment per Text</br>

<h1> 2) Comprehend for Key Phrase Extraction.</h1>
<p> It is simple. Just use the __detect_key_phrases__ method of your client.</p>

In [5]:
# set the text that you want to get key phrases from
txt = "On the 10th of November 2016, at 6:30 pm, the Police arrested John Doe, the director of Free Health at his home in Madison. For several months now, he has been under investigation. John was accused of diverting company funds into his personal account. He has requested to speak with his lawyer as soon as possible."

In [6]:
keyPhrases = comprehend.detect_key_phrases(Text = txt, LanguageCode = 'en')

In [7]:
keyPhrases

{'KeyPhrases': [{'Score': 0.9808025360107422,
   'Text': 'the 10th',
   'BeginOffset': 3,
   'EndOffset': 11},
  {'Score': 0.9979576468467712,
   'Text': 'November 2016',
   'BeginOffset': 15,
   'EndOffset': 28},
  {'Score': 0.9829525351524353,
   'Text': '6:30 pm',
   'BeginOffset': 33,
   'EndOffset': 40},
  {'Score': 0.9958406090736389,
   'Text': 'the Police',
   'BeginOffset': 42,
   'EndOffset': 52},
  {'Score': 0.9991291761398315,
   'Text': 'John Doe',
   'BeginOffset': 62,
   'EndOffset': 70},
  {'Score': 0.999496579170227,
   'Text': 'the director',
   'BeginOffset': 72,
   'EndOffset': 84},
  {'Score': 0.9977135062217712,
   'Text': 'Free Health',
   'BeginOffset': 88,
   'EndOffset': 99},
  {'Score': 0.9993090629577637,
   'Text': 'his home',
   'BeginOffset': 103,
   'EndOffset': 111},
  {'Score': 0.9967743754386902,
   'Text': 'Madison',
   'BeginOffset': 115,
   'EndOffset': 122},
  {'Score': 0.9980320930480957,
   'Text': 'several months',
   'BeginOffset': 128,
   'En

<p>The key phrases are in jave script object notation jason which in python is the equivallent of nested dictionaries. The output consists of KeyPhrases, and ResponseMetadata. We are only interested in the KeyPhrases. Each KeyPhrase is dictionary of four objects namely:
</p>
<ul>
    <li>__Score__: The Score is measure of the confidence the algorithm has that that key phrase makes sense.</li>
    <li>__Text__: This is our subject of interest. This is the text that make up the key phrase that we want.</li>
    <li>__BeginOffset__: This is the index of the first character of the key phrase in the input Text.</li>
    <li>__EndOffset__: This is the index of the Last character of the key phrase in the input Text.</li>
<ul>
    
    
<br> Below, I print the extracted key phrases, one on a line.</br>

In [11]:
for result in keyPhrases['KeyPhrases']:
    print(result['Text'])

the 10th
November 2016
6:30 pm
the Police
John Doe
the director
Free Health
his home
Madison
several months
investigation
John
company funds
his personal account
his lawyer


<h1> 2) Comprehend for Named Entity Extraction.</h1>
<p>It is simple. Just use the detect_named_entities method of your client.</p>

In [9]:
namedEntities = comprehend.detect_entities(Text = txt, LanguageCode = 'en')

In [10]:
namedEntities

{'Entities': [{'Score': 0.984207272529602,
   'Type': 'DATE',
   'Text': '10th of November 2016',
   'BeginOffset': 7,
   'EndOffset': 28},
  {'Score': 0.9976122975349426,
   'Type': 'DATE',
   'Text': '6:30 pm',
   'BeginOffset': 33,
   'EndOffset': 40},
  {'Score': 0.9987821578979492,
   'Type': 'PERSON',
   'Text': 'John Doe',
   'BeginOffset': 62,
   'EndOffset': 70},
  {'Score': 0.7706741094589233,
   'Type': 'ORGANIZATION',
   'Text': 'Free Health',
   'BeginOffset': 88,
   'EndOffset': 99},
  {'Score': 0.9977292418479919,
   'Type': 'LOCATION',
   'Text': 'Madison',
   'BeginOffset': 115,
   'EndOffset': 122},
  {'Score': 0.7375115752220154,
   'Type': 'QUANTITY',
   'Text': 'several months',
   'BeginOffset': 128,
   'EndOffset': 142},
  {'Score': 0.9996035695075989,
   'Type': 'PERSON',
   'Text': 'John',
   'BeginOffset': 181,
   'EndOffset': 185}],
 'ResponseMetadata': {'RequestId': 'fcc4feb2-e8a0-11e8-b6e5-5dbbcf180c8c',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'T

<p>Just like in Key phrase extraction, the output is a json object. Each named entity consists of 5 parts namely:</p>
<ul>
    <li>__Score__: The Score is measure of the confidence the algorithm has that that key phrase makes sense.</li>
    <li>__Type__: This is the type of named entity that our named entity is. They are 9 valid types.</li>
    <li>__Text__: This is our subject of interest. This is the text that make up the key phrase that we want.</li>
    <li>__BeginOffset__: This is the index of the first character of the key phrase in the input Text.</li>
    <li>__EndOffset__: This is the index of the Last character of the key phrase in the input Text.</li>
</ul>
    
<p>The valid named entity types are:</p>
###  PERSON | LOCATION | ORGANIZATION | COMMERCIAL_ITEM | EVENT | DATE | QUANTITY | TITLE | OTHER
<br> Below, I print the extracted named entities, and their types one on a line.</br>

In [11]:
for result in namedEntities['Entities']:
    print('Entity: {} \t EntityType: {}'.format(result['Text'], result['Type']))

Entity: 10th of November 2016 	 EntityType: DATE
Entity: 6:30 pm 	 EntityType: DATE
Entity: John Doe 	 EntityType: PERSON
Entity: Free Health 	 EntityType: ORGANIZATION
Entity: Madison 	 EntityType: LOCATION
Entity: several months 	 EntityType: QUANTITY
Entity: John 	 EntityType: PERSON


<h1> 3) Comprehend for Language Detection.</h1>
<p>It is simple. Just use the __detect_dominant_language__ method of your client.</p>

In [14]:
lang = comprehend.detect_dominant_language(Text = txt)

In [15]:
lang ### Can you interprete the output in the cell above? 

{'Languages': [{'LanguageCode': 'en', 'Score': 0.9971848130226135}],
 'ResponseMetadata': {'RequestId': 'f7b33f6a-e8a4-11e8-90dd-75ee00afd85e',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Thu, 15 Nov 2018 07:06:26 GMT',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '64',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'f7b33f6a-e8a4-11e8-90dd-75ee00afd85e'},
  'RetryAttempts': 0}}

<h1> 4) Comprehend for Medical Entity detection.</h1>
<p>It is simple. Just use the __detect_named_entities__ method of your client. Keep in mind that the `service_name` for comprehend medical is __comprehendmedical__.</p>

In [2]:
import boto3
comprehend_medical = boto3.client(service_name = 'comprehendmedical', region_name = 'us-east-1', 
                                  aws_access_key_id = 'YourAccessKeyGoesHere', aws_secret_access_key = 'YourSecretKeyGoesHere')

In [3]:
txt = '''A 62yo male states onset of chest pain with dizziness approximately 15 minutes before calling 9-1-1. Patient states he was mowing the yard when he had the onset of chest pain. Patient indicates no pain on palpation during examination of sternal and chest areas. Patient states chest pain radiates to right arm and denies difficulty breathing at this time. Patient’s skin is cool, pale and moist to touch. Pulse is equal on both wrists at 85, and is irregular.'''

In [5]:
medical_entities = comprehend_medical.detect_entities(Text = txt)
medical_entities

{'Entities': [{'Id': 0,
   'BeginOffset': 2,
   'EndOffset': 6,
   'Score': 0.9998487234115601,
   'Text': '62yo',
   'Category': 'PROTECTED_HEALTH_INFORMATION',
   'Type': 'AGE',
   'Traits': []},
  {'Id': 11,
   'BeginOffset': 28,
   'EndOffset': 33,
   'Score': 0.9947172999382019,
   'Text': 'chest',
   'Category': 'ANATOMY',
   'Type': 'SYSTEM_ORGAN_SITE',
   'Traits': []},
  {'Id': 1,
   'BeginOffset': 28,
   'EndOffset': 38,
   'Score': 0.95955890417099,
   'Text': 'chest pain',
   'Category': 'MEDICAL_CONDITION',
   'Type': 'DX_NAME',
   'Traits': [{'Name': 'SYMPTOM', 'Score': 0.8515509366989136}]},
  {'Id': 2,
   'BeginOffset': 44,
   'EndOffset': 53,
   'Score': 0.9893583059310913,
   'Text': 'dizziness',
   'Category': 'MEDICAL_CONDITION',
   'Type': 'DX_NAME',
   'Traits': [{'Name': 'SYMPTOM', 'Score': 0.8791248798370361}]},
  {'Id': 12,
   'BeginOffset': 164,
   'EndOffset': 169,
   'Score': 0.9959815740585327,
   'Text': 'chest',
   'Category': 'ANATOMY',
   'Type': 'SYSTE

<p>As you can see, the output is very similar to that frome Named entity extraction and each named entity such as:</p>
<ul>
<li>__Entity:__ A textual reference to the unique name of a real-world object such as people, treatments, medications, and medical conditions, and to precise references to measures, such as dates and dosage. For example, "Ibuprofen."</li>

<li>__Category:__ The generalized grouping to which an detected entity belongs, for ease of understanding. For example, "Ibuprofen" is part of the MEDICATION category.</li>

<li>__Type:__ The type of entity detected, scoped to a category. For example, "Ibuprofen" is of the GENERIC_NAME type of entity.</li>

<li>__Attribute:__ Relevant information related to an detected entity, as in dosage is an attribute of a medication. For example "200mg" is an attribute of the "Ibuprofen" entity.</li>

<li>__Trait:__ Something we understand about an entity, based on context. For instance, a medication is negated (NEGATION trait) if a patient is not taking it.</li>
</ul>
    
<p>Checkout the valid __Categories__ and __attributes__ on __[this page](https://docs.aws.amazon.com/comprehend/latest/dg/extracted-med-info.html).__:</p>

It would be nice to see the output in a `pandas dataframe.`

In [6]:
import pandas as pd
df = pd.DataFrame([(ent['Text'], ent['Category'], ent['Type']) for ent in medical_entities['Entities']], columns = 'MedicalEntity Catetory Type'.split())
df

Unnamed: 0,MedicalEntity,Catetory,Type
0,62yo,PROTECTED_HEALTH_INFORMATION,AGE
1,chest,ANATOMY,SYSTEM_ORGAN_SITE
2,chest pain,MEDICAL_CONDITION,DX_NAME
3,dizziness,MEDICAL_CONDITION,DX_NAME
4,chest,ANATOMY,SYSTEM_ORGAN_SITE
5,chest pain,MEDICAL_CONDITION,DX_NAME
6,pain,MEDICAL_CONDITION,DX_NAME
7,sternal,ANATOMY,SYSTEM_ORGAN_SITE
8,chest areas,ANATOMY,SYSTEM_ORGAN_SITE
9,chest,ANATOMY,SYSTEM_ORGAN_SITE


Get all the __medical_entities__ for for our text

In [7]:
' | '.join(set([ent['Text'] for ent in medical_entities['Entities']]))

'pain | pale | difficulty breathing | right | 62yo | chest | wrists | arm | chest areas | skin | dizziness | both | cool | Pulse is equal | moist to touch | chest pain | sternal'

<h1> RESOURCES </h1>
<ul>
    <li>__Comprehed [Sentiment Analysis](https://docs.aws.amazon.com/comprehend/latest/dg/how-sentiment.html) How to Page.__</li>
    <li>__Comprehed [Key Phrase Extraction](https://docs.aws.amazon.com/comprehend/latest/dg/get-started-api-key-phrases.html) How to Page.__</li>
    <li>__Comprehed [Named Entities Extraction](https://docs.aws.amazon.com/comprehend/latest/dg/API_Entity.html) How to Page.__</li>
    <li>__Comprehed [Dominant Language Detection](https://docs.aws.amazon.com/comprehend/latest/dg/how-languages.html) How to Page.__</li>
    <li>__Comprehed [Custom Classification](https://docs.aws.amazon.com/comprehend/latest/dg/how-document-classification.html) How to Page.__</li>
    <li>__Comprehed [Custom Named Entity Rrecognition](https://docs.aws.amazon.com/comprehend/latest/dg/custom-entity-recognition.html) How to Page.__</li>
    <li>__Comprehed [Medical](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/comprehendmedical.html) How to Page.__</li>
    <li>__My Tutorial - [The Power of Comprehend](https://www.sammywealth.com/the-power-of-comprehend)__.</li>
    
</ul>
    