## Easily extract text and data from virtually any document - Amazon Textract

Easily extract text and data from virtually any document

**Image:**


![simple-document-image.jpg](attachment:simple-document-image.jpg)

In [2]:
import boto3

# Document
documentName = "simple-document-image.jpg"

# Read document content
with open(documentName, 'rb') as document:
    imageBytes = bytearray(document.read())

# Amazon Textract client
textract = boto3.client('textract')

# Call Amazon Textract
response = textract.detect_document_text(Document={'Bytes': imageBytes})

#print(response)

# Print detected text
for item in response["Blocks"]:
    if item["BlockType"] == "LINE":
        print ('\033[94m' +  item["Text"] + '\033[0m')

[94mAmazon.com, Inc. is located in Seattle, WA[0m
[94mIt was founded July 5th, 1994 by Jeff Bezos[0m
[94mAmazon.com allows customers to buy everything from books to blenders[0m
[94mSeattle is north of Portland and south of Vancouver, BC.[0m



## Discover insights and relationships in text - Amazon Comprehend



In [14]:
import boto3

text = """Amazon.com, Inc. is located in Seattle, WA
It was founded July 5th, 1994 by Jeff Bezos
Amazon.com allows customers to buy everything from books to blenders
Seattle is north of Portland and south of Vancouver, BC."""

print("\nText\n========\n{}".format(text))

# Amazon Comprehend client
comprehend = boto3.client('comprehend')

# Detect sentiment
sentiment =  comprehend.detect_sentiment(LanguageCode="en", Text=text)
print ("\nSentiment\n========\n{}".format(sentiment.get('Sentiment')))

# Detect entities
entities =  comprehend.detect_entities(LanguageCode="en", Text=text)
print("\nEntities\n========")
for entity in entities["Entities"]:
    print ("{}\t=>\t{}".format(entity["Type"], entity["Text"]))


Text
Amazon.com, Inc. is located in Seattle, WA
It was founded July 5th, 1994 by Jeff Bezos
Amazon.com allows customers to buy everything from books to blenders
Seattle is north of Portland and south of Vancouver, BC.

Sentiment
NEUTRAL

Entities
ORGANIZATION	=>	Amazon.com, Inc.
LOCATION	=>	Seattle, WA
DATE	=>	July 5th, 1994
PERSON	=>	Jeff Bezos
ORGANIZATION	=>	Amazon.com
LOCATION	=>	Seattle
LOCATION	=>	Portland
LOCATION	=>	Vancouver, BC


## Discover insights and relationships in Domain text - Amazon Comprehend Medical

In [15]:
import boto3

# Print text
text = """Patient visit notes
Pt is 40yo mother, high school teacher
HPI : Sleeping trouble on present dosage of Clonidine.
Severe Rash on face and leg, slightly itchy
Meds : Vyvanse 50 mgs po at breakfast daily,
Clonidine 0.2 mgs -- 1 and 1 / 2 tabs po qhs
HEENT : Boggy inferior turbinates, No oropharyngeal lesion
Lungs : clear
Heart : Regular rhythm
Skin : Mild erythematous eruption to hairline
Follow-up as scheduled"""

print("\nText\n========\n{}".format(text))

# Amazon Comprehend client
comprehend = boto3.client('comprehendmedical')

# Detect medical entities
entities =  comprehend.detect_entities(Text=text)
print("\nMedical Entities\n========")
for entity in entities["Entities"]:
    print("- {}".format(entity["Text"]))
    print ("   Type: {}".format(entity["Type"]))
    print ("   Category: {}".format(entity["Category"]))
    if(entity["Traits"]):
        print("   Traits:")
        for trait in entity["Traits"]:
            print ("    - {}".format(trait["Name"]))
    print("\n")


Text
Patient visit notes
Pt is 40yo mother, high school teacher
HPI : Sleeping trouble on present dosage of Clonidine.
Severe Rash on face and leg, slightly itchy
Meds : Vyvanse 50 mgs po at breakfast daily,
Clonidine 0.2 mgs -- 1 and 1 / 2 tabs po qhs
HEENT : Boggy inferior turbinates, No oropharyngeal lesion
Lungs : clear
Heart : Regular rhythm
Skin : Mild erythematous eruption to hairline
Follow-up as scheduled

Medical Entities
- 40yo
   Type: AGE
   Category: PROTECTED_HEALTH_INFORMATION


- teacher
   Type: PROFESSION
   Category: PROTECTED_HEALTH_INFORMATION


- Sleeping trouble
   Type: DX_NAME
   Category: MEDICAL_CONDITION
   Traits:
    - SYMPTOM


- Clonidine
   Type: GENERIC_NAME
   Category: MEDICATION


- Rash
   Type: DX_NAME
   Category: MEDICAL_CONDITION
   Traits:
    - SYMPTOM


- face
   Type: SYSTEM_ORGAN_SITE
   Category: ANATOMY


- leg
   Type: SYSTEM_ORGAN_SITE
   Category: ANATOMY


- itchy
   Type: DX_NAME
   Category: MEDICAL_CONDITION
   Traits:
    - SYM

## Translating the text - Amazon Translate

In [28]:
import boto3

# Document
s3BucketName = "ki-textract-demo-docs"
documentName = "simple-document-image.jpg"

# Amazon Textract client
textract = boto3.client('textract')

# Call Amazon Textract
response = textract.detect_document_text(
    Document={
        'S3Object': {
            'Bucket': s3BucketName,
            'Name': documentName
        }
    })

#print(response)

text = """Amazon.com, Inc. is located in Seattle, WA
It was founded July 5th, 1994 by Jeff Bezos
Amazon.com allows customers to buy everything from books to blenders
Seattle is north of Portland and south of Vancouver, BC."""
textLines = text.splitlines()

# Amazon Translate client
translate = boto3.client('translate')
print ('')
for item in textLines:
    print ('\033[94m' +  item + '\033[0m')
    result = translate.translate_text(Text=item, SourceLanguageCode="en", TargetLanguageCode="es")
    print ('\033[92m' + result.get('TranslatedText') + '\033[0m')
    print ('')


[94mAmazon.com, Inc. is located in Seattle, WA[0m
[92mAmazon.com, Inc. se encuentra en Seattle, WA[0m

[94mIt was founded July 5th, 1994 by Jeff Bezos[0m
[92mFue fundada el 5 de julio de 1994 por Jeff Bezos[0m

[94mAmazon.com allows customers to buy everything from books to blenders[0m
[92mAmazon.com permite a los clientes comprar de todo, desde libros hasta licuadoras[0m

[94mSeattle is north of Portland and south of Vancouver, BC.[0m
[92mSeattle está al norte de Portland y al sur de Vancouver, BC.[0m

