### Amazon Comprehend

In this notebook, we will take a look at NLP service called Comprehend that can be used to find insights and relationships in text. Comprehend can detect dominant language, entities, key phrases and sentiment in provided text.

To start using Comprehend API, we need to initialize client:

In [1]:
import boto3
from pprint import pprint

session = boto3.session.Session()
comprehend_client = session.client('comprehend')

#### Detect dominant language

In our first task, we will try to detect dominant language of several texts. In the first example, we will try to analyze following English text:

In [2]:
text = """
Amazon Comprehend is a natural language processing (NLP) service
that uses machine learning to find insights and relationships in text.
Amazon Comprehend identifies the language of the text;
extracts key phrases, places, people, brands, or events;
understands how positive or negative the text is;
and automatically organizes a collection of text files by topic.
"""

response = comprehend_client.detect_dominant_language(
    Text=text
)

pprint(response)

{'Languages': [{'LanguageCode': 'en', 'Score': 0.9784398674964905}],
 'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-length': '64',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Sun, 25 Mar 2018 21:00:51 GMT',
                                      'x-amzn-requestid': '99ed77ec-306f-11e8-a4f8-cd2162c514fd'},
                      'HTTPStatusCode': 200,
                      'RequestId': '99ed77ec-306f-11e8-a4f8-cd2162c514fd',
                      'RetryAttempts': 0}}


In the response, we get a list of languages with corresponding score (level of confidence). In first example, we can see that Comprehend is almost sure that provided text is English.

Let's now try to detect language for a text written in both Polish and English.

In [8]:
text = """
Potrafi identyfikować język podanego tekstu;
extracts key phrases, places, people, brands, or events;
understands how positive or negative the text is;
and automatically organizes a collection of text files by topic.
"""

response = comprehend_client.detect_dominant_language(
    Text=text
)

pprint(response)

{'Languages': [{'LanguageCode': 'en', 'Score': 0.7801410555839539},
               {'LanguageCode': 'pl', 'Score': 0.10633250325918198}],
 'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-length': '114',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Sun, 25 Mar 2018 21:05:31 GMT',
                                      'x-amzn-requestid': '40eb6883-3070-11e8-9d52-a71b4437c023'},
                      'HTTPStatusCode': 200,
                      'RequestId': '40eb6883-3070-11e8-9d52-a71b4437c023',
                      'RetryAttempts': 0}}


In this example, Comprehend is still fairly sure that dominant language in provided text is English, but it also includes Polish in the response as one of the possibilites.