# AWS Comprehend Entity Extraction

Jupyter notebook to demostrate how to setup and use AWS Comprehend for Entity Extractions. 

* [Amazon Comprehend examples using SDK for Python (Boto3)](https://docs.aws.amazon.com/code-library/latest/ug/python_3_comprehend_code_examples.html)
* [Comprehend.Client.detect_entities(**kwargs)](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/comprehend/client/detect_entities.html#)
* [Permissions to allow all Amazon Comprehend actions
](https://docs.aws.amazon.com/comprehend/latest/dg/security_iam_id-based-policy-examples.html#custom-policy-all-all-actions)

In [3]:
!apt -qq update -y
!apt -qq install zip

61 packages can be upgraded. Run 'apt list --upgradable' to see them.
zip is already the newest version (3.0-11+b1).
0 upgraded, 0 newly installed, 0 to remove and 61 not upgraded.


In [6]:
import sys
import time

In [None]:
PATH_TO_LIB: str = "../../../../lib"

In [7]:
sys.path.append(PATH_TO_LIB) 

In [26]:
from util_aws.boto3.lambdas import (
    LambdaFunction
)

# Constant

In [4]:
AWS_SERVICE: str = "comprehend"
LAMBDA_FUNCTION_NAME: str = "tagging-poc-dev-comprehend"

# Data

In [5]:
example="""
在 SBS On Demand 可免费收看《世界上最幸福的国度》（The World's Happiest Country）。

澳大利亚人梅丽莎·乔治奥（Melissa Georgiou）十多年前移居芬兰，在地球上最寒冷、最黑暗的地方之一寻求幸福。

梅丽莎说：“在这里生活，我最喜欢的事情之一是，无论你是在住宅区还是在城市中间，都很容易接近大自然。”
她原本是一名教师，12年前，她从悉尼的海滩换到了芬兰的黑暗冬天和冰冷湖泊，此后再也没有回头。
梅丽莎说：“对芬兰人来说，幸福的概念与澳大利亚人的幸福概念非常不同。

她说，芬兰人乐于接受将自己描绘成忧郁、矜持的形象——当地流行的一句话是：“拥有幸福的人必须把它隐藏起来。”

“在这里，我注意到的第一件事是，你不会去参加晚宴或烧烤，也不会谈论房地产。没有人问你住在哪里，住在哪个郊区，你的孩子在哪里上学。” 
芬兰人似乎对现状相当满意，他们似乎并不总是想要更多。
梅丽莎·乔治奥
北欧的黑夜
在联合国最新发布的《世界幸福报告》中，芬兰连续第六年被评为世界上最幸福的国家。

幸福专家和研究员弗兰克·马特拉（Frank Martela）解释说：“北欧国家往往是有（良好）失业救济、养老金和其他福利的国家。”
但弗兰克说，芬兰在排名中的位置往往让其本国人民感到惊讶。

“芬兰人，他们几乎感到愤怒，因为他们觉得这不可能是真的。我们听的是悲伤的音乐，还有硬摇滚。”
“因此，幸福并不是芬兰人自我形象的一部分。”

芬兰人忧郁的另一面是对毅力的文化关注，弗兰克说这重新定义了芬兰人看待幸福的方式——一个被称为“sisu”的概念——这是芬兰文化的一部分，很难直译，但可理解为意志、决心、毅力和理性面对逆境。

他说，这在芬兰人最喜欢的消遣方式中得到了最好的体现——在冰点以下的气温中泡完海澡后，在桑拿房里取暖。

“这是关于这种矛盾——从一个极端到另一个极端，而这是相当有趣的体验......因为你需要毅力。”

梅丽莎说，但芬兰有很多东西是伟大的，可以为这个国家的人提供幸福。
芬兰是受新冠大流行影响最小的欧洲国家之一，专家们将此归功于对政府的高度信任和对遵循限制措施的较小阻力。

而对政府的信任则源于国家对其公民的投资。

公立学校系统很少对儿童进行测试，是世界上最好的学校之一。芬兰也有一个全民医疗保健系统，有民众负担得起的儿童保育和对父母的有力支持。

梅丽莎说：“整个国家都在照顾孩子的成长，这个制度设置得非常好。因此，从生下我儿子到在家抚养他，再到送他去日托，再到上学，这一切的每个方面都得到了很好的支持。”
芬兰VS中国 幸福感哪国最强？
自《世界幸福报告》发布以来，北欧国家一直在前十名中占主导地位。在今年的报告中，芬兰及其邻国丹麦（第2名）、冰岛（第3名）、瑞典（第6名）和挪威（第7名）在幸福指标中的得分都很高，包括健康的预期寿命、人均GDP、低腐败程度、社会支持、自由、信任与慷慨等。

其他位列前十的国家/地区包括，荷兰（第5名）、瑞士（第8名）、卢森堡（第9名）及新西兰（第10名）。

澳大利亚在这份报告中排名第12名，紧随其后的是加拿大（第13名）、爱尔兰（第14名）、美国（第15名）。

亚洲地区，新加坡排全球第25名，较去年上升两位，台湾较去年下跌一位到第27名，日本排名升至第47名，中国大陆排第64名，香港排名第82名。

 与此同时，民调机构益普索集团（Ipsos）业发布了一份有关全球幸福指数的调查报告，结果显示，在32个国家中，幸福感指数最高的国家是中国（91%），其后是沙特阿拉伯（86%）、荷兰（85%）、印度（84%）、巴西（83%）。

澳大利亚在这份报告中位列第9名。

调查报告指出，平均而言，中等收入国家（按照世界银行定义）的幸福感，比高收入国家的幸福感增长得更明显。
"""

# Directory Arrangement

In [6]:
%%bash -s "$AWS_SERVICE"
PWD=$(basename $(pwd))
if [[ ${PWD}  != $1 ]] ; then
    echo  "make sure to be in ./${1} directory"
    exit -1
fi

In [7]:
%rm -rf ./package && mkdir ./package 
%cd ./package
PYTHON_DEPENDENCY_DIRECTORY=%pwd

/root/home/repository/git/sbs/tagging-poc/notebook/aws/comprehend/package


In [8]:
PYTHON_DEPENDENCY_DIRECTORY

'/root/home/repository/git/sbs/tagging-poc/notebook/aws/comprehend/package'

## PYTHONPATH

Make sure to use the directory of the intended dpendencies packages.

In [9]:
if PYTHON_DEPENDENCY_DIRECTORY != sys.path[0]:
    sys.path.insert(0, PYTHON_DEPENDENCY_DIRECTORY)
    print(sys.path)

['/root/home/repository/git/sbs/tagging-poc/notebook/aws/comprehend/package', '/root/home/repository/git/sbs/tagging-poc/notebook/aws/comprehend', '/opt/conda/lib/python37.zip', '/opt/conda/lib/python3.7', '/opt/conda/lib/python3.7/lib-dynload', '', '/opt/conda/lib/python3.7/site-packages', '/opt/conda/lib/python3.7/site-packages/IPython/extensions', '/root/.ipython']


## Python dependencies 

Contain all the Python dependencies within the package directory and only use them to avoid Python dependency hell. The dependencies is also packaged into the lambda deployment package so that the same dependencies used at development will be used at Lambda rantime.


In [10]:
!echo $PYTHON_DEPENDENCY_DIRECTORY
!pip install --target $PYTHON_DEPENDENCY_DIRECTORY botocore boto3 --quiet

/root/home/repository/git/sbs/tagging-poc/notebook/aws/comprehend/package
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pytest-astropy 0.8.0 requires pytest-cov>=2.0, which is not installed.
pytest-astropy 0.8.0 requires pytest-filter-subpackage>=0.1, which is not installed.
conda 22.9.0 requires ruamel_yaml_conda>=0.11.14, which is not installed.
sagemaker 2.145.0 requires importlib-metadata<5.0,>=1.4.0, but you have importlib-metadata 6.3.0 which is incompatible.
sagemaker 2.145.0 requires PyYAML==5.4.1, but you have pyyaml 6.0 which is incompatible.
docker-compose 1.29.2 requires PyYAML<6,>=3.10, but you have pyyaml 6.0 which is incompatible.
awscli 1.27.111 requires botocore==1.29.111, but you have botocore 1.29.126 which is incompatible.
awscli 1.27.111 requires PyYAML<5.5,>=3.10, but you have pyyaml 6.0 which is incompatible.
awscli 1.27.111 requir

---
# Define Function


## Lambda Function Code

Define your lambda function. As an example, the code execute AWS S3 SDK code to list the S3 buckets.

* [Lambda function handler in Python](https://docs.aws.amazon.com/lambda/latest/dg/python-handler.html)
* [Boto3 Error handling](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/error-handling.html)


In [11]:
%%writefile lambda_function.py
import os
import re
import sys
import json
import logging
from typing import (
    List,
    Dict,
    Any,
)

import botocore
import boto3


# --------------------------------------------------------------------------------
# Constant
# --------------------------------------------------------------------------------
DEFAULT_LOG_LEVEL_NAME: str = "ERROR"
SUPPORTED_LANGUAGES: List[str] = [  # As of 04MAY2023
    "de", 
    "en", 
    "es", 
    "it", 
    "pt", 
    "fr", 
    "ja", 
    "ko", 
    "hi", 
    "ar", 
    "zh", 
    "zh-TW"
]


# --------------------------------------------------------------------------------
# Utility
# --------------------------------------------------------------------------------
def is_valid_log_level_name(name: str) -> bool:
    """Check if the log level name is valid
    Args:
        name: log level name
    Returns: bool
    """
    return hasattr(logging, name)


def get_log_level(name: str) -> int:
    """Get log level integer from log level name
    Args:
        name: log level name
    Returns: logging level integer
    """
    assert is_valid_log_level_name(name), f"Invalid log level name {name}."
    return getattr(logging, name)


def get_log_level_from_environment_variable(
        log_level_variable_name: str = "LOG_LEVEL_NAME"
) -> int:
    """Get log level for the log level name specified in the environment variable
    Returns: log level int or DEFAULT_LOG_LEVEL
    """
    log_level_name: str = DEFAULT_LOG_LEVEL_NAME
    if log_level_variable_name in os.environ:
        if is_valid_log_level_name(log_level_variable_name):
            log_level_name = os.environ[log_level_variable_name].upper()

    return get_log_level(name=log_level_name)


# --------------------------------------------------------------------------------
# AWS service class
# --------------------------------------------------------------------------------
class ComprehendDetect:
    """Encapsulates Comprehend detection functions."""
    def __init__(self, comprehend_client):
        """
        :param comprehend_client: A Boto3 Comprehend client.
        """
        self.comprehend_client = comprehend_client


    def detect_dominant_language(self, text):
        """
        Detect dominant languages in the text.
        Returns: Sorted list of {
            'LanguageCode': str,
            'Score': float
        }
        Raises: RuntimeError when API call fails
        """
        try:
            response = self.comprehend_client.detect_dominant_language(
                Text=text
            )
            detections: List[Dict[str, Any]] = sorted(
                response['Languages'], 
                key=lambda detection: detection['Score']
            )
            logger.debug("detected languages %s", detections)
            
        except botocore.exceptions.ClientError as error:
            msg: str = f"Comprehend.detect_dominant_language() failed due to {error}\n" \
                       f"text=[{text}]."
            logger.error(msg)
            raise RuntimeError(msg) from error
        else:
            return detections
            

    def detect_entities(self, text, language_code):
        """
        Detects entities in a document. Entities can be things like people and places
        or other common terms.

        :param text: The document to inspect.
        :param language_code: The language of the document.
        :return: The list of entities along with their confidence scores.
        """
        try:
            response = self.comprehend_client.detect_entities(
                Text=text, LanguageCode=language_code
            )
            entities = response['Entities']
            logger.info("Detected %s entities.", len(entities))
            
        except botocore.exceptions.ClientError as error:
            msg: str = f"Comprehend.detect_entities() failed due to {error}\n" \
                       f"text=[{text}]\nlanguage_code=[{language_code}]."
            logger.error(msg)
            raise RuntimeError(msg) from error
        else:
            return entities


        
# --------------------------------------------------------------------------------
# Lambda utility
# --------------------------------------------------------------------------------
def get_palyload_from_event(event: dict):
    """Get escapted string payload from the event
    Returns: (text, language_code, entity_type)
    Raises: RuntimeError if expected elements are not in the event.
    """
    # --------------------------------------------------------------------------------
    # Validate 'text' element in event['body']
    # --------------------------------------------------------------------------------
    if not (isinstance(event, dict) and 'body' in event ):
        msg: str = f"'body' element does not exist in the event."
        logger.error("%s event:\n%s", msg, event)
        raise RuntimeError("invalid request payload")

    if not isinstance(event['body'], str):
        msg: str = f"expected event['body'] as escaped string, got {type(event['body'])}."
        logger.error("%s event:\n%s", msg, event)
        raise RuntimeError(msg)

    try:
        body = json.loads(event['body'])
    except (JSONDecodeError, TypeError) as error:
        msg: str = f"JSON is required as the request payload, got [{body}]"
        logger.error(msg)
        raise RuntimeError(msg) from error
        
    if not isinstance(body, dict):
        msg: str = f"Dictionary format is required as the request payload, got {type(body)}"
        logger.error(msg)
        raise RuntimeError(msg)

    if 'text' not in body:
        msg: str = f"require 'text' in the event."
        logger.error("%s event:\n%s", msg, event)
        raise RuntimeError(msg)

    # if 'language_code' not in body:
    #     msg: str = f"require 'language_code' in the event."
    #     logger.error("%s event:\n%s", msg, event)
    #     raise RuntimeError(msg)

    # --------------------------------------------------------------------------------
    # Extract payload elements from JSON/Dictionary
    # --------------------------------------------------------------------------------
    text: str = re.sub(r'[\s\'\"\n\t\\]+', ' ', body['text'], flags=re.MULTILINE).strip()
    language_code: str = body['language_code'] if 'language_code' in body else None
    entity_type: str = body['entity_type'] if 'entity_type' in body else None

    # --------------------------------------------------------------------------------
    # Validate text
    # --------------------------------------------------------------------------------
    if not isinstance(text, str) or len(text) == 0:
        msg: str = f"expected valid 'text' element in the payload."
        logger.error("%s got type:[%s] of value:[%s]", msg, type(text), text)
        raise RuntimeError(msg)
    
    return text, language_code, entity_type
    

def extract_entity_by_type(response: Dict, entity_type: str):
    """Extract entities of the entity type
    If entity type is not specified, return response as is.
    If entity type is specified, extract the matching entities and
    return values as a list
    """
    if entity_type is None or len(entity_type) == 0:
        return response
    
    entities = dict()
    for entity in response:
        if entity['Type'].lower() == entity_type.lower():
            # There will be multiple entities with the same text because
            # the text can applear multiple locations in the document.
            # When find the same text, take the higher score.
            text: str = entity['Text'].lower()
            if text not in entities or entities[text] < entity['Score']:
                entities[text] = entity['Score']
                
    return sorted(entities, key=entities.get)


# --------------------------------------------------------------------------------
# Global instances to avoid re-instantiations
# --------------------------------------------------------------------------------
logger: logging.Logger = logging.getLogger(__name__)
logger.setLevel(level=get_log_level_from_environment_variable())

comprehend: ComprehendDetect = ComprehendDetect(
    comprehend_client=boto3.client('comprehend')
)


# --------------------------------------------------------------------------------
# Lambda handler
# --------------------------------------------------------------------------------
def lambda_handler(event, context):
    """Lambda function to invoke comprehend"""
    # --------------------------------------------------------------------------------
    # Extract payload elements from JSON/Dictionary
    # --------------------------------------------------------------------------------
    try:
        text, language_code, entity_type = get_palyload_from_event(event=event)
        
    except RuntimeError as error:
        return {
            "statusCode": 400,
            "headers": {
                "Content-Type": "application/json"
            },
            "body": json.dumps(
                {
                    "error": str(error),
                    "event": event
                },
                default=str
            )
        }
            
    # --------------------------------------------------------------------------------
    # Entity detection
    # --------------------------------------------------------------------------------
    try:
        # --------------------------------------------------------------------------------
        # Language detection if not specified
        # --------------------------------------------------------------------------------
        if language_code is None:
            languages: List[Dict[str, Any]] = comprehend.detect_dominant_language(text=text)
            # Use the higest score language
            language_code = languages[0]['LanguageCode']
        
        # --------------------------------------------------------------------------------
        # Entity detection
        # --------------------------------------------------------------------------------
        if language_code not in SUPPORTED_LANGUAGES:
            msg: str = f"detected language_code [{language_code}] is not supported." \
                       f"set language_code to one of {SUPPORTED_LANGUAGES}."
            raise RuntimeError(msg)

        response = comprehend.detect_entities(text=text, language_code=language_code)
        results = extract_entity_by_type(response=response, entity_type=entity_type)
        if len(results) == 0:
            msg: str = f"no entity detected in the text"
            msg = msg + (f" for the entity_type [{entity_type}]." if entity_type else ".")
            
            logger.error("%s response:\n%s", msg, response)
            return {
                "statusCode": 400,
                "headers": {
                    "Content-Type": "application/json"
                },
                "body": json.dumps(
                    {
                        "error": msg,
                        "response": response,
                        "event": event
                    },
                    default=str
                )
            }

        return {
            "statusCode": 200,
            "headers": {
                "Content-Type": "application/json"
            },
            "body": json.dumps(results, sort_keys=True, default=str)
        }

    except (botocore.exceptions.ClientError, RuntimeError) as error:
        return {
            "statusCode": 500,
            "headers": {
                "Content-Type": "application/json"
            },
            "body": json.dumps({"error": error}, sort_keys=True, default=str)
        }

        
if __name__ == "__main__":
    with open("example.txt", "r", encoding='utf-8') as example_text:
        example: str = example_text.read()

    body: dict = {
        "text": example,
        # "language_code": "zh",
        "entity_type": "location"
    }
        
    body_as_escaped_string: str = json.dumps(
        body, 
        default=str, 
        ensure_ascii=True    # ASCII is 100% network safe.
    )
    event = {
        "body": body_as_escaped_string
    }
    
    # --------------------------------------------------------------------------------
    # Test the lambda handler
    # --------------------------------------------------------------------------------
    response: dict = lambda_handler(
        event=event,
        context=None
    )
    
    # --------------------------------------------------------------------------------
    # Restore the JSON/Dictionary from the body as escaped string.
    # --------------------------------------------------------------------------------
    response_body_as_dictionary = json.loads(response['body'])
    print(json.dumps(response_body_as_dictionary, indent=4, default=str, ensure_ascii=False))

Writing lambda_function.py


###  Test the Code

In [12]:
%store example >example.txt

Writing 'example' (str) to file 'example.txt'.


In [13]:
!python ./lambda_function.py

[
    "日本",
    "爱尔兰",
    "加拿大",
    "美国",
    "荷兰",
    "欧洲",
    "瑞士",
    "香港",
    "新西兰",
    "新加坡",
    "中国大陆",
    "亚洲地区",
    "卢森堡",
    "全球",
    "日托",
    "地球上",
    "芬兰",
    "悉尼"
]


---
# Package

* [Deployment package with dependencies](https://docs.aws.amazon.com/lambda/latest/dg/python-package.html#python-package-create-package-with-dependency)


## Dependency Libraries

Package the exact Python library versions you have used for development. Do not rely on the libraries installed in the AWS Lambda runtime to avoid Python package dependency hell.

## Zip File

Package the source code file and libraries into a zip file.


* [How do I troubleshoot "permission denied" or "unable to import module" errors when uploading a Lambda deployment package?](https://repost.aws/knowledge-center/lambda-deployment-package-errors)

> The correct permissions for all executable files within a Lambda deployment package is **644** in Unix permissions numeric notation. For folders within a deployment package, the correct permissions setting is **755**.
> ```
> $ chmod 644 $(find /tmp/package_contents -type f)
> $ chmod 755 $(find /tmp/package_contents -type d)
> $ zip -r new-lambda-package.zip *
> ```

In [14]:
!chmod -R u=rwX,go=rX .
!rm -rf ../lambda-deployment-package.zip
!zip -q -r ../lambda-deployment-package.zip .

In [15]:
%cd ..
%pwd

/root/home/repository/git/sbs/tagging-poc/notebook/aws/comprehend


'/root/home/repository/git/sbs/tagging-poc/notebook/aws/comprehend'

In [16]:
%%bash -s "$AWS_SERVICE"
PWD=$(basename $(pwd))
if [[ ${PWD}  != $1 ]] ; then
    echo  "make sure to be in ./${1} directory"
    exit -1
fi

---
# Deploy Function

Deploy the package using AWS CLI.

* [AWS CLI lambda - create-function](https://docs.aws.amazon.com/cli/latest/reference/lambda/create-function.html)
* [Using Lambda with the AWS CLI](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-awscli.html)

> ### Create the execution role
> Create the execution role that gives your function permission to access AWS resources. 
> ```
> aws iam create-role --role-name lambda-ex --assume-role-policy-document file://trust-policy.json
> aws iam attach-role-policy --role-name lambda-ex --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
> ```
> **trust-policy-json**:
> ```
> {
>   "Version": "2012-10-17",
>   "Statement": [
>     {
>       "Effect": "Allow",
>       "Principal": {
>         "Service": "lambda.amazonaws.com"
>       },
>       "Action": "sts:AssumeRole"
>     }
>   ]
> }'
> ```



We need to ask the AWS team to run the code to deploy the function with AWS CLI create-function because we do not have the permission.

Once the Lambda function was created, run the update as below.

In [17]:
%%bash -s "$LAMBDA_FUNCTION_NAME"
aws lambda update-function-code \
    --function-name $1 \
    --zip-file fileb://lambda-deployment-package.zip

{
    "FunctionName": "tagging-poc-dev-comprehend",
    "FunctionArn": "arn:aws:lambda:ap-southeast-2:755863699032:function:tagging-poc-dev-comprehend",
    "Runtime": "python3.9",
    "Role": "arn:aws:iam::755863699032:role/service-role/tagging-poc-lambda-for-sagemaker-role-0t1rnggf",
    "Handler": "lambda_function.lambda_handler",
    "CodeSize": 12439467,
    "Description": "",
    "Timeout": 60,
    "MemorySize": 128,
    "LastModified": "2023-05-04T07:29:20.000+0000",
    "CodeSha256": "vAmZp2kcvhF5nGWNYvZ0OD+GaFBUUGlif/7Xw3bdKjU=",
    "Version": "$LATEST",
    "TracingConfig": {
        "Mode": "PassThrough"
    },
    "RevisionId": "31c77b51-8c53-486c-aa36-7dd9c9d5bb7b",
    "State": "Active",
    "LastUpdateStatus": "InProgress",
    "LastUpdateStatusReason": "The function is being created.",
    "LastUpdateStatusReasonCode": "Creating",
    "PackageType": "Zip",
    "Architectures": [
        "x86_64"
    ],
    "EphemeralStorage": {
        "Size": 512
    },
    "SnapStart

Wait for a while for the lambda function update to be done. otherwise you can get the error:

```
{
    "StatusCode": 200,
    "FunctionError": "Unhandled",
    "ExecutedVersion": "$LATEST"
}
```
when invoking the function.

In [18]:
time.sleep(5)

# Invoke Function

You can invoke the function via UI, CLI, SDK, or from AWS service. Under the hood, a Lambda function is invoked via the AWS HTTPS API call.

* [AWS API Lambda - Invoke](https://docs.aws.amazon.com/lambda/latest/dg/API_Invoke.html)

```
POST /2015-03-31/functions/FunctionName/invocations?Qualifier=Qualifier HTTP/1.1
X-Amz-Invocation-Type: InvocationType
X-Amz-Log-Type: LogType
X-Amz-Client-Context: ClientContext

Payload
```

### Request Body

```payload``` HTTP request body is a JSON document passed to the function as ```event``` argument.

### Response

```
HTTP/1.1 StatusCode
X-Amz-Function-Error: FunctionError
X-Amz-Log-Result: LogResult
X-Amz-Executed-Version: ExecutedVersion

Payload
```

Lambda function can set the status, headers, and response body in the HTTP response.

```
import json
        
def lambda_handler(event, context):
    return {
        "statusCode": 200,
        "headers": {
            "Content-Type": "application/json"
        },
        "body": json.dumps({
            "key": "value"
        })
    }
```


## Invoke via CLI

Use AWS CLI to invoke the deployed function.

* [AWS CLI lambda - invoke](https://docs.aws.amazon.com/cli/latest/reference/lambda/invoke.html)

In [22]:
with open("./package/example.txt", "r", encoding='utf-8') as example_text:
    text: str = example_text.read()

body: dict = {
    "text": text,
    # "language_code": "ru",
    "entity_type": "location"
}

body_as_escaped_string: str = json.dumps(
    body, 
    default=str, 
    ensure_ascii=True    # ASCII is 100% network safe.
)
payload = {
    "body": body_as_escaped_string
}

with open("payload.json", "w", encoding='utf-8') as payload_json:
    payload_json.write(json.dumps(payload, indent=4, default=str))

In [23]:
%%bash -s "$LAMBDA_FUNCTION_NAME"
aws lambda invoke \
    --function-name $1 \
    --payload file://payload.json \
    response.json

{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}


In [25]:
with open(file="response.json", mode="r", encoding="utf-8") as f:
    content: dict = json.loads(f.read())
    body = json.loads(content['body'])
    
    results = set()
    print(body)
    
    if isinstance(body, dict):
        for entity in body:
            if entity['Type'] == "LOCATION":
                results.add(entity['Text'])

    print(results)

['日本', '爱尔兰', '加拿大', '美国', '荷兰', '欧洲', '瑞士', '香港', '新西兰', '新加坡', '中国大陆', '亚洲地区', '卢森堡', '全球', '日托', '地球上', '芬兰', '悉尼']
set()


# Clearnup

In [26]:
!rm -rf lambda-deployment-package.zip payload.json response.json 