# Named Entity Recognition

NER is a subtask of information extraction that seeks to locate and classify named entities in text into predefined categories such as the name of a person, location, time, quantity, etc.

Chatbot NER is heuristic based that uses several NLP techniques to extract necessary entities from chat interface. In Chatbot, there are several entities that need to be identified and each entity has to be distinguished based on its type as a different entity has different detection logic.

For the first version, we will only support date, time, time_with_range, number and email detector using SageMaker. We will add support for other detectors and languages on SageMaker soon.

In [None]:
! pip install boto3 sagemaker

In [20]:
import os
import json
import boto3
import sagemaker

In [21]:
# Consider using IAM roles instead

AWS_ACCESS_KEY = os.environ.get('AWS_ACCESS_KEY')
AWS_SECRET_KEY = os.environ.get('AWS_SECRET_KEY')
AWS_REGION = os.environ.get('AWS_REGION')

In [22]:
boto_session = boto3.session.Session(aws_access_key_id=AWS_ACCESS_KEY, aws_secret_access_key=AWS_SECRET_KEY, region_name=AWS_REGION)
sagemaker_session = sagemaker.session.Session(boto_session=boto_session)
real_time_predictor = sagemaker.predictor.RealTimePredictor(endpoint='chatbot-ner', sagemaker_session=sagemaker_session, content_type='application/json', accept='application/json')

## Date Detection

Date Detector allows you to detect date. Use the timezone parameter to pass your current timezone to date detection. Check pytz.all_timezones for the list of all valid timezone values

In [23]:
date_detector_payload = {
    "message": "Set a reminder for 25th December", 
    "entity_name": "date", 
    "entity_type": "date", 
    "timezone": "UTC"
}
date_predictions = real_time_predictor.predict(json.dumps(date_detector_payload))
date_predictions = json.loads(date_predictions)
print(date_predictions)

{'data': [{'detection': 'message', 'original_text': '25th december', 'entity_value': {'end_range': False, 'from': True, 'normal': False, 'value': {'mm': 12, 'yy': 2018, 'dd': 25, 'type': 'date'}, 'to': False, 'start_range': False}}]}


## Time Detection

Time Detector allows you to detect time. Use the timezone parameter to pass your current timezone to time detection. Check pytz.all_timezones for the list of all valid timezone values.

In [9]:
time_detector_payload =  {
    "message": "John arrived at the bus stop at 13:50 hrs, expecting the bus to be there in 15 mins. But the bus was scheduled for 12:30 pm", 
    "entity_name": "time", 
    "entity_type": "time", 
    "timezone": "UTC"
}
time_predictions = real_time_predictor.predict(json.dumps(time_detector_payload))
time_predictions = json.loads(time_predictions)
print(time_predictions)

{'data': [{'detection': 'message', 'original_text': '12:30 pm', 'entity_value': {'mm': 30, 'hh': 12, 'nn': 'pm'}, 'language': 'en'}, {'detection': 'message', 'original_text': 'in 15 mins', 'entity_value': {'mm': '15', 'hh': 0, 'nn': 'df'}, 'language': 'en'}, {'detection': 'message', 'original_text': '13:50', 'entity_value': {'mm': 50, 'hh': 13, 'nn': 'hrs'}, 'language': 'en'}]}


## Time With Range Detection

Time with range Detector allows you to detect time ranges. Use the timezone parameter to pass your current timezone to time detection. Check pytz.all_timezones for the list of all valid timezone values.

In [12]:
time_with_range_detector_payload = {
    "message": "Set a drink water reminder for tomorrow from 7:00 AM to 6:00 PM", 
    "entity_name": "time_with_range", 
    "entity_type": "time_with_range", 
    "timezone": "UTC"
}
time_with_range_predictions = real_time_predictor.predict(json.dumps(time_with_range_detector_payload))
time_with_range_predictions = json.loads(time_with_range_predictions)
print(time_with_range_predictions)

{'data': [{'detection': 'message', 'original_text': '7:00 am to 6:00 pm', 'entity_value': {'mm': 0, 'hh': 7, 'range': 'start', 'nn': 'am', 'time_type': None}, 'language': 'en'}, {'detection': 'message', 'original_text': '7:00 am to 6:00 pm', 'entity_value': {'mm': 0, 'hh': 6, 'range': 'end', 'nn': 'pm', 'time_type': None}, 'language': 'en'}]}


## Number Detection

Number Detector allows you to detector numbers or numerals. You can configure min_digit and max_digit as per your requirement.

In [14]:
number_detector_payload = {
    "message": "I want to reserve a table for three people", 
    "entity_name": "number_of_people", 
    "entity_type": "number", 
    "min_digit": 1, 
    "max_digit": 2
}
number_predictions = real_time_predictor.predict(json.dumps(number_detector_payload))
number_predictions = json.loads(number_predictions)
print(number_predictions)

{'data': [{'detection': 'message', 'original_text': 'three', 'entity_value': {'value': '3'}, 'language': 'en'}]}


## Email Detection

Email Detector allows you to detect email ids.

In [17]:
email_detector_payload = {
    "message": "my email id is hello@haptik.ai", 
    "entity_name": "email", 
    "entity_type": "email"
}
email_predictions = real_time_predictor.predict(json.dumps(email_detector_payload))
email_predictions = json.loads(email_predictions)
print(email_predictions)

{'data': [{'detection': 'message', 'original_text': 'hello@haptik.ai', 'entity_value': {'value': 'hello@haptik.ai'}, 'language': 'en'}]}
