# AWS Marketplace Product Usage Demonstration - 7Park Data chat and informal text NER

**7Park Data** chat and informal text NER allows you to improve compliance and wrangle more value out of your Slack, Bloomberg, and other chat data. by identifying and extracting companies/organ.

Our chat classifier (NER) identifies companies, stock tickers and governmental and non-government institutions.

The solution has been optimized on millions of Slack messages and achieves an F1 score of 88% on our data.

# Pre-requisites

This sample notebook requires subscription to the following pre-trained machine learning model packages from AWS Marketplace:

**[Chat and informal text NER](https://aws.amazon.com/marketplace/pp/prodview-64zsbbhzwijeo)**
    
If your AWS account has not been subscribed to these listings, here is the process you can follow for each of the above mentioned listings:

1. Open the listing from AWS Marketplace
1. Read the **Highlights** section and then **product overview** section of the listing.
1. View **usage information** and then **additional resources.**
1. Note the supported instance types.
1. Next, click on **Continue to subscribe.**
1. Review **End user license agreement, support terms**, as well as **pricing information.**
1. **"Accept Offer"** button needs to be clicked if your organization agrees with EULA, pricing information as well as support terms.

**Notes:**

If **Continue to configuration** button is active, it means your account already has a subscription to this listing.
Once you click on **Continue to configuration** button and then choose region, you will see that a Product Arn will appear. This is the model package ARN that you need to specify while creating a deployable model. However, for this notebook, the algorithm ARN has been specified in **src/model_package_arns.py** file and you do not need to specify the same explicitly.

# Set up environment and view a sample image

In this section, we will import necessary libraries and define variables such as an S3 bucket, an IAM role, and sagemaker session to be used.

In [1]:
import json
from pprint import pprint
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage

from src.model_package_arns import ModelPackageArnProvider

role = get_execution_role()

sagemaker_session = sage.Session()

# Live Inference Endpoint

## Step 1: Deploy the model for performing real-time inference.

In [2]:
# Get the model_package_arn
modelpackage_arn = ModelPackageArnProvider.get_model_package_arn(sagemaker_session.boto_region_name)

# Define predictor wrapper class
def ner_detection_predict_wrapper(endpoint, session):
    return sage.RealTimePredictor(endpoint, session, content_type='application/json')

# Create a deployable model package
ner_model = ModelPackage(role=role,
                         model_package_arn=modelpackage_arn,
                         sagemaker_session=sagemaker_session,
                         predictor_cls=ner_detection_predict_wrapper)

# Deploy the model
ner_predictor = ner_model.deploy(initial_instance_count=1, 
                                 instance_type='ml.m5.xlarge',
                                 endpoint_name='chat-ner-endpoint')

-------------------!

## Step 2: Perform a prediction on Amazon Sagemaker Endpoint created.

In [3]:
sample = {'instance': 'Just checked, and the netflix data looks pretty postive'}

# Perform a prediction
ner_result = ner_predictor.predict(json.dumps(sample)).decode('utf-8')

# View the prediction
pprint(json.loads(ner_result))

{'ner': [{'end_pos': 29,
          'key': 'netflix',
          'start_pos': 22,
          'type': 'NE_TICKER_COMPANY'}]}


# Batch Transform Job

Now let's use the model built to run a batch inference job and verify it works. 

The model supports data in [jsonlines](http://jsonlines.org/) format.

In [4]:
# review input file
SAMPLE_FILE = 'data/samples.jl'

with open(SAMPLE_FILE) as f:
    print(f.read())

{"id": 0, "instance": "Just checked, and the nflx data looks pretty postive"}
{"id": 1, "instance": "we should definitely boost Apple weight"}
{"id": 2, "instance": "Uber just published its lastest revenue number"}
{"id": 3, "instance": "The report of tsla will be released today"}
{"id": 4, "instance": "atvi got trapped and people just traded it"}


## Step 1: Update the input file to S3

In [None]:
transform_input = sagemaker_session.upload_data(
    SAMPLE_FILE, 
    key_prefix='chat_ner/' + SAMPLE_FILE)
print("Transform input uploaded to " + transform_input)

## Step 2: Run a new transform job

In [None]:
import json 
import uuid

transformer = ner_model.transformer(1, 'ml.m5.xlarge', 
                                    accept="application/jsonlines",
                                    assemble_with='Line')
transformer.transform(
    transform_input, 
    content_type='application/jsonlines',
    join_source= "Input",
    split_type='Line'
)
transformer.wait()

print("Batch Transform output saved to " + transformer.output_path)

## Step 3: Inspect the Batch Transform Output in S3

In [7]:
from urllib.parse import urlparse

parsed_url = urlparse(transformer.output_path)
bucket_name = parsed_url.netloc
file_key = '{}/{}.out'.format(parsed_url.path[1:], "samples.jl")

s3_client = sagemaker_session.boto_session.client('s3')

response = s3_client.get_object(Bucket = sagemaker_session.default_bucket(), Key = file_key)
response_bytes = response['Body'].read().decode('utf-8')

print(response_bytes)

{"SageMakerOutput":{"ner":[{"end_pos":26,"key":"nflx","start_pos":22,"type":"NE_TICKER_COMPANY"}]},"id":0,"instance":"Just checked, and the nflx data looks pretty postive"}
{"SageMakerOutput":{"ner":[{"end_pos":32,"key":"Apple","start_pos":27,"type":"NE_TICKER_COMPANY"}]},"id":1,"instance":"we should definitely boost Apple weight"}
{"SageMakerOutput":{"ner":[{"end_pos":4,"key":"Uber","start_pos":0,"type":"NE_TICKER_COMPANY"}]},"id":2,"instance":"Uber just published its lastest revenue number"}
{"SageMakerOutput":{"ner":[{"end_pos":18,"key":"tsla","start_pos":14,"type":"NE_TICKER_COMPANY"}]},"id":3,"instance":"The report of tsla will be released today"}
{"SageMakerOutput":{"ner":[{"end_pos":4,"key":"atvi","start_pos":0,"type":"NE_TICKER_COMPANY"}]},"id":4,"instance":"atvi got trapped and people just traded it"}



# Cleanup

In [8]:
ner_predictor.delete_endpoint()
ner_predictor.delete_model()

Finally, if the AWS Marketplace subscription was created just for an experiment and you would like to unsubscribe, here are the steps that can be followed. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model.

**Steps to unsubscribe from the product on AWS Marketplace:**

Navigate to Machine Learning tab on Your [Software subscriptions page](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=lbr_tab_ml).
Locate the listing that you would need to cancel, and click Cancel Subscription.