# AWS Marketplace Product Usage Demonstration - 7Park Data Drug NER

**7Park Data** drug NER allows you to identify and surface drug (trade) names in unstructured text data such as patient records, customer orders, news and reports, and more. 

Use this solution for competitive intelligence, for better customer service, more targeted prospecting and to make more data driven decisions.

Our drug classifier (NER) has been trained and optimized on 200 thousand healthcare articles and blogs from the LexisNexis database and achieves an F1 score of 88% on our data.

# Pre-requisites

This sample notebook requires subscription to the following pre-trained machine learning model packages from AWS Marketplace:

**[Drug NER](https://aws.amazon.com/marketplace/pp/prodview-47xyf7s3glruu?qid=1575415914406&sr=0-5&ref_=srh_res_product_title)**
    
If your AWS account has not been subscribed to these listings, here is the process you can follow for each of the above mentioned listings:

1. Open the listing from AWS Marketplace
1. Read the **Highlights** section and then **product overview** section of the listing.
1. View **usage information** and then **additional resources.**
1. Note the supported instance types.
1. Next, click on **Continue to subscribe.**
1. Review **End user license agreement, support terms**, as well as **pricing information.**
1. **"Accept Offer"** button needs to be clicked if your organization agrees with EULA, pricing information as well as support terms.

**Notes:**

If **Continue to configuration** button is active, it means your account already has a subscription to this listing.
Once you click on **Continue to configuration** button and then choose region, you will see that a Product Arn will appear. This is the model package ARN that you need to specify while creating a deployable model. However, for this notebook, the algorithm ARN has been specified in **src/model_package_arns.py** file and you do not need to specify the same explicitly.

# Set up environment and view a sample image

In this section, we will import necessary libraries and define variables such as an S3 bucket, an IAM role, and sagemaker session to be used.

In [1]:
import json
from pprint import pprint
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage

from src.model_package_arns import ModelPackageArnProvider

role = get_execution_role()

sagemaker_session = sage.Session()

# Live Inference Endpoint

## Step 1: Deploy the model for performing real-time inference.

In [3]:
# Get the model_package_arn
modelpackage_arn = ModelPackageArnProvider.get_model_package_arn(sagemaker_session.boto_region_name)

# Define predictor wrapper class
def ner_detection_predict_wrapper(endpoint, session):
    return sage.RealTimePredictor(endpoint, session, content_type='application/json')

# Create a deployable model package
ner_model = ModelPackage(role=role,
                         model_package_arn=modelpackage_arn,
                         sagemaker_session=sagemaker_session,
                         predictor_cls=ner_detection_predict_wrapper)

# Deploy the model
ner_predictor = ner_model.deploy(initial_instance_count=1, 
                                 instance_type='ml.t2.2xlarge',
                                 endpoint_name='drug-ner-endpoint')

---------------------------------------------------------------------------------------------------------------!

## Step 2: Perform a prediction on Amazon Sagemaker Endpoint created.

In [5]:
sample = {'instance': 
          'Subsequently, eligible patients who have not experienced disease progression '
          'at week 24 will continue in a maintenance phase where a single dose of ipilimumab will be '
          'administered once every 12 weeks until disease progression.'
         }

# Perform a prediction
ner_result = ner_predictor.predict(json.dumps(sample)).decode('utf-8')

# View the prediction
pprint(json.loads(ner_result))

{'ner': [{'end_pos': 158,
          'key': 'ipilimumab',
          'start_pos': 148,
          'type': 'NE_DRUG'}]}


# Batch Transform Job

Now let's use the model built to run a batch inference job and verify it works. 

The model supports data in [jsonlines](http://jsonlines.org/) format.

In [6]:
# review input file
SAMPLE_FILE = 'data/samples.jl'

with open(SAMPLE_FILE) as f:
    print(f.read())

{"id": 0, "instance": "Subsequently, eligible patients who have not experienced disease progression at week 24 will continue in a maintenance phase where a single dose of ipilimumab will be administered once every 12 weeks until disease progression."}
{"id": 1, "instance": "The investigational new drug （ IND ） application for IBI306 was approved by the National Medical Products Administration (NMPA) in September 2017."}
{"id": 2, "instance": "In June 2005, UCB announced significant positive results for the two pivotal phase III trials (PRECiSE 1 and 2) of CIMZIATM in the induction and maintenance of clinical response in moderate to severe active Crohn's disease."}
{"id": 3, "instance": "Carefully weigh the risks and benefits of treatment with OTEZLA for patients with a history of depression and/or suicidal thoughts/behavior, or in patients who develop such symptoms while on OTEZLA."}
{"id": 4, "instance": "In contrast, pre-clinical data indicate that PRTX-100 may have the potential to 

## Step 1: Update the input file to S3

In [7]:
transform_input = sagemaker_session.upload_data(
    SAMPLE_FILE, 
    key_prefix='drug_ner/' + SAMPLE_FILE)
print("Transform input uploaded to " + transform_input)

Transform input uploaded to s3://sagemaker-us-east-1-084888172679/drug_ner/data/samples.jl/samples.jl


## Step 2: Run a new transform job

In [8]:
import json 
import uuid

transformer = ner_model.transformer(1, 'ml.m4.xlarge', 
                                    accept="application/jsonlines",
                                    assemble_with='Line')
transformer.transform(
    transform_input, 
    content_type='application/jsonlines',
    join_source= "Input",
    split_type='Line'
)
transformer.wait()

print("Batch Transform output saved to " + transformer.output_path)

Using already existing model: ner-drugs-2019-11-22-20-00-09--099d95f0-2019-12-04-16-23-23-620


................................[31mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[31mbash: no job control in this shell[0m
[31m2019/12/04 16:39:16 [crit] 21#21: *1 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 169.254.255.130, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "169.254.255.131:8080"[0m
[31m169.254.255.130 - - [04/Dec/2019:16:39:16 +0000] "GET /ping HTTP/1.1" 502 182 "-" "Go-http-client/1.1"[0m
[31m2019/12/04 16:39:16 [crit] 21#21: *3 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 169.254.255.130, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "169.254.255.131:8080"[0m
[31m169.254.255.130 - - [04/Dec/2019:16:39:16 +0000] "GET /ping HTTP/1.1" 502 182 "-" "Go-http-client/1.1"[0m
[31m2019/12/04 16:39:

## Step 3: Inspect the Batch Transform Output in S3

In [9]:
from urllib.parse import urlparse

parsed_url = urlparse(transformer.output_path)
bucket_name = parsed_url.netloc
file_key = '{}/{}.out'.format(parsed_url.path[1:], "samples.jl")

s3_client = sagemaker_session.boto_session.client('s3')

response = s3_client.get_object(Bucket = sagemaker_session.default_bucket(), Key = file_key)
response_bytes = response['Body'].read().decode('utf-8')

print(response_bytes)

{"SageMakerOutput":{"ner":[{"end_pos":158,"key":"ipilimumab","start_pos":148,"type":"NE_DRUG"}]},"id":0,"instance":"Subsequently, eligible patients who have not experienced disease progression at week 24 will continue in a maintenance phase where a single dose of ipilimumab will be administered once every 12 weeks until disease progression."}
{"SageMakerOutput":{"ner":[{"end_pos":59,"key":"IBI306","start_pos":53,"type":"NE_DRUG"}]},"id":1,"instance":"The investigational new drug （ IND ） application for IBI306 was approved by the National Medical Products Administration (NMPA) in September 2017."}
{"SageMakerOutput":{"ner":[{"end_pos":123,"key":"CIMZIATM","start_pos":115,"type":"NE_DRUG"}]},"id":2,"instance":"In June 2005, UCB announced significant positive results for the two pivotal phase III trials (PRECiSE 1 and 2) of CIMZIATM in the induction and maintenance of clinical response in moderate to severe active Crohn's disease."}
{"SageMakerOutput":{"ner":[{"end_pos":63,"key":"OTEZLA",

# Cleanup

In [10]:
ner_predictor.delete_endpoint()
ner_predictor.delete_model()

Finally, if the AWS Marketplace subscription was created just for an experiment and you would like to unsubscribe, here are the steps that can be followed. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model.

**Steps to unsubscribe from the product on AWS Marketplace:**

Navigate to Machine Learning tab on Your [Software subscriptions page](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=lbr_tab_ml).
Locate the listing that you would need to cancel, and click Cancel Subscription.