# Relationship Extraction

In this notebook, we'll train, deploy and use an relationship extraction
model using transformers from the
[transformers](https://huggingface.co/transformers/) library which uses
PyTorch.

**Note**: When running this notebook on SageMaker Studio, you should make
sure the 'SageMaker JumpStart PyTorch 1.0' image/kernel is used. When
running this notebook on SageMaker Notebook Instance, you should make
sure the 'sagemaker-soln' kernel is used.

This solution relies on a config file to run the provisioned AWS resources. Run the cell below to generate that file.

In [None]:
import boto3
import os
import json

client = boto3.client('servicecatalog')
cwd = os.getcwd().split('/')
i= cwd.index('S3Downloads')
pp_name = cwd[i + 1]
pp = client.describe_provisioned_product(Name=pp_name)
record_id = pp['ProvisionedProductDetail']['LastSuccessfulProvisioningRecordId']
record = client.describe_record(Id=record_id)

keys = [ x['OutputKey'] for x in record['RecordOutputs'] if 'OutputKey' and 'OutputValue' in x]
values = [ x['OutputValue'] for x in record['RecordOutputs'] if 'OutputKey' and 'OutputValue' in x]
stack_output = dict(zip(keys, values))

with open(f'/root/S3Downloads/{pp_name}/stack_outputs.json', 'w') as f:
    json.dump(stack_output, f)

We start by importing a variety of packages that will be used throughout
the notebook. One of the most important packages is the Amazon SageMaker
Python SDK (i.e. `import sagemaker`). We also import modules from our own
custom (and editable) package that can be found at `../package`.

In [None]:
import boto3
from pathlib import Path
import sagemaker
from sagemaker.pytorch import PyTorch
import sys

sys.path.insert(0, '../package')
from package import config, utils

Up next, we define the current folder and create a SageMaker client (from
`boto3`). We can use the SageMaker client to call SageMaker APIs
directly, as an alternative to using the Amazon SageMaker SDK. We'll use
it at the end of the notebook to delete certain resources that are
created in this notebook.

In [None]:
current_folder = utils.get_current_folder(globals())
sagemaker_client = boto3.client('sagemaker')
sagemaker_session = sagemaker.Session()

In [None]:
!aws s3 cp --recursive --quiet $config.SOURCE_S3_PATH/data ../data

In [None]:
!aws s3 cp --recursive --quiet ../data s3://$config.S3_BUCKET/$config.DATASETS_S3_PREFIX

In [None]:
hyperparameters = {
    "learning-rate": 0.0007
}

current_folder = utils.get_current_folder(globals())
estimator = PyTorch(
    framework_version='1.5.0',
    py_version='py3',
    entry_point='entry_point.py',
    source_dir=str(Path(current_folder, '../containers/relationship_extraction').resolve()),
    hyperparameters=hyperparameters,
    role=config.IAM_ROLE,
    instance_count=1,
    instance_type=config.TRAINING_INSTANCE_TYPE,
    output_path='s3://' + str(Path(config.S3_BUCKET, config.OUTPUTS_S3_PREFIX)),
    code_location='s3://' + str(Path(config.S3_BUCKET, config.OUTPUTS_S3_PREFIX)),
    base_job_name=config.SOLUTION_PREFIX,
    tags=[{'Key': config.TAG_KEY, 'Value': config.SOLUTION_PREFIX}],
    sagemaker_session=sagemaker_session,
    volume_size=30
)

In [None]:
estimator.fit({
    'train': 's3://' + str(Path(config.S3_BUCKET, config.DATASETS_S3_PREFIX, 'semeval', 'train')),
    'test': 's3://' + str(Path(config.S3_BUCKET, config.DATASETS_S3_PREFIX, 'semeval', 'test'))
})

We'll use the unique solution prefix to name the model and endpoint.

In [None]:
model_name = "{}-relationship-extraction".format(config.SOLUTION_PREFIX)

In [None]:
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

predictor = estimator.deploy(
    endpoint_name=model_name,
    instance_type=config.HOSTING_INSTANCE_TYPE,
    initial_instance_count=1,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer()
)

When calling our new endpoint from the notebook, we use a Amazon
SageMaker SDK
[`Predictor`](https://sagemaker.readthedocs.io/en/stable/predictors.html).
A `Predictor` is used to send data to an endpoint (as part of a request),
and interpret the response. Our `estimator.deploy` command returned a
`Predictor` but, by default, it will send and receive numpy arrays. Our
endpoint expects to receive (and also sends) JSON formatted objects, so
we modify the `Predictor` to use JSON instead of the PyTorch endpoint
default of numpy arrays. JSON is used here because it is a standard
endpoint format and the endpoint response can contain nested data
structures.

With our model successfully deployed and our predictor configured, we can
try out the relationship extraction model out on example inputs.

In [None]:
predictor.predict(
    data={
        'sequence': 'Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly.',
        'entity_one_start': 0,
        'entity_one_end': 6,
        'entity_two_start': 7,
        'entity_two_end': 16
    }
)

## Clean Up

When you've finished with the relationship extraction endpoint (and associated
endpoint-config), make sure that you delete it to avoid accidental
charges.

In [None]:
sagemaker_client.delete_endpoint(EndpointName=model_name)
sagemaker_client.delete_endpoint_config(EndpointConfigName=model_name)