##### Create the session

The session remembers our connection parameters to Amazon SageMaker. We'll use it to perform all of our SageMaker operations.

In [None]:
import os
from sagemaker import get_execution_role
import sagemaker as sage
import pandas as pd
import boto3
import json
smmp = boto3.client("sagemaker")

# TODO: Replace with algorithm ARN from your subscription
algorithm_arn = "arn:aws:sagemaker:us-west-2:512418328296:algorithm/neopoly-molecule-1727104855"

role = get_execution_role()
sess = sage.Session()
account = sess.boto_session.client("sts").get_caller_identity()["Account"]
region = sess.boto_session.region_name
common_prefix = "neopoly-molecule-input"

## Part 1 : Train your Algorithm

A number of files are laid out for your use, under the `/opt/ml` directory:

##### The input

* `/opt/ml/input/config` contains information to control how your program runs. `hyperparameters.json` is a JSON-formatted dictionary of hyperparameter names to values. These values will always be strings, so you may need to convert them. 
* `/opt/ml/input/data/<channel_name>/` (for File mode) contains the input data for that channel. The channels are created based on the call to CreateTrainingJob but it's generally important that channels match what the algorithm expects. The files for each channel will be copied from S3 to this directory, preserving the tree structure indicated by the S3 key structure. 

##### The output

* `/opt/ml/model/` is the directory where you write the model that your algorithm generates. Your model can be in any format that you want. It can be a single file or a whole directory tree. SageMaker will package any files in this directory into a compressed tar archive file. This file will be available at the S3 location returned in the `DescribeTrainingJob` result.
* `/opt/ml/output` is a directory where the algorithm can write a file `failure` that describes why the job failed. The contents of this file will be returned in the `FailureReason` field of the `DescribeTrainingJob` result. For jobs that succeed, there is no reason to write this file as it will be ignored.

In [None]:
# Upload data to S3; prefix is the S3 bucket path and workdir is the local path
training_input_prefix = common_prefix + "/training-input-data"
TRAINING_WORKDIR = "data/training"
training_input = sess.upload_data(
    TRAINING_WORKDIR, key_prefix=training_input_prefix
)

In [None]:
# Create an algorithm etimator from the algorithm product ARN
neopoly = sage.AlgorithmEstimator(
    algorithm_arn=algorithm_arn,
    base_job_name='neopoly-molecule-training',
    role=role,
    sagemaker_session=sess,
    instance_count=1,
    instance_type='ml.r5.8xlarge',
    output_path="s3://{}/neopoly-molecule-model".format(sess.default_bucket()),
    hyperparameters={ 
        "epochs" : "100",
        "t_0": "10",
        "batch_size": "16",
        "structure_lr": "0.01",
        "property_lr": "0.005",
        "alpha": "0.1",
        "edge_threshold": "1.0",
        "depth": 4,
        "interval": 8
    }
)

neopoly.fit({'training': training_input})

## Part 2 : Batch Transform Inference

The training job produces an S3 model artifact with the trained weights. These can be loaded to create a model package, which we can use to run batch transformations with. In contrast to real-time inferences on an endpoint (Part 3), batch transformations allow larger datasets and is only billed per call.

#### Running your container during hosting

Hosting has a very different model than training because hosting is reponding to inference requests that come in via HTTP. In this example, we use our recommended Python serving stack to provide robust and scalable serving of inference requests.

Amazon SageMaker uses two URLs in the container:
* `/ping` will receive `GET` requests from the infrastructure. Your program returns 200 if the container is up and accepting requests.
* `/invocations` is the endpoint that receives client inference `POST` requests. The format of the request and the response is up to the algorithm. If the client supplied `ContentType` and `Accept` headers, these will be passed in as well. 

In [None]:
# Upload data to S3; prefix is the S3 bucket path and workdir is the local path
batch_inference_input_prefix = common_prefix + "/batch-inference-input-data"
TRANSFORM_WORKDIR = "data/transform"
transform_input = (
    sess.upload_data(TRANSFORM_WORKDIR, key_prefix=batch_inference_input_prefix)
    + "/transform_test.csv"
)

In [None]:
# TODO: Replace with S3 model artifact from your training job
model_path = "s3://sagemaker-us-west-2-512418328296/neopoly-molecule-model/neopoly-molecule-training-2024-09-27-15-16-51-494/output/model.tar.gz"

# Instantiate a model from the model artifact
model = sage.ModelPackage(
    role=role, 
    model_data=model_path, 
    sagemaker_session=sess, 
    algorithm_arn=algorithm_arn
)
# Instantiate a transformer from the model
transformer = model.transformer(
    instance_count=1,
    instance_type='ml.r7i.8xlarge'
)
# Run the batch transformation job
transformer.transform(
    transform_input, 
    job_name="neopoly-molecule-transform",
    content_type="text/csv"
)
transformer.wait()

# Output is available in the following path
transformer.output_path