## Extending Amazon SageMaker Autopilot to Custom Code

Amazon SageMaker Autopilot generates a series of artifacts during the AutoML job, allowing you to download, explore, re-use, or customize any part of the Autopilot pipeline.

In this notebook, we will learn how to re-use these artifacts generated for a given model from an Autopilot job.

The following diagram illustrates in high-level the steps Autopilot follows, and the artifacts generated and stored in Amazon S3.

<img src="images/Autopilot_diagram.png" width="1000"/>


-----

### 1. Setting-up Libraries and Variables

Let's start by ensuring we have an updated SageMaker SDK in our kernel, and importing some libraries.

In [2]:
!pip install -qU awscli boto3 sagemaker sagemaker-experiments

[0m

In [3]:
import boto3, sagemaker
import pandas as pd
import re, os

**Replace the following variables with the artifacts' outputs for the corresponding Autopilot Model Details**

You can access this information from SageMaker Studio by checking:
* Open the "SageMaker Resources" tab (left menu in Studio)
* Select "Experiments and trials" from the dropdown combo
* Select your Autopilot Job name, right click and choose "Describe AutoML Job"
* In the window open, select the model you want to use as a base, right click and choose "Open in model details"
* In the new tab open, select the "Artifacts" tab, copy-paste the URLs shown to the variables in the following cell...

<img src="images/studio.png" width="1000"/>

In [125]:
#Input artifacts...
automl_job_name = 'verisure-02' #This one is the "Autopilot Job" in the "Describe AutoML Job" pane
input_data = 'https://eu-west-1.console.aws.amazon.com/s3/object/rodzanto2021ml/verisure/all_data/output_1659100501/part-00000-f967beaa-76d8-45cc-ba3a-989f1938acbe-c000.csv'
shuffled_split_data = 'https://console.aws.amazon.com/s3/buckets/rodzanto2021ml/verisure/output_1659100501/verisure-02/preprocessed-data/tuning_data/'
transformed_data = 'https://console.aws.amazon.com/s3/buckets/rodzanto2021ml/verisure/output_1659100501/verisure-02/transformed-data/dpp4/rpb/'
feature_engineering_code = 'https://eu-west-1.console.aws.amazon.com/s3/object/rodzanto2021ml/verisure/output_1659100501/verisure-02/sagemaker-automl-candidates/verisure-02-pr-1-5f3f81f452c545f29ec1075d7ff1211c0d06d394097340/generated_module/candidate_data_processors/dpp4.py'
feature_engineering_model = 'https://eu-west-1.console.aws.amazon.com/s3/object/rodzanto2021ml/verisure/output_1659100501/verisure-02/data-processor-models/verisure-02-dpp4-1-78a0076f02bd4f9c8871d70155985ee554cc6b54e7e0/output/model.tar.gz'
algorithm_model = 'https://eu-west-1.console.aws.amazon.com/s3/object/rodzanto2021ml/verisure/output_1659100501/verisure-02/tuning/verisure-0-dpp4-xgb/verisure-02v7z745kGnyMtSvlKTHFsP-006-aeea7101/output/model.tar.gz'
explainability = 'https://console.aws.amazon.com/s3/buckets/rodzanto2021ml/verisure/output_1659100501/verisure-02/documentation/explainability/output/verisure-02v7z745kGnyMtSvlKTHFsP-006-aeea7101/'


------

### 2. Explore data

Autopilot shuffles and split the original input dataset into training and validation folders, it also splits the data into CSV chunks for better performance.

In [5]:
if not os.path.exists('./artifacts'):
    os.makedirs('./artifacts')

s3 = boto3.client('s3')

bucket = input_data[input_data.index('object/')+len('object/') : input_data.index('/', input_data.index('object/')+len('object/')+1)]

The results of the data exploration performed by SageMaker Autopilot can be checked directly from the **Data Exploration Notebook**. Let's download this notebook for review:

In [6]:
#Data Exploration notebook...
s3_exploration_notebook = feature_engineering_code[feature_engineering_code.index(bucket)+len(bucket)+1 : feature_engineering_code.index('generated_module')] + 'notebooks/SageMakerAutopilotDataExplorationNotebook.ipynb'
print('data_exploration_notebook:\ns3://{}/{}'.format(bucket, s3_exploration_notebook))
s3.download_file(bucket, s3_exploration_notebook, 'artifacts/SageMakerAutopilotDataExplorationNotebook.ipynb')

data_exploration_notebook:
s3://rodzanto2021ml/verisure/output_1659100501/verisure-02/sagemaker-automl-candidates/verisure-02-pr-1-5f3f81f452c545f29ec1075d7ff1211c0d06d394097340/notebooks/SageMakerAutopilotDataExplorationNotebook.ipynb


Also, the whole Autopilot process can be reproduced from the generated **Candidate Definition Notebook**. Let's download this notebook for further exploration as well:

In [7]:
#Candidate Definition notebook...
s3_candidates_notebook = feature_engineering_code[feature_engineering_code.index(bucket)+len(bucket)+1 : feature_engineering_code.index('generated_module')] + 'notebooks/SageMakerAutopilotCandidateDefinitionNotebook.ipynb'
print('cadidate_definition_notebook:\ns3://{}/{}'.format(bucket, s3_candidates_notebook))
s3.download_file(bucket, s3_candidates_notebook, 'artifacts/SageMakerAutopilotCandidateDefinitionNotebook.ipynb')

cadidate_definition_notebook:
s3://rodzanto2021ml/verisure/output_1659100501/verisure-02/sagemaker-automl-candidates/verisure-02-pr-1-5f3f81f452c545f29ec1075d7ff1211c0d06d394097340/notebooks/SageMakerAutopilotCandidateDefinitionNotebook.ipynb


In [8]:
#Shuffled & split data sample...
s3_shuffled_split = shuffled_split_data[shuffled_split_data.index(bucket)+len(bucket)+1 : shuffled_split_data.index('tuning_data')+12]
print('bucket: {}'.format(bucket))
print('s3_shuffled_split: s3://{}/{}'.format(bucket, s3_shuffled_split))

bucket: rodzanto2021ml
s3_shuffled_split: s3://rodzanto2021ml/verisure/output_1659100501/verisure-02/preprocessed-data/tuning_data/


In [9]:
s3.download_file(bucket, s3_shuffled_split + 'train/chunk_0.csv', 'artifacts/train_chunk_0.csv')
train_data = pd.read_csv('artifacts/train_chunk_0.csv', header=None)
train_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,124,125,126,127,128,129,130,131,132,133
0,0,11875800,4291,499.0,0,1,0,0,0,0,...,4291.0,0.0,0.0,0.0,0.0,0.0,95.0,46.0,44.0,185.0
1,0,14304458,577,199.0,0,1,0,0,0,0,...,577.0,0.0,0.0,29.0,23.0,27.0,0.0,0.0,0.0,0.0
2,0,14108665,730,79.0,0,0,1,0,0,0,...,730.0,0.0,0.0,0.0,0.0,0.0,213.0,99.0,82.0,394.0
3,0,14323153,546,0.0,0,0,1,0,0,0,...,546.0,0.0,0.0,0.0,0.0,0.0,7.0,2.0,2.0,11.0
4,0,14297649,577,49.0,0,0,0,1,0,0,...,577.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,3.0,5.0


------

### 3. Pre-processing

Autopilot performs the feature engineering required for each candidate, as shown in the generated notebooks.

As we have chosen a specific model, let's explore the processing script and output data generated after this step.

#### 3.1 Processing script

This is the processing code used for feature engineering by the candidate selected:

In [10]:
#Pre-processing script...
s3_processing_code = feature_engineering_code[feature_engineering_code.index(bucket)+len(bucket)+1 : feature_engineering_code.index('.py')+3]
print('feature_engineering_code:\ns3://{}/{}'.format(bucket, s3_processing_code))
s3.download_file(bucket, s3_processing_code, 'artifacts/processing.py')

feature_engineering_code:
s3://rodzanto2021ml/verisure/output_1659100501/verisure-02/sagemaker-automl-candidates/verisure-02-pr-1-5f3f81f452c545f29ec1075d7ff1211c0d06d394097340/generated_module/candidate_data_processors/dpp4.py


In [11]:
!pygmentize artifacts/processing.py

[34mfrom[39;49;00m [04m[36mnumpy[39;49;00m [34mimport[39;49;00m nan
[34mfrom[39;49;00m [04m[36msagemaker_sklearn_extension[39;49;00m[04m[36m.[39;49;00m[04m[36mexternals[39;49;00m [34mimport[39;49;00m Header
[34mfrom[39;49;00m [04m[36msagemaker_sklearn_extension[39;49;00m[04m[36m.[39;49;00m[04m[36mimpute[39;49;00m [34mimport[39;49;00m RobustImputer
[34mfrom[39;49;00m [04m[36msagemaker_sklearn_extension[39;49;00m[04m[36m.[39;49;00m[04m[36mpreprocessing[39;49;00m [34mimport[39;49;00m RobustLabelEncoder
[34mfrom[39;49;00m [04m[36msagemaker_sklearn_extension[39;49;00m[04m[36m.[39;49;00m[04m[36mpreprocessing[39;49;00m [34mimport[39;49;00m RobustStandardScaler
[34mfrom[39;49;00m [04m[36msagemaker_sklearn_extension[39;49;00m[04m[36m.[39;49;00m[04m[36mpreprocessing[39;49;00m [34mimport[39;49;00m ThresholdOneHotEncoder
[34mfrom[39;49;00m [04m[36msklearn[39;49;00m[04m[36m.[39;49;00m[04m[36mcompose[39;49;00m [34mim

#### 3.2 Processing Pipeline

This is the pipeline definition - remember SageMaker Autopilot relies on SciKit Learn Pipelines for performing the Feature Engineering:

In [12]:
#Pre-processing pipeline...
s3_processing_pipeline = feature_engineering_code[feature_engineering_code.index(bucket)+len(bucket)+1 : feature_engineering_code.index('.py')-4] + 'trainer.py'
print('feature_engineering_pipeline:\ns3://{}/{}'.format(bucket, s3_processing_pipeline))
s3.download_file(bucket, s3_processing_pipeline, 'artifacts/trainer.py')

feature_engineering_pipeline:
s3://rodzanto2021ml/verisure/output_1659100501/verisure-02/sagemaker-automl-candidates/verisure-02-pr-1-5f3f81f452c545f29ec1075d7ff1211c0d06d394097340/generated_module/candidate_data_processors/trainer.py


In [13]:
!pygmentize artifacts/trainer.py

[37m# This code is auto-generated.[39;49;00m

[34mimport[39;49;00m [04m[36margparse[39;49;00m
[34mimport[39;49;00m [04m[36mimportlib[39;49;00m
[34mimport[39;49;00m [04m[36mlogging[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m
[34mimport[39;49;00m [04m[36mshutil[39;49;00m

[34mfrom[39;49;00m [04m[36mjoblib[39;49;00m [34mimport[39;49;00m dump

[34mfrom[39;49;00m [04m[36msagemaker_sklearn_extension[39;49;00m[04m[36m.[39;49;00m[04m[36mexternals[39;49;00m [34mimport[39;49;00m AutoMLTransformer
[34mfrom[39;49;00m [04m[36msagemaker_sklearn_extension[39;49;00m[04m[36m.[39;49;00m[04m[36mexternals[39;49;00m[04m[36m.[39;49;00m[04m[36mread_data[39;49;00m [34mimport[39;49;00m read_csv_data



[34mdef[39;49;00m [32mtrain[39;49;00m(X, y, header, feature_transformer, label_transformer):
    [33m"""Trains the data processing model.[39;49;00m
[33m[39;49;00m
[33m    Splits training data to features and labels based on the he

#### 3.3 Processing Execution

Once the processing model is generated, there is a Transform Job executed for obtaining the training data. This is done through the serving script below:

In [14]:
#Pre-processing serving code...
s3_processing_serving = feature_engineering_code[feature_engineering_code.index(bucket)+len(bucket)+1 : feature_engineering_code.index('.py')-4] + 'sagemaker_serve.py'
print('feature_engineering_pipeline:\ns3://{}/{}'.format(bucket, s3_processing_serving))
s3.download_file(bucket, s3_processing_serving, 'artifacts/sagemaker_serve.py')

feature_engineering_pipeline:
s3://rodzanto2021ml/verisure/output_1659100501/verisure-02/sagemaker-automl-candidates/verisure-02-pr-1-5f3f81f452c545f29ec1075d7ff1211c0d06d394097340/generated_module/candidate_data_processors/sagemaker_serve.py


In [15]:
!pygmentize artifacts/sagemaker_serve.py

[37m# This code is auto-generated.[39;49;00m
[34mimport[39;49;00m [04m[36mhttp[39;49;00m[04m[36m.[39;49;00m[04m[36mclient[39;49;00m [34mas[39;49;00m [04m[36mhttp_client[39;49;00m
[34mimport[39;49;00m [04m[36mio[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mlogging[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m

[34mimport[39;49;00m [04m[36mnumpy[39;49;00m [34mas[39;49;00m [04m[36mnp[39;49;00m
[34mfrom[39;49;00m [04m[36mjoblib[39;49;00m [34mimport[39;49;00m load
[34mfrom[39;49;00m [04m[36mscipy[39;49;00m [34mimport[39;49;00m sparse

[34mfrom[39;49;00m [04m[36msagemaker_containers[39;49;00m[04m[36m.[39;49;00m[04m[36mbeta[39;49;00m[04m[36m.[39;49;00m[04m[36mframework[39;49;00m [34mimport[39;49;00m encoders
[34mfrom[39;49;00m [04m[36msagemaker_containers[39;49;00m[04m[36m.[39;49;00m[04m[36mbeta[39;49;00m[04m[36m.[39;49;00m[04m[36mframework[39;49;00m [

#### 3.4 Generating Training Data

The result of this job is the training data, delivered in file chunks for performance efficiency. Let's explore one of the files below:

In [16]:
s3_transformed = transformed_data[transformed_data.index(bucket)+len(bucket)+1 : transformed_data.index('rpb')+3]
print('feature_engineering_code:\ns3://{}/{}'.format(bucket, s3_transformed))

feature_engineering_code:
s3://rodzanto2021ml/verisure/output_1659100501/verisure-02/transformed-data/dpp4/rpb


In [17]:
s3.download_file(bucket, s3_transformed+'/train/chunk_0.csv.out', 'artifacts/transform_chunk_0.out')

In [106]:
import sagemaker.amazon.common as smac
from google.protobuf.json_format import MessageToJson

import json

def read_recordio_file (filename, recordsToPrint = 1):
    with open(filename, 'rb') as f:
        record = smac.read_records(f)
        for i, r in enumerate(record):
            if i >= recordsToPrint:
                break
            #Returning records as JSON for easing reading...
            return(MessageToJson(r))

#Writing sample record as CSV for easing reading...
data = json.loads(read_recordio_file('artifacts/transform_chunk_0.out'))
#print(data)
#print('\n')
header = 'label,'+','.join(data['features']['values']['float32Tensor']['keys'])
print(header)
row = str(data['label']['values']['float32Tensor']['values'][0])+','+','.join(map(str, data['features']['values']['float32Tensor']['values']))
print(row)

with open('artifacts/transform_chunk_0.out.csv','w') as file:
    file.write(header)
    file.write('\n')
    file.write(row)
    file.close

label,0,1,2,4,8,17,21,28,30,34,37,38,40,42,46,50,51,53,56,57,62,63,68,90,91,92,94,96,99,100,101,112,118,119,120,121,264,278,281,282,284,286,289,290,292,294,296,298,300,302,304,306,307,309,311,314,315,317,319,321,323,325,328,329,403,405,407,451,454,458,520,524,526,604,607,610,713,715,717,740,747,754,759,771,775,893,894,899,903,907,997,1005,1014,1015,1022,1024,1031,1066,1098,1118,1120,1121,1122,1124,1126,1127,1128,1130,1132,1133,1134,1135,1137,1139,1141,1143,1145,1162,1263,1348,1350,1353,1360,1368,1378,1386,1395,1402,1408,1410,1412,1414,1416,1418,1420,1422,1424,1526,1527,1531,1538,1545,1645,1694,1783
0.0,5.0615807,2.3908212,2.205823,2.006275,2.659607,2.0958915,10.305701,10.696403,3.12097,0.1308629,4.0373816,0.04754876,5.053326,1.1885871,1.1604565,1.0153844,1.9062184,0.4521069,4.7541547,5.0775275,2.3733115,0.19784242,6.4867163,0.70173097,0.0297471,0.03002761,20.231672,0.61492497,0.5566159,0.54265845,1.0630093,2.7586436,1.1653721,1.1389959,1.1026819,1.1696364,10.3769655,5.916064,2.006275,2

#### 3.5 Processing with your own script

If you want to customize this processing code for adapting it to your own transformations, or adding your own transformations to the existing pipeline, you can do so by following the instructions in this blog post:

https://aws.amazon.com/blogs/machine-learning/customizing-and-reusing-models-generated-by-amazon-sagemaker-autopilot/

In those instructions you will either replace the processing code with your own script, or add your own transformations to the SciKit Learn Pipeline definition, or both.

Remember you can also simplify the whole process if you just take the processing code above and use it directly, e.g. with a SageMaker Processing job. For this task you can check examples like this one:

https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_processing/basic_sagemaker_data_processing/basic_sagemaker_processing.ipynb

Also, consider using the Candidate Definition Notebook generated by Autopilot for reproducing any part of the process you might be interested on.

-------

### 4. Training

In this section, let's assume we have already processed our data and just want to use it in our own Training Job.

We can use any example as a reference, like e.g.:

https://github.com/aws/amazon-sagemaker-examples/blob/main/aws_sagemaker_studio/getting_started/xgboost_customer_churn_studio.ipynb

In [107]:
sess = boto3.Session()
sm = sess.client("sagemaker")
role = sagemaker.get_execution_role()

In [108]:
from time import strftime, gmtime
from smexperiments.experiment import Experiment
from smexperiments.trial import Trial
from smexperiments.trial_component import TrialComponent
from smexperiments.tracker import Tracker

docker_image_name = sagemaker.image_uris.retrieve("xgboost", boto3.Session().region_name, "1.3-1", image_scope="training")
print(docker_image_name)

INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


141502667606.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-xgboost:1.3-1


In [113]:
s3_input_train = sagemaker.TrainingInput(
    s3_data="s3://{}/{}/train/".format(bucket, s3_transformed),
    content_type="application/x-recordio-protobuf"
)
print("s3://{}/{}/train/".format(bucket, s3_transformed))

s3_input_validation = sagemaker.TrainingInput(
    s3_data="s3://{}/{}/validation/".format(bucket, s3_transformed),
    content_type="application/x-recordio-protobuf"
)
print("s3://{}/{}/validation/".format(bucket, s3_transformed))

s3://rodzanto2021ml/verisure/output_1659100501/verisure-02/transformed-data/dpp4/rpb/train/
s3://rodzanto2021ml/verisure/output_1659100501/verisure-02/transformed-data/dpp4/rpb/validation/


In [114]:
sess = sagemaker.session.Session()

create_date = strftime("%Y-%m-%d-%H-%M-%S", gmtime())
customer_experiment = Experiment.create(
    experiment_name="automl-to-custom-{}".format(create_date),
    description="Reusing Autopilot generated training",
    sagemaker_boto_client=boto3.client("sagemaker"),
)

In [115]:
hyperparams = {
    "max_depth": 8,
    "subsample": 0.7855482055675881,
    "num_round": 195,
    "eta": 0.091189676348378,
    "gamma": 1.248783399604081,
    "min_child_weight": 8.8593717950024e-05,
    "objective": "binary:logistic",
}

In [116]:
trial = Trial.create(
    trial_name="algorithm-mode-trial-{}".format(strftime("%Y-%m-%d-%H-%M-%S", gmtime())),
    experiment_name=customer_experiment.experiment_name,
    sagemaker_boto_client=boto3.client("sagemaker"),
)

xgb = sagemaker.estimator.Estimator(
    image_uri=docker_image_name,
    role=role,
    hyperparameters=hyperparams,
    instance_count=1,
    instance_type="ml.m5.12xlarge",
    output_path="s3://{}/{}/output".format(bucket, customer_experiment.experiment_name),
    base_job_name="automl-to-custom",
    sagemaker_session=sess,
)

xgb.fit(
    {"train": s3_input_train, "validation": s3_input_validation},
    experiment_config={
        "ExperimentName": customer_experiment.experiment_name,
        "TrialName": trial.trial_name,
        "TrialComponentDisplayName": "Training",
    },
)

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: automl-to-custom-2022-09-12-14-45-05-690


2022-09-12 14:45:05 Starting - Starting the training job...
2022-09-12 14:45:28 Starting - Preparing the instances for trainingProfilerReport-1662993905: InProgress
......
2022-09-12 14:46:29 Downloading - Downloading input data...
2022-09-12 14:46:56 Training - Downloading the training image...
2022-09-12 14:47:29 Training - Training image download completed. Training in progress.[34m[2022-09-12 14:47:29.974 ip-10-0-186-183.eu-west-1.compute.internal:1 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None[0m
[34m[2022-09-12:14:47:30:INFO] Imported framework sagemaker_xgboost_container.training[0m
[34m[2022-09-12:14:47:30:INFO] Failed to parse hyperparameter objective value binary:logistic to Json.[0m
[34mReturning the value itself[0m
[34m[2022-09-12:14:47:30:INFO] No GPUs detected (normal if no gpus installed)[0m
[34m[2022-09-12:14:47:30:INFO] Running XGBoost Sagemaker in algorithm mode[0m
[34m[2022-09-12:14:47:30:INFO] files path: /opt/ml/input/data/train[0m
[34m[2022-

### 5. Hosting

#### 5.1 Option 1: Re-using Autopilot's best model

In [131]:
print(f'AutoML Job Name: {automl_job_name}')
best_candidate = sm.describe_auto_ml_job(AutoMLJobName=automl_job_name)["BestCandidate"]
best_candidate_name = best_candidate["CandidateName"]
print(f"CandidateName: {best_candidate_name}")
print(f'FinalAutoMLJobObjectiveMetricName: {best_candidate["FinalAutoMLJobObjectiveMetric"]["MetricName"]}')
print(f'FinalAutoMLJobObjectiveMetricValue: {best_candidate["FinalAutoMLJobObjectiveMetric"]["Value"]}')

AutoML Job Name: verisure-02
CandidateName: verisure-02v7z745kGnyMtSvlKTHFsP-006-aeea7101
FinalAutoMLJobObjectiveMetricName: validation:f1_binary
FinalAutoMLJobObjectiveMetricValue: 0.10154999792575836


In [139]:
current_timestamp = strftime('%Y-%m-%d-%H-%M-%S', gmtime())
model_name = f'autopilot-best-model-{current_timestamp}'
best_model = sm.create_model(Containers=best_candidate["InferenceContainers"], 
                                      ModelName=model_name, 
                                      ExecutionRoleArn=role)

print(f'Model ARN corresponding to the best candidate is : {best_model["ModelArn"]}')

Model ARN corresponding to the best candidate is : arn:aws:sagemaker:eu-west-1:889960878219:model/autopilot-best-model-2022-09-12-15-30-24


5.1.1 Real-time Endpoint - with Autopilot best model

In [145]:
endpoint_name = "automl-to-custom-best-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
endpoint_config_name = "automl-to-custom-best-config-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("EndpointName = {}".format(endpoint_name))

EndpointName = automl-to-custom-best-2022-09-12-15-35-02


In [146]:
endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "VariantName": "variant1", # The name of the production variant.
            "ModelName": model_name, 
            "InstanceType": 'ml.m5.xlarge',
            "InitialInstanceCount": 1
        }
    ]
)

create_endpoint_response = sm.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name
)

In [149]:
print(create_endpoint_response)

{'EndpointArn': 'arn:aws:sagemaker:eu-west-1:889960878219:endpoint/automl-to-custom-best-2022-09-12-15-35-02', 'ResponseMetadata': {'RequestId': '8a113933-347b-4a3b-a0d4-fa66c1eed97a', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '8a113933-347b-4a3b-a0d4-fa66c1eed97a', 'content-type': 'application/x-amz-json-1.1', 'content-length': '109', 'date': 'Mon, 12 Sep 2022 15:35:02 GMT'}, 'RetryAttempts': 0}}


Wait 3-5 mins for the Endpoint to be ready...

In [158]:
sm_runtime = boto3.client("sagemaker-runtime")

body = '5.0615807,2.3908212,2.205823,2.006275,2.659607,2.0958915,10.305701,10.696403,3.12097,0.1308629,4.0373816,0.04754876,5.053326,1.1885871,1.1604565,1.0153844,1.9062184,0.4521069,4.7541547,5.0775275,2.3733115,0.19784242,6.4867163,0.70173097,0.0297471,0.03002761,20.231672,0.61492497,0.5566159,0.54265845,1.0630093,2.7586436,1.1653721,1.1389959,1.1026819,1.1696364,10.3769655,5.916064,2.006275,2.2784512,2.5009794,5.3195143,2.659607,4.490695,13.133049,3.7834327,6.9444585,4.6643033,6.171723,3.6821976,27.068035,2.0958915,5.0687222,3.3187938,2.1268005,10.305701,20.200344,18.479351,2.44193,2.532028,14.392318,11.242263,10.696403,7.782227,7.592284,7.9242706,8.055726,4.8538,4.8758473,5.1323023,5.8792024,6.3343806,6.1338434,7.3920064,7.439455,8.015664,9.487159,9.550146,11.387362,3.639573,2.1880996,3.3596904,2.0415332,4.243878,3.2398567,5.047744,5.0429554,4.1884437,4.34925,6.304137,21.338745,3.2073255,3.2073643,5.83536,5.835922,5.83536,2.0678313,2.1786182,2.3137763,6.33898,7.756326,53.23693,8.225585,7.1055393,499.31955,135.90097,11.848583,9.212663,998.6376,706.14374,39.86251,9.575442,9.357657,11.173505,17.715107,17.334522,18.336351,11.423981,11.926623,24.406563,23.053587,2.823134,2.3217738,2.2719762,2.89872,2.9089823,4.680155,2.0805886,2.934598,3.5071137,3.8282945,4.319693,10.996861'

result = sm_runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=body,
    ContentType='text/csv',
    Accept="text/csv"
)

response_body = result['Body']
response_str = response_body.read().decode('utf-8')
response_dict = eval(response_str)
print(response_dict)

0


5.1.2 Batch Transform - with Autopilot best model

In [None]:
# creates a transformer object from Autopilot's best model
transform_job_name = f'autopilot-batch-job-{current_timestamp}'
transform_input = {
    "DataSource": {"S3DataSource": {"S3DataType": "S3Prefix", "S3Uri": batch_input}},
    "ContentType": "text/csv",
    "CompressionType": "None",
    "SplitType": "Line",
}

transform_output = {
    "S3OutputPath": 's3://{}/automl-to-custom/best-batch-output'.format(bucket),
}

transform_resources = {"InstanceType": "ml.m5.xlarge", "InstanceCount": 1}

sagemaker_client.create_transform_job(
    TransformJobName=transform_job_name,
    ModelName=model_name,
    TransformInput=transform_input,
    TransformOutput=transform_output,
    TransformResources=transform_resources,
)

#### 5.2 Option 2: Hosting your own trained model (from steps above)

5.2.1 Real-time Endpoint - with the model trained above

In [126]:
endpoint_name = "automl-to-custom-own-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("EndpointName = {}".format(endpoint_name))

EndpointName = automl-to-custom-own-2022-09-12-15-21-31


In [127]:
predictor = xgb.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge",
    endpoint_name=endpoint_name,
    #data_capture_config=DataCaptureConfig(
    #    enable_capture=True,
    #    sampling_percentage=100,
    #    destination_s3_uri="s3://{}/{}".format(bucket, data_capture_prefix),
    #),
)

INFO:sagemaker:Creating model with name: automl-to-custom-2022-09-12-15-21-37-009
INFO:sagemaker:Creating endpoint-config with name automl-to-custom-own-2022-09-12-15-21-31
INFO:sagemaker:Creating endpoint with name automl-to-custom-own-2022-09-12-15-21-31


-----!

Perform some test prediction...

In [128]:
result = predictor.predict(
    '5.0615807,2.3908212,2.205823,2.006275,2.659607,2.0958915,10.305701,10.696403,3.12097,0.1308629,4.0373816,0.04754876,5.053326,1.1885871,1.1604565,1.0153844,1.9062184,0.4521069,4.7541547,5.0775275,2.3733115,0.19784242,6.4867163,0.70173097,0.0297471,0.03002761,20.231672,0.61492497,0.5566159,0.54265845,1.0630093,2.7586436,1.1653721,1.1389959,1.1026819,1.1696364,10.3769655,5.916064,2.006275,2.2784512,2.5009794,5.3195143,2.659607,4.490695,13.133049,3.7834327,6.9444585,4.6643033,6.171723,3.6821976,27.068035,2.0958915,5.0687222,3.3187938,2.1268005,10.305701,20.200344,18.479351,2.44193,2.532028,14.392318,11.242263,10.696403,7.782227,7.592284,7.9242706,8.055726,4.8538,4.8758473,5.1323023,5.8792024,6.3343806,6.1338434,7.3920064,7.439455,8.015664,9.487159,9.550146,11.387362,3.639573,2.1880996,3.3596904,2.0415332,4.243878,3.2398567,5.047744,5.0429554,4.1884437,4.34925,6.304137,21.338745,3.2073255,3.2073643,5.83536,5.835922,5.83536,2.0678313,2.1786182,2.3137763,6.33898,7.756326,53.23693,8.225585,7.1055393,499.31955,135.90097,11.848583,9.212663,998.6376,706.14374,39.86251,9.575442,9.357657,11.173505,17.715107,17.334522,18.336351,11.423981,11.926623,24.406563,23.053587,2.823134,2.3217738,2.2719762,2.89872,2.9089823,4.680155,2.0805886,2.934598,3.5071137,3.8282945,4.319693,10.996861,16.290886,7.240019,7.78474,8.901506,15.605523,4.752111,4.5808287,3.6082551,4.64917,15.999891,11.1707535,11.120221',
    initial_args={'ContentType': 'text/csv'}
)
print(f'There is a {float(result)*100} % probability of... whatever this model was predicting :-)')

There is a 18.577253818511963 % probability of... whatever this model was predicting :-)


5.2.2 Batch Transform - with the model trained above

In [None]:
# Define your s3_batch_input file location for running the Batch Transform...
s3_batch_input = ''
s3_batch_output = 's3://{}/automl-to-custom/own-batch-output'.format(bucket)

# creates a transformer object from the trained model
transformer = xgb.transformer(
                          instance_count=1,
                          instance_type='ml.m5.xlarge',
                          output_path=s3_batch_output)

# calls that object's transform method to create a transform job
transformer.transform(data=s3_batch_input, data_type='S3Prefix', content_type='text/csv', split_type='Line')
transformer.wait()