# SageMaker Bring Your Own Algorithm Container

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Test-on-Local-Machine" data-toc-modified-id="Test-on-Local-Machine-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Test on Local Machine</a></span><ul class="toc-item"><li><span><a href="#Build-Docker-Image" data-toc-modified-id="Build-Docker-Image-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Build Docker Image</a></span></li><li><span><a href="#Local-Test" data-toc-modified-id="Local-Test-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Local Test</a></span><ul class="toc-item"><li><span><a href="#train_local.sh" data-toc-modified-id="train_local.sh-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span><code>train_local.sh</code></a></span></li><li><span><a href="#serve_local.sh" data-toc-modified-id="serve_local.sh-1.2.2"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span><code>serve_local.sh</code></a></span></li><li><span><a href="#predict.sh" data-toc-modified-id="predict.sh-1.2.3"><span class="toc-item-num">1.2.3&nbsp;&nbsp;</span><code>predict.sh</code></a></span></li></ul></li><li><span><a href="#Publish-Image-to-ECR" data-toc-modified-id="Publish-Image-to-ECR-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Publish Image to ECR</a></span><ul class="toc-item"><li><span><a href="#Manual-Steps" data-toc-modified-id="Manual-Steps-1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Manual Steps</a></span></li></ul></li></ul></li><li><span><a href="#Train-Model-in-SageMaker" data-toc-modified-id="Train-Model-in-SageMaker-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Train Model in SageMaker</a></span><ul class="toc-item"><li><span><a href="#Set-up-the-environment" data-toc-modified-id="Set-up-the-environment-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Set up the environment</a></span></li><li><span><a href="#Train-Model" data-toc-modified-id="Train-Model-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Train Model</a></span></li><li><span><a href="#Create-an-estimator-and-fit-the-model" data-toc-modified-id="Create-an-estimator-and-fit-the-model-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Create an estimator and fit the model</a></span></li></ul></li><li><span><a href="#Host-Model-in-SageMaker" data-toc-modified-id="Host-Model-in-SageMaker-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Host Model in SageMaker</a></span><ul class="toc-item"><li><span><a href="#Choose-some-data-and-use-it-for-a-prediction" data-toc-modified-id="Choose-some-data-and-use-it-for-a-prediction-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Choose some data and use it for a prediction</a></span></li><li><span><a href="#Optional-cleanup" data-toc-modified-id="Optional-cleanup-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Optional cleanup</a></span></li></ul></li><li><span><a href="#Run-Batch-Transform-Job" data-toc-modified-id="Run-Batch-Transform-Job-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Run Batch Transform Job</a></span><ul class="toc-item"><li><span><a href="#Create-a-Transform-Job" data-toc-modified-id="Create-a-Transform-Job-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Create a Transform Job</a></span></li><li><span><a href="#View-Output" data-toc-modified-id="View-Output-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>View Output</a></span></li></ul></li></ul></div>

## Test on Local Machine

### Build Docker Image

Build the image using Dockerfile in `container` folder.

```sh
cd container
docker build -t sagemaker_bring_your_own . 
```

### Local Test

To test the algorithm and docker image, use the three shell scripts in the **`test`** folder. It builds the image and runs it in a container to train and test the model. It mounts a directory structure that mimics production.

#### `train_local.sh`

- Run the script with the name of the image. 
- It maps `test_dir` folder to `/opt/ml` folder. 
- Test data is placed in `test_dir/input/data`.
- (Optional) Modify the file `test_dir/input/config/hyperparameters.json` to have the hyperparameter settings that you want to test (as strings).
- Trained model will be saved to `test_dir/models` folder.

```sh
./train_local.sh sagemaker_bring_your_own
```

#### `serve_local.sh`

- Run this with the name of the image to serve the model after model is trained.

```sh
./serve_local.sh sagemaker_bring_your_own
```

#### `predict.sh`

- Run this with the name of a payload file and (optionally) the HTTP content type you want. The content type will default to `text/csv`. For example, you can run 

```sh
./predict.sh payload.csv text/csv
```

- Alternatively, can run following command to test the prediction. Need to use full path in the curl command.

```sh
curl --data-binary @D:/tmp/sagemaker_bring_your_own/container/local_test/payload.csv -H "Content-Type: text/csv" -v http://localhost:8080/invocations
```

### Publish Image to ECR

Run the `build_and_push.sh <IMAGE_NAME>` script in the folder `container`.

```
./build_and_push.sh sagemaker_bring_your_own
```


#### Manual Steps

For debugging purpose, you can also run following commands one by one.

1. With AWS CLI 2, login into AWS ECR.

```sh
aws ecr get-login-password --region ap-southeast-1 | docker login --username AWS --password-stdin 825935993978.dkr.ecr.ap-southeast-1.amazonaws.com
```

2. Tag local image with full ECR image name.

```sh
docker tag sagemaker_bring_your_own <ACCOUNT_ID>.dkr.ecr.ap-southeast-1.amazonaws.com/sagemaker_bring_your_own:latest
```

3. Push image to ECR.

```sh
docker push <ACCOUNT_ID>.dkr.ecr.ap-southeast-1.amazonaws.com/sagemaker_bring_your_own:latest
```

## Train Model in SageMaker

After local test, use SageMaker to train models and use the model for hosting or batch transforms.

### Set up the environment

- Import libraries
- Get SageMaker execution role
- Get current AWS region

In [14]:
import boto3
import re
import os
import numpy as np
import pandas as pd
import sagemaker
from time import gmtime, strftime

role = sagemaker.get_execution_role()
account_id = boto3.client('sts').get_caller_identity().get('Account')
region = boto3.Session().region_name

- Setup S3 data paths to input training data and output model artifacts

In [13]:
s3_client = boto3.client('s3')

IMAGE_NAME = 'sagemaker_bring_your_own'

# Where the training data is located
data_bucket = 'temp-305326993135'
data_prefix = f'{IMAGE_NAME}/input/data'
data_bucket_path = f's3://{data_bucket}'

# Where to save code and model artifacts
output_bucket = sagemaker.Session().default_bucket()
# TEST
output_bucket = 'temp-305326993135'
output_prefix = f'sagemaker/{IMAGE_NAME}'
output_bucket_path = f's3://{output_bucket}'


- Copy training data from input path to designated folder in output path 

In [9]:
from botocore.errorfactory import ClientError

for data_category in ['train', 'test', 'validation']:
    data_key = f'{data_prefix}/{data_category}/{data_category}.csv'
    output_key = f'{output_prefix}/{data_category}/{data_category}.csv'
    data_filename = f'{data_category}.csv'
    try:
        s3_client.download_file(data_bucket, data_key, data_filename)
        s3_client.upload_file(data_filename, output_bucket, output_key)
        print(f'Copied file: {data_key}')
    except ClientError as ex:
        print(f'File not found: {data_key}')

Copied file: sagemaker_bring_your_own/input/data/train/train.csv
File not found: sagemaker_bring_your_own/input/data/test/test.csv
File not found: sagemaker_bring_your_own/input/data/validation/validation.csv


### Train Model

In [17]:
full_image_name = f"{account_id}.dkr.ecr.{region}.amazonaws.com/{IMAGE_NAME}:latest"

In [22]:
%%time

job_name = f"{IMAGE_NAME.replace('_','-')}-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}"
print("Training job", job_name)

Training job sagemaker-bring-your-own-2022-01-18-06-09-33
CPU times: total: 0 ns
Wall time: 0 ns


In [23]:
create_training_params = {
    "AlgorithmSpecification": {"TrainingImage": full_image_name, "TrainingInputMode": "File"},
    "RoleArn": role,
    "OutputDataConfig": {"S3OutputPath": f"{output_bucket_path}/{output_prefix}/temp"},
    "ResourceConfig": {"InstanceCount": 1, "InstanceType": "ml.c4.2xlarge", "VolumeSizeInGB": 5},
    "TrainingJobName": job_name,
    "HyperParameters": {},
    "StoppingCondition": {"MaxRuntimeInSeconds": 3600},
    "InputDataConfig": [
        {
            "ChannelName": "train",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": f"{output_bucket_path}/{output_prefix}/train",
                    "S3DataDistributionType": "FullyReplicated",
                }
            },
            "ContentType": "libsvm",
            "CompressionType": "None",
        },
    ],
}

In [24]:
sage_client = boto3.client("sagemaker", region_name=region)
sage_client.create_training_job(**create_training_params)

import time

status = client.describe_training_job(TrainingJobName=job_name)["TrainingJobStatus"]
while status != "Completed" and status != "Failed":
    print(status)
    status = client.describe_training_job(TrainingJobName=job_name)["TrainingJobStatus"]
    time.sleep(1)

ClientError: An error occurred (ValidationException) when calling the CreateTrainingJob operation: Could not assume role arn:aws:iam::305326993135:role/u-role-prog-access-cloudadmin. Please ensure that the role exists and allows principal 'sagemaker.amazonaws.com' to assume the role.

### Create an estimator and fit the model

In order to use SageMaker to fit our algorithm, we'll create an `Estimator` that defines how to use the container to train. This includes the configuration we need to invoke SageMaker training:

* The __container name__. This is constructed as in the shell commands above.
* The __role__. As defined above.
* The __instance count__ which is the number of machines to use for training.
* The __instance type__ which is the type of machine to use for training.
* The __output path__ determines where the model artifact will be written.
* The __session__ is the SageMaker session object that we defined above.

Then we use fit() on the estimator to train against the data that we uploaded above.

In [None]:


tree = sage.estimator.Estimator(
    full_image_name,
    role,
    1,
    "ml.c4.2xlarge",
    output_path=f"s3://{sess.default_bucket()}/output"
    sagemaker_session=sess,
)

tree.fit(data_location)

## Host Model in SageMaker
You can use a trained model to get real time predictions using HTTP endpoint. Follow these steps to walk you through the process.

Deploying the model to SageMaker hosting just requires a `deploy` call on the fitted model. This call takes an instance count, instance type, and optionally serializer and deserializer functions. These are used when the resulting predictor is created on the endpoint.

In [None]:
from sagemaker.predictor import csv_serializer

predictor = tree.deploy(1, "ml.m4.xlarge", serializer=csv_serializer)

### Choose some data and use it for a prediction

In order to do some predictions, we'll extract some of the data we used for training and do predictions against it. This is, of course, bad statistical practice, but a good way to see how the mechanism works.

In [None]:
shape = pd.read_csv("data/iris.csv", header=None)
shape.sample(3)

In [None]:
# drop the label column in the training set
shape.drop(shape.columns[[0]], axis=1, inplace=True)
shape.sample(3)

In [None]:
import itertools

a = [50 * i for i in range(3)]
b = [40 + i for i in range(10)]
indices = [i + j for i, j in itertools.product(a, b)]

test_data = shape.iloc[indices[:-1]]

Prediction is as easy as calling predict with the predictor we got back from deploy and the data we want to do predictions with. The serializers take care of doing the data conversions for us.

In [None]:
print(predictor.predict(test_data.values).decode("utf-8"))

### Optional cleanup
When you're done with the endpoint, you'll want to clean it up.

In [None]:
sess.delete_endpoint(predictor.endpoint)

## Run Batch Transform Job
You can use a trained model to get inference on large data sets by using [Amazon SageMaker Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html). A batch transform job takes your input data S3 location and outputs the predictions to the specified S3 output folder. Similar to hosting, you can extract inferences for training data to test batch transform.

### Create a Transform Job
We'll create an `Transformer` that defines how to use the container to get inference results on a data set. This includes the configuration we need to invoke SageMaker batch transform:

* The __instance count__ which is the number of machines to use to extract inferences
* The __instance type__ which is the type of machine to use to extract inferences
* The __output path__ determines where the inference results will be written

In [None]:
transform_output_folder = "batch-transform-output"
output_path = "s3://{}/{}".format(sess.default_bucket(), transform_output_folder)

transformer = tree.transformer(
    instance_count=1,
    instance_type="ml.m4.xlarge",
    output_path=output_path,
    assemble_with="Line",
    accept="text/csv",
)

We use tranform() on the transfomer to get inference results against the data that we uploaded. You can use these options when invoking the transformer. 

* The __data_location__ which is the location of input data
* The __content_type__ which is the content type set when making HTTP request to container to get prediction
* The __split_type__ which is the delimiter used for splitting input data 
* The __input_filter__ which indicates the first column (ID) of the input will be dropped before making HTTP request to container

In [None]:
transformer.transform(
    data_location, content_type="text/csv", split_type="Line", input_filter="$[1:]"
)
transformer.wait()

For more information on the configuration options, see [CreateTransformJob API](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTransformJob.html)

### View Output
Lets read results of above transform job from s3 files and print output

In [None]:
s3_client = sess.boto_session.client("s3")
s3_client.download_file(
    sess.default_bucket(), "{}/iris.csv.out".format(transform_output_folder), "/tmp/iris.csv.out"
)
with open("/tmp/iris.csv.out") as f:
    results = f.readlines()
print("Transform results: \n{}".format("".join(results)))