# Simple SageMaker Sample: XGBoost Hyperparameter Tuning, Training & Deployment

## Contents

1. [Introduction](#Introduction)
2. [Preparation](#Preparation)
3. [Get the Data](#Download-and-prepare-the-data)
4. [Simple Model Training](#Perform-a-simple-training-job)
5. [Hyperparameter Optimization Model Training](#Hyper-parameter-tuning)
6. [Deploy Model](#Deploy-our-model)

## Introduction
This notebook walks through building a SageMaker model using the SageMaker Python APIs.  We are using Tensorflow as the framework that drives our training jobs.  The samples here have been copied and modified from the various examples and notebooks published by AWS on the AWS GitHub site.

NEED TO INCLUDE SOME PRIMITIVE DIAGRAMS AND IMAGES THAT DEFINE WHAT'S HAPPENING

## Preparation

In [None]:
import sagemaker
import boto3
from sagemaker.tuner import (
    ContinuousParameter,
    HyperparameterTuner,
)

import numpy as np
import pandas as pd
import os

region = boto3.Session().region_name
sagemaker_client = boto3.Session().client("sagemaker")

role = sagemaker.get_execution_role()

bucket = sagemaker.Session().default_bucket()
prefix = "sagemaker/DEMO-hpo-xgboost-dm"

# Request the algorithm image & version from this region...
container = retrieve("xgboost", region, "latest")

## Download and Prepare the Data

## Perform a Simple Training Job

In this section we'll discuss the various methods of training models in SageMaker.  We are using XGBoost as the avatar for this discussion.  There are several ways to train models in SageMaker; refer to the following outline for guidance.

https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html#xgboost-modes
1. Framework Mode vs. Script Mode
2. Train XGBoost in Framework Mode
3. Train XGBoost in Script Mode
4. Train XGBoost as your own container
5. Additional Considerations for Training XGBoost
    a. [Training with Spot Instances](https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_managed_spot_training.html)
    b. Distributed Training
    c. [SageMaker Debugger](https://sagemaker-examples.readthedocs.io/en/latest/aws_sagemaker_studio/sagemaker_studio_image_build/xgboost_bring_your_own/Batch_Transform_BYO_XGB.html)
https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html#xgboost-modes

In [7]:
from sagemaker.tensorflow import TensorFlow

# https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_managed_spot_training.html
# https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-python-sdk/tensorflow_moving_from_framework_mode_to_script_mode/tensorflow_moving_from_framework_mode_to_script_mode.html
# USE THIS FOR the TRAINING ONLY SECTION
# There's a conversation around training in framework vs. script mode.
# Additionally there's a conversation around migrating from framework to script mode.
# Spot Training is nuanced as well since requires checkpointing.
# Finally let's address distributed training for framework mode.

# If possible let's stick to framework mode entirely.
# https://sagemaker.readthedocs.io/en/stable/frameworks/xgboost/using_xgboost.html#host-multiple-models-with-multi-model-endpoints

hyperparameters = {
    "max_depth": "5",
    "eta": "0.2",
    "gamma": "4",
    "min_child_weight": "6",
    "subsample": "0.7",
    "objective": "reg:squarederror",
    "num_round": "50",
    "verbosity": "2",
}

instance_type = "ml.m5.2xlarge"
output_path = "s3://{}/{}/{}/output".format(bucket, prefix, "abalone-xgb")
content_type = "libsvm"

import time
from sagemaker.inputs import TrainingInput

job_name = "DEMO-xgboost-spot-" + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
print("Training job", job_name)

use_spot_instances = True
max_run = 3600
max_wait = 7200 if use_spot_instances else None
checkpoint_s3_uri = (
    "s3://{}/{}/checkpoints/{}".format(bucket, prefix, job_name) if use_spot_instances else None
)
print("Checkpoint path:", checkpoint_s3_uri)

estimator = sagemaker.estimator.Estimator(
    container,
    role,
    hyperparameters=hyperparameters,
    instance_count=1,
    instance_type=instance_type,
    volume_size=5,  # 5 GB
    output_path=output_path,
    sagemaker_session=sagemaker.Session(),
    use_spot_instances=use_spot_instances,
    max_run=max_run,
    max_wait=max_wait,
    checkpoint_s3_uri=checkpoint_s3_uri,
)
train_input = TrainingInput(
    s3_data="s3://{}/{}/{}".format(bucket, prefix, "train"), content_type="libsvm"
)
estimator.fit({"train": train_input}, job_name=job_name)

NameError: name 'bucket' is not defined

## Hyper Parameter Tuning

Hyperparameter Tuning is a feature of Amazon SageMaker that allows customers to find the best hyperparameter values for their training jobs in the least amount of time.  This feature is available in the SageMaker UI within the AWS Console, the AWS CLI, and via the Python SageMaker API.  We will describe how to submit a HTJ (Hyperparameter Tuning Job) via API calls in this document.

The steps to submit an HTJ via the SageMaker Python API are as follows:
1. Decide which algorithm you want to tune.
2. Set initial weights on the hyperparameters for the selected algorithm.
3. Define the range of hyperparameters you want to tune against.
4. Define your hyperparameter tuning job.
5. Train your hyperparameter tuning job.
6. Evaluate the your results.

### Decide which algorithm to tune
Amazon SageMaker Hyperparameter Tuning allows you to tune built-in SageMaker algorithms as well as custom algorithms that you bring to the training environment.  To complete this process, you need to pull the model container using the `sagemaker.image_uris.retrieve` function call.  This function takes a model name, region and version number as parameters and returns a container object.

In [None]:
from sagemaker.amazon.amazon_estimator import get_image_url
from sagemaker.image_urls import retrieve

sagemaker_session = sagemaker.Session()

# Create a reference to an XGBoost model with the retrieved image
# The API call tells SageMaker what the algorithm is, what permissions it has, 
#   what kind and how many instances to train with, where output should do,
#   and the session credentials for this algorithms.
xgb = sagemaker.estimator.Estimator(
    container,
    role,
    instance_count=1,
    instance_type="ml.m4.xlarge",
    output_path="s3://{}/{}/output".format(bucket, prefix),
    sagemaker_session=sagemaker_session,
)

### Set Initial Weights on Hyperparameters

Every algorithms has different hyperparameters.  Before you run a training job for hyperparameter tuning you need to set default weights on the algorithm for these hyperparameters.  During this phase we will also set the objective metric for the HTJ.

The objective metric is a metric used by Amazon SageMaker to compare the performance of hyperparameter tuning jobs against each other.  The model with the best performing metric is the model that should be used for training.

Every algorithm will have different hyperparameters.  It is important to understand what these hyperparameters are and they work at a high level before running an HTJ.  [XGBoost Hyperparameters & Objective Metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost-tuning.html) are listed in the AWS Documentation for Amazon SageMaker.  

In [None]:
# Set the initial weights on XGBoost
xgb.set_hyperparameters(
    eval_metric="auc",
    objective="binary:logistic",
    num_round=10,
    rate_drop=0.3,
    tweedie_variance_power=1.4,
)

# Set the Objective Metric to compare job runs.
# This validation metric is Area Under the Curve.
objective_metric_name = "validation:auc"

### Define the range of Hyperparameters you want to Tune

Amazon SageMaker Hyperparameter Tuning Jobs require a specific range of values to use when testing hyperparameters.  You need to specify these ranges as `sagemaker.parameter.ParameterRange` objects in a dictionary.  This class is implemented in the `sagemaker.tuner` package with implementations for `IntegerParameter`, `CategoricalParameter` & `ContinuousParameter`.  In this example, we'll tune continuous parameters.

When specifying a parameter to tune, you pass a minimum value, a maximum value, and a scaling type.  Amazon SageMaker Hyperparameter Tuning Jobs support Linear and Logarithmic scaling for parameters.  Test both and see which perform best for you.

In [2]:
from sagemaker.tuner import ContinuousParameter

hyperparameter_ranges_logarithmic = {
    "alpha": ContinuousParameter(0.01, 10, scaling_type="Logarithmic"),
    "labmda": ContinuousParameter(0.01, 10, scaling_type="Logarithmic"),
}

hyperparameter_ranges_linear = {
    "alpha": ContinuousParameter(0.01, 10, scaling_type="Linear"),
    "labmda": ContinuousParameter(0.01, 10, scaling_type="Linear"),
}

### Define the Hyperparameter Tuning Job

A Hyperparameter Tuning Job is represented in the SageMaker API as a `HyperparameterTuner` object.  To create this object we must pass in many of the objects we've created up until this point.  We supply the algorithm we will tune, the objective metric, and the hyperparameter ranges.  Furthermore, we will set the number of jobs to run and the parallelization of these jobs.  

Finally, we must specify our tuning strategy.  Hyperparameter Tuning Jobs take a `Random` or `Bayesian` strategy, and we will showcase both in this example.

In [3]:
from sagemaker.tuner import HyperparameterTuner

# Random Search Hyperparameter Tuning Job that scales Linearly.
random_linear_tuner_log = HyperparameterTuner(
    xgb,
    objective_metric_name,
    hyperparameter_ranges_linear,
    max_jobs=5, max_parallel_jobs=5,
    strategy="Random",
)

# Random Search Hyperparameter Tuning Job that scales Logarithmically.
random_logarithmic_tuner_log = HyperparameterTuner(
    xgb,
    objective_metric_name,
    hyperparameter_ranges_logarithmic,
    max_jobs=5, max_parallel_jobs=5,
    strategy="Random",
)

# Bayesian Search Hyperparameter Tuning Job that scales Linearly.
bayesian_linear_tuner_log = HyperparameterTuner(
    xgb,
    objective_metric_name,
    hyperparameter_ranges_linear,
    max_jobs=5, max_parallel_jobs=5,
    strategy="Bayesian",
)

# Bayesian Search Hyperparameter Tuning Job that scales Logarithmically.
bayesian_logarithmic_tuner_log = HyperparameterTuner(
    xgb,
    objective_metric_name,
    hyperparameter_ranges_logarithmic,
    max_jobs=5, max_parallel_jobs=5,
    strategy="Bayesian",
)

NameError: name 'xgb' is not defined

### Submit the Hyperparameter Tuning Job

The `fit(dict, bool)` method submits your Hyperparameter Tuning Job to Amazon SageMaker.  The hardware specified in the algorithm definition is initialized and the jobs are run in accordance with the instructions defined within the `HyperparameterTuner` job.

The hyperparameter ranges we've specified will be tested and modified across all of the jobs that run; these values will change based on the scaling strategy you utilize.

In [None]:
random_linear_tuner_log.fit(
    {"train": s3_input_train, "validation": s3_input_validation},
    include_cls_metadata=False,
    job_name="random_linear_tuner"
)

random_logarithmic_tuner_log(
    {"train": s3_input_train, "validation": s3_input_validation},
    include_cls_metadata=False,
    job_name="random_logarithmic_tuner"
)

bayesian_linear_tuner_log(
    {"train": s3_input_train, "validation": s3_input_validation},
    include_cls_metadata=False,
    job_name="bayesian_linear_tuner"
)

bayesian_logarithmic_tuner_log(
    {"train": s3_input_train, "validation": s3_input_validation},
    include_cls_metadata=False,
    job_name="bayesian_logarithmic_tuner"
)

### Evaluate the results

In this section we will use the Hyperparameter Tuning Job API to retrieve performance statistics for the 4 jobs we have built in this demo.  We start by checking if the jobs have completed.  Once completed, we get dataframes plotting the results of the Hyperparameter Tuning Jobs.  We supplement these jobs with meta data about the job run and then create a pandas dataframe.  Using third party libraries we plot these values out in a visualization.

In [4]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

client = boto3.client("sagemaker")

####################################
# Check if the jobs have completed #
####################################

random_linear_status_log = client.describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=random_linear_tuner_log.latest_tuning_job.job_name
)["HyperParameterTuningJobStatus"]

random_logarithmic_status_log = client.describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=random_logarithmic_tuner_log.latest_tuning_job.job_name
)["HyperParameterTuningJobStatus"]

bayesian_linear_status_log = client.describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=bayesian_linear_tuner_log.latest_tuning_job.job_name
)["HyperParameterTuningJobStatus"]

bayesian_logarithmic_status_log = client.describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=bayesian_logarithmic_tuner_log.latest_tuning_job.job_name
)["HyperParameterTuningJobStatus"]

NameError: name 'boto3' is not defined

In [5]:

########################################################################
# Gather the HyperparameterTuningJobAnalytics for each of jobs we ran. #
########################################################################

random_linear_df_log = sagemaker.HyperparameterTuningJobAnalytics(
    random_linear_status_log.latest_tuning_job.job_name
).dataframe()
random_linear_df_log["strategy"] = "random"
random_linear_df_log["scaling"] = "linear"

random_logarithmic_df_log = sagemaker.HyperparameterTuningJobAnalytics(
    random_logarithmic_df_log.latest_tuning_job.job_name
).dataframe()
random_logarithmic_df_log["strategy"] = "random"
random_logarithmic_df_log["scaling"] = "logarithmic"

bayesian_linear_df_log = sagemaker.HyperparameterTuningJobAnalytics(
    bayesian_linear_status_log.latest_tuning_job.job_name
).dataframe()
bayesian_linear_df_log["strategy"] = "bayesian"
bayesian_linear_df_log["scaling"] = "linear"

bayesian_logarithmic_df_log = sagemaker.HyperparameterTuningJobAnalytics(
    bayesian_logarithmic_status_log.latest_tuning_job.job_name
).dataframe()
bayesian_logarithmic_df_log["strategy"] = "bayesian"
bayesian_logarithmic_df_log["scaling"] = "logarithmic"


NameError: name 'sagemaker' is not defined

In [6]:

########################################################################
# Create the Pandas dataframe that plots the performance of our models #
########################################################################

df = pd.concat([
    random_linear_df_log,
    random_logarithmic_df_log,
    bayesian_linear_df_log,
    bayesian_logarithmic_df_log
], ignore_index=True)

g = sns.FacetGrid(df, col="scaling", palette="viridis")
g = g.map(plt.scatter, "alpha", "lambda", alpha=0.6)

NameError: name 'random_linear_df_log' is not defined

This concludes our segment on Hyperparameter Tuning.  The next task is to select the model we want to work with and deploy that model.

## Deploy Our Model
In this section we will discuss two seperate deployment paradigms.  We will deploy our model using a SageMaker hosted endpoint first, followed by a local deployment.  The steps for local deployment can be reused to deploy to a machine in a private or self-hosted environment.