# Amazon SageMaker Autopilot Candidate Definition Notebook

This notebook was automatically generated by the AutoML job **Canvas1711471556444**.
This notebook allows you to customize the [AutoGluon](https://auto.gluon.ai/stable/index.html) trials and execute the SageMaker Autopilot workflow.

The dataset has **53** columns and the column named **total01_10_2022** is used as
the target column. This is being treated as a **Regression** problem. 
This notebook will build a **[Regression](https://en.wikipedia.org/wiki/Regression_analysis)** model that
**minimizes** the "**RMSE**" quality metric of the trained models.
The "**RMSE**" metric stands for root mean squared error. It minimizes the distance between the model's prediction and the true answer..

As part of the AutoML job, the input dataset has been randomly split into two pieces, one for **training** and one for
**validation**. Given an input dataset, Amazon SageMaker Autopilot runs a number of trials with different base models and
metaparameter settings. This notebook helps you inspect and modify the metaparameters proposed by Amazon SageMaker Autopilot.
You can interactively select one of the configurations proposed by Amazon SageMaker Autopilot, modify it and execute a processing job
to train models as per the selected configuration.


---

## Contents

1. [Sagemaker Setup](#Sagemaker-Setup)
    1. [Downloading Generated Candidates](#Downloading-Generated-Modules)
    1. [SageMaker Autopilot Job and Amazon Simple Storage Service (Amazon S3) Configuration](#SageMaker-Autopilot-Job-and-Amazon-Simple-Storage-Service-(Amazon-S3)-Configuration)
1. [Candidate Trials](#Candidate-Trials)
    1. [Select Candidate to Train](#Select-Candidate-to-Train)
    1. [Update Selected Candidate](#Update-Selected-Candidate)
    1. [Display Selected Candidate](#Selected-Candidate-Metaparameters)
1. [Executing the Candidate Trial](#Executing-the-Candidate-Trial)
    1. [Run Processing Job](#Run-Processing-Job)
1. [Model Deployment](#Model-Deployment)
    1. [Deploying the Trained Model](#Deploy-the-Trained-Model)

---

## Sagemaker Setup

Before you launch the SageMaker Autopilot jobs, we'll setup the environment for Amazon SageMaker
- Check environment & dependencies.
- Create a few helper objects/function to organize input/output data and SageMaker sessions.

**Minimal Environment Requirements**

- Jupyter: Tested on `JupyterLab 1.0.6`, `jupyter_core 4.5.0` and `IPython 6.4.0`
- Kernel: `conda_python3`
- Dependencies required
  - `sagemaker-python-sdk>=2.40.0`
    - Use `!pip install sagemaker==2.40.0` to download this dependency.
    - Kernel may need to be restarted after download.
- Expected Execution Role/permission
  - S3 access to the bucket that stores the notebook.

### Downloading Generated Modules
Download the generated trial configurations and a SageMaker Autopilot helper module used by this notebook.
Those artifacts will be downloaded to **Canvas1711471556444-artifacts** folder.

In [2]:
!mkdir -p Canvas1711471556444-artifacts
!aws s3 sync s3://sagemaker-us-east-1-590183834230/Canvas/default-1709579454037/Training/output/Canvas1711471556444/sagemaker-automl-candidates/notebooks/sagemaker_automl_ensemble Canvas1711471556444-artifacts/sagemaker_automl_ensemble --only-show-errors
!aws s3 sync s3://sagemaker-us-east-1-590183834230/Canvas/default-1709579454037/Training/output/Canvas1711471556444/sagemaker-automl-candidates/notebooks/trial_configs Canvas1711471556444-artifacts/trial_configs --only-show-errors

import sys
sys.path.append("Canvas1711471556444-artifacts")

### SageMaker Autopilot Job and Amazon Simple Storage Service (Amazon S3) Configuration

The following configuration has been derived from the SageMaker Autopilot job. These items configure where this notebook will
look for generated candidates, and where input and output data is stored on Amazon S3.

In [3]:
from sagemaker_automl_ensemble import AutoMLLocalEnsembleRunConfig, uid

# Where the existing AutoML job is stored
BASE_AUTOML_JOB_NAME = 'Canvas1711471556444'
BASE_AUTOML_JOB_CONFIG = {
    'automl_job_name': BASE_AUTOML_JOB_NAME,
    'automl_output_s3_base_path': 's3://sagemaker-us-east-1-590183834230/Canvas/default-1709579454037/Training/output/Canvas1711471556444'
}

# Path conventions of the output data storage path from the local AutoML job run of this notebook
LOCAL_AUTOML_JOB_NAME = 'Canvas1711-notebook-run-{}'.format(uid())
LOCAL_AUTOML_JOB_CONFIG = {
    'local_automl_job_name': LOCAL_AUTOML_JOB_NAME,
    'local_automl_job_output_s3_base_path': 's3://sagemaker-us-east-1-590183834230/Canvas/default-1709579454037/Training/output/Canvas1711471556444/{}'.format(LOCAL_AUTOML_JOB_NAME),
}

AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG = AutoMLLocalEnsembleRunConfig(
    test_artifacts_path = 'Canvas1711471556444-artifacts',
    base_automl_job_config = BASE_AUTOML_JOB_CONFIG,
    local_automl_job_config = LOCAL_AUTOML_JOB_CONFIG
)

AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.display()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


This notebook is initialized to use the following configuration: 
        <table>
        <tr><th rowspan=2>Base AutoML Job</th><th>Job Name</th><td>Canvas1711471556444</td></tr>
        <tr><th>Base Output S3 Path</th><td>s3://sagemaker-us-east-1-590183834230/Canvas/default-1709579454037/Training/output/Canvas1711471556444</td></tr>
        <tr><th rowspan=5>Interactive Job</th><th>Job Name</th><td>Canvas1711-notebook-run-31-18-22-39</td></tr>
        <tr><th>Base Output S3 Path</th><td>s3://sagemaker-us-east-1-590183834230/Canvas/default-1709579454037/Training/output/Canvas1711471556444/Canvas1711-notebook-run-31-18-22-39</td></tr>
        </table>
        

## Candidate Trials

### Select Candidate to Train

The SageMaker Autopilot Job has analyzed the dataset and has generated a number of trial configurations with different metaparameter settings. You can select a trial configuration that you wish to train:

In [5]:
from ipywidgets import interact

trials_dropdown = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.dropdown
interact(AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.select_trial, trials_dropdown=trials_dropdown)

interactive(children=(Dropdown(description='trials_dropdown', options=(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), value=1…

<function ipywidgets.widgets.interaction._InteractFactory.__call__.<locals>.<lambda>(*args, **kwargs)>

### Update Selected Candidate

By editing and saving the [metaparameters.json file](Canvas1711471556444-artifacts/metaparameters.json), you can update the metaparameters that will be used for training.  *(To edit the file use Right Click->Open With->Editor.)* \
*IF you wish to reselect another trial from the dropdown, make sure you close and reopen the metaparameters.json file tab, before editing.*

The following are the metaparameters that can be updated. You can update the metaparameters of your choice.
The updated parameters will be passed to AutoGluon predictor for training. For a detailed description of the parameters,
refer to the [description of each arguments in AutoGluon predictor.](https://auto.gluon.ai/stable/_modules/autogluon/tabular/predictor/predictor.html)

<div class="alert alert-info"> 💡 <strong> Available Knobs</strong>

1. num_bag_sets: Number of repeats of kfold bagging to perform. Valid values: integer
1. included_model_types: List of models to train. Valid values: any subset of following list: ["XGB", "GBM", "CAT", "FASTAI", "NN_TORCH", "LR", "RF", "XT"]
    1. "XGB" (XGBoost)
    1. "GBM" (LightGBM)
    1. "CAT" (CatBoost)
    1. "FASTAI" (neural network with FastAI backend)
    1. "NN_TORCH" ((neural network implemented in Pytorch)
    1. "LR" (linear regression)
    1. "RF" (random forest)
    1. "XT" (extremely randomized trees)
1. presets: List of preset configurations for various arguments. ['best_quality', 'high_quality', 'good_quality', 'medium_quality', 'optimize_for_deployment', 'interpretable', 'ignore_text']
    - It is recommended to only use one `quality` based preset in a given call to `fit()` as they alter many of the same arguments and are not compatible with each-other.
1. auto_stack: Whether AutoGluon should automatically utilize bagging and multi-layer stack ensembling to boost predictive accuracy. Valid values: boolean
1. num_stack_levels: Number of stacking levels to use in stack ensemble. Valid values: integer
1. refit_full: Whether to retrain all models on all of the data (training + validation) after the normal training procedure. Valid values: boolean
1. set_best_to_refit_full: If True, AutoGluon will change the default model that Predictor uses for prediction when model is not specified to the refit_full version
    of the model that exhibited the highest validation score. Only valid if refit_full is set. Valid values: boolean
1. save_bag_folds: Whether bagged models will save their fold models. Valid values: boolean

</div>

### Selected Candidate Metaparameters

You have selected the following metaparameters for your trial. (please run the cell below to load and display your selection):

In [6]:
AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.display_candidate()

0,1
num_bag_sets,1
included_model_types,"['XGB', 'GBM', 'CAT', 'XT', 'RF', 'NN_TORCH', 'FASTAI']"
presets,['good_quality']
auto_stack,True
num_stack_levels,0


## Executing the Candidate Trial
### Run Processing Job
Now you are ready to create processing job with the updated trial configuration.

#### Prepare Processor and Processing Job Inputs

In [7]:
from sagemaker.processing import Processor

processor_args = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.prepare_processor_args()
processor = Processor(**processor_args, tags=[{'Key': 'sagemaker:is-canvas-resource', 'Value': 'True'}, {'Key': 'sagemaker:service:source:additionalMetadata', 'Value': 'canvas:notebook:tabular'}])

processing_inputs = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.prepare_processing_inputs()
processing_outputs = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.prepare_processing_outputs()
processing_job_name = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.local_automl_job_name

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


#### Run Processing Job for the Selected Trial

In [None]:
from IPython.display import display, Markdown

display(
Markdown(f"Creating Processing Job {processing_job_name}, please track the progress from [here](https://{AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.region}.console.aws.amazon.com/sagemaker/home?region={AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.region}#/processing-jobs/{processing_job_name})."))

processor.run(
    job_name = processing_job_name,
    inputs = processing_inputs,
    outputs = processing_outputs,
    logs = False
)

## Model Deployment
Now, you can deploy the trained model from the processing job. After the deployment completes, you will get an endpoint that's ready to serve online inference.

<div class="alert alert-info"> 💡 <strong> Available Knobs</strong>

1. You can customize the initial instance count and instance type used to deploy this model.
2. Endpoint name can be changed to avoid conflict with existing endpoints.

</div>

In [None]:
from sagemaker.model import Model

model_args = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.prepare_model_args()
model = Model(**model_args)

model.deploy(initial_instance_count=2,
             instance_type='ml.m5.12xlarge',
             endpoint_name="AutoML-{}".format(processing_job_name),
             wait=True,
             tags=[{'Key': 'sagemaker:is-canvas-resource', 'Value': 'True'}, {'Key': 'sagemaker:service:source:additionalMetadata', 'Value': 'canvas:notebook:tabular'}])

Congratulations! Now you could visit the sagemaker
[endpoint console page](https://us-west-2.console.aws.amazon.com/sagemaker/home?region=us-west-2#/endpoints) to find the deployed endpoint (it'll take a few minutes to be in service).

<div class="alert alert-warning">
    <strong>To rerun this notebook, delete or change the name of your endpoint!</strong> <br>
    If you rerun this notebook, you'll run into an error on the last step because the endpoint already exists. You can either delete the endpoint from the endpoint console page or you can change the <code>endpoint_name</code> in the previous code block.
</div>