#  [모듈 3.1] HPO 스텝 개발 (SageMaker Training Step)

이 노트북은 아래와 같은 목차로 진행 됩니다. 전체를 모두 실행시에 완료 시간은 약 5분-10분 소요 됩니다.

- 0. 모델 튜닝 개요 
- 1. 데이터 세트 로딩 및 기본 훈련 변수 설정
- 2. 모델 훈련 코드 확인
- 3. HPO 코드 실행
- 4. 모델 튜닝 스텝 개발 및 실행
    
---

# 0. 모델 튜닝 개요

하이퍼파라미터 튜닝이라고도 하는 Amazon SageMaker 자동 모델 튜닝은 사용자가 지정한 알고리즘과 다양한 하이퍼파라미터를 사용하여 데이터 세트에 대해 여러 훈련 작업을 실행하여 최적의 모델 버전을 찾습니다. 그런 다음 선택한 지표로 측정된 값에 따라 최적의 성능을 보여준 모델을 만든 하이퍼파라미터 값을 선택합니다.



- 참고
    - 개발자 가이드: [SageMaker 로 자동 모델 튜닝 수행](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/automatic-model-tuning.html)
    - 공식 세이지 메이커의 샘플 입니다. --> [HPO 시작 코드](https://github.com/aws/amazon-sagemaker-examples/blob/master/hyperparameter_tuning/xgboost_direct_marketing/hpo_xgboost_direct_marketing_sagemaker_python_sdk.ipynb)




# 1. 데이터 세트 로딩 및 기본 훈련 변수 설정
- 이전 단계(전처리)에서 결과 파일을 로딩 합니다. 실제 훈련에 제공되는 데이터를 확인하기 위함 입니다.
---

In [8]:
import boto3
import sagemaker
import pandas as pd
import os

#region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
role = sagemaker.get_execution_role()
sm_client = boto3.client("sagemaker")
# region = sagemaker.Session().boto_region_name
# print("Using AWS Region: {}".format(region))

%store -r 
# 노트북에 저장되어 있는 변수를 보기 위해서는 주석을 제거하고 실행하시면 됩니다.
# %store  

In [9]:
! aws s3 ls {train_preproc_data_uri} --recursive

2021-07-24 06:51:43     767663 sagemaker-pipeline-step-by-step-phase01/preporc/train.csv


In [10]:
train_prep_df = pd.read_csv(train_preproc_data_uri)
train_prep_df

Unnamed: 0,customer_age,months_as_customer,num_claims_past_year,num_insurers_past_5_years,policy_deductable,policy_annual_premium,auto_year,customer_gender_Female,customer_gender_Male,policy_state_AZ,...,authorities_contacted_Fire,authorities_contacted_None,authorities_contacted_Police,incident_severity_Major,incident_severity_Minor,incident_severity_Totaled,incident_severity_nan,police_report_available_No,police_report_available_Yes,police_report_available_nan
0,54,94,0,1,750,3000,2006,0,0,0,...,0,1,0,0,1,0,0,1,0,0
1,41,165,0,1,750,2950,2012,0,1,0,...,0,0,1,0,0,1,0,0,1,0
2,57,155,0,1,750,3000,2017,1,0,0,...,0,0,1,0,1,0,0,0,1,0
3,39,80,0,1,750,3000,2020,1,0,1,...,0,1,0,0,1,0,0,1,0,0
4,39,60,0,1,750,3000,2018,1,0,0,...,0,0,1,1,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4495,55,151,0,1,750,3000,2017,0,1,0,...,0,0,1,1,0,0,0,1,0,0
4496,47,155,0,1,750,3000,2015,1,0,0,...,0,0,1,1,0,0,0,0,1,0
4497,34,177,0,1,750,2800,2014,0,1,0,...,0,0,1,1,0,0,0,1,0,0
4498,61,268,0,1,750,2550,2014,0,0,0,...,0,1,0,0,1,0,0,1,0,0


## 2. 훈련 스크립트 확인

---

In [11]:
# !pygmentize src/xgboost_starter_script.py

## 3. HPO 코드 실행
---



### 기본 훈련 변수 및 하이퍼파라미터 설정

In [12]:
from sagemaker.xgboost.estimator import XGBoost

bucket = sagemaker_session.default_bucket()
prefix = project_prefix

estimator_output_path = f's3://{bucket}/{prefix}/training_jobs'
train_instance_count = 1

def get_pos_scale_weight(df, label):
    '''
    1, 0 의 레이블 분포를 계산하여 클래스 가중치 리턴
    예: 1: 10, 0: 90 이면 90/10 = 9 를 제공함. 
    호출:
        class_weight = get_pos_scale_weight(train_prep_df, label='fraud')
    '''
    fraud_sum = df[df[label] == 1].shape[0]
    non_fraud_sum = df[df[label] == 0].shape[0]
    class_weight = int(non_fraud_sum / fraud_sum)
    print(f"fraud_sum: {fraud_sum} , non_fraud_sum: {non_fraud_sum}, class_weight: {class_weight}")
    return class_weight
    
class_weight = get_pos_scale_weight(train_prep_df, label='fraud')

hyperparameters = {
       "scale_pos_weight" : class_weight,    
        "max_depth": "3",
        "eta": "0.2",
        "objective": "binary:logistic",
        "num_round": "100",
}


fraud_sum: 149 , non_fraud_sum: 4351, class_weight: 29


### 튜너 설정 및 생성
- xbg_estimator 정의된  estimator 기술
- `objective_metric_name = "validation:auc"` 튜닝을 하고자 하는 지표 기술
    - 이 지표의 경우는 훈련 코드에서 정의 및 기록을 해야만 합니다.
- `hyperparameter_ranges` 튜닝하고자 하는 파라미터의 범위 설정
- `max_jobs` 기술
    - 총 훈련잡의 갯수 입니다.
- `max_parallel_jobs` 기술
    - 병렬로 실행할 훈련잡의 개수 (리소스 제한에 따라서 에러가 발생할 수 있습니다. 이 경우에 줄여 주세요.)


In [13]:
from sagemaker.tuner import (
    IntegerParameter,
    CategoricalParameter,
    ContinuousParameter,
    HyperparameterTuner,
)


xgb_estimator = XGBoost(
    entry_point = "xgboost_script.py",
    source_dir = "src",
    output_path = estimator_output_path,
    code_location = estimator_output_path,
    hyperparameters = hyperparameters,
    role = role,
    instance_count = train_instance_count,
    instance_type = 'ml.m4.xlarge',
    framework_version = "1.0-1")

hyperparameter_ranges = {
    "eta": ContinuousParameter(0, 1),
    "min_child_weight": ContinuousParameter(1, 10),
    "alpha": ContinuousParameter(0, 2),
    "max_depth": IntegerParameter(1, 10),
}

objective_metric_name = "validation:auc"

tuner = HyperparameterTuner(
    xgb_estimator, objective_metric_name, hyperparameter_ranges, 
    max_jobs=5,
    max_parallel_jobs=5,
)

In [14]:
tuner.fit(inputs = {'train': train_preproc_data_uri,
                   },
                  wait=False,
                 )


### 튜너의 실행 상태를 확인
- 약 5분 소요 됩니다.

In [15]:
import time

tuning_job_name = tuner.latest_tuning_job.job_name

def show_hpo_status(sm_client):
    status = sm_client.describe_hyper_parameter_tuning_job(
        HyperParameterTuningJobName=tuning_job_name
    )["HyperParameterTuningJobStatus"]
    return status

status = show_hpo_status(sm_client)
while status == 'InProgress':
    status = show_hpo_status(sm_client)    
    print("HPO status: ", status)    
    time.sleep(30)    

HPO status:  InProgress
HPO status:  InProgress
HPO status:  InProgress
HPO status:  InProgress
HPO status:  InProgress
HPO status:  InProgress
HPO status:  InProgress
HPO status:  InProgress
HPO status:  InProgress
HPO status:  InProgress
HPO status:  Completed


### Best 훈련 Job 출력
- 수행된 훈련 잡 중에서 가장 성능이 좋은 훈련 잡을 기술하고, 최종 사용된 하이퍼 파리미터 값을 보여 줌

In [16]:
from pprint import pprint

# run this cell to check current status of hyperparameter tuning job
tuning_job_result = sm_client.describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuning_job_name
)

status = tuning_job_result["HyperParameterTuningJobStatus"]
if status != "Completed":
    print("Reminder: the tuning job has not been completed.")

job_count = tuning_job_result["TrainingJobStatusCounters"]["Completed"]
print("%d training jobs have completed" % job_count)
is_minimize = (
    tuning_job_result["HyperParameterTuningJobConfig"]["HyperParameterTuningJobObjective"]["Type"] != "Maximize"
)
objective_name = tuning_job_result["HyperParameterTuningJobConfig"]["HyperParameterTuningJobObjective"]["MetricName"]

if tuning_job_result.get("BestTrainingJob", None):
    print("Best model found so far:")
    pprint(tuning_job_result["BestTrainingJob"])
else:
    print("No training jobs have reported results yet.")


5 training jobs have completed
Best model found so far:
{'CreationTime': datetime.datetime(2021, 7, 25, 9, 14, 13, tzinfo=tzlocal()),
 'FinalHyperParameterTuningJobObjectiveMetric': {'MetricName': 'validation:auc',
                                                 'Value': 0.8084999918937683},
 'ObjectiveStatus': 'Succeeded',
 'TrainingEndTime': datetime.datetime(2021, 7, 25, 9, 18, 9, tzinfo=tzlocal()),
 'TrainingJobArn': 'arn:aws:sagemaker:ap-northeast-2:057716757052:training-job/sagemaker-xgboost-210725-0913-004-90f9d6c6',
 'TrainingJobName': 'sagemaker-xgboost-210725-0913-004-90f9d6c6',
 'TrainingJobStatus': 'Completed',
 'TrainingStartTime': datetime.datetime(2021, 7, 25, 9, 16, 46, tzinfo=tzlocal()),
 'TunedHyperParameters': {'alpha': '1.3452164783637666',
                          'eta': '0.16285209106156806',
                          'max_depth': '3',
                          'min_child_weight': '8.441234615433405'}}


### 튜닝을 수행한 모든 훈련 잡의 결과 확인
- `FinalObjectiveValue` 의 성능 지표 순서로 보여 줌

In [17]:
import pandas as pd

tuner_df = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name)

full_df = tuner_df.dataframe()

if len(full_df) > 0:
    df = full_df[full_df["FinalObjectiveValue"] > -float("inf")]
    if len(df) > 0:
        df = df.sort_values("FinalObjectiveValue", ascending=is_minimize)
        print("Number of training jobs with valid objective: %d" % len(df))
        print({"lowest": min(df["FinalObjectiveValue"]), "highest": max(df["FinalObjectiveValue"])})
        pd.set_option("display.max_colwidth", -1)  # Don't truncate TrainingJobName
    else:
        print("No training jobs have reported valid results yet.")

df

Number of training jobs with valid objective: 5
{'lowest': 0.7833999991416931, 'highest': 0.8084999918937683}




Unnamed: 0,alpha,eta,max_depth,min_child_weight,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
1,1.345216,0.162852,3.0,8.441235,sagemaker-xgboost-210725-0913-004-90f9d6c6,Completed,0.8085,2021-07-25 09:16:46+00:00,2021-07-25 09:18:09+00:00,83.0
3,0.905327,0.519599,6.0,6.73515,sagemaker-xgboost-210725-0913-002-a110507c,Completed,0.7931,2021-07-25 09:16:30+00:00,2021-07-25 09:17:56+00:00,86.0
4,1.213865,0.473219,8.0,2.534293,sagemaker-xgboost-210725-0913-001-7bc4aaa3,Completed,0.786,2021-07-25 09:16:22+00:00,2021-07-25 09:17:45+00:00,83.0
0,0.625873,0.204358,6.0,4.143218,sagemaker-xgboost-210725-0913-005-64fe0486,Completed,0.7852,2021-07-25 09:16:40+00:00,2021-07-25 09:17:59+00:00,79.0
2,1.337186,0.596923,10.0,7.694334,sagemaker-xgboost-210725-0913-003-96c41b33,Completed,0.7834,2021-07-25 09:16:29+00:00,2021-07-25 09:17:52+00:00,83.0


# 4. 모델 튜닝 스텝 개발 및 실행
---
- 개발자 가이드의 튜닝 단계 참고 --> [튜닝 단계](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/build-and-manage-steps.html#step-type-tuning)



### 튜너 기본 요소 정의
- 위에서 튜닝에 대한 부분을 다시 기술하였습니다. 값들을 바꾸어 가면서 사용하시면 됩니다. 

In [18]:
from sagemaker.tuner import (
    IntegerParameter,
    CategoricalParameter,
    ContinuousParameter,
    HyperparameterTuner,
)


xgb_estimator = XGBoost(
    entry_point = "xgboost_script.py",
    source_dir = "src",
    output_path = estimator_output_path,
    code_location = estimator_output_path,
    hyperparameters = hyperparameters,
    role = role,
    instance_count = train_instance_count,
    instance_type = 'ml.m4.xlarge',
    framework_version = "1.0-1")

hyperparameter_ranges = {
    "eta": ContinuousParameter(0, 1),
    "min_child_weight": ContinuousParameter(1, 10),
    "alpha": ContinuousParameter(0, 2),
    "max_depth": IntegerParameter(1, 10),
}

objective_metric_name = "validation:auc"

pipeline_tuner = HyperparameterTuner(
    xgb_estimator, objective_metric_name, hyperparameter_ranges, 
    max_jobs=5,
    max_parallel_jobs=5,
)

### 튜닝 단계 정의 



In [20]:
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TuningStep
from sagemaker.model import Model
    
step_tuning = TuningStep(
    name = "HPTuning",
    tuner = pipeline_tuner,
    inputs={
        "train": TrainingInput(
            s3_data= train_preproc_data_uri,
            content_type="text/csv"
        ),
    },
)


### 모델 빌딩 파이프라인 정의

In [21]:
from sagemaker.workflow.pipeline import Pipeline

from sagemaker.workflow.execution_variables import ExecutionVariables
from sagemaker.workflow.pipeline_experiment_config import PipelineExperimentConfig

project_hpo_prefix = project_prefix + "-HPO-step"

pipeline_name = project_prefix
pipeline = Pipeline(
    name=project_hpo_prefix,
    pipeline_experiment_config=PipelineExperimentConfig(
      ExecutionVariables.PIPELINE_NAME,
      ExecutionVariables.PIPELINE_EXECUTION_ID
    ),    
    steps=[step_tuning],
)

In [22]:
import json

definition = json.loads(pipeline.definition())
# definition

### 파이프라인을 SageMaker에 제출하고 실행하기 

파이프라인 정의를 파이프라인 서비스에 제출합니다. 함께 전달되는 역할(role)을 이용하여 AWS에서 파이프라인을 생성하고 작업의 각 단계를 실행할 것입니다.   

In [23]:
pipeline.upsert(role_arn=role)
execution = pipeline.start()


In [24]:
execution.describe()

{'PipelineArn': 'arn:aws:sagemaker:ap-northeast-2:057716757052:pipeline/sagemaker-pipeline-step-by-step-phase01-hpo-step',
 'PipelineExecutionArn': 'arn:aws:sagemaker:ap-northeast-2:057716757052:pipeline/sagemaker-pipeline-step-by-step-phase01-hpo-step/execution/vmnf37qlk7j6',
 'PipelineExecutionDisplayName': 'execution-1627204882998',
 'PipelineExecutionStatus': 'Executing',
 'CreationTime': datetime.datetime(2021, 7, 25, 9, 21, 22, 935000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2021, 7, 25, 9, 21, 22, 935000, tzinfo=tzlocal()),
 'CreatedBy': {},
 'LastModifiedBy': {},
 'ResponseMetadata': {'RequestId': '43324919-3610-4877-9a62-84fdf8172b45',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '43324919-3610-4877-9a62-84fdf8172b45',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '471',
   'date': 'Sun, 25 Jul 2021 09:21:22 GMT'},
  'RetryAttempts': 0}}

In [25]:
execution.wait()

In [26]:
execution.list_steps()

[{'StepName': 'HPTuning',
  'StartTime': datetime.datetime(2021, 7, 25, 9, 21, 24, 28000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2021, 7, 25, 9, 26, 13, 266000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {}}]