# [모듈 4.1] 모델 훈련 스텝 개발 (SageMaker Model Building Pipeline 훈련 스텝)

이 노트북은 아래와 같은 목차로 진행 됩니다. 전체를 모두 실행시에 완료 시간은 약 5분-10분 소요 됩니다.

- 0. 모델 훈련 개요 
- 1. 데이터 세트 로딩 및 기본 훈련 변수 설정
- 2. 모델 훈련 코드 확인
- 3. 모델 훈련 스텝 개발 및 실행
    - 아래의 3단계를 진행하여 SageMaker Model Building Pipeline 에서 훈련 스텝 개발 함. 아래의 (1), (2) 단계는 옵션이지만, 실제 현업 개발시에 필요한 단계이기에 실행을 권장 드립니다.
        - (1) **[로컬 노트북 인스턴스]**에서 다커 컨테이너로 훈련 코드 실행 (로컬 모드로 불리움)
        - (2) 세이지메이커 호스트 모드(로컬 다커 컨테이너 사용) 및 실험(Experiment)사용하여 훈련 코드 실행
        - (3) [필수] SageMaker Model Building Pipeline 에서 모델 훈련 스텝 개발 및 실행
    
---

# 0. 모델 훈련 개요

# 1. 데이터 세트 로딩 및 기본 훈련 변수 설정
- 이전 단계(전처리)에서 결과 파일을 로딩 합니다. 실제 훈련에 제공되는 데이터를 확인하기 위함 입니다.
---

In [44]:
import boto3
import sagemaker
import pandas as pd
import os

#region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
role = sagemaker.get_execution_role()
sm_client = boto3.client("sagemaker")

# region = sagemaker.Session().boto_region_name
# print("Using AWS Region: {}".format(region))

%store -r 
# 노트북에 저장되어 있는 변수를 보기 위해서는 주석을 제거하고 실행하시면 됩니다.
# %store  

In [45]:
! aws s3 ls {train_preproc_data_uri} --recursive

2021-07-23 00:38:10     767663 sagemaker-pipeline-step-by-step-phase01/preporc/train.csv


In [46]:
train_prep_df = pd.read_csv(train_preproc_data_uri)
train_prep_df

Unnamed: 0,customer_age,months_as_customer,num_claims_past_year,num_insurers_past_5_years,policy_deductable,policy_annual_premium,auto_year,customer_gender_Female,customer_gender_Male,policy_state_AZ,policy_state_CA,policy_state_ID,policy_state_NV,policy_state_OR,policy_state_WA,policy_state_nan,customer_education_Advanced Degree,customer_education_Associate,customer_education_Bachelor,customer_education_Below High School,customer_education_High School,customer_education_nan,policy_liability_100/200,policy_liability_15/30,policy_liability_25/50,...,incident_hour,fraud,driver_relationship_Child,driver_relationship_Other,driver_relationship_Self,driver_relationship_Spouse,driver_relationship_nan,incident_type_Break-in,incident_type_Collision,incident_type_Theft,collision_type_Front,collision_type_Rear,collision_type_Side,collision_type_nan,authorities_contacted_Ambulance,authorities_contacted_Fire,authorities_contacted_None,authorities_contacted_Police,incident_severity_Major,incident_severity_Minor,incident_severity_Totaled,incident_severity_nan,police_report_available_No,police_report_available_Yes,police_report_available_nan
0,54,94,0,1,750,3000,2006,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,...,8,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,0,1,0,0
1,41,165,0,1,750,2950,2012,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,...,11,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,1,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4498,61,268,0,1,750,2550,2014,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,...,15,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,1,0,0
4499,23,26,0,1,750,2900,2017,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,...,10,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1,0,0,1,0


### 기본 훈련 변수 및 하이퍼파라미터 설정

In [47]:
from sagemaker.xgboost.estimator import XGBoost

bucket = sagemaker_session.default_bucket()
prefix = project_prefix

estimator_output_path = f's3://{bucket}/{prefix}/training_jobs'
train_instance_count = 1

def get_pos_scale_weight(df, label):
    '''
    1, 0 의 레이블 분포를 계산하여 클래스 가중치 리턴
    예: 1: 10, 0: 90 이면 90/10 = 9 를 제공함. 
    호출:
        class_weight = get_pos_scale_weight(train_prep_df, label='fraud')
    '''
    fraud_sum = df[df[label] == 1].shape[0]
    non_fraud_sum = df[df[label] == 0].shape[0]
    class_weight = int(non_fraud_sum / fraud_sum)
    print(f"fraud_sum: {fraud_sum} , non_fraud_sum: {non_fraud_sum}, class_weight: {class_weight}")
    return class_weight
    
class_weight = get_pos_scale_weight(train_prep_df, label='fraud')


hyperparameters = {
       "scale_pos_weight" : class_weight,    
        "max_depth": "3",
        "alpha" : "0.2", 
        "eta": "0.333",
        "min_child_weight": "7",
        "objective": "binary:logistic",
        "num_round": "100",
}



fraud_sum: 149 , non_fraud_sum: 4351, class_weight: 29


# 2. 모델 훈련 코드 확인

전처리 코드는 크게 아래와 같이 구성 되어 있습니다.
- 커맨드 인자로 전달된 변수 내용 확인
- 훈련 데이터를 로딩 합니다.
- xgboost의 cross-validation(cv) 로 훈련 합니다.
- 훈련 성능을 나타내는 지표를 저장합니다.
- 훈련이 모델 아티펙트를 저장 합니다.
    - [알림] 일반적으로 xgboost의 알고리즘의 큰 변경이 없으면, 세이지 메이커 내장 xgboost 알고리즘을 사용합니다. 여기서는 훈련 코드를 사용자가 정의해서 사용할 수 있는 예시를 위하여 따로 훈련 코드를 만들었습니다.
---

In [48]:
# !pygmentize src/xgboost_starter_script.py

# 3. 모델 훈련 스텝 개발 및 실행
---



## (1) 로컬 노트북 인스턴스에서 로컬 모드(로컬 다커 컨테이너 사용)로 훈련 코드 실행

In [49]:

xgb_estimator_local = XGBoost(
    entry_point = "xgboost_script.py",
    source_dir = "src",
    output_path = estimator_output_path,
    code_location = estimator_output_path,
    hyperparameters = hyperparameters,
    role = role,
    instance_count = train_instance_count,
    instance_type = 'local',
    framework_version = "1.0-1"
)
    
xgb_estimator_local.fit(inputs = {'train': train_preproc_data_uri},
                      wait=True,                        
                 )


INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: sagemaker-xgboost-2021-07-24-02-53-49-823
INFO:sagemaker.local.local_session:Starting training job
INFO:sagemaker.local.image:No AWS credentials found in session but credentials from EC2 Metadata Service are available.
INFO:sagemaker.local.image:docker compose file: 
networks:
  sagemaker-local:
    name: sagemaker-local
services:
  algo-1-rvisb:
    command: train
    container_name: nv4hml61rk-algo-1-rvisb
    environment:
    - '[Masked]'
    - '[Masked]'
    image: 366743142698.dkr.ecr.ap-northeast-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3
    networks:
      sagemaker-local:
        aliases:
        - algo-1-rvisb
    stdin_open: true
    tty: true
    volumes:
    - /tmp/tmp0azdg660/algo-1-rvisb/output/data:/opt/ml/output/data
    - /tmp/tmp0azdg660/algo-1-rvisb/inp

Creating nv4hml61rk-algo-1-rvisb ... 
Creating nv4hml61rk-algo-1-rvisb ... done
Attaching to nv4hml61rk-algo-1-rvisb
[36mnv4hml61rk-algo-1-rvisb |[0m INFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training
[36mnv4hml61rk-algo-1-rvisb |[0m INFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)
[36mnv4hml61rk-algo-1-rvisb |[0m INFO:sagemaker_xgboost_container.training:Invoking user training script.
[36mnv4hml61rk-algo-1-rvisb |[0m INFO:sagemaker-containers:Module xgboost_script does not provide a setup.py. 
[36mnv4hml61rk-algo-1-rvisb |[0m Generating setup.py
[36mnv4hml61rk-algo-1-rvisb |[0m INFO:sagemaker-containers:Generating setup.cfg
[36mnv4hml61rk-algo-1-rvisb |[0m INFO:sagemaker-containers:Generating MANIFEST.in
[36mnv4hml61rk-algo-1-rvisb |[0m INFO:sagemaker-containers:Installing module with the following command:
[36mnv4hml61rk-algo-1-rvisb |[0m /miniconda3/bin/python3 -m pip install . 
[36mnv4hml61rk-algo-1-rvisb |[



===== Job Complete =====


## (2) 세이지메이커 호스트 모드(로컬 다커 컨테이너 사용) 및 실험(Experiment)사용하여 훈련 코드 실행

### 실험(Experiment) 세팅
- Amazon SageMaker 실험은 기계 학습 실험을 구성, 추적, 비교 및 평가할 수 있는 Amazon SageMaker 의 기능입니다
- 상세 사항은 개발자 가이드 참조 하세요. --> [Amazon SageMaker 실험을 통한 Machine Learning 관리](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/experiments.html)
- sagemaker experiment는 추가적인 패키지를 설치하여야 합니다. 0.0.Setup-Environment 가 실행이 안도었다고 하면, `!pip install --upgrade sagemaker-experiments` 를 통해 설치 해주세요.
- 여기서는 boto3 API를 통해서 실험을 생성합니다. SageMaker Python SDK를 통해서도 가능합니다.


In [50]:
# !pip install --upgrade sagemaker-experiments

from smexperiments.experiment import Experiment
from smexperiments.trial import Trial
from smexperiments.trial_component import TrialComponent
from smexperiments.tracker import Tracker

from datetime import datetime

sm = boto3.client('sagemaker')


# 설험에 대한 이름을 생성 합니다.
experiment_name = project_prefix + '-single-train'

# 실험이 존재하지 않으면 생성하고, 그렇지 않으면 지나갑니다.
try:
    response = sm_client.describe_experiment(ExperimentName=experiment_name)
    print(f"Experiment:{experiment_name} already exists")    
    
except:
    response = sm_client.create_experiment(
        ExperimentName = experiment_name,
        Description = 'Experiment for fraud detection',
    )
    print(f"Experiment:{experiment_name} is created")        


Experiment:sagemaker-pipeline-step-by-step-phase01-single-train already exists


### 하이퍼 파리미터 변경 실험
- max_depth 5개의 값을 바꾸면서 5개의 훈련잡을 실행합니다.
    - ```for i, max_depth_num in enumerate([1,3,5,7,10]):```
    - 만약에 리소스 제한의 에러가 발생하면, 5개를 2개 정도로 줄여서 실행 해주세요.
- 위에서 생성한 Experiment 안에 5개의 Trial(시도) 를 생성합니다.
- xgb_estimator 에 각각의 하이파라미터를 인자로 제공합니다.
- xgb_estimator.fit()에 Experiment의 설정 파일을 제공합니다.
    - 1개의 실험, 각각의 시도가 설정되어 훈련을 시작 합니다.

In [51]:
instance_type = 'ml.m5.xlarge'
for i, max_depth_num in enumerate([1,3,5,7,10]):
    hyperparameters = {
           "scale_pos_weight" : class_weight,    
            "max_depth": f"{max_depth_num}",
            "alpha" : "0", 
            "eta": "0.3",
            "min_child_weight": "1",
            "objective": "binary:logistic",
            "num_round": "100",
    }
    
    ts = datetime.now().strftime('%Y-%m-%d-%H-%M-%S-%f')
    trial_name = experiment_name + f"-{ts}"

    response = sm_client.create_trial(
        ExperimentName = experiment_name,
        TrialName = trial_name,
    )    
    
    experiment_config = {
        'ExperimentName' : experiment_name,
        'TrialName' : trial_name,
        "TrialComponentDisplayName" : 'Training',
    }    

    
    xgb_estimator = XGBoost(
        entry_point = "xgboost_script.py",
        source_dir = "src",
        output_path = estimator_output_path,
        code_location = estimator_output_path,
        hyperparameters = hyperparameters,
        role = role,
        instance_count = train_instance_count,
        instance_type = instance_type,
        framework_version = "1.0-1")

    xgb_estimator.fit(inputs = {'train': train_preproc_data_uri},
                          experiment_config = experiment_config,
                          wait=False,                        
                     )

    



INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: sagemaker-xgboost-2021-07-24-02-53-57-783
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: sagemaker-xgboost-2021-07-24-02-53-58-195
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: sagemaker-xgboost-2021-07-24-02-54-02-151
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: sagemaker-xgboost-2021-07-

In [52]:
## 마지막 estimator의 로그 출력
xgb_estimator.logs()

2021-07-24 02:54:08 Starting - Starting the training job...
2021-07-24 02:54:10 Starting - Launching requested ML instancesProfilerReport-1627095248: InProgress
...
2021-07-24 02:55:01 Starting - Preparing the instances for training.........
2021-07-24 02:56:35 Downloading - Downloading input data
2021-07-24 02:56:35 Training - Downloading the training image...
2021-07-24 02:57:02 Uploading - Uploading generated training model[34mINFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training[0m
[34mINFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)[0m
[34mINFO:sagemaker_xgboost_container.training:Invoking user training script.[0m
[34mINFO:sagemaker-containers:Module xgboost_script does not provide a setup.py. [0m
[34mGenerating setup.py[0m
[34mINFO:sagemaker-containers:Generating setup.cfg[0m
[34mINFO:sagemaker-containers:Generating MANIFEST.in[0m
[34mINFO:sagemaker-containers:Installing module with the following command:[0m
[34

###  실험 결과 보기
위의 실험한 결과를 확인 합니다.
- 각각의 훈련잡의 시도에 대한 훈련 사용 데이터, 모델 입력 하이퍼 파라미터, 모델 평가 지표, 모델 아티펙트 결과 위치 등의 확인이 가능합니다.
- **아래의 모든 내용은 SageMaker Studio 를 통해서 직관적으로 확인이 가능합니다.**

In [53]:
from sagemaker.analytics import ExperimentAnalytics
import pandas as pd
pd.options.display.max_columns = 50
pd.options.display.max_rows = 5
pd.options.display.max_colwidth = 50

search_expression = {
    "Filters": [
        {
            "Name": "DisplayName",
            "Operator": "Equals",
            "Value": "Training",
        }
    ],
}


trial_component_analytics = ExperimentAnalytics(
    sagemaker_session= sagemaker_session,
    experiment_name= experiment_name,
    search_expression=search_expression,
)

trial_component_analytics.dataframe()

Unnamed: 0,TrialComponentName,DisplayName,SourceArn,SageMaker.ImageUri,SageMaker.InstanceCount,SageMaker.InstanceType,SageMaker.VolumeSizeInGB,alpha,eta,max_depth,min_child_weight,num_round,objective,sagemaker_container_log_level,sagemaker_job_name,sagemaker_program,sagemaker_region,sagemaker_submit_directory,scale_pos_weight,validation:auc - Min,validation:auc - Max,validation:auc - Avg,validation:auc - StdDev,validation:auc - Last,validation:auc - Count,train:auc - Min,train:auc - Max,train:auc - Avg,train:auc - StdDev,train:auc - Last,train:auc - Count,train - MediaType,train - Value,SageMaker.DebugHookOutput - MediaType,SageMaker.DebugHookOutput - Value,SageMaker.ModelArtifact - MediaType,SageMaker.ModelArtifact - Value,Trials,Experiments
0,sagemaker-xgboost-2021-07-24-02-54-08-343-aws-...,Training,arn:aws:sagemaker:ap-northeast-2:057716757052:...,366743142698.dkr.ecr.ap-northeast-2.amazonaws....,1.0,ml.m5.xlarge,30.0,"""0""","""0.3""","""10""","""1""","""100""","""binary:logistic""",20.0,"""sagemaker-xgboost-2021-07-24-02-54-08-343""","""xgboost_script.py""","""ap-northeast-2""","""s3://sagemaker-ap-northeast-2-057716757052/sa...",29.0,0.7835,0.7835,0.7835,0.0,0.7835,1,0.9430,0.9430,0.9430,0.0,0.9430,1,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,[sagemaker-pipeline-step-by-step-phase01-singl...,[sagemaker-pipeline-step-by-step-phase01-singl...
1,sagemaker-xgboost-2021-07-24-02-54-02-633-aws-...,Training,arn:aws:sagemaker:ap-northeast-2:057716757052:...,366743142698.dkr.ecr.ap-northeast-2.amazonaws....,1.0,ml.m5.xlarge,30.0,"""0""","""0.3""","""7""","""1""","""100""","""binary:logistic""",20.0,"""sagemaker-xgboost-2021-07-24-02-54-02-633""","""xgboost_script.py""","""ap-northeast-2""","""s3://sagemaker-ap-northeast-2-057716757052/sa...",29.0,0.7850,0.7850,0.7850,0.0,0.7850,1,0.9156,0.9156,0.9156,0.0,0.9156,1,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,[sagemaker-pipeline-step-by-step-phase01-singl...,[sagemaker-pipeline-step-by-step-phase01-singl...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18,sagemaker-xgboost-2021-07-23-02-12-54-542-aws-...,Training,arn:aws:sagemaker:ap-northeast-2:057716757052:...,366743142698.dkr.ecr.ap-northeast-2.amazonaws....,1.0,ml.m5.2xlarge,30.0,"""0""","""0.3""","""6""","""1""","""100""","""binary:logistic""",20.0,"""sagemaker-xgboost-2021-07-23-02-12-54-542""","""xgboost_script.py""","""ap-northeast-2""","""s3://sagemaker-ap-northeast-2-057716757052/sa...",29.0,0.7845,0.7845,0.7845,0.0,0.7845,1,0.9014,0.9014,0.9014,0.0,0.9014,1,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,[Trial-2021-07-23-01-41-41-524747],[sagemaker-pipeline-step-by-step-phase01-singl...
19,sagemaker-xgboost-2021-07-23-01-43-58-289-aws-...,Training,arn:aws:sagemaker:ap-northeast-2:057716757052:...,366743142698.dkr.ecr.ap-northeast-2.amazonaws....,1.0,ml.m5.xlarge,30.0,"""0""","""0.3""","""6""","""1""","""100""","""binary:logistic""",20.0,"""sagemaker-xgboost-2021-07-23-01-43-58-289""","""xgboost_script.py""","""ap-northeast-2""","""s3://sagemaker-ap-northeast-2-057716757052/sa...",29.0,0.7845,0.7845,0.7845,0.0,0.7845,1,0.9014,0.9014,0.9014,0.0,0.9014,1,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,[Trial-2021-07-23-01-41-41-524747],[sagemaker-pipeline-step-by-step-phase01-singl...


### 모델 평가 지표에 순서에 따른 시도 보기
- 아래는 모델 평가 지표에 따른 순서로 보여주기 입니다.

In [54]:

trial_component_training_analytics = ExperimentAnalytics(
    sagemaker_session= sagemaker_session,
    experiment_name= experiment_name,
    search_expression=search_expression,
    sort_by="metrics.validation:auc.max",        
    sort_order="Descending",
    metric_names=["validation:auc"],
    parameter_names=["hidden_channels", "epochs", "dropout", "optimizer"],
)

trial_component_training_analytics.dataframe()

Unnamed: 0,TrialComponentName,DisplayName,SourceArn,validation:auc - Min,validation:auc - Max,validation:auc - Avg,validation:auc - StdDev,validation:auc - Last,validation:auc - Count,train - MediaType,train - Value,SageMaker.DebugHookOutput - MediaType,SageMaker.DebugHookOutput - Value,SageMaker.ModelArtifact - MediaType,SageMaker.ModelArtifact - Value,Trials,Experiments
0,sagemaker-xgboost-2021-07-24-02-53-57-783-aws-...,Training,arn:aws:sagemaker:ap-northeast-2:057716757052:...,0.8160,0.8160,0.8160,0.0,0.8160,1,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,[sagemaker-pipeline-step-by-step-phase01-singl...,[sagemaker-pipeline-step-by-step-phase01-singl...
1,sagemaker-xgboost-2021-07-23-23-35-25-417-aws-...,Training,arn:aws:sagemaker:ap-northeast-2:057716757052:...,0.8160,0.8160,0.8160,0.0,0.8160,1,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,[sagemaker-pipeline-step-by-step-phase01-singl...,[sagemaker-pipeline-step-by-step-phase01-singl...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18,sagemaker-xgboost-2021-07-24-01-09-56-586-aws-...,Training,arn:aws:sagemaker:ap-northeast-2:057716757052:...,0.7835,0.7835,0.7835,0.0,0.7835,1,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,[sagemaker-pipeline-step-by-step-phase01-singl...,[sagemaker-pipeline-step-by-step-phase01-singl...
19,sagemaker-xgboost-2021-07-23-23-35-31-070-aws-...,Training,arn:aws:sagemaker:ap-northeast-2:057716757052:...,0.7835,0.7835,0.7835,0.0,0.7835,1,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,,s3://sagemaker-ap-northeast-2-057716757052/sag...,[sagemaker-pipeline-step-by-step-phase01-singl...,[sagemaker-pipeline-step-by-step-phase01-singl...


## (3) SageMaker Pipeline에서  실행 
- 모델 훈련 스텝과 모델 등록 스텝 두가지를 실행합니다.

---



### 모델 빌딩 파이프라인 변수 생성



In [55]:
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
)

processing_instance_count = ParameterInteger(
    name="ProcessingInstanceCount",
    default_value=1
)
processing_instance_type = ParameterString(
    name="ProcessingInstanceType",
    default_value="ml.m5.xlarge"
)

training_instance_type = ParameterString(
    name="TrainingInstanceType",
    default_value="ml.m5.xlarge"
)


training_instance_count = ParameterInteger(
    name="TrainInstanceCount",
    default_value=1
)

model_approval_status = ParameterString(
    name="ModelApprovalStatus", default_value="PendingManualApproval"
)


input_data = ParameterString(
    name="InputData",
    default_value=input_data_uri,
)


### 모델 학습을 위한 학습단계 정의 

본 단계에서는 SageMaker의 [XGBoost](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) 알고리즘을 이용하여 학습을 진행할 것입니다. XGBoost 알고리즘을 이용하도록 Estimator를 구성합니다. 보편적인 학습스크립트를 이용하여 입력 채널에서 정의한 학습데이터를 로드하고, 하이퍼파라미터 설정을 통해 학습을 설정하고, 모델을 학습한 후 `model_dir`경로에 학습된 모델을 저장합니다. 저장된 모델은 이후 호스팅을 위해 사용됩니다. 

학습된 모델이 추출되어 저장될 경로 또한 명시되었습니다. 

`training_instance_type`파라미터가 사용된 것을 확인합니다. 이 값은 본 예제의 파이프라인에서 여러번 사용됩니다. 본 단계에서는 estimator를 선언할 때 전달되었습니다. 


In [56]:
xgb_train = XGBoost(
    entry_point = "xgboost_script.py",
    source_dir = "src",
    output_path = estimator_output_path,
    code_location = estimator_output_path,
    hyperparameters = hyperparameters,
    role = role,
    instance_count = train_instance_count,
    instance_type = training_instance_type,
    framework_version = "1.0-1")

이전 단계에서 (프로세싱) 전처리 훈련, 검증 데이터 세트를 입력으로 제공 합니다.

In [57]:
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep


step_train = TrainingStep(
    name="FraudScratchTrain",
    estimator=xgb_train,
    inputs={
        "train": TrainingInput(
            s3_data= train_preproc_dir_artifact,
            content_type="text/csv"
        ),
    },
)

### 모델 등록 스텝
- 모델 등록 단계의 개발자 가이드 
    - [모델 등록기 단계](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/build-and-manage-steps.html#step-type-register-model)
    - [모델 레지스트리로 모델 등록 및 배포](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/model-registry.html)
- 모델 그룹 릭스팅 API:  [ListModelPackageGroups](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ListModelPackageGroups.html)   

In [58]:
model_package_group_name = f"{project_prefix}"
model_package_group_input_dict = {
 "ModelPackageGroupName" : model_package_group_name,
 "ModelPackageGroupDescription" : "Sample model package group"
}

model_approval_status = ParameterString(
    name="ModelApprovalStatus", default_value="PendingManualApproval"
)

from sagemaker.workflow.step_collections import RegisterModel

step_register = RegisterModel(
    name= f"{project_prefix}-XgboostRegisterModel",
    estimator=xgb_train,
    image_uri = image_uri,
    model_data= train_model_artifact, # train_step.properties.ModelArtifacts.S3ModelArtifacts,
    content_types=["text/csv"],
    response_types=["text/csv"],
    inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
    transform_instances=["ml.m5.xlarge"],
    model_package_group_name=model_package_group_name,
    approval_status=model_approval_status,
    #model_metrics=model_metrics,
)

### 모델 빌딩 파이프라인 정의

In [59]:
from sagemaker.workflow.pipeline import Pipeline

from sagemaker.workflow.execution_variables import ExecutionVariables
from sagemaker.workflow.pipeline_experiment_config import PipelineExperimentConfig


pipeline_name = project_prefix
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        processing_instance_type, 
        processing_instance_count,
        training_instance_type,        
        input_data,
        model_approval_status,
    ],
    pipeline_experiment_config=PipelineExperimentConfig(
      ExecutionVariables.PIPELINE_NAME,
      ExecutionVariables.PIPELINE_EXECUTION_ID
    ),    
    steps=[step_train, step_register],
)

In [60]:
import json

definition = json.loads(pipeline.definition())
# definition

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


### 파이프라인을 SageMaker에 제출하고 실행하기 

파이프라인 정의를 파이프라인 서비스에 제출합니다. 함께 전달되는 역할(role)을 이용하여 AWS에서 파이프라인을 생성하고 작업의 각 단계를 실행할 것입니다.   

In [61]:
pipeline.upsert(role_arn=role)

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


{'PipelineArn': 'arn:aws:sagemaker:ap-northeast-2:057716757052:pipeline/sagemaker-pipeline-step-by-step-phase01',
 'ResponseMetadata': {'RequestId': 'df9cef36-fd84-4de6-94b1-fa357da71409',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'df9cef36-fd84-4de6-94b1-fa357da71409',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '112',
   'date': 'Sat, 24 Jul 2021 02:58:21 GMT'},
  'RetryAttempts': 0}}

디폴트값을 이용하여 파이프라인을 샐행합니다. 

In [62]:
execution = pipeline.start()

### 파이프라인 운영: 파이프라인 대기 및 실행상태 확인

워크플로우의 실행상황을 살펴봅니다. 

In [63]:
execution.describe()

{'PipelineArn': 'arn:aws:sagemaker:ap-northeast-2:057716757052:pipeline/sagemaker-pipeline-step-by-step-phase01',
 'PipelineExecutionArn': 'arn:aws:sagemaker:ap-northeast-2:057716757052:pipeline/sagemaker-pipeline-step-by-step-phase01/execution/8hv10r0bknf2',
 'PipelineExecutionDisplayName': 'execution-1627095502393',
 'PipelineExecutionStatus': 'Executing',
 'CreationTime': datetime.datetime(2021, 7, 24, 2, 58, 22, 325000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2021, 7, 24, 2, 58, 22, 325000, tzinfo=tzlocal()),
 'CreatedBy': {},
 'LastModifiedBy': {},
 'ResponseMetadata': {'RequestId': '14527613-775d-4cdc-9dd1-5d444a69b474',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '14527613-775d-4cdc-9dd1-5d444a69b474',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '453',
   'date': 'Sat, 24 Jul 2021 02:58:21 GMT'},
  'RetryAttempts': 0}}

In [64]:

execution.wait()

실행이 완료될 때까지 기다립니다.

실행된 단계들을 리스트업합니다. 파이프라인의 단계실행 서비스에 의해 시작되거나 완료된 단계를 보여줍니다.

In [65]:
execution.list_steps()

[{'StepName': 'FraudScratchTrain',
  'StartTime': datetime.datetime(2021, 7, 24, 2, 58, 22, 783000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2021, 7, 24, 3, 1, 36, 732000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'TrainingJob': {'Arn': 'arn:aws:sagemaker:ap-northeast-2:057716757052:training-job/pipelines-8hv10r0bknf2-fraudscratchtrain-5ne4kvchds'}}},
 {'StepName': 'sagemaker-pipeline-step-by-step-phase01-XgboostRegisterModel',
  'StartTime': datetime.datetime(2021, 7, 24, 2, 58, 22, 783000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2021, 7, 24, 2, 58, 24, 330000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'RegisterModel': {'Arn': 'arn:aws:sagemaker:ap-northeast-2:057716757052:model-package/sagemaker-pipeline-step-by-step-phase01/2'}}}]

## 모델 레지스트리에서 모델 등록 확인
- 등록된 모델 버전에 대한 보기 --> [모델 버전의 세부 정보 보기](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/model-registry-details.html)

In [66]:
# 위에서 생성한 model_package_group_name 을 인자로 제공 합니다.
response = sm_client.list_model_packages(ModelPackageGroupName= model_package_group_name)
response

{'ModelPackageSummaryList': [{'ModelPackageGroupName': 'sagemaker-pipeline-step-by-step-phase01',
   'ModelPackageVersion': 2,
   'ModelPackageArn': 'arn:aws:sagemaker:ap-northeast-2:057716757052:model-package/sagemaker-pipeline-step-by-step-phase01/2',
   'CreationTime': datetime.datetime(2021, 7, 24, 2, 58, 24, 213000, tzinfo=tzlocal()),
   'ModelPackageStatus': 'Completed',
   'ModelApprovalStatus': 'PendingManualApproval'},
  {'ModelPackageGroupName': 'sagemaker-pipeline-step-by-step-phase01',
   'ModelPackageVersion': 1,
   'ModelPackageArn': 'arn:aws:sagemaker:ap-northeast-2:057716757052:model-package/sagemaker-pipeline-step-by-step-phase01/1',
   'CreationTime': datetime.datetime(2021, 7, 24, 1, 17, 56, 82000, tzinfo=tzlocal()),
   'ModelPackageStatus': 'Completed',
   'ModelApprovalStatus': 'Approved'}],
 'ResponseMetadata': {'RequestId': '3d164d5d-b5fa-4b1d-b578-eb792a233521',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '3d164d5d-b5fa-4b1d-b578-eb792a233521'

등록된 모델 버전의 상세 정보를 제공합니다.

In [67]:
ModelPackageArn = response['ModelPackageSummaryList'][0]['ModelPackageArn']
sm_client.describe_model_package(ModelPackageName=ModelPackageArn)

{'ModelPackageGroupName': 'sagemaker-pipeline-step-by-step-phase01',
 'ModelPackageVersion': 2,
 'ModelPackageArn': 'arn:aws:sagemaker:ap-northeast-2:057716757052:model-package/sagemaker-pipeline-step-by-step-phase01/2',
 'CreationTime': datetime.datetime(2021, 7, 24, 2, 58, 24, 213000, tzinfo=tzlocal()),
 'InferenceSpecification': {'Containers': [{'Image': '366743142698.dkr.ecr.ap-northeast-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3',
    'ImageDigest': 'sha256:04889b02181f14632e19ef6c2a7d74bfe699ff4c7f44669a78834bc90b77fe5a',
    'ModelDataUrl': 's3://sagemaker-ap-northeast-2-057716757052/sagemaker-pipeline-step-by-step-phase01/training_jobs/pipelines-ho3rqw49tfm2-FraudScratchTrain-DSlKffhKYY/output/model.tar.gz'}],
  'SupportedTransformInstanceTypes': ['ml.m5.xlarge'],
  'SupportedRealtimeInferenceInstanceTypes': ['ml.t2.medium', 'ml.m5.xlarge'],
  'SupportedContentTypes': ['text/csv'],
  'SupportedResponseMIMETypes': ['text/csv']},
 'ModelPackageStatus': 'Completed',
 'ModelPa

### 아티펙트 경로 추출
위의 훈련 스텝이 완료되면 실행해주세요

In [68]:
def get_train_artifact(execution, client, job_type,  kind=0):
    '''
    kind: 0 --> train
    kind: 2 --> test
    '''
    response = execution.list_steps()
    # print("response: ", response)
    proc_arn = response[0]['Metadata'][job_type]['Arn']
    train_job_name = proc_arn.split('/')[-1]
    # print("train_job_name: ", train_job_name)
    response = client.describe_training_job(TrainingJobName = train_job_name)
    # print("\nresponse: ", response)    
    train_model_artifact = response['ModelArtifacts']['S3ModelArtifacts']    
    
    return train_model_artifact

import boto3
client = boto3.client("sagemaker")
    
train_model_artifact = get_train_artifact(execution, client,job_type='TrainingJob', kind=0)
print(" train_model_artifact: ", train_model_artifact)


 train_model_artifact:  s3://sagemaker-ap-northeast-2-057716757052/sagemaker-pipeline-step-by-step-phase01/training_jobs/pipelines-8hv10r0bknf2-FraudScratchTrain-5nE4KvCHdS/output/model.tar.gz


In [69]:
image_uri = xgb_train.image_uri

In [70]:
%store train_model_artifact
%store image_uri

Stored 'train_model_artifact' (str)
Stored 'image_uri' (str)
