# [모듈 5.1] HPO 사용 모델 빌딩 파이프라인 개발 (SageMaker Model Building Pipeline 모든 스텝)

이 노트북은 아래와 같은 목차로 진행 됩니다. 전체를 모두 실행시에 완료 시간은 **약 30분** 소요 됩니다.

- 0. SageMaker Model Building Pipeline 개요
- 1. 파이프라인 변수 및 환경 설정
- 2. 파이프라인 스텝 단계 정의

    - (1) 전처리 스텝 단계 정의    
    - (2) 모델 학습을 위한 학습단계 정의 
    - (3) 모델 평가 단계
    - (4) 모델 등록 스텝
    - (5) 세이지 메이커 모델 생성 스텝 생성    
    - (6) HPO 단계
    - (7) 조건 단계
- 3. 모델 빌딩 파이프라인 정의 및 실행
- 4. Pipleline 캐싱 및 파라미터 이용한 실행
- 5. 정리 작업
    
---

# 0.SageMaker Model Building Pipeline 개요
- 필요시에 이전 노트북을  참조하세요:  scratch/8.5.All-Pipeline.ipynb

# 1. 파이프라인 변수 및 환경 설정



In [1]:
import boto3
import sagemaker
import pandas as pd

region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
role = sagemaker.get_execution_role()

sm_client = boto3.client('sagemaker', region_name=region)

%store -r 

## 파이프라인 변수 설정

In [2]:
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
    ParameterFloat,
)

processing_instance_count = ParameterInteger(
    name="ProcessingInstanceCount",
    default_value=1
)
processing_instance_type = ParameterString(
    name="ProcessingInstanceType",
    default_value="ml.m5.xlarge"
)

training_instance_type = ParameterString(
    name="TrainingInstanceType",
    default_value="ml.m5.xlarge"
)

training_instance_count = ParameterInteger(
    name="TrainingInstanceCount",
    default_value=1
)

model_eval_threshold = ParameterFloat(
    name="model2eval2threshold",
    default_value=0.85
)

input_data = ParameterString(
    name="InputData",
    default_value=input_data_uri,
)

model_approval_status = ParameterString(
    name="ModelApprovalStatus", default_value="PendingManualApproval"
)


## 캐싱 정의

- 참고: 캐싱 파이프라인 단계:  [Caching Pipeline Steps](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/pipelines-caching.html)

In [3]:
from sagemaker.workflow.steps import CacheConfig

cache_config = CacheConfig(enable_caching=True, 
                           expire_after="7d")


# 2. 파이프라인 스텝 단계 정의

# (1) 전처리 스텝 단계 정의
- input_data_uri 입력 데이타를 대상으로 전처리를 수행 합니다.

In [4]:
from sagemaker.sklearn.processing import SKLearnProcessor

split_rate = 0.2
framework_version = "0.23-1"

sklearn_processor = SKLearnProcessor(
    framework_version=framework_version,
    instance_type=processing_instance_type,
    instance_count=processing_instance_count,
    base_job_name="sklearn-fraud-process",
    role=role,
)
print("input_data: \n", input_data)

input_data: 
 s3://sagemaker-us-east-1-028703291518/sagemaker-pipeline-step-by-step-phase01/input


In [5]:
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep
    
step_process = ProcessingStep(
    name="FraudScratchProcess",
    processor=sklearn_processor,
    inputs=[
#         ProcessingInput(source=input_data_uri,destination='/opt/ml/processing/input'),
        ProcessingInput(source=input_data, destination='/opt/ml/processing/input'),        
         ],
    outputs=[ProcessingOutput(output_name="train",
                              source='/opt/ml/processing/output/train'),
             ProcessingOutput(output_name="test",
                              source='/opt/ml/processing/output/test')],
    job_arguments=["--split_rate", f"{split_rate}"],        
    code= 'src/preprocessing.py',
    cache_config = cache_config, # 캐시 정의
)


## (2)모델 학습을 위한 학습단계 정의 



### 기본 훈련 변수 및 하이퍼파라미터 설정

In [6]:
from sagemaker.xgboost.estimator import XGBoost

bucket = sagemaker_session.default_bucket()
prefix = 'fraud2train'
estimator_output_path = f's3://{bucket}/{prefix}/training_jobs'

base_hyperparameters = {
       "scale_pos_weight" : "29",        
        "max_depth": "6",
        "alpha" : "0", 
        "eta": "0.3",
        "min_child_weight": "1",
        "objective": "binary:logistic",
        "num_round": "100",
}


In [7]:
xgb_train = XGBoost(
    entry_point = "xgboost_script.py",
    source_dir = "src",
    output_path = estimator_output_path,
    code_location = estimator_output_path,
    hyperparameters = base_hyperparameters,
    role = role,
    instance_count = training_instance_count,
    instance_type = training_instance_type,
    framework_version = "1.0-1")

훈련의 입력이 이전 전처리의 결과가 제공됩니다.
- `step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri`

In [8]:
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep


step_train = TrainingStep(
    name="FraudScratchTrain",
    estimator=xgb_train,
    inputs={
        "train": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "train"
            ].S3Output.S3Uri,
            # s3_data= train_preproc_dir_artifact,            
            content_type="text/csv"
        ),
    },
    cache_config = cache_config, # 캐시 정의    
)

## (3) 모델 평가 단계

### ScriptProcessor 의 기본 도커 컨테이너 지정
ScriptProcessor 의 기본 도커 컨테이너로 Scikit-learn를 기본 이미지를 사용함. 
- 사용자가 정의한 도커 컨테이너도 사용할 수 있습니다.

In [9]:
from sagemaker.processing import ScriptProcessor


script_eval = SKLearnProcessor(
                             framework_version= "0.23-1",
                             role=role,
                             instance_type=processing_instance_type,
                             instance_count=1,
                             base_job_name="script-fraud-scratch-eval",
                                    )



In [10]:
from sagemaker.workflow.properties import PropertyFile
from sagemaker.workflow.steps import ProcessingStep

from sagemaker.workflow.properties import PropertyFile


evaluation_report = PropertyFile(
    name="EvaluationReport",
    output_name="evaluation",
    path="evaluation.json"
)



step_eval = ProcessingStep(
    name="FraudEval",
    processor=script_eval,
    inputs=[
        ProcessingInput(
            source= step_train.properties.ModelArtifacts.S3ModelArtifacts,
            destination="/opt/ml/processing/model"
        ),
        ProcessingInput(
            source=step_process.properties.ProcessingOutputConfig.Outputs[
                "test"
            ].S3Output.S3Uri,
        destination="/opt/ml/processing/test"
        )
    ],
    outputs=[
        ProcessingOutput(output_name="evaluation", source="/opt/ml/processing/evaluation"),
    ],
    code="src/evaluation.py",
    cache_config = cache_config, # 캐시 정의    
  property_files=[evaluation_report], # 현재 이 라인을 넣으면 에러 발생
)

## (4) 모델 등록 스텝

### 모델 그룹 생성

- 참고
    - 모델 그룹 릭스팅 API:  [ListModelPackageGroups](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ListModelPackageGroups.html)
    - 모델 지표 등록: [Model Quality Metrics](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/model-monitor-model-quality-metrics.html)

In [11]:
model_package_group_name = f"{project_prefix}"
model_package_group_input_dict = {
 "ModelPackageGroupName" : model_package_group_name,
 "ModelPackageGroupDescription" : "Sample model package group"
}
response = sm_client.list_model_package_groups(NameContains=model_package_group_name)
if len(response['ModelPackageGroupSummaryList']) == 0:
    print("No model group exists")
    print("Create model group")    
    
    create_model_pacakge_group_response = sm_client.create_model_package_group(**model_package_group_input_dict)
    print('ModelPackageGroup Arn : {}'.format(create_model_pacakge_group_response['ModelPackageGroupArn']))    
else:
    print(f"{model_package_group_name} exitss")

sagemaker-pipeline-step-by-step-phase01 exitss


In [12]:
from sagemaker.workflow.step_collections import RegisterModel

from sagemaker.model_metrics import MetricsSource, ModelMetrics 


model_metrics = ModelMetrics(
    model_statistics=MetricsSource(
        s3_uri="{}/evaluation.json".format(
            step_eval.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
        ),
        content_type="application/json"
    )
)


step_register = RegisterModel(
    name= "FraudScratcRegisterhModel",
    estimator=xgb_train,
    image_uri= step_train.properties.AlgorithmSpecification.TrainingImage,
    model_data= step_train.properties.ModelArtifacts.S3ModelArtifacts,
    content_types=["text/csv"],
    response_types=["text/csv"],
    inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
    transform_instances=["ml.m5.xlarge"],
    model_package_group_name=model_package_group_name,
    approval_status=model_approval_status,
    model_metrics=model_metrics,
)

## (5) 세이지 메이커 모델 스텝 생성
- 아래 두 파리미터의 입력이 이전 스텝의 결과가 제공됩니다.
    - image_uri= step_train.properties.AlgorithmSpecification.TrainingImage,
    - model_data= step_train.properties.ModelArtifacts.S3ModelArtifacts,



In [13]:
from sagemaker.model import Model
    
model = Model(
    image_uri= step_train.properties.AlgorithmSpecification.TrainingImage,
    model_data= step_train.properties.ModelArtifacts.S3ModelArtifacts,
    sagemaker_session=sagemaker_session,
    role=role,
)

In [14]:
from sagemaker.inputs import CreateModelInput
from sagemaker.workflow.steps import CreateModelStep


inputs = CreateModelInput(
    instance_type="ml.m5.large",
    # accelerator_type="ml.eia1.medium",
)
step_create_model = CreateModelStep(
    name="FraudScratchModel",
    model=model,
    inputs=inputs,
)

## (6) HPO 스텝

In [15]:
from sagemaker.tuner import (
    IntegerParameter,
    CategoricalParameter,
    ContinuousParameter,
    HyperparameterTuner,
)

hyperparameter_ranges = {
    "eta": ContinuousParameter(0, 1),
    "min_child_weight": ContinuousParameter(1, 10),
    "alpha": ContinuousParameter(0, 2),
    "max_depth": IntegerParameter(1, 10),
}

objective_metric_name = "validation:auc"

tuner = HyperparameterTuner(
    xgb_train, objective_metric_name, hyperparameter_ranges, 
    max_jobs=5,
    max_parallel_jobs=5,
)

from sagemaker.workflow.steps import TuningStep
    
step_tuning = TuningStep(
    name = "HPTuning",
    tuner = tuner,
    inputs={
        "train": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "train"
            ].S3Output.S3Uri,
            # s3_data= train_preproc_dir_artifact,            
            content_type="text/csv"
        ),
    },    
    cache_config = cache_config, # 캐시 정의        
)

## (7) 조건 스텝

In [16]:
from sagemaker.workflow.conditions import ConditionLessThanOrEqualTo
from sagemaker.workflow.condition_step import (
    ConditionStep,
    JsonGet,
)


cond_lte = ConditionLessThanOrEqualTo(
    left=JsonGet(
        step=step_eval,
        property_file=evaluation_report,
        json_path="binary_classification_metrics.auc.value",
    ),
    # right=8.0
    right = model_eval_threshold
)

step_cond = ConditionStep(
    name="FruadScratchCond",
    conditions=[cond_lte],
    if_steps=[step_tuning],        
    else_steps=[step_register, step_create_model], 
)

The class JsonGet has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


# 3.모델 빌딩 파이프라인 정의 및 실행
위에서 정의한 아래의 4개의 스텝으로 파이프라인 정의를 합니다.
-     steps=[step_process, step_train, step_create_model, step_deploy],
- 아래는 약 20분 정도 소요 됩니다.

In [17]:
from sagemaker.workflow.pipeline import Pipeline

project_prefix = 'sagemaker-pipeline-phase2-step-by-step'

pipeline_name = project_prefix
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        processing_instance_type, 
        processing_instance_count,
        training_instance_type,        
        training_instance_count,                
        input_data,
        model_eval_threshold,
        model_approval_status,        
    ],
#   steps=[step_process, step_train, step_register, step_eval, step_cond],
  steps=[step_process, step_train, step_eval, step_cond],
)



In [18]:
import json

definition = json.loads(pipeline.definition())
# definition

No finished training job found associated with this estimator. Please make sure this estimator is only used for building workflow config


### 파이프라인을 SageMaker에 제출하고 실행하기 


In [19]:
pipeline.upsert(role_arn=role)

No finished training job found associated with this estimator. Please make sure this estimator is only used for building workflow config
No finished training job found associated with this estimator. Please make sure this estimator is only used for building workflow config


{'PipelineArn': 'arn:aws:sagemaker:us-east-1:028703291518:pipeline/sagemaker-pipeline-phase2-step-by-step',
 'ResponseMetadata': {'RequestId': '0edb75d1-48e3-4424-b8f9-7cc84dd76b9e',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '0edb75d1-48e3-4424-b8f9-7cc84dd76b9e',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '106',
   'date': 'Fri, 27 Aug 2021 07:08:22 GMT'},
  'RetryAttempts': 0}}

디폴트값을 이용하여 파이프라인을 샐행합니다. 

In [20]:
execution = pipeline.start()

### 파이프라인 운영: 파이프라인 대기 및 실행상태 확인

워크플로우의 실행상황을 살펴봅니다. 

In [21]:
execution.describe()

{'PipelineArn': 'arn:aws:sagemaker:us-east-1:028703291518:pipeline/sagemaker-pipeline-phase2-step-by-step',
 'PipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:028703291518:pipeline/sagemaker-pipeline-phase2-step-by-step/execution/s20mgt50msys',
 'PipelineExecutionDisplayName': 'execution-1630048102956',
 'PipelineExecutionStatus': 'Executing',
 'CreationTime': datetime.datetime(2021, 8, 27, 7, 8, 22, 879000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2021, 8, 27, 7, 8, 22, 879000, tzinfo=tzlocal()),
 'CreatedBy': {},
 'LastModifiedBy': {},
 'ResponseMetadata': {'RequestId': '3eed2cf1-c0c4-4bc4-9aa1-5377372bb6f5',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '3eed2cf1-c0c4-4bc4-9aa1-5377372bb6f5',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '441',
   'date': 'Fri, 27 Aug 2021 07:08:22 GMT'},
  'RetryAttempts': 0}}

In [22]:
execution.wait()

실행이 완료될 때까지 기다립니다.

실행된 단계들을 리스트업합니다. 파이프라인의 단계실행 서비스에 의해 시작되거나 완료된 단계를 보여줍니다.

In [23]:
execution.list_steps()

[{'StepName': 'HPTuning',
  'StartTime': datetime.datetime(2021, 8, 27, 7, 22, 16, 323000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2021, 8, 27, 7, 26, 32, 426000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'TuningJob': {'Arn': 'arn:aws:sagemaker:us-east-1:028703291518:hyper-parameter-tuning-job/s20mgt50msys-hptunin-70xkmgxd1y'}}},
 {'StepName': 'FruadScratchCond',
  'StartTime': datetime.datetime(2021, 8, 27, 7, 22, 15, 588000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2021, 8, 27, 7, 22, 15, 960000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'Condition': {'Outcome': 'True'}}},
 {'StepName': 'FraudEval',
  'StartTime': datetime.datetime(2021, 8, 27, 7, 17, 56, 405000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2021, 8, 27, 7, 22, 15, 359000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:028703291518:processing-job/pipelines-s20mgt50msys-fraudeval-ncoqe

# 4. Pipeline 캐싱 및 파라미터 이용한 실행
- 캐싱은 2021년 7월 현재 Training, Processing, Transform 의 Step에 적용이 되어 있습니다.
- 상세 사항은 여기를 확인하세요. -->  [캐싱 파이프라인 단계](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/pipelines-caching.html)


In [24]:
is_cache = True

In [25]:
%%time 

from IPython.display import display as dp
import time

if is_cache:
    execution = pipeline.start(
        parameters=dict(
            model2eval2threshold=0.8,
        )
    )    
    
    # execution = pipeline.start()
    time.sleep(10)
    dp(execution.list_steps())    
    execution.wait()


[{'StepName': 'FraudScratchModel',
  'StartTime': datetime.datetime(2021, 8, 27, 7, 27, 2, 750000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2021, 8, 27, 7, 27, 3, 683000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'Model': {'Arn': 'arn:aws:sagemaker:us-east-1:028703291518:model/pipelines-ezxbhph3n6s5-fraudscratchmodel-66rcgc1wby'}}},
 {'StepName': 'FraudScratcRegisterhModel',
  'StartTime': datetime.datetime(2021, 8, 27, 7, 27, 2, 687000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2021, 8, 27, 7, 27, 3, 543000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'RegisterModel': {'Arn': 'arn:aws:sagemaker:us-east-1:028703291518:model-package/sagemaker-pipeline-step-by-step-phase01/3'}}},
 {'StepName': 'FruadScratchCond',
  'StartTime': datetime.datetime(2021, 8, 27, 7, 27, 2, 204000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2021, 8, 27, 7, 27, 2, 535000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'Condition': {

CPU times: user 25.1 ms, sys: 439 µs, total: 25.5 ms
Wall time: 10.5 s


In [26]:
if is_cache:
    dp(execution.list_steps())

[{'StepName': 'FraudScratchModel',
  'StartTime': datetime.datetime(2021, 8, 27, 7, 27, 2, 750000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2021, 8, 27, 7, 27, 3, 683000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'Model': {'Arn': 'arn:aws:sagemaker:us-east-1:028703291518:model/pipelines-ezxbhph3n6s5-fraudscratchmodel-66rcgc1wby'}}},
 {'StepName': 'FraudScratcRegisterhModel',
  'StartTime': datetime.datetime(2021, 8, 27, 7, 27, 2, 687000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2021, 8, 27, 7, 27, 3, 543000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'RegisterModel': {'Arn': 'arn:aws:sagemaker:us-east-1:028703291518:model-package/sagemaker-pipeline-step-by-step-phase01/3'}}},
 {'StepName': 'FruadScratchCond',
  'StartTime': datetime.datetime(2021, 8, 27, 7, 27, 2, 204000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2021, 8, 27, 7, 27, 2, 535000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'Condition': {