# SageMakerでカスタムコンテナを使った学習
sagemaker-training-toolkitを使ってカスタムコンテナを作り、それを用いてSageMaker上で学習を実施する。

## イメージのビルド
`sagemaker-studio-image-build`をインストールし、`sm-docker` コマンドを使ってイメージをビルドする。

In [1]:
# sagemaker-trainingが含まれているDockerfile
%cat Dockerfile

FROM python:3.9.13-bullseye

RUN pip install -U pip && \
    pip install sagemaker-training sklearn pandas

COPY train.py /opt/ml/code/train.py

ENV SAGEMAKER_PROGRAM train.py


In [2]:
# trainingに使うスクリプト
%pycat train.py

[0;32mimport[0m [0margparse[0m[0;34m[0m
[0;34m[0m[0;32mimport[0m [0mos[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0;32mimport[0m [0mjoblib[0m[0;34m[0m
[0;34m[0m[0;32mimport[0m [0mpandas[0m [0;32mas[0m [0mpd[0m[0;34m[0m
[0;34m[0m[0;32mfrom[0m [0msklearn[0m [0;32mimport[0m [0mtree[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0;32mif[0m [0m__name__[0m [0;34m==[0m [0;34m"__main__"[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0mparser[0m [0;34m=[0m [0margparse[0m[0;34m.[0m[0mArgumentParser[0m[0;34m([0m[0;34m)[0m[0;34m[0m
[0;34m[0m    [0mparser[0m[0;34m.[0m[0madd_argument[0m[0;34m([0m[0;34m"--max_leaf_nodes"[0m[0;34m,[0m [0mtype[0m[0;34m=[0m[0mint[0m[0;34m,[0m [0mdefault[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m)[0m[0;34m[0m
[0;34m[0m    [0margs[0m [0;34m=[0m [0mparser[0m[0;34m.[0m[0mparse_args[0m[0;34m([0m[0;34m)[0m[0;34m[0m
[0;34m[0m    [0mchannel_train[0m [0;34m=[0

In [3]:
!pip install -U pip
!pip install sagemaker-studio-image-build
!sm-docker build .

[0m...........[Container] 2022/08/26 20:57:20 Waiting for agent ping

[Container] 2022/08/26 20:57:21 Waiting for DOWNLOAD_SOURCE
[Container] 2022/08/26 20:57:24 Phase is DOWNLOAD_SOURCE
[Container] 2022/08/26 20:57:24 CODEBUILD_SRC_DIR=/codebuild/output/src573399070/src
[Container] 2022/08/26 20:57:24 YAML location is /codebuild/output/src573399070/src/buildspec.yml
[Container] 2022/08/26 20:57:24 Setting HTTP client timeout to higher timeout for S3 source
[Container] 2022/08/26 20:57:24 Processing environment variables
[Container] 2022/08/26 20:57:24 No runtime version selected in buildspec.
[Container] 2022/08/26 20:57:24 Moving to directory /codebuild/output/src573399070/src
[Container] 2022/08/26 20:57:24 Configuring ssm agent with target id: codebuild:37f8ae1c-b488-4938-85a9-87730ab6a1cf
[Container] 2022/08/26 20:57:24 Successfully updated ssm agent configuration
[Container] 2022/08/26 20:57:24 Registering with agent
[Container] 2022/08/26 20:57:24 Phases found in YAML: 3
[Conta

## トレーニングの実行
image_uriでECRにプッシュしたイメージをしてする以外は、SageMakerで通常のトレーニングを行うときと同様の実装となる。

In [4]:
import boto3
import sagemaker

sagemaker_session = sagemaker.Session()

client = boto3.client("sagemaker-runtime")

instance_count = 1
instance_type = "ml.m4.xlarge"
train_input_s3_path = (
    "s3://sagemaker-us-east-1-980831117329/workshop/sklearn-endpoint/data"
)
train_output_s3_path = (
    "s3://sagemaker-us-east-1-980831117329/workshop/custom-container/output"
)
role = sagemaker.get_execution_role()

image_uri = "980831117329.dkr.ecr.us-east-1.amazonaws.com/sagemaker-studio-d-2fqm8ncc55ww:default-1643235874544"

In [5]:
sm = sagemaker.estimator.Estimator(
    image_uri=image_uri,
    role=role,
    instance_count=instance_count,
    instance_type=instance_type,
    max_run=86400,
    output_path=train_output_s3_path,
    sagemaker_session=sagemaker_session,
    hyperparameters={"max_leaf_nodes": 30},
)

sm.fit({"train": train_input_s3_path})

2022-08-26 21:01:30 Starting - Starting the training job...ProfilerReport-1661547690: InProgress
...
2022-08-26 21:02:10 Starting - Preparing the instances for training.........
2022-08-26 21:03:54 Downloading - Downloading input data...
2022-08-26 21:04:25 Training - Training image download completed. Training in progress.
2022-08-26 21:04:25 Uploading - Uploading generated training model[34m2022-08-26 21:04:17,333 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-08-26 21:04:17,356 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-08-26 21:04:17,369 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-08-26 21:04:17,380 sagemaker-training-toolkit INFO     Invoking user script[0m
[34mTraining Env:[0m
[34m{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "train": "/opt/ml/input/data/train"
    },
    "current_host"