# [Module 6.1] Model Training on SageMaker Component in Kubeflow


아래는 Kubeflow 

![kubeflowPipeline](img/kubeflow-pipeline.png)

아래는 

![kubeflow-training](img/kubeflow-training.png)

![kubeflow-creating-model](img/kubeflow-creating-model.png)

![kubeflow-sagemaker-train-job](img/kubeflow-sagemaker-train-job.png)

**아래 pip install boto3 가 에러시 커널을 리스트하고 해주세요**

In [86]:
! pip install boto3 --user



## Install Kubeflow Pipelines SDK

In [87]:
!pip install https://storage.googleapis.com/ml-pipeline/release/0.1.29/kfp.tar.gz --upgrade --user

Collecting https://storage.googleapis.com/ml-pipeline/release/0.1.29/kfp.tar.gz
  Using cached https://storage.googleapis.com/ml-pipeline/release/0.1.29/kfp.tar.gz (88 kB)
Building wheels for collected packages: kfp
  Building wheel for kfp (setup.py) ... [?25ldone
[?25h  Created wheel for kfp: filename=kfp-0.1.29-py3-none-any.whl size=122731 sha256=45ffc40c41e7105b865aecf6904f0d0edf2d90c1244c7b9b137ee612639f094a
  Stored in directory: /tmp/pip-ephem-wheel-cache-s3etbspg/wheels/88/a3/ee/c4fcecb08dc7a40d1a893262178176ff83fa48e4caa1ce66b6
Successfully built kfp
Installing collected packages: kfp
  Attempting uninstall: kfp
    Found existing installation: kfp 0.1.29
    Uninstalling kfp-0.1.29:
      Successfully uninstalled kfp-0.1.29
Successfully installed kfp-0.1.29


In [88]:
import boto3

#################################
#################################
# REPLACE AWS_REGION= with the current region
#  surround with single quotes
AWS_REGION='ap-northeast-2'

AWS_ACCOUNT_ID=boto3.client('sts').get_caller_identity().get('Account')
print('Account ID: {}'.format(AWS_ACCOUNT_ID))

S3_BUCKET='sagemaker-{}-{}'.format(AWS_REGION, AWS_ACCOUNT_ID)
print('S3 Bucket: {}'.format(S3_BUCKET))

Account ID: 343441690612
S3 Bucket: sagemaker-ap-northeast-2-343441690612


# Build Pipeline 

## 1. Run the following command to load Kubeflow Pipelines SDK

In [89]:
import kfp
from kfp import components
from kfp import dsl
from kfp.aws import use_aws_secret

## 2.Load reusable sagemaker components

In [90]:
# sagemaker_train_op = components.load_component_from_url('https://raw.githubusercontent.com/kubeflow/pipelines/0ad6c28d32e2e790e6a129b7eb1de8ec59c1d45f/components/aws/sagemaker/train/component.yaml')
sagemaker_train_op = components.load_component_from_url('https://raw.githubusercontent.com/kubeflow/pipelines/cb36f87b727df0578f4c1e3fe9c24a30bb59e5a2/components/aws/sagemaker/train/component.yaml')
sagemaker_model_op = components.load_component_from_url('https://raw.githubusercontent.com/kubeflow/pipelines/0ad6c28d32e2e790e6a129b7eb1de8ec59c1d45f/components/aws/sagemaker/model/component.yaml')
sagemaker_deploy_op = components.load_component_from_url('https://raw.githubusercontent.com/kubeflow/pipelines/0ad6c28d32e2e790e6a129b7eb1de8ec59c1d45f/components/aws/sagemaker/deploy/component.yaml')


In [91]:
train_prefix = 'sagemaker-scikit-learn-2020-08-02-11-44-02-899/output/bert-train'
s3_train = "s3://{}/{}".format(S3_BUCKET, train_prefix)
print("s3_train: \n", s3_train)

validation_prefix = 'sagemaker-scikit-learn-2020-08-02-11-44-02-899/output/bert-validation'
s3_validation = "s3://{}/{}".format(S3_BUCKET, validation_prefix)
print("s3_validation: \n", s3_validation)

test_prefix = 'sagemaker-scikit-learn-2020-08-02-11-44-02-899/output/bert-test'
s3_test = "s3://{}/{}".format(S3_BUCKET, test_prefix)
print("s3_test: \n", s3_test)

s3_train: 
 s3://sagemaker-ap-northeast-2-343441690612/sagemaker-scikit-learn-2020-08-02-11-44-02-899/output/bert-train
s3_validation: 
 s3://sagemaker-ap-northeast-2-343441690612/sagemaker-scikit-learn-2020-08-02-11-44-02-899/output/bert-validation
s3_test: 
 s3://sagemaker-ap-northeast-2-343441690612/sagemaker-scikit-learn-2020-08-02-11-44-02-899/output/bert-test


In [92]:
channels='[ \
                    { \
                        "ChannelName": "train", \
                        "DataSource": { \
                            "S3DataSource": { \
                                "S3DataType": "S3Prefix", \
                                "S3Uri": "'+s3_train+'", \
                                "S3DataDistributionType": "ShardedByS3Key" \
                            } \
                        }, \
                        "CompressionType": "None", \
                        "RecordWrapperType": "None" \
                    }, \
                    { \
                        "ChannelName": "validation", \
                        "DataSource": { \
                            "S3DataSource": { \
                                "S3DataType": "S3Prefix", \
                                "S3Uri": "'+s3_validation+'", \
                                "S3DataDistributionType": "ShardedByS3Key" \
                            } \
                        }, \
                        "CompressionType": "None", \
                        "RecordWrapperType": "None" \
                    }, \
                    { \
                        "ChannelName": "test", \
                        "DataSource": { \
                            "S3DataSource": { \
                                "S3DataType": "S3Prefix", \
                                "S3Uri": "'+s3_test+'", \
                                "S3DataDistributionType": "ShardedByS3Key" \
                            } \
                        }, \
                        "CompressionType": "None", \
                        "RecordWrapperType": "None" \
                    } \
                ]'

In [93]:
epochs= "10"
train_steps_per_epoch= "100"

max_seq_length = "32"
learning_rate= "1e-5"
epsilon= "0.00000001"
train_batch_size= "128"
validation_batch_size= "128"
test_batch_size= "128"

validation_steps= "100"
test_steps= "100"

train_instance_count= "2" 
train_instance_type='ml.p3.2xlarge'
train_volume_size= "1024"

use_xla= "True"
use_amp= "True"
freeze_bert_layer= "False"
enable_checkpointing= "True"
input_mode='Pipe'

## 3.Create Pipeline

In [94]:
SAGEMAKER_ROLE_ARN = 'arn:aws:iam::343441690612:role/service-role/AmazonSageMaker-ExecutionRole-20200801T163342'

In [95]:

# Configure your s3 bucket
# S3_PIPELINE_PATH= 's3://{}/'.format(S3_BUCKET)
# processed_train_data_s3_uri = 's3://sagemaker-us-west-2-057716757052/sagemaker-scikit-learn-2020-06-28-05-08-39-660/output/bert-train'
# processed_validation_data_s3_uri = 's3://sagemaker-us-west-2-057716757052/sagemaker-scikit-learn-2020-06-28-05-08-39-660/output/bert-validation'
# processed_test_data_s3_uri = 's3://sagemaker-us-west-2-057716757052/sagemaker-scikit-learn-2020-06-28-05-08-39-660/output/bert-test'

if AWS_REGION == 'ap-northeast-2':
    AWS_ECR_TRAIN_REGISTRY = "343441690612.dkr.ecr.ap-northeast-2.amazonaws.com/bert2tweet:latest"
    

# TF_INFER_IMAGE = "520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tensorflow-serving:1.12.0-cpu"
# TF_INFER_IMAGE = "520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tensorflow-serving:1.14.0-cpu"
# TF_INFER_IMAGE = "520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tensorflow-serving:2.1.0-cpu"
# TF_INFER_IMAGE = '520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tensorflow-serving:1.13.1-gpu'
TF_INFER_IMAGE = '520713654638.dkr.ecr.ap-northeast-2.amazonaws.com/sagemaker-tensorflow-serving:1.14.0-gpu'

model_output_prefix = 'sagemaker-scikit-learn-2020-08-02-01-14-52-546/model'
model_output_path = 's3://{}/{}'.format(S3_BUCKET,model_output_prefix )
# model_output_path = 's3://sagemaker-us-west-2-057716757052/sagemaker-scikit-learn-2020-06-28-05-08-39-660/model'

In [96]:
@dsl.pipeline(
    name='Tweet BERT Classification pipeline',
    description='Tweet BERT Classification using KMEANS in SageMaker'
)
def tweet_BERT(
    region = AWS_REGION,
    image = AWS_ECR_TRAIN_REGISTRY,
    dataset_path = channels,
    instance_type = 'ml.p3.2xlarge',
    instance_count = 2,
    volume_size = '50',
    model_putput_path = model_output_path,
    role_arn = SAGEMAKER_ROLE_ARN,
    network_isolation='False',
    traffic_encryption='False',
    spot_instance='False'    
    ):
    # Component 1
    training = sagemaker_train_op(
        region = region,
        image = image,
        channels=channels,        
        instance_type = instance_type,
        instance_count = instance_count,
        volume_size = volume_size,
        model_artifact_path=model_output_path,
        role=role_arn,
        network_isolation=network_isolation,
        traffic_encryption=traffic_encryption,
        spot_instance=spot_instance,        
        hyperparameters={'epochs': epochs,
                        'learning_rate': learning_rate,
                        'epsilon': epsilon,
                        'train_batch_size': train_batch_size,
                        'validation_batch_size': validation_batch_size,
                        'test_batch_size': test_batch_size,                                             
                        'train_steps_per_epoch': train_steps_per_epoch,
                        'validation_steps': validation_steps,
                        'test_steps': test_steps,
                        'use_xla': use_xla,
                        'use_amp': use_amp,                                             
                        'max_seq_length': max_seq_length,
                        'freeze_bert_layer': freeze_bert_layer,
                        'enable_checkpointing': enable_checkpointing
                        },        
    ).apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))        
    # Component 2
    create_model = sagemaker_model_op(
        region = region,
        image = TF_INFER_IMAGE,
        model_artifact_url = training.outputs['model_artifact_url'],
        model_name = training.outputs['job_name'],
        role = role_arn
    ).apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
    
#     # Component 3
#     prediction = sagemaker_deploy_op(
#         region=region,
#         model_name=create_model.output
#     ).apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))


In [97]:
kfp.compiler.Compiler().compile(tweet_BERT, 'tweet_BERT.zip')



In [98]:
!unzip -o ./tweet_BERT.zip

Archive:  ./tweet_BERT.zip
  inflating: pipeline.yaml           


In [99]:
# !cat pipeline.yaml

In [100]:
import time

In [101]:
client = kfp.Client()
aws_experiment = client.create_experiment(name='aws')

exp_name    = f'tweet-BERT-train-deploy-kfp-{time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())}'
my_run = client.run_pipeline(aws_experiment.id, exp_name, 'tweet_BERT.zip')



Failed to load kube config.


MaxRetryError: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /apis/v1beta1/experiments (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff017d21358>: Failed to establish a new connection: [Errno 111] Connection refused',))