## Automate Model Retraining & Deployment Using the AWS Step Functions Data Science SDK

**This sample is provided for demonstration purposes, make sure to conduct appropriate testing if derivating this code for your own use-cases!**

This notebook describes how to use the AWS Step Functions Data Science SDK to create a machine learning model retraining workflow. The Step Functions SDK is an open source library that allows data scientists to easily create and execute machine learning workflows using AWS Step Functions and Amazon SageMaker. For more information, please see the following resources:
* [AWS Step Functions](https://aws.amazon.com/step-functions/)
* [AWS Step Functions Developer Guide](https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html)
* [AWS Step Functions Data Science SDK](https://aws-step-functions-data-science-sdk.readthedocs.io)


### Step 0: Get Admin Setup Results
Bucket names, codecommit repo, docker image, IAM roles, ...

In order to keep things orginized, we will save our `Source Code` (data processing, model training/serving scripts), `datasets`, as well as our trained `model(s) binaries` and their `test-performance metrics` all on S3, **versioned with respect to the date/time of each update.**

In [1]:
# Upgrade the stepfunctions library
import sys
!{sys.executable} -m pip install --upgrade stepfunctions

Requirement already up-to-date: stepfunctions in /home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages (1.1.1)
You should consider upgrading via the '/home/ec2-user/anaconda3/envs/tensorflow_p36/bin/python -m pip install --upgrade pip' command.[0m


In [1]:
import json
import boto3
import logging
import stepfunctions
from stepfunctions import steps
from time import gmtime, strftime
from stepfunctions.steps.choice_rule import ChoiceRule
from stepfunctions.steps import TrainingStep, ModelStep
from stepfunctions.inputs import ExecutionInput
from stepfunctions.workflow import Workflow
stepfunctions.set_stream_logger(level=logging.INFO)
session = boto3.session.Session()


# Set project bucket, IAM Roles and Docker Image for Training
with open('admin_setup.txt', 'r') as filehandle:
    admin_setup = json.load(filehandle)

WORKFLOW_DATE_TIME = strftime("%Y-%m-%d-%H-%M-%S", gmtime())

WORKFLOW_NAME = "my-project-2"
SOURCE_DATA = admin_setup["raw_data_path"]
BUCKET = admin_setup["project_bucket"]
REGION = session.region_name

REPO = admin_setup["repo_name"]
BRANCH = "master"
TRAINING_IMAGE = admin_setup["docker_image"]
WORKFLOW_EXECUTION_ROLE = admin_setup["workflow_execution_role"]

### Define Wrokflow Schema

In [2]:
my_workflow_input_schema = {
    #ADMIN
    "REGION":str,
    "ROLE_ARN":str,
    "BUCKET":str,
    "WORKFLOW_NAME":str,
    "WORKFLOW_DATE_TIME":str,
    "DATA_SOURCE":str,
    
    # CodeCommit
    "REPO":str,
    "BRANCH":str,
    "DATA_PROCESSING_DIR":str,
    "ML_DIR":str,
    
    # SM Processing
    "PROCESSING_SCRIPT":str,
    "PROCESSING_IMAGE":str,
    "PROCESSING_INSTANCE_TYPE":str,
    "PROCESSING_INSTANCE_COUNT":int,
    "PROCESSING_VOLUME_SIZE_GB":int,
    
    # SM TRAINING
    "TRAINING_SCRIPT":str,
    "TRAINING_IMAGE":str,
    "TRAINING_INSTANCE_TYPE":str,
    "TRAINING_INSTANCE_COUNT":int,
    "TRAINING_VOLUME_SIZE_GB":int,
    
    # SM SERVING
    "SERVING_SCRIPT":str,
    "SERVING_IMAGE":str,
    "SERVING_INSTANCE_TYPE":str,
    "SERVING_INSTANCE_COUNT":int,
    "SERVING_VOLUME_SIZE_GB":int,
}
my_execution_input = ExecutionInput(schema=my_workflow_input_schema)

In [3]:
# StepN: Create Fail State
fail_step = steps.states.Fail(
    'Workflow Failed',
    comment='Either Validation accuracy is lower than threshold or one of processing, training, deployment jobs has faild.'
)

# Step1: Copy source code from CodeCommit to S3
codecommit_to_s3_step = steps.compute.LambdaStep(
    state_id = 'Put SourceCode on S3',
    parameters={ 
        "FunctionName": WORKFLOW_NAME + '-codecommit-to-s3',
        'Payload':{
            "REGION": my_execution_input["REGION"],
            "BUCKET": my_execution_input["BUCKET"],
            "WORKFLOW_DATE_TIME": my_execution_input["WORKFLOW_DATE_TIME"],
            "REPO": my_execution_input["REPO"],
            "BRANCH": my_execution_input["BRANCH"],
            "ML_DIR": my_execution_input["ML_DIR"],
            "DATA_PROCESSING_DIR": my_execution_input["DATA_PROCESSING_DIR"]
        }
    }
)

# Step2: Run SageMaker Data Processing Job
data_processing_step = steps.compute.LambdaStep(
    state_id = 'Run SageMaker Processing',
    parameters={  
        "FunctionName": WORKFLOW_NAME + '-create-sagemaker-prcoessing-job',
        'Payload':{
            "DATA_SOURCE":SOURCE_DATA,
            "BUCKET": my_execution_input["BUCKET"],
            "WORKFLOW_NAME": my_execution_input["WORKFLOW_NAME"],
            "WORKFLOW_DATE_TIME": my_execution_input["WORKFLOW_DATE_TIME"],
            "PROCESSING_INSTANCE_TYPE": my_execution_input["PROCESSING_INSTANCE_TYPE"],
            "PROCESSING_INSTANCE_COUNT": my_execution_input["PROCESSING_INSTANCE_COUNT"],
            "PROCESSING_VOLUME_SIZE_GB": my_execution_input["PROCESSING_VOLUME_SIZE_GB"],
            "PROCESSING_IMAGE": my_execution_input["PROCESSING_IMAGE"],
            "PROCESSING_SCRIPT": my_execution_input["PROCESSING_SCRIPT"],
            "ROLE_ARN": my_execution_input["ROLE_ARN"]
        }
    }
)

# Step3: Wait a little bit
wait_for_data_processing = steps.states.Wait(
    state_id = "Wait 30 Seconds",
    seconds = 30
)

# Step4: Check if processing job has finished
get_processing_status = steps.compute.LambdaStep(
    state_id = "Get SageMaker Processing Status",
    parameters={  
        "FunctionName": WORKFLOW_NAME + '-query-data-processing-status',
        'Payload':{
            "WORKFLOW_NAME": my_execution_input["WORKFLOW_NAME"],
            "WORKFLOW_DATE_TIME": my_execution_input["WORKFLOW_DATE_TIME"]
        }
    }
)

# Step5: If processing job is not done, go back to waiting (Step3), if done go to Step6, else go to failure
# We will author this step later
# ...

# Step6: Start SageMaker Training Job
model_training_step = steps.compute.LambdaStep(
    'Run Model Training Job',
    parameters={  
        "FunctionName": WORKFLOW_NAME + '-create-sagemaker-training-job',
        'Payload':{
            "BUCKET": my_execution_input["BUCKET"],
            "WORKFLOW_NAME": my_execution_input["WORKFLOW_NAME"],
            "WORKFLOW_DATE_TIME": my_execution_input["WORKFLOW_DATE_TIME"],
            "TRAINING_INSTANCE_TYPE": my_execution_input["TRAINING_INSTANCE_TYPE"],
            "TRAINING_INSTANCE_COUNT": my_execution_input["TRAINING_INSTANCE_COUNT"],
            "TRAINING_VOLUME_SIZE_GB": my_execution_input["TRAINING_VOLUME_SIZE_GB"],
            "TRAINING_IMAGE": my_execution_input["TRAINING_IMAGE"],
            "TRAINING_SCRIPT": my_execution_input["TRAINING_SCRIPT"],
            "ROLE_ARN": my_execution_input["ROLE_ARN"]
        }
    }
)

# Step5: If processing job is not done, go back to waiting (Step3), if done go to Step6, else go to failure
check_pocessing_status = steps.states.Choice(
    state_id = "Processing Job Complete?",
)

processing_job_output = get_processing_status.output()['Payload']['ProcessingJobStatus']

completed_rule = ChoiceRule.StringEquals(variable=processing_job_output, value="Completed")
in_progress_rule = ChoiceRule.StringEquals(variable=processing_job_output, value="InProgress")

check_pocessing_status.add_choice(rule=completed_rule, next_step=model_training_step)
check_pocessing_status.add_choice(rule=in_progress_rule, next_step=wait_for_data_processing)
check_pocessing_status.default_choice(fail_step)



# Step7: Wait a little bit
wait_for_training = steps.states.Wait(
    state_id = "Wait 60 Seconds",
    seconds = 60
)

# Step8: Check if training job has finished
get_training_status = steps.compute.LambdaStep(
    state_id = "Get Training Job Status",
    parameters={  
        "FunctionName": WORKFLOW_NAME + '-query-training-status',
        'Payload':{
            "WORKFLOW_NAME": my_execution_input["WORKFLOW_NAME"],
            "WORKFLOW_DATE_TIME": my_execution_input["WORKFLOW_DATE_TIME"]
        }
    }
)


# Step9: If training job is not done, go back to waiting (Step7), if done go to Step10, else go to failure
# We will author this step later
# ...

# Step10: Get model accuracy (custom print to logs during training)
get_model_accuracy = steps.compute.LambdaStep(
    state_id = "Get Model Median Abs. Err.",
    parameters={  
        "FunctionName": WORKFLOW_NAME + '-query-model-accuracy',
        'Payload':{
            "WORKFLOW_NAME": my_execution_input["WORKFLOW_NAME"],
            "WORKFLOW_DATE_TIME": my_execution_input["WORKFLOW_DATE_TIME"]
        }
    }
)

# Step9: If training job is not done, go back to waiting (Step7), if done go to Step10, else go to failure
check_training_status = steps.states.Choice(
    state_id = "Training Job Complete?",
)

training_job_output = get_training_status.output()['Payload']['TrainingJobStatus']

completed_rule = ChoiceRule.StringEquals(variable=training_job_output, value="Completed")
in_progress_rule = ChoiceRule.StringEquals(variable=training_job_output,value="InProgress")

check_training_status.add_choice(rule=completed_rule, next_step=get_model_accuracy)
check_training_status.add_choice(rule=in_progress_rule, next_step=wait_for_training)
check_training_status.default_choice(fail_step)


# Step11: If model's Median Abs. Err. is less than 2, go back to next step (deployment), else go to failure
# We will author this step later
# ...

# Step12: Create Endpoint (or update it if it exists)
deploy_model_step = steps.compute.LambdaStep(
    'Deploy Model',
    parameters={  
        "FunctionName": WORKFLOW_NAME + '-deploy-sagemaker-model-job',
        'Payload':{
            "REGION": my_execution_input["REGION"],
            "BUCKET": my_execution_input["BUCKET"],
            "WORKFLOW_NAME": my_execution_input["WORKFLOW_NAME"],
            "WORKFLOW_DATE_TIME": my_execution_input["WORKFLOW_DATE_TIME"],
            "SERVING_INSTANCE_TYPE": my_execution_input["SERVING_INSTANCE_TYPE"],
            "SERVING_INSTANCE_COUNT": my_execution_input["SERVING_INSTANCE_COUNT"],
            "SERVING_IMAGE": my_execution_input["SERVING_IMAGE"],
            "SERVING_SCRIPT": my_execution_input["SERVING_SCRIPT"],
            "ROLE_ARN": my_execution_input["ROLE_ARN"]
        }
    }
)


# Step11: If model's Median Abs. Err. is less than 3, go back to next step (deployment), else go to failure
check_accuracy_step = steps.states.Choice(
    'Median-AE < 3'
)
mae = get_model_accuracy.output()['Payload']['trainingMetrics'][0]['Value']
threshold_rule = ChoiceRule.NumericLessThan(variable=mae, value=3)
check_accuracy_step.add_choice(rule=threshold_rule, next_step=deploy_model_step)
check_accuracy_step.default_choice(next_step=fail_step)

### Link all the Steps Together
We create a workflow definition by chaining all of the steps together that we've created. See [Chain](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.states.Chain) in the AWS Step Functions Data Science SDK documentation to learn more.

In [4]:
# Chain Steps 5-16
codecommit_to_s3_step.next(data_processing_step)
data_processing_step.next(wait_for_data_processing)
wait_for_data_processing.next(get_processing_status)
get_processing_status.next(check_pocessing_status)
model_training_step.next(wait_for_training)
wait_for_training.next(get_training_status)
get_training_status.next(check_training_status)
get_model_accuracy.next(check_accuracy_step)

# Chain the whole workflow
workflow_definition = steps.Chain([
    codecommit_to_s3_step
    #wait_for_etl_step,
    #get_etl_status,
    #check_etl_status
])

In [5]:
workflow = Workflow(
    name=WORKFLOW_NAME+'-sep23',
    definition=workflow_definition,
    role=WORKFLOW_EXECUTION_ROLE,
    execution_input=my_execution_input
)

Create your workflow using the workflow definition above, and render the graph with [render_graph](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Workflow.render_graph):

In [6]:
workflow.render_graph()

Create the workflow in AWS Step Functions with [create](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Workflow.create):

In [7]:
workflow.create()

[32m[INFO] Workflow created successfully on AWS Step Functions.[0m


'arn:aws:states:us-east-1:227921966468:stateMachine:my-project-2-sep23'

Run the workflow with [execute](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Workflow.execute):

In [8]:
my_execution_input_values = {
    #ADMIN
    "REGION":REGION,
    "ROLE_ARN":WORKFLOW_EXECUTION_ROLE,
    "BUCKET":BUCKET,
    "WORKFLOW_NAME": WORKFLOW_NAME,
    "WORKFLOW_DATE_TIME":WORKFLOW_DATE_TIME,
    "DATA_SOURCE":SOURCE_DATA,

    # CodeCommit
    "REPO":REPO,
    "BRANCH":BRANCH,
    "DATA_PROCESSING_DIR": "sagemaker-processing-src",
    "ML_DIR": "sagemaker-train-serve-src",
    
    # SM Processing
    "PROCESSING_SCRIPT":"processing.py",
    "PROCESSING_IMAGE":TRAINING_IMAGE,
    "PROCESSING_INSTANCE_TYPE":"ml.c5.xlarge",
    "PROCESSING_INSTANCE_COUNT":1,
    "PROCESSING_VOLUME_SIZE_GB":10,
    
    # SM TRAINING
    "TRAINING_SCRIPT":"train.py",
    "TRAINING_IMAGE":TRAINING_IMAGE,
    "TRAINING_INSTANCE_TYPE":"ml.c5.xlarge",
    "TRAINING_INSTANCE_COUNT":1,
    "TRAINING_VOLUME_SIZE_GB":10,
    
    # SM SERVING
    "SERVING_SCRIPT":"serve.py",
    "SERVING_IMAGE":TRAINING_IMAGE,
    "SERVING_INSTANCE_TYPE":"ml.c5.xlarge",
    "SERVING_INSTANCE_COUNT":1,
    "SERVING_VOLUME_SIZE_GB":10,
}

execution = workflow.execute(inputs=my_execution_input_values)

[32m[INFO] Workflow execution started successfully on AWS Step Functions.[0m


Render workflow progress with the [render_progress](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Execution.render_progress). This generates a snapshot of the current state of your workflow as it executes. This is a static image therefore you must run the cell again to check progress:

In [11]:
execution.render_progress()

In [10]:
execution.list_events(html=True)

ID,Type,Step,Resource,Elapsed Time (ms),Timestamp
1,ExecutionStarted,,-,0.0,"Sep 23, 2020 06:10:18.253 PM"
"{  ""input"": {  ""REGION"": ""us-east-1"",  ""ROLE_ARN"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"",  ""BUCKET"": ""my-project-227921966468"",  ""WORKFLOW_NAME"": ""my-project-2"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_SOURCE"": ""s3://my-datalake-227921966468/data/boston.csv"",  ""REPO"": ""my-project"",  ""BRANCH"": ""master"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""PROCESSING_SCRIPT"": ""processing.py"",  ""PROCESSING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""PROCESSING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""PROCESSING_INSTANCE_COUNT"": 1,  ""PROCESSING_VOLUME_SIZE_GB"": 10,  ""TRAINING_SCRIPT"": ""train.py"",  ""TRAINING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""TRAINING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""TRAINING_INSTANCE_COUNT"": 1,  ""TRAINING_VOLUME_SIZE_GB"": 10,  ""SERVING_SCRIPT"": ""serve.py"",  ""SERVING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""SERVING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""SERVING_INSTANCE_COUNT"": 1,  ""SERVING_VOLUME_SIZE_GB"": 10  },  ""roleArn"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"" }","{  ""input"": {  ""REGION"": ""us-east-1"",  ""ROLE_ARN"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"",  ""BUCKET"": ""my-project-227921966468"",  ""WORKFLOW_NAME"": ""my-project-2"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_SOURCE"": ""s3://my-datalake-227921966468/data/boston.csv"",  ""REPO"": ""my-project"",  ""BRANCH"": ""master"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""PROCESSING_SCRIPT"": ""processing.py"",  ""PROCESSING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""PROCESSING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""PROCESSING_INSTANCE_COUNT"": 1,  ""PROCESSING_VOLUME_SIZE_GB"": 10,  ""TRAINING_SCRIPT"": ""train.py"",  ""TRAINING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""TRAINING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""TRAINING_INSTANCE_COUNT"": 1,  ""TRAINING_VOLUME_SIZE_GB"": 10,  ""SERVING_SCRIPT"": ""serve.py"",  ""SERVING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""SERVING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""SERVING_INSTANCE_COUNT"": 1,  ""SERVING_VOLUME_SIZE_GB"": 10  },  ""roleArn"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"" }","{  ""input"": {  ""REGION"": ""us-east-1"",  ""ROLE_ARN"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"",  ""BUCKET"": ""my-project-227921966468"",  ""WORKFLOW_NAME"": ""my-project-2"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_SOURCE"": ""s3://my-datalake-227921966468/data/boston.csv"",  ""REPO"": ""my-project"",  ""BRANCH"": ""master"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""PROCESSING_SCRIPT"": ""processing.py"",  ""PROCESSING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""PROCESSING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""PROCESSING_INSTANCE_COUNT"": 1,  ""PROCESSING_VOLUME_SIZE_GB"": 10,  ""TRAINING_SCRIPT"": ""train.py"",  ""TRAINING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""TRAINING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""TRAINING_INSTANCE_COUNT"": 1,  ""TRAINING_VOLUME_SIZE_GB"": 10,  ""SERVING_SCRIPT"": ""serve.py"",  ""SERVING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""SERVING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""SERVING_INSTANCE_COUNT"": 1,  ""SERVING_VOLUME_SIZE_GB"": 10  },  ""roleArn"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"" }","{  ""input"": {  ""REGION"": ""us-east-1"",  ""ROLE_ARN"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"",  ""BUCKET"": ""my-project-227921966468"",  ""WORKFLOW_NAME"": ""my-project-2"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_SOURCE"": ""s3://my-datalake-227921966468/data/boston.csv"",  ""REPO"": ""my-project"",  ""BRANCH"": ""master"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""PROCESSING_SCRIPT"": ""processing.py"",  ""PROCESSING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""PROCESSING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""PROCESSING_INSTANCE_COUNT"": 1,  ""PROCESSING_VOLUME_SIZE_GB"": 10,  ""TRAINING_SCRIPT"": ""train.py"",  ""TRAINING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""TRAINING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""TRAINING_INSTANCE_COUNT"": 1,  ""TRAINING_VOLUME_SIZE_GB"": 10,  ""SERVING_SCRIPT"": ""serve.py"",  ""SERVING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""SERVING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""SERVING_INSTANCE_COUNT"": 1,  ""SERVING_VOLUME_SIZE_GB"": 10  },  ""roleArn"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"" }","{  ""input"": {  ""REGION"": ""us-east-1"",  ""ROLE_ARN"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"",  ""BUCKET"": ""my-project-227921966468"",  ""WORKFLOW_NAME"": ""my-project-2"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_SOURCE"": ""s3://my-datalake-227921966468/data/boston.csv"",  ""REPO"": ""my-project"",  ""BRANCH"": ""master"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""PROCESSING_SCRIPT"": ""processing.py"",  ""PROCESSING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""PROCESSING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""PROCESSING_INSTANCE_COUNT"": 1,  ""PROCESSING_VOLUME_SIZE_GB"": 10,  ""TRAINING_SCRIPT"": ""train.py"",  ""TRAINING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""TRAINING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""TRAINING_INSTANCE_COUNT"": 1,  ""TRAINING_VOLUME_SIZE_GB"": 10,  ""SERVING_SCRIPT"": ""serve.py"",  ""SERVING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""SERVING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""SERVING_INSTANCE_COUNT"": 1,  ""SERVING_VOLUME_SIZE_GB"": 10  },  ""roleArn"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"" }","{  ""input"": {  ""REGION"": ""us-east-1"",  ""ROLE_ARN"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"",  ""BUCKET"": ""my-project-227921966468"",  ""WORKFLOW_NAME"": ""my-project-2"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_SOURCE"": ""s3://my-datalake-227921966468/data/boston.csv"",  ""REPO"": ""my-project"",  ""BRANCH"": ""master"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""PROCESSING_SCRIPT"": ""processing.py"",  ""PROCESSING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""PROCESSING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""PROCESSING_INSTANCE_COUNT"": 1,  ""PROCESSING_VOLUME_SIZE_GB"": 10,  ""TRAINING_SCRIPT"": ""train.py"",  ""TRAINING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""TRAINING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""TRAINING_INSTANCE_COUNT"": 1,  ""TRAINING_VOLUME_SIZE_GB"": 10,  ""SERVING_SCRIPT"": ""serve.py"",  ""SERVING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""SERVING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""SERVING_INSTANCE_COUNT"": 1,  ""SERVING_VOLUME_SIZE_GB"": 10  },  ""roleArn"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"" }"
2,TaskStateEntered,Put SourceCode on S3,-,35.0,"Sep 23, 2020 06:10:18.288 PM"
"{  ""name"": ""Put SourceCode on S3"",  ""input"": {  ""REGION"": ""us-east-1"",  ""ROLE_ARN"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"",  ""BUCKET"": ""my-project-227921966468"",  ""WORKFLOW_NAME"": ""my-project-2"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_SOURCE"": ""s3://my-datalake-227921966468/data/boston.csv"",  ""REPO"": ""my-project"",  ""BRANCH"": ""master"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""PROCESSING_SCRIPT"": ""processing.py"",  ""PROCESSING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""PROCESSING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""PROCESSING_INSTANCE_COUNT"": 1,  ""PROCESSING_VOLUME_SIZE_GB"": 10,  ""TRAINING_SCRIPT"": ""train.py"",  ""TRAINING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""TRAINING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""TRAINING_INSTANCE_COUNT"": 1,  ""TRAINING_VOLUME_SIZE_GB"": 10,  ""SERVING_SCRIPT"": ""serve.py"",  ""SERVING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""SERVING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""SERVING_INSTANCE_COUNT"": 1,  ""SERVING_VOLUME_SIZE_GB"": 10  } }","{  ""name"": ""Put SourceCode on S3"",  ""input"": {  ""REGION"": ""us-east-1"",  ""ROLE_ARN"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"",  ""BUCKET"": ""my-project-227921966468"",  ""WORKFLOW_NAME"": ""my-project-2"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_SOURCE"": ""s3://my-datalake-227921966468/data/boston.csv"",  ""REPO"": ""my-project"",  ""BRANCH"": ""master"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""PROCESSING_SCRIPT"": ""processing.py"",  ""PROCESSING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""PROCESSING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""PROCESSING_INSTANCE_COUNT"": 1,  ""PROCESSING_VOLUME_SIZE_GB"": 10,  ""TRAINING_SCRIPT"": ""train.py"",  ""TRAINING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""TRAINING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""TRAINING_INSTANCE_COUNT"": 1,  ""TRAINING_VOLUME_SIZE_GB"": 10,  ""SERVING_SCRIPT"": ""serve.py"",  ""SERVING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""SERVING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""SERVING_INSTANCE_COUNT"": 1,  ""SERVING_VOLUME_SIZE_GB"": 10  } }","{  ""name"": ""Put SourceCode on S3"",  ""input"": {  ""REGION"": ""us-east-1"",  ""ROLE_ARN"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"",  ""BUCKET"": ""my-project-227921966468"",  ""WORKFLOW_NAME"": ""my-project-2"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_SOURCE"": ""s3://my-datalake-227921966468/data/boston.csv"",  ""REPO"": ""my-project"",  ""BRANCH"": ""master"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""PROCESSING_SCRIPT"": ""processing.py"",  ""PROCESSING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""PROCESSING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""PROCESSING_INSTANCE_COUNT"": 1,  ""PROCESSING_VOLUME_SIZE_GB"": 10,  ""TRAINING_SCRIPT"": ""train.py"",  ""TRAINING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""TRAINING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""TRAINING_INSTANCE_COUNT"": 1,  ""TRAINING_VOLUME_SIZE_GB"": 10,  ""SERVING_SCRIPT"": ""serve.py"",  ""SERVING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""SERVING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""SERVING_INSTANCE_COUNT"": 1,  ""SERVING_VOLUME_SIZE_GB"": 10  } }","{  ""name"": ""Put SourceCode on S3"",  ""input"": {  ""REGION"": ""us-east-1"",  ""ROLE_ARN"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"",  ""BUCKET"": ""my-project-227921966468"",  ""WORKFLOW_NAME"": ""my-project-2"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_SOURCE"": ""s3://my-datalake-227921966468/data/boston.csv"",  ""REPO"": ""my-project"",  ""BRANCH"": ""master"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""PROCESSING_SCRIPT"": ""processing.py"",  ""PROCESSING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""PROCESSING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""PROCESSING_INSTANCE_COUNT"": 1,  ""PROCESSING_VOLUME_SIZE_GB"": 10,  ""TRAINING_SCRIPT"": ""train.py"",  ""TRAINING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""TRAINING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""TRAINING_INSTANCE_COUNT"": 1,  ""TRAINING_VOLUME_SIZE_GB"": 10,  ""SERVING_SCRIPT"": ""serve.py"",  ""SERVING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""SERVING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""SERVING_INSTANCE_COUNT"": 1,  ""SERVING_VOLUME_SIZE_GB"": 10  } }","{  ""name"": ""Put SourceCode on S3"",  ""input"": {  ""REGION"": ""us-east-1"",  ""ROLE_ARN"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"",  ""BUCKET"": ""my-project-227921966468"",  ""WORKFLOW_NAME"": ""my-project-2"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_SOURCE"": ""s3://my-datalake-227921966468/data/boston.csv"",  ""REPO"": ""my-project"",  ""BRANCH"": ""master"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""PROCESSING_SCRIPT"": ""processing.py"",  ""PROCESSING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""PROCESSING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""PROCESSING_INSTANCE_COUNT"": 1,  ""PROCESSING_VOLUME_SIZE_GB"": 10,  ""TRAINING_SCRIPT"": ""train.py"",  ""TRAINING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""TRAINING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""TRAINING_INSTANCE_COUNT"": 1,  ""TRAINING_VOLUME_SIZE_GB"": 10,  ""SERVING_SCRIPT"": ""serve.py"",  ""SERVING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""SERVING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""SERVING_INSTANCE_COUNT"": 1,  ""SERVING_VOLUME_SIZE_GB"": 10  } }","{  ""name"": ""Put SourceCode on S3"",  ""input"": {  ""REGION"": ""us-east-1"",  ""ROLE_ARN"": ""arn:aws:iam::227921966468:role/My-StepFunction-Workflow-Role"",  ""BUCKET"": ""my-project-227921966468"",  ""WORKFLOW_NAME"": ""my-project-2"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_SOURCE"": ""s3://my-datalake-227921966468/data/boston.csv"",  ""REPO"": ""my-project"",  ""BRANCH"": ""master"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""PROCESSING_SCRIPT"": ""processing.py"",  ""PROCESSING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""PROCESSING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""PROCESSING_INSTANCE_COUNT"": 1,  ""PROCESSING_VOLUME_SIZE_GB"": 10,  ""TRAINING_SCRIPT"": ""train.py"",  ""TRAINING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""TRAINING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""TRAINING_INSTANCE_COUNT"": 1,  ""TRAINING_VOLUME_SIZE_GB"": 10,  ""SERVING_SCRIPT"": ""serve.py"",  ""SERVING_IMAGE"": ""683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3"",  ""SERVING_INSTANCE_TYPE"": ""ml.c5.xlarge"",  ""SERVING_INSTANCE_COUNT"": 1,  ""SERVING_VOLUME_SIZE_GB"": 10  } }"
3,TaskScheduled,Put SourceCode on S3,Step Functions execution,35.0,"Sep 23, 2020 06:10:18.288 PM"
"{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""region"": ""us-east-1"",  ""parameters"": {  ""FunctionName"": ""my-project-2-codecommit-to-s3"",  ""Payload"": {  ""BRANCH"": ""master"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""REPO"": ""my-project"",  ""BUCKET"": ""my-project-227921966468"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""REGION"": ""us-east-1""  }  } }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""region"": ""us-east-1"",  ""parameters"": {  ""FunctionName"": ""my-project-2-codecommit-to-s3"",  ""Payload"": {  ""BRANCH"": ""master"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""REPO"": ""my-project"",  ""BUCKET"": ""my-project-227921966468"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""REGION"": ""us-east-1""  }  } }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""region"": ""us-east-1"",  ""parameters"": {  ""FunctionName"": ""my-project-2-codecommit-to-s3"",  ""Payload"": {  ""BRANCH"": ""master"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""REPO"": ""my-project"",  ""BUCKET"": ""my-project-227921966468"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""REGION"": ""us-east-1""  }  } }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""region"": ""us-east-1"",  ""parameters"": {  ""FunctionName"": ""my-project-2-codecommit-to-s3"",  ""Payload"": {  ""BRANCH"": ""master"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""REPO"": ""my-project"",  ""BUCKET"": ""my-project-227921966468"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""REGION"": ""us-east-1""  }  } }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""region"": ""us-east-1"",  ""parameters"": {  ""FunctionName"": ""my-project-2-codecommit-to-s3"",  ""Payload"": {  ""BRANCH"": ""master"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""REPO"": ""my-project"",  ""BUCKET"": ""my-project-227921966468"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""REGION"": ""us-east-1""  }  } }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""region"": ""us-east-1"",  ""parameters"": {  ""FunctionName"": ""my-project-2-codecommit-to-s3"",  ""Payload"": {  ""BRANCH"": ""master"",  ""WORKFLOW_DATE_TIME"": ""2020-09-23-18-10-06"",  ""DATA_PROCESSING_DIR"": ""sagemaker-processing-src"",  ""REPO"": ""my-project"",  ""BUCKET"": ""my-project-227921966468"",  ""ML_DIR"": ""sagemaker-train-serve-src"",  ""REGION"": ""us-east-1""  }  } }"
4,TaskStarted,Put SourceCode on S3,Step Functions execution,73.0,"Sep 23, 2020 06:10:18.326 PM"
"{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"" }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"" }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"" }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"" }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"" }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"" }"


In [16]:
event = {"WORKFLOW_NAME":WORKFLOW_NAME,
         "WORKFLOW_DATE_TIME":WORKFLOW_DATE_TIME
}
event

{'WORKFLOW_NAME': 'my-project-2', 'WORKFLOW_DATE_TIME': '2020-09-23-18-25-47'}

In [18]:
import boto3
import logging
import json

logger = logging.getLogger()
logger.setLevel(logging.INFO)
sm_client = boto3.client('sagemaker')

def lambda_handler(event):
    JOB_NAME = "{}-{}".format(event["WORKFLOW_NAME"], event["WORKFLOW_DATE_TIME"])

    try:
        response = sm_client.describe_training_job(TrainingJobName=JOB_NAME)
        logger.info("Training job:{} has status:{}.".format(JOB_NAME,
            response['TrainingJobStatus']))

    except Exception as e:
        response = ('Failed to read training status!'+ 
                    ' The training job may not exist or the job name may be incorrect.'+ 
                    ' Check SageMaker to confirm the job name.')
        print(e)
        print('{} Attempted to read job name: {}.'.format(response, JOB_NAME))

    #We can't marshall datetime objects in JSON response. So convert
    #all datetime objects returned to unix time.
    for index, metric in enumerate(response['FinalMetricDataList']):
        metric['Timestamp'] = metric['Timestamp'].timestamp()

    return {
        'statusCode': 200,
        'trainingMetrics': response['FinalMetricDataList']
    }
lambda_handler(event)

{'statusCode': 200, 'trainingMetrics': []}

## Local Dev

In [None]:
%run -i sagemaker-processing-src/processing.py \
    --local_path ./data/

!ls data/train/ 

In [None]:
%run -i sagemaker-train-serve-src/train.py \
    --model-dir ./models \
    --train ./data/train/ \
    --test ./data/test/


In [26]:
from sagemaker.sklearn.estimator import SKLearn
import sagemaker

TRAINING_DATA_PATH = 's3://{}/{}'.format(BUCKET, WORKFLOW_DATE_TIME + '/data/train/train.csv')
TESTING_DATA_PATH = 's3://{}/{}'.format(BUCKET, WORKFLOW_DATE_TIME + '/data/test/test.csv')

train_estimator = SKLearn(#base_job_name = train_job_name,
                          sagemaker_session = sagemaker.Session(),
                          role = sagemaker.get_execution_role(),
                          source_dir = './sagemaker-train-serve-src/',
                          entry_point = 'train.py',
                          train_instance_type = 'ml.m5.2xlarge',#"local",
                          train_instance_count = 1,
                          #framework_version = '0.20.0',
                          #hyperparameters = best_params_dict,
                          #metric_definitions = validation_metric_defs,
                          output_path = 's3://{}/{}'.format(BUCKET, WORKFLOW_DATE_TIME + '/model-artifacts'),
                          code_location = 's3://{}/{}'.format(BUCKET, WORKFLOW_DATE_TIME + '/source-code')
                          )

train_estimator.fit(job_name = "{}-{}-sdk".format(WORKFLOW_NAME, WORKFLOW_DATE_TIME),
                    inputs = {"train" : TRAINING_DATA_PATH,
                              "test" : TESTING_DATA_PATH
                             },
                    wait = True
                   )

2020-09-23 17:32:38 Starting - Starting the training job...
2020-09-23 17:32:44 Starting - Launching requested ML instances......
2020-09-23 17:34:03 Starting - Preparing the instances for training......
2020-09-23 17:34:57 Downloading - Downloading input data
2020-09-23 17:34:57 Training - Downloading the training image..[34m2020-09-23 17:35:11,719 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training[0m
[34m2020-09-23 17:35:11,722 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-09-23 17:35:11,731 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m

2020-09-23 17:35:11 Training - Training image download completed. Training in progress.[34m2020-09-23 17:35:27,313 sagemaker-containers INFO     Module train does not provide a setup.py. [0m
[34mGenerating setup.py[0m
[34m2020-09-23 17:35:27,313 sagemaker-containers INFO     Generating setup.cfg[0m
[34m2020-09-23 17:35:27,313 s

[34m  Building wheel for thrift (setup.py): finished with status 'done'
  Created wheel for thrift: filename=thrift-0.13.0-cp37-cp37m-linux_x86_64.whl size=285410 sha256=a6013149acec15758186e2dc5f7b157a0ae27239332d53fa82543080c1aab366
  Stored in directory: /root/.cache/pip/wheels/02/a2/46/689ccfcf40155c23edc7cdbd9de488611c8fdf49ff34b1706e
  Building wheel for docopt (setup.py): started
  Building wheel for docopt (setup.py): finished with status 'done'
  Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13704 sha256=0690f6efeaea15ff999040000e8a99f8a4452e4334d454133f7820171f216400
  Stored in directory: /root/.cache/pip/wheels/9b/04/dd/7daf4150b6d9b12949298737de9431a324d4b797ffd63f526e
  Building wheel for PyYAML (setup.py): started[0m
[34m  Building wheel for PyYAML (setup.py): finished with status 'done'
  Created wheel for PyYAML: filename=PyYAML-5.3.1-cp37-cp37m-linux_x86_64.whl size=44620 sha256=d227c69603c5bedd978643c635a62f5224da89fcd5c9a9e4e5de2bad4b4


2020-09-23 17:36:11 Completed - Training job completed
Training seconds: 81
Billable seconds: 81
