## Build end-to-end ML AI workflows with AWS Step Functions Data Science SDK
### Use Case #1 - Sample data transformation and XGBoost regression for price prediction

This code is assuming you are using an Amazon SageMaker notebook.

If running the notebook for the first time uncomment and run the following lines to install the AWS Step Functions Data Science SDK.

(Note the SDK should be already installed if you are using an Amazon SageMaker Studio notebook with Python 3 Data Science).

In [414]:
#import sys
#!{sys.executable} -m pip install --upgrade stepfunctions

### Preparation

Let us load some basic libraries for working through the notebook

In [540]:
import boto3, sagemaker, time, random, uuid, logging, stepfunctions, io, random

from sagemaker.amazon.amazon_estimator import get_image_uri
from stepfunctions import steps
from stepfunctions.steps import TrainingStep, ModelStep, TransformStep
from stepfunctions.inputs import ExecutionInput
from stepfunctions.workflow import Workflow
from stepfunctions.template import TrainingPipeline
from stepfunctions.template.utils import replace_parameters_with_jsonpath

# SageMaker Execution Role
sagemaker_execution_role = sagemaker.get_execution_role()

# REPLACE with your Step Functions WorkflowExecutionRole ARN
# For instructions on how to configure permissions and getting this role check the Step Functions Data Science SDK documentation
workflow_execution_role = "arn:aws:iam::889960878219:role/StepFunctionsWorkflowExecutionRole" #REPLACE

session = sagemaker.Session()
stepfunctions.set_stream_logger(level=logging.INFO)

bucket = 'rodzanto2019ml' #REPLACE with your S3 bucket name, or use the session default with 'session.default_bucket()'
prefix = 'ml-pipelines/sample-price-estimation'

Let us retrieve the sample dataset from UCI - Online Retail (https://archive.ics.uci.edu/ml/datasets/Online+Retail)

In [416]:
from urllib.request import urlretrieve 
urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx", "Online%20Retail.xlsx")

('Online%20Retail.xlsx', <http.client.HTTPMessage at 0x7f1f9cc22cc0>)

The original dataset file is in 'xlsx' format. Let us convert this to 'csv' format, considering the datetime attribute.

In [417]:
import xlrd
import csv
from datetime import datetime

def csv_from_excel():
    wb = xlrd.open_workbook('Online%20Retail.xlsx', on_demand=True)
    #print(wb.sheet_names())
    sh = wb.sheet_by_name('Online Retail')
    your_csv_file = open('retail.csv', 'w')
    wr = csv.writer(your_csv_file, quoting=csv.QUOTE_MINIMAL)

    for rownum in range(sh.nrows):
        date = sh.row_values(rownum)[4]
        if isinstance( date, float) or isinstance( date, int ):
            year, month, day, hour, minute, sec = xlrd.xldate_as_tuple(date, wb.datemode)
            py_date = "%02d/%02d/%02d %02d:%02d" % (month, day, year, hour, minute)
            wr.writerow(sh.row_values(rownum)[0:4] + [py_date] + sh.row_values(rownum)[5:8])
        else:
            wr.writerow(sh.row_values(rownum))

    your_csv_file.close()

csv_from_excel()

FILE_DATA = 'retail.csv'

Let us have a look at the resulting 'csv' dataset file...

In [459]:
import pandas
import os

# Also upload to our S3 bucket for preparing for the upcoming steps...
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'retail.csv')).upload_file('retail.csv')

df = pandas.read_csv("retail.csv")
df.head(5)

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365.0,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6.0,12/01/2010 08:26,2.55,17850.0,United Kingdom
1,536365.0,71053.0,WHITE METAL LANTERN,6.0,12/01/2010 08:26,3.39,17850.0,United Kingdom
2,536365.0,84406B,CREAM CUPID HEARTS COAT HANGER,8.0,12/01/2010 08:26,2.75,17850.0,United Kingdom
3,536365.0,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6.0,12/01/2010 08:26,3.39,17850.0,United Kingdom
4,536365.0,84029E,RED WOOLLY HOTTIE WHITE HEART.,6.0,12/01/2010 08:26,3.39,17850.0,United Kingdom


### Data transformation

For the purpose of this sample exercise we will drop some fields not needed for performing our dummy 'UnitPrice' regression, for having our items price prediction.
We will also encode the categorical variable 'StockCode' for having numerical values on it.

In [419]:
df['StockCode'] = df['StockCode'].astype('category')
df['StockCodeEnc'] = df['StockCode'].cat.codes

df = df.drop(['InvoiceNo', 'Description', 'InvoiceDate', 'Country', 'StockCode'], axis=1)
df = df[['UnitPrice', 'StockCodeEnc', 'Quantity', 'CustomerID']]

df.head(5)

Unnamed: 0,UnitPrice,StockCodeEnc,Quantity,CustomerID
0,2.55,3536,6.0,17850.0
1,3.39,2794,6.0,17850.0
2,2.75,3044,8.0,17850.0
3,3.39,2985,6.0,17850.0
4,3.39,2984,6.0,17850.0


Now we split the data into training and validation datasets for performing the SageMaker training job, and upload the files to our Amazon S3 bucket

In [420]:
import numpy as np
import os

train_data, validation_data, test_data = np.split(df.sample(frac=1, random_state=1729), [int(0.7 * len(df)), int(0.9 * len(df))])
train_data.to_csv('train.csv', header=False, index=False)
validation_data.to_csv('validation.csv', header=False, index=False)
test_data.to_csv('test_real.csv', header=False, index=False)
test_data.drop(['UnitPrice'], axis=1).to_csv('test.csv', header=False, index=False)

boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv')
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'validation/validation.csv')).upload_file('validation.csv')
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'test/test.csv')).upload_file('test.csv')

#### Automating data preparation with AWS Lambda

Let us prepare a script for running this same data transformation, but now from an AWS Lambda function... automate, automate, automate!

In [None]:
# Create data transformation lambda:

# First let us download pandas and numpy for using in our Lambda package, as these do not come in Lambda's base...
!mkdir lambda
urlretrieve("https://files.pythonhosted.org/packages/7b/fd/41698f20fd297cef2dc43a72a8ca42d149eaf7d954f1fb2bd3fc366a658d/pandas-0.25.3-cp38-cp38-manylinux1_x86_64.whl", "lambda/pandas-0.25.3-cp38-cp38-manylinux1_x86_64.whl")
urlretrieve("https://files.pythonhosted.org/packages/d7/6a/3fed132c846d1e47963f30376cc041e9dd586d286d931055ad06ff65c6c7/numpy-1.17.4-cp38-cp38-manylinux1_x86_64.whl", "lambda/numpy-1.17.4-cp38-cp38-manylinux1_x86_64.whl")
!unzip -o lambda/pandas-0.25.3-cp38-cp38-manylinux1_x86_64.whl -d lambda
!unzip -o lambda/numpy-1.17.4-cp38-cp38-manylinux1_x86_64.whl -d lambda

# then install the pytz dependency locally...
!pip install -t lambda pytz

# and remove the files no longer needed...
!rm -rf lambda/*.whl lambda/*.dist-info lambda/__pycache__

# finally we prepare the lambda function code...
file_name = 'lambda/lambda_function.py'
def MakeFile(file_name):
    with open(file_name, 'w') as f:
        f.write('''\
import json
import boto3
import pandas
import numpy as np
import os
bucket = 'rodzanto2019ml' #Replace with your S3 bucket name, or the session default with 'session.default_bucket()'
prefix = 'ml-pipelines/sample-price-estimation'
filename = 'retail.csv'
def lambda_handler(event, context):
    s3 = boto3.resource('s3')
    s3.Bucket(bucket).download_file(prefix + "/" + filename, '/tmp/retail.csv')
    df = pandas.read_csv("/tmp/retail.csv")
    df['StockCode'] = df['StockCode'].astype('category')
    df['StockCodeEnc'] = df['StockCode'].cat.codes
    df = df.drop(['InvoiceNo', 'Description', 'InvoiceDate', 'Country', 'StockCode'], axis=1)
    df = df[['UnitPrice', 'StockCodeEnc', 'Quantity', 'CustomerID']]
    train_data, validation_data, test_data = np.split(df.sample(frac=1, random_state=1729), [int(0.7 * len(df)), int(0.9 * len(df))])
    train_data.to_csv('/tmp/train.csv', header=False, index=False)
    validation_data.to_csv('/tmp/validation.csv', header=False, index=False)
    test_data.to_csv('/tmp/test_real.csv', header=False, index=False)
    test_data.drop(['UnitPrice'], axis=1).to_csv('/tmp/test.csv', header=False, index=False)
    s3.Bucket(bucket).Object(os.path.join(prefix, 'train/train.csv')).upload_file('/tmp/train.csv')
    s3.Bucket(bucket).Object(os.path.join(prefix, 'validation/validation.csv')).upload_file('/tmp/validation.csv')
    s3.Bucket(bucket).Object(os.path.join(prefix, 'test/test.csv')).upload_file('/tmp/test.csv')
    return {
        'statusCode': 200,
        'body': ('Date transformation complete for retail.csv - ' + bucket + '/' + prefix + '/' + filename)
    }
        ''')
MakeFile(file_name)
!cd lambda; zip -rm ../lambda_function.zip *

try:
    f = open("lambda_function.zip")
    print("Lambda file created: lambda/lambda_function.zip")
except IOError:
    print("Error - Lambda file not created")
finally:
    f.close()

!aws lambda create-function --function-name 'ml-pipelines-data-transformation-lambda' \
    --runtime python3.8 --role 'arn:aws:iam::889960878219:role/LambdaDynamo' \ #REPLACE with your proper Lambda role
    --handler lambda_function.lambda_handler \
    --zip-file 'fileb://lambda_function.zip' \
    --description 'Sample ML pipeline data transformation lambda'  \
    --timeout 600  \
    --memory-size 256  \
    --publish

### Building our ML pipeline with the Step Functions Data Science SDK

Now we are ready for creating our ML pipeline

In [541]:
xgb = sagemaker.estimator.Estimator(
    get_image_uri(boto3.Session().region_name, 'xgboost'),
    sagemaker_execution_role, 
    train_instance_count = 1, 
    train_instance_type = 'ml.m5.large',
    train_volume_size = 5,
    output_path = 's3://{}/{}/output'.format(bucket, prefix),
    sagemaker_session = session
)

xgb.set_hyperparameters(
    objective = 'reg:linear',
    num_round = 50,
    max_depth = 5,
    eta = 0.2,
    gamme = 4,
    min_child_weight = 6,
    subsample = 0.7,
    silent = 0
)

In [542]:
# SageMaker expects unique names for jobs/models/endpoints. Pass these for each execution via placeholders:
execution_input = ExecutionInput(schema={
    'JobName': str, 
    'ModelName': str
})

In [543]:
preparation_step = steps.LambdaStep(
    'Preparing data (Lambda)',
    parameters={  
        "FunctionName": "ml-pipelines-data-transformation-lambda",
        "Payload": {  
           "JobName": execution_input['JobName']
        }
    }
)

preparation_step.add_retry(steps.Retry(
    error_equals=["States.TaskFailed"],
    interval_seconds=15,
    max_attempts=2,
    backoff_rate=4.0
))

In [544]:
training_step = steps.TrainingStep(
    'Training (SageMaker)', 
    estimator=xgb,
    data={
        'train': sagemaker.s3_input(s3_data='s3://{}/{}/train'.format(bucket, prefix), content_type='csv'),
        'validation': sagemaker.s3_input(s3_data='s3://{}/{}/validation'.format(bucket, prefix), content_type='csv')
    },
    job_name=execution_input['JobName']  
)

In [545]:
model_step = steps.ModelStep(
    'Save model (SageMaker)',
    model=training_step.get_expected_model(),
    model_name=execution_input['ModelName']  
)

In [None]:
# Create validation lambda:
file_name = 'lambda_function.py'
def MakeFile(file_name):
    with open(file_name, 'w') as f:
        f.write('''\
import json
import boto3
def lambda_handler(event, context):
    sm = boto3.client('sagemaker')
    rmse = sm.describe_training_job(TrainingJobName=event['JobName'])['FinalMetricDataList'][0]['Value']
    print(rmse)
    return {
        'statusCode': 200,
        'rmse': json.dumps(rmse)
    }
        ''')
MakeFile(file_name)
!zip -rm lambda_function.zip lambda_function.py
try:
    f = open("lambda_function.zip")
    print("Lambda file created: lambda_function.zip")
except IOError:
    print("Error - Lambda file not created")
finally:
    f.close()
    
!aws lambda create-function --function-name 'ml-pipelines-validation-lambda' \
    --runtime python3.8 --role 'arn:aws:iam::889960878219:role/LambdaDynamo' \ #REPLACE with your proper Lambda role
    --handler lambda_function.lambda_handler \
    --zip-file 'fileb://lambda_function.zip' \
    --description 'Sample ML pipeline validation metric lambda'  \
    --timeout 60  \
    --memory-size 128  \
    --publish

In [547]:
validation_lambda_step = steps.LambdaStep(
    'Validating RMSE (Lambda)',
    parameters={  
        "FunctionName": "ml-pipelines-validation-lambda",
        "Payload": {  
           "JobName": execution_input['JobName']
        }
    }
)

validation_lambda_step.add_retry(steps.Retry(
    error_equals=["States.TaskFailed"],
    interval_seconds=15,
    max_attempts=2,
    backoff_rate=4.0
))

In [548]:
transform_step = steps.TransformStep(
    'Batch inference (SageMaker)',
    transformer=xgb.transformer(
        instance_count=1,
        instance_type='ml.m5.large'
    ),
    job_name=execution_input['JobName'],     
    model_name=execution_input['ModelName'], 
    data='s3://{}/{}/test'.format(bucket, prefix),
    content_type='text/csv'
)

In [549]:
worse_step = steps.Pass(
    'Worse model',
    parameters={
        "Error": ("The new model is not accurate enough. RMSE:" + str(validation_lambda_step.output()["Payload"]["rmse"]))
    }
)

In [550]:
choice_state = steps.Choice(
    state_id='RMSE >=50% ?' #REPLACE with your desired threshold for RMSE
)

In [551]:
#REPLACE the rmse values with your desired threshold
choice_state.add_choice(
    rule=steps.ChoiceRule.StringGreaterThanEquals(variable=validation_lambda_step.output()["Payload"]["rmse"], value="50"),
    next_step=transform_step
)
choice_state.add_choice(
    rule=steps.ChoiceRule.StringLessThan(variable=validation_lambda_step.output()["Payload"]["rmse"], value="50"),
    next_step=worse_step
)

Note in our case we are not creating and 'Endpoint Configuration' and an 'Endpoint' because in we are performing a batch prediction with an Amazon SageMaker Batch Transformation for our test dataset.

Shall you need to respond to real-time inferences with an Amazon SageMaker Endpoint, you can follow the steps in the AWS Step Functions Data Science SDK examples for including the endpoint configuration and endpoint steps to the workflow, as per the commented lines below.

In [552]:
#endpoint_config_step = steps.EndpointConfigStep(
#    "Create Endpoint Config",
#    endpoint_config_name=execution_input['ModelName'],
#    model_name=execution_input['ModelName'],
#    initial_instance_count=1,
#    instance_type='ml.m5.large'
#)

#endpoint_step = steps.EndpointStep(
#    "Create Endpoint",
#    endpoint_name=execution_input['EndpointName'],
#    endpoint_config_name=execution_input['ModelName']
#)

In [553]:
workflow_definition = steps.Chain([
    preparation_step,
    training_step,
    model_step,
    validation_lambda_step,
    choice_state
])

In [554]:
workflow = Workflow(
    name='ml-pipelines-sample-price-estimation_v1',
    definition=workflow_definition,
    role=workflow_execution_role,
    execution_input=execution_input
)

In [563]:
workflow.render_graph(portrait=False)

In [556]:
workflow.create()

[32m[INFO] Workflow created successfully on AWS Step Functions.[0m


'arn:aws:states:eu-west-1:889960878219:stateMachine:ml-pipelines-sample-price-estimation_v1'

In [557]:
jobname = 'regression-{}'.format(uuid.uuid1().hex)

execution = workflow.execute(
    inputs={
        'JobName': jobname, # Each Sagemaker Job requires a unique name
        'ModelName': 'regression-{}'.format(uuid.uuid1().hex), # Each Model requires a unique name
    }
)

[32m[INFO] Workflow execution started successfully on AWS Step Functions.[0m


In [565]:
execution.render_progress()

In [566]:
execution.list_events(html=True)

ID,Type,Step,Resource,Elapsed Time (ms),Timestamp
1,ExecutionStarted,,-,0.0,"Dec 18, 2019 02:32:04.619 AM"
"{  ""input"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c"",  ""ModelName"": ""regression-942c436a213e11eaa4a5f516d0c8402c""  },  ""roleArn"": ""arn:aws:iam::889960878219:role/StepFunctionsWorkflowExecutionRole"" }","{  ""input"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c"",  ""ModelName"": ""regression-942c436a213e11eaa4a5f516d0c8402c""  },  ""roleArn"": ""arn:aws:iam::889960878219:role/StepFunctionsWorkflowExecutionRole"" }","{  ""input"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c"",  ""ModelName"": ""regression-942c436a213e11eaa4a5f516d0c8402c""  },  ""roleArn"": ""arn:aws:iam::889960878219:role/StepFunctionsWorkflowExecutionRole"" }","{  ""input"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c"",  ""ModelName"": ""regression-942c436a213e11eaa4a5f516d0c8402c""  },  ""roleArn"": ""arn:aws:iam::889960878219:role/StepFunctionsWorkflowExecutionRole"" }","{  ""input"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c"",  ""ModelName"": ""regression-942c436a213e11eaa4a5f516d0c8402c""  },  ""roleArn"": ""arn:aws:iam::889960878219:role/StepFunctionsWorkflowExecutionRole"" }","{  ""input"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c"",  ""ModelName"": ""regression-942c436a213e11eaa4a5f516d0c8402c""  },  ""roleArn"": ""arn:aws:iam::889960878219:role/StepFunctionsWorkflowExecutionRole"" }"
2,TaskStateEntered,Preparing data (Lambda),-,20.0,"Dec 18, 2019 02:32:04.639 AM"
"{  ""name"": ""Preparing data (Lambda)"",  ""input"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c"",  ""ModelName"": ""regression-942c436a213e11eaa4a5f516d0c8402c""  } }","{  ""name"": ""Preparing data (Lambda)"",  ""input"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c"",  ""ModelName"": ""regression-942c436a213e11eaa4a5f516d0c8402c""  } }","{  ""name"": ""Preparing data (Lambda)"",  ""input"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c"",  ""ModelName"": ""regression-942c436a213e11eaa4a5f516d0c8402c""  } }","{  ""name"": ""Preparing data (Lambda)"",  ""input"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c"",  ""ModelName"": ""regression-942c436a213e11eaa4a5f516d0c8402c""  } }","{  ""name"": ""Preparing data (Lambda)"",  ""input"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c"",  ""ModelName"": ""regression-942c436a213e11eaa4a5f516d0c8402c""  } }","{  ""name"": ""Preparing data (Lambda)"",  ""input"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c"",  ""ModelName"": ""regression-942c436a213e11eaa4a5f516d0c8402c""  } }"
3,TaskScheduled,Preparing data (Lambda),Step Functions execution,20.0,"Dec 18, 2019 02:32:04.639 AM"
"{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""region"": ""eu-west-1"",  ""parameters"": {  ""FunctionName"": ""ml-pipelines-data-transformation-lambda"",  ""Payload"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c""  },  ""LogType"": ""Tail""  } }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""region"": ""eu-west-1"",  ""parameters"": {  ""FunctionName"": ""ml-pipelines-data-transformation-lambda"",  ""Payload"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c""  },  ""LogType"": ""Tail""  } }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""region"": ""eu-west-1"",  ""parameters"": {  ""FunctionName"": ""ml-pipelines-data-transformation-lambda"",  ""Payload"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c""  },  ""LogType"": ""Tail""  } }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""region"": ""eu-west-1"",  ""parameters"": {  ""FunctionName"": ""ml-pipelines-data-transformation-lambda"",  ""Payload"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c""  },  ""LogType"": ""Tail""  } }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""region"": ""eu-west-1"",  ""parameters"": {  ""FunctionName"": ""ml-pipelines-data-transformation-lambda"",  ""Payload"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c""  },  ""LogType"": ""Tail""  } }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""region"": ""eu-west-1"",  ""parameters"": {  ""FunctionName"": ""ml-pipelines-data-transformation-lambda"",  ""Payload"": {  ""JobName"": ""regression-942c3e06213e11eaa4a5f516d0c8402c""  },  ""LogType"": ""Tail""  } }"
4,TaskStarted,Preparing data (Lambda),Step Functions execution,68.0,"Dec 18, 2019 02:32:04.687 AM"
"{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"" }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"" }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"" }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"" }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"" }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"" }"
5,TaskSucceeded,Preparing data (Lambda),Step Functions execution,734.0,"Dec 18, 2019 02:32:35.353 AM"
"{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""output"": {  ""ExecutedVersion"": ""$LATEST"",  ""LogResult"": ""U1RBUlQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAKUkVQT1JUIFJlcXVlc3RJZDogNGYwYzE3MTQtNGE5OS00YjU3LWJlNzQtYzM5YzJhYzA5NWMwCUR1cmF0aW9uOiAyODU0OS42NSBtcwlCaWxsZWQgRHVyYXRpb246IDI4NjAwIG1zCU1lbW9yeSBTaXplOiAyNTYgTUIJTWF4IE1lbW9yeSBVc2VkOiAyNTYgTUIJSW5pdCBEdXJhdGlvbjogMTc2MC45NCBtcwkK"",  ""Payload"": {  ""statusCode"": 200,  ""body"": ""Date transformation complete for retail.csv - rodzanto2019ml/ml-pipelines/sample-price-estimation/retail.csv""  },  ""SdkHttpMetadata"": {  ""HttpHeaders"": {  ""Connection"": ""keep-alive"",  ""Content-Length"": ""139"",  ""Content-Type"": ""application/json"",  ""Date"": ""Wed, 18 Dec 2019 02:32:35 GMT"",  ""X-Amz-Executed-Version"": ""$LATEST"",  ""X-Amz-Log-Result"": ""U1RBUlQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAKUkVQT1JUIFJlcXVlc3RJZDogNGYwYzE3MTQtNGE5OS00YjU3LWJlNzQtYzM5YzJhYzA5NWMwCUR1cmF0aW9uOiAyODU0OS42NSBtcwlCaWxsZWQgRHVyYXRpb246IDI4NjAwIG1zCU1lbW9yeSBTaXplOiAyNTYgTUIJTWF4IE1lbW9yeSBVc2VkOiAyNTYgTUIJSW5pdCBEdXJhdGlvbjogMTc2MC45NCBtcwkK"",  ""x-amzn-Remapped-Content-Length"": ""0"",  ""x-amzn-RequestId"": ""4f0c1714-4a99-4b57-be74-c39c2ac095c0"",  ""X-Amzn-Trace-Id"": ""root=1-5df98fa4-e567171aae2c953ead355ccc;sampled=0""  },  ""HttpStatusCode"": 200  },  ""SdkResponseMetadata"": {  ""RequestId"": ""4f0c1714-4a99-4b57-be74-c39c2ac095c0""  },  ""StatusCode"": 200  } }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""output"": {  ""ExecutedVersion"": ""$LATEST"",  ""LogResult"": ""U1RBUlQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAKUkVQT1JUIFJlcXVlc3RJZDogNGYwYzE3MTQtNGE5OS00YjU3LWJlNzQtYzM5YzJhYzA5NWMwCUR1cmF0aW9uOiAyODU0OS42NSBtcwlCaWxsZWQgRHVyYXRpb246IDI4NjAwIG1zCU1lbW9yeSBTaXplOiAyNTYgTUIJTWF4IE1lbW9yeSBVc2VkOiAyNTYgTUIJSW5pdCBEdXJhdGlvbjogMTc2MC45NCBtcwkK"",  ""Payload"": {  ""statusCode"": 200,  ""body"": ""Date transformation complete for retail.csv - rodzanto2019ml/ml-pipelines/sample-price-estimation/retail.csv""  },  ""SdkHttpMetadata"": {  ""HttpHeaders"": {  ""Connection"": ""keep-alive"",  ""Content-Length"": ""139"",  ""Content-Type"": ""application/json"",  ""Date"": ""Wed, 18 Dec 2019 02:32:35 GMT"",  ""X-Amz-Executed-Version"": ""$LATEST"",  ""X-Amz-Log-Result"": ""U1RBUlQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAKUkVQT1JUIFJlcXVlc3RJZDogNGYwYzE3MTQtNGE5OS00YjU3LWJlNzQtYzM5YzJhYzA5NWMwCUR1cmF0aW9uOiAyODU0OS42NSBtcwlCaWxsZWQgRHVyYXRpb246IDI4NjAwIG1zCU1lbW9yeSBTaXplOiAyNTYgTUIJTWF4IE1lbW9yeSBVc2VkOiAyNTYgTUIJSW5pdCBEdXJhdGlvbjogMTc2MC45NCBtcwkK"",  ""x-amzn-Remapped-Content-Length"": ""0"",  ""x-amzn-RequestId"": ""4f0c1714-4a99-4b57-be74-c39c2ac095c0"",  ""X-Amzn-Trace-Id"": ""root=1-5df98fa4-e567171aae2c953ead355ccc;sampled=0""  },  ""HttpStatusCode"": 200  },  ""SdkResponseMetadata"": {  ""RequestId"": ""4f0c1714-4a99-4b57-be74-c39c2ac095c0""  },  ""StatusCode"": 200  } }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""output"": {  ""ExecutedVersion"": ""$LATEST"",  ""LogResult"": ""U1RBUlQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAKUkVQT1JUIFJlcXVlc3RJZDogNGYwYzE3MTQtNGE5OS00YjU3LWJlNzQtYzM5YzJhYzA5NWMwCUR1cmF0aW9uOiAyODU0OS42NSBtcwlCaWxsZWQgRHVyYXRpb246IDI4NjAwIG1zCU1lbW9yeSBTaXplOiAyNTYgTUIJTWF4IE1lbW9yeSBVc2VkOiAyNTYgTUIJSW5pdCBEdXJhdGlvbjogMTc2MC45NCBtcwkK"",  ""Payload"": {  ""statusCode"": 200,  ""body"": ""Date transformation complete for retail.csv - rodzanto2019ml/ml-pipelines/sample-price-estimation/retail.csv""  },  ""SdkHttpMetadata"": {  ""HttpHeaders"": {  ""Connection"": ""keep-alive"",  ""Content-Length"": ""139"",  ""Content-Type"": ""application/json"",  ""Date"": ""Wed, 18 Dec 2019 02:32:35 GMT"",  ""X-Amz-Executed-Version"": ""$LATEST"",  ""X-Amz-Log-Result"": ""U1RBUlQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAKUkVQT1JUIFJlcXVlc3RJZDogNGYwYzE3MTQtNGE5OS00YjU3LWJlNzQtYzM5YzJhYzA5NWMwCUR1cmF0aW9uOiAyODU0OS42NSBtcwlCaWxsZWQgRHVyYXRpb246IDI4NjAwIG1zCU1lbW9yeSBTaXplOiAyNTYgTUIJTWF4IE1lbW9yeSBVc2VkOiAyNTYgTUIJSW5pdCBEdXJhdGlvbjogMTc2MC45NCBtcwkK"",  ""x-amzn-Remapped-Content-Length"": ""0"",  ""x-amzn-RequestId"": ""4f0c1714-4a99-4b57-be74-c39c2ac095c0"",  ""X-Amzn-Trace-Id"": ""root=1-5df98fa4-e567171aae2c953ead355ccc;sampled=0""  },  ""HttpStatusCode"": 200  },  ""SdkResponseMetadata"": {  ""RequestId"": ""4f0c1714-4a99-4b57-be74-c39c2ac095c0""  },  ""StatusCode"": 200  } }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""output"": {  ""ExecutedVersion"": ""$LATEST"",  ""LogResult"": ""U1RBUlQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAKUkVQT1JUIFJlcXVlc3RJZDogNGYwYzE3MTQtNGE5OS00YjU3LWJlNzQtYzM5YzJhYzA5NWMwCUR1cmF0aW9uOiAyODU0OS42NSBtcwlCaWxsZWQgRHVyYXRpb246IDI4NjAwIG1zCU1lbW9yeSBTaXplOiAyNTYgTUIJTWF4IE1lbW9yeSBVc2VkOiAyNTYgTUIJSW5pdCBEdXJhdGlvbjogMTc2MC45NCBtcwkK"",  ""Payload"": {  ""statusCode"": 200,  ""body"": ""Date transformation complete for retail.csv - rodzanto2019ml/ml-pipelines/sample-price-estimation/retail.csv""  },  ""SdkHttpMetadata"": {  ""HttpHeaders"": {  ""Connection"": ""keep-alive"",  ""Content-Length"": ""139"",  ""Content-Type"": ""application/json"",  ""Date"": ""Wed, 18 Dec 2019 02:32:35 GMT"",  ""X-Amz-Executed-Version"": ""$LATEST"",  ""X-Amz-Log-Result"": ""U1RBUlQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAKUkVQT1JUIFJlcXVlc3RJZDogNGYwYzE3MTQtNGE5OS00YjU3LWJlNzQtYzM5YzJhYzA5NWMwCUR1cmF0aW9uOiAyODU0OS42NSBtcwlCaWxsZWQgRHVyYXRpb246IDI4NjAwIG1zCU1lbW9yeSBTaXplOiAyNTYgTUIJTWF4IE1lbW9yeSBVc2VkOiAyNTYgTUIJSW5pdCBEdXJhdGlvbjogMTc2MC45NCBtcwkK"",  ""x-amzn-Remapped-Content-Length"": ""0"",  ""x-amzn-RequestId"": ""4f0c1714-4a99-4b57-be74-c39c2ac095c0"",  ""X-Amzn-Trace-Id"": ""root=1-5df98fa4-e567171aae2c953ead355ccc;sampled=0""  },  ""HttpStatusCode"": 200  },  ""SdkResponseMetadata"": {  ""RequestId"": ""4f0c1714-4a99-4b57-be74-c39c2ac095c0""  },  ""StatusCode"": 200  } }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""output"": {  ""ExecutedVersion"": ""$LATEST"",  ""LogResult"": ""U1RBUlQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAKUkVQT1JUIFJlcXVlc3RJZDogNGYwYzE3MTQtNGE5OS00YjU3LWJlNzQtYzM5YzJhYzA5NWMwCUR1cmF0aW9uOiAyODU0OS42NSBtcwlCaWxsZWQgRHVyYXRpb246IDI4NjAwIG1zCU1lbW9yeSBTaXplOiAyNTYgTUIJTWF4IE1lbW9yeSBVc2VkOiAyNTYgTUIJSW5pdCBEdXJhdGlvbjogMTc2MC45NCBtcwkK"",  ""Payload"": {  ""statusCode"": 200,  ""body"": ""Date transformation complete for retail.csv - rodzanto2019ml/ml-pipelines/sample-price-estimation/retail.csv""  },  ""SdkHttpMetadata"": {  ""HttpHeaders"": {  ""Connection"": ""keep-alive"",  ""Content-Length"": ""139"",  ""Content-Type"": ""application/json"",  ""Date"": ""Wed, 18 Dec 2019 02:32:35 GMT"",  ""X-Amz-Executed-Version"": ""$LATEST"",  ""X-Amz-Log-Result"": ""U1RBUlQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAKUkVQT1JUIFJlcXVlc3RJZDogNGYwYzE3MTQtNGE5OS00YjU3LWJlNzQtYzM5YzJhYzA5NWMwCUR1cmF0aW9uOiAyODU0OS42NSBtcwlCaWxsZWQgRHVyYXRpb246IDI4NjAwIG1zCU1lbW9yeSBTaXplOiAyNTYgTUIJTWF4IE1lbW9yeSBVc2VkOiAyNTYgTUIJSW5pdCBEdXJhdGlvbjogMTc2MC45NCBtcwkK"",  ""x-amzn-Remapped-Content-Length"": ""0"",  ""x-amzn-RequestId"": ""4f0c1714-4a99-4b57-be74-c39c2ac095c0"",  ""X-Amzn-Trace-Id"": ""root=1-5df98fa4-e567171aae2c953ead355ccc;sampled=0""  },  ""HttpStatusCode"": 200  },  ""SdkResponseMetadata"": {  ""RequestId"": ""4f0c1714-4a99-4b57-be74-c39c2ac095c0""  },  ""StatusCode"": 200  } }","{  ""resourceType"": ""lambda"",  ""resource"": ""invoke"",  ""output"": {  ""ExecutedVersion"": ""$LATEST"",  ""LogResult"": ""U1RBUlQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAKUkVQT1JUIFJlcXVlc3RJZDogNGYwYzE3MTQtNGE5OS00YjU3LWJlNzQtYzM5YzJhYzA5NWMwCUR1cmF0aW9uOiAyODU0OS42NSBtcwlCaWxsZWQgRHVyYXRpb246IDI4NjAwIG1zCU1lbW9yeSBTaXplOiAyNTYgTUIJTWF4IE1lbW9yeSBVc2VkOiAyNTYgTUIJSW5pdCBEdXJhdGlvbjogMTc2MC45NCBtcwkK"",  ""Payload"": {  ""statusCode"": 200,  ""body"": ""Date transformation complete for retail.csv - rodzanto2019ml/ml-pipelines/sample-price-estimation/retail.csv""  },  ""SdkHttpMetadata"": {  ""HttpHeaders"": {  ""Connection"": ""keep-alive"",  ""Content-Length"": ""139"",  ""Content-Type"": ""application/json"",  ""Date"": ""Wed, 18 Dec 2019 02:32:35 GMT"",  ""X-Amz-Executed-Version"": ""$LATEST"",  ""X-Amz-Log-Result"": ""U1RBUlQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiA0ZjBjMTcxNC00YTk5LTRiNTctYmU3NC1jMzljMmFjMDk1YzAKUkVQT1JUIFJlcXVlc3RJZDogNGYwYzE3MTQtNGE5OS00YjU3LWJlNzQtYzM5YzJhYzA5NWMwCUR1cmF0aW9uOiAyODU0OS42NSBtcwlCaWxsZWQgRHVyYXRpb246IDI4NjAwIG1zCU1lbW9yeSBTaXplOiAyNTYgTUIJTWF4IE1lbW9yeSBVc2VkOiAyNTYgTUIJSW5pdCBEdXJhdGlvbjogMTc2MC45NCBtcwkK"",  ""x-amzn-Remapped-Content-Length"": ""0"",  ""x-amzn-RequestId"": ""4f0c1714-4a99-4b57-be74-c39c2ac095c0"",  ""X-Amzn-Trace-Id"": ""root=1-5df98fa4-e567171aae2c953ead355ccc;sampled=0""  },  ""HttpStatusCode"": 200  },  ""SdkResponseMetadata"": {  ""RequestId"": ""4f0c1714-4a99-4b57-be74-c39c2ac095c0""  },  ""StatusCode"": 200  } }"


In [567]:
workflow.list_executions(html=True)

Name,Status,Started,End Time
10990850-1271-42bf-847f-c745146f0c5a,SUCCEEDED,"Dec 18, 2019 02:32:04.619 AM","Dec 18, 2019 02:38:38.410 AM"


In [568]:
workflow.list_workflows(html=True)

Name,Creation Date
WorkshopMgmtManageAccounts,"Oct 14, 2019 02:05:27.801 PM"
ml-pipelines-sample-price-estimation_v1,"Dec 18, 2019 02:32:04.540 AM"
training-pipeline-2019-11-15-13-57-27,"Nov 15, 2019 01:58:50.548 PM"


In [569]:
print(Workflow.get_cloudformation_template(workflow))

AWSTemplateFormatVersion: '2010-09-09'
Description: CloudFormation template for AWS Step Functions - State Machine
Resources:
  StateMachineComponent:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      StateMachineName: ml-pipelines-sample-price-estimation_v1
      DefinitionString: |-
        {
          "StartAt": "Preparing data (Lambda)",
          "States": {
            "Preparing data (Lambda)": {
              "Parameters": {
                "FunctionName": "ml-pipelines-data-transformation-lambda",
                "Payload": {
                  "JobName.$": "$$.Execution.Input['JobName']"
                }
              },
              "Resource": "arn:aws:states:::lambda:invoke",
              "Type": "Task",
              "Next": "Training (SageMaker)",
              "Retry": [
                {
                  "ErrorEquals": [
                    "States.TaskFailed"
                  ],
                  "IntervalSeconds": 15,
                  "MaxAttempts