# Managing Machine Learning Workflows and Deployments

![](https://drive.google.com/uc?id=1ectGaJhXUbii8go0lbrbWoCyNdqldrAF&authuser=recohut.data.001%40gmail.com&usp=drive_fs)

We will cover the following recipes in this tutorial:

- Working with Hugging Face models
- Preparing the prerequisites of a multi-model endpoint deployment
- Hosting multiple models with multi-model endpoints
- Setting up A/B testing on multiple models with production variants
- Preparing the Step Functions execution role
- Managing ML workflows with AWS Step Functions and the Data Science SDK
- Managing ML workflows with SageMaker Pipelines

## Setup

In [None]:
# install tree command (helpful in printing folder structures)
!apt-get install tree

# setup AWS cli
!mkdir -p ~/.aws && cp /content/drive/MyDrive/AWS/d01_admin/* ~/.aws
!chmod 600 ~/.aws/credentials
!pip install awscli

# install boto3 and sagemaker
!pip install boto3
!pip install sagemaker

# install dependencies
!pip install pyathena
!pip install awswrangler
!pip install smclarify
!pip install sagemaker-experiments
!pip install sagemaker-tensorflow
!pip install smclarify
!pip install stepfunctions

# install nlp libs
!pip install transformers

In [1]:
# imports
import boto3
import sagemaker
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import io
import os
import sys
import time
import json
from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput
from sagemaker.serializers import JSONSerializer
from sagemaker.multidatamodel import MultiDataModel
from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import JSONDeserializer

In [2]:
# global variables
role = "sagemakerRole"
prefix = "sagemaker-exp11060345"
training_instance_type = "ml.m5.xlarge"

In [3]:
# setup sagemaker session
sess = sagemaker.Session()
bucket = sess.default_bucket()
region = boto3.Session().region_name
account_id = boto3.client("sts").get_caller_identity().get("Account")
role_arn = "arn:aws:iam::{}:role/{}".format(account_id, role)
sm = boto3.Session().client(service_name="sagemaker", region_name=region)
s3 = boto3.Session().client(service_name="s3", region_name=region)
runtime_sm_client = boto3.client('sagemaker-runtime')

In [9]:
!git clone https://github.com/RecoHut-Datasets/synthetic_text.git

Cloning into 'synthetic_text'...
remote: Enumerating objects: 13, done.[K
remote: Counting objects: 100% (13/13), done.[K
remote: Compressing objects: 100% (11/11), done.[K
remote: Total 13 (delta 1), reused 13 (delta 1), pack-reused 0[K
Unpacking objects: 100% (13/13), done.


## Working with Hugging Face models

In [6]:
s3_train_data = 's3://{}/{}/input/{}'.format(
    bucket, 
    prefix, 
    "synthetic.train.txt"
)
s3_validation_data = 's3://{}/{}/input/{}'.format(
    bucket, 
    prefix, 
    "synthetic.validation.txt"
)

In [None]:
!aws s3 cp synthetic_text/synthetic.train.txt {s3_train_data}
!aws s3 cp synthetic_text/synthetic.validation.txt {s3_validation_data}

upload: synthetic_text/synthetic.train.txt to s3://sagemaker-us-east-1-390354360073/sagemaker-exp11060345/input/synthetic.train.txt
upload: synthetic_text/synthetic.validation.txt to s3://sagemaker-us-east-1-390354360073/sagemaker-exp11060345/input/synthetic.validation.txt


In [None]:
from sagemaker.huggingface import HuggingFace

hyperparameters = {
    'epochs': 1,
    'train_batch_size': 32,
    'model_name':'distilbert-base-uncased'
}

estimator = HuggingFace(
    entry_point='train.py',
    source_dir='./synthetic_text',
    instance_type='ml.p3.2xlarge',
    instance_count=1,
    role=role,
    transformers_version='4.4',
    pytorch_version='1.6',
    py_version='py36',
    hyperparameters=hyperparameters
)

In [None]:
train_data = TrainingInput(s3_train_data)
validation_data = TrainingInput(s3_validation_data)

data_channels = {
    'train': train_data, 
    'valid': validation_data
}

In [None]:
%%time

estimator.fit(data_channels)

In [None]:
from sagemaker.pytorch.model import PyTorchModel

model_data = estimator.model_data

model = PyTorchModel(
    model_data=model_data, 
    role=role, 
    source_dir="synthetic_text",
    entry_point='inference.py', 
    framework_version='1.6.0',
    py_version="py3"
)

In [None]:
%%time

predictor = model.deploy(
    instance_type='ml.m5.xlarge', 
    initial_instance_count=1
)

In [None]:
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

predictor.serializer = JSONSerializer()
predictor.deserializer = JSONDeserializer()

test_data = {
    "text": "This tastes bad. I hate this place."
}

predictor.predict(test_data)

test_data = {
    "text": "Very delicious. I would recommend this to my friends"
}

predictor.predict(test_data)

In [None]:
predictor.delete_endpoint()

## Preparing the prerequisites of a multi-model endpoint deployment

In [10]:
model_a_s3_path = f"s3://{bucket}/{prefix}/files/model.a.tar.gz"
model_b_s3_path = f"s3://{bucket}/{prefix}/files/model.b.tar.gz"

!aws s3 cp synthetic_text/model.a.tar.gz {model_a_s3_path}
!aws s3 cp synthetic_text/model.b.tar.gz {model_b_s3_path}

upload: synthetic_text/model.a.tar.gz to s3://sagemaker-us-east-1-390354360073/sagemaker-exp11060345/files/model.a.tar.gz
upload: synthetic_text/model.b.tar.gz to s3://sagemaker-us-east-1-390354360073/sagemaker-exp11060345/files/model.b.tar.gz


## Hosting multiple models with multi-model endpoints

In [11]:
image_uri = sagemaker.image_uris.retrieve("xgboost", region, "0.90-2")
image_uri

'683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:0.90-2-cpu-py3'

In [13]:
models_path = f"s3://{bucket}/model-artifacts/"

multi_model = MultiDataModel(
    name=prefix+"-multi",
    model_data_prefix=models_path, 
    image_uri=image_uri,
    role=role
)

multi_model.add_model(model_a_s3_path)
multi_model.add_model(model_b_s3_path)

model_a, model_b = list(
    multi_model.list_models()
)

print(model_a)
print(model_b)

sagemaker-exp11060345/files/model.a.tar.gz
sagemaker-exp11060345/files/model.b.tar.gz


In [15]:
%%time

endpoint_name = prefix+"-mma"

multi_model.deploy(
    initial_instance_count=1, 
    instance_type='ml.t2.medium', 
    endpoint_name=endpoint_name
)

-------------------!CPU times: user 2.29 s, sys: 402 ms, total: 2.69 s
Wall time: 9min 39s


![](https://drive.google.com/uc?id=1egWbE0IKoJEQbee7gENUff4a_ayTyhet&authuser=recohut.data.001%40gmail.com&usp=drive_fs)

In [16]:
predictor = Predictor(
    endpoint_name=endpoint_name
)

predictor.serializer = CSVSerializer()
predictor.deserializer = JSONDeserializer()

In [17]:
predictor.predict(data="10,-5", target_model=model_a)

[0.895996630191803]

In [18]:
predictor.predict(data="10,-5", target_model=model_b)

[0.8308258652687073]

## Setting up AB testing on multiple models with production variants

In [20]:
image_uri = sagemaker.image_uris.retrieve("xgboost", region, "0.90-2")

image_uri_a = image_uri
image_uri_b = image_uri

container1 = { 
    'Image': image_uri_a,
    'ContainerHostname': 'containerA',
    'ModelDataUrl': model_a_s3_path
}

container2 = { 
    'Image': image_uri_b,
    'ContainerHostname': 'containerB',
    'ModelDataUrl': model_b_s3_path
}

In [22]:
model_name_a = "ab-model-a"
model_name_b = "ab-model-b"
endpoint_config_name = 'ab-endpoint-config'
endpoint_name = 'ab-endpoint'

try:
    sm.delete_model(ModelName=model_name_a)
    sm.delete_model(ModelName=model_name_b)
except:
    pass

response = sm.create_model(
    ModelName = model_name_a,
    ExecutionRoleArn = role_arn,
    Containers = [container1])

print(response)

response = sm.create_model(
    ModelName = model_name_b,
    ExecutionRoleArn = role_arn,
    Containers = [container2])

print(response)

{'ModelArn': 'arn:aws:sagemaker:us-east-1:390354360073:model/ab-model-a', 'ResponseMetadata': {'RequestId': 'ff521792-ee40-4eed-a4b1-58793d036966', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'ff521792-ee40-4eed-a4b1-58793d036966', 'content-type': 'application/x-amz-json-1.1', 'content-length': '72', 'date': 'Sat, 11 Jun 2022 03:52:18 GMT'}, 'RetryAttempts': 0}}
{'ModelArn': 'arn:aws:sagemaker:us-east-1:390354360073:model/ab-model-b', 'ResponseMetadata': {'RequestId': '700bd215-5f1f-48cc-831e-a868b56ce156', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '700bd215-5f1f-48cc-831e-a868b56ce156', 'content-type': 'application/x-amz-json-1.1', 'content-length': '72', 'date': 'Sat, 11 Jun 2022 03:52:20 GMT'}, 'RetryAttempts': 2}}


In [23]:
from sagemaker.session import production_variant

variant1 = production_variant(
    model_name=model_name_a,
    instance_type="ml.t2.medium",
    initial_instance_count=1,
    variant_name='VariantA',
    initial_weight=0.5
)
                              
variant2 = production_variant(
    model_name=model_name_b,
    instance_type="ml.t2.medium",
    initial_instance_count=1,
    variant_name='VariantB',
    initial_weight=0.5
)

In [24]:
sess.endpoint_from_production_variants(
    name=endpoint_name,
    production_variants=[variant1, variant2]
)

-------------------!

'ab-endpoint'

In [26]:
body = "10,-5"

def test_ab_testing_setup():
    response = runtime_sm_client.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='text/csv',
        Body=body
    )
    
    variant = response['InvokedProductionVariant']
    b = response['Body'].read()
    prediction = b.decode("utf-8")

    print(variant + " - "+ prediction)

for _ in range(0,10):
    test_ab_testing_setup()
    time.sleep(1)

VariantA - 0.895996630191803
VariantB - 0.8308258652687073
VariantB - 0.8308258652687073
VariantB - 0.8308258652687073
VariantA - 0.895996630191803
VariantA - 0.895996630191803
VariantB - 0.8308258652687073
VariantA - 0.895996630191803
VariantA - 0.895996630191803
VariantA - 0.895996630191803


In [27]:
def test_direct_call():
    response = runtime_sm_client.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='text/csv',
        TargetVariant='VariantB',
        Body=body
    )

    variant = response['InvokedProductionVariant']
    b = response['Body'].read()
    prediction = b.decode("utf-8")

    print(variant + " - "+ prediction)

for _ in range(0,10):
    test_direct_call()
    time.sleep(1)

VariantB - 0.8308258652687073
VariantB - 0.8308258652687073
VariantB - 0.8308258652687073
VariantB - 0.8308258652687073
VariantB - 0.8308258652687073
VariantB - 0.8308258652687073
VariantB - 0.8308258652687073
VariantB - 0.8308258652687073
VariantB - 0.8308258652687073
VariantB - 0.8308258652687073


## Managing ML workflows with AWS Step Functions and the Data Science SDK

In [28]:
%%writefile management_experience_and_salary.csv
last_name,management_experience_months,monthly_salary
Taylor,65,1630
Wang,61,1330
Brown,38,1290
Harris,71,1480
Jones,94,1590
Garcia,93,1750
Williams,15,1020
Lee,56,1290
White,59,1430
Tan,7,960
Chen,14,1090
Kim,67,1340
Davis,29,1170
James,49,1390
Perez,46,1240
Cruz,73,1390
Smith,19,960
Thompson,22,1040
Joseph,32,1090
Singh,37,1300

Writing management_experience_and_salary.csv


In [4]:
df_all_data = pd.read_csv("management_experience_and_salary.csv")
df_all_data

Unnamed: 0,last_name,management_experience_months,monthly_salary
0,Taylor,65,1630
1,Wang,61,1330
2,Brown,38,1290
3,Harris,71,1480
4,Jones,94,1590
5,Garcia,93,1750
6,Williams,15,1020
7,Lee,56,1290
8,White,59,1430
9,Tan,7,960


In [5]:
from sklearn.model_selection import train_test_split

dad = df_all_data

X = dad['management_experience_months'].values 
y = dad['monthly_salary'].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.3, random_state=0
)

df_training_data = pd.DataFrame({ 
    'monthly_salary': y_train, 
    'management_experience_months': X_train
})

df_training_data

Unnamed: 0,monthly_salary,management_experience_months
0,1020,15
1,1390,49
2,1590,94
3,1290,38
4,1750,93
5,1240,46
6,960,7
7,1290,56
8,960,19
9,1340,67


In [32]:
tn = "training_data.csv"

df_training_data.to_csv(tn, header=False, index=False)
dest = f"s3://{bucket}/{prefix}/input/{tn}"
!aws s3 cp {tn} {dest}

Completed 109 Bytes/109 Bytes (226 Bytes/s) with 1 file(s) remainingupload: ./training_data.csv to s3://sagemaker-us-east-1-390354360073/sagemaker-exp11060345/input/training_data.csv


In [6]:
training_s3_input_location = f"s3://{bucket}/{prefix}/input/training_data.csv" 
training_s3_output_location = f"s3://{bucket}/{prefix}/output/"

train = TrainingInput(
    training_s3_input_location, 
    content_type="text/csv"
)

In [7]:
container = sagemaker.image_uris.retrieve("linear-learner", region, "1")
container

'382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:1'

In [8]:
estimator = sagemaker.estimator.Estimator(
    container,
    role, 
    instance_count=1, 
    instance_type='ml.m5.xlarge',
    output_path=training_s3_output_location,
    sagemaker_session=sess
)

In [9]:
estimator.set_hyperparameters(
    predictor_type='regressor', 
    mini_batch_size=4
)

In [10]:
from stepfunctions.inputs import ExecutionInput
from stepfunctions.steps import TrainingStep
from stepfunctions.steps import ModelStep
from stepfunctions.steps import EndpointConfigStep
from stepfunctions.steps import EndpointStep
from stepfunctions.steps import Chain

execution_input = ExecutionInput(
    schema={ 
        'ModelName': str,
        'EndpointName': str,
        'JobName': str
    }
)

ei = execution_input

training_step = TrainingStep(
    'Training Step', 
    estimator=estimator,
    data={
        'train': train
    },
    job_name=ei['JobName']
)

model_step = ModelStep(
    'Model Step',
    model=training_step.get_expected_model(),
    model_name=ei['ModelName']  
)

endpoint_config_step = EndpointConfigStep(
    "Create Endpoint Configuration",
    endpoint_config_name=ei['ModelName'],
    model_name=ei['ModelName'],
    initial_instance_count=1,
    instance_type='ml.m5.xlarge'
)

endpoint_step = EndpointStep(
    "Deploy Endpoint",
    endpoint_name=ei['EndpointName'],
    endpoint_config_name=ei['ModelName']
)

workflow_definition = Chain([
    training_step,
    model_step,
    endpoint_config_step,
    endpoint_step
])

In [12]:
import uuid

def generate_random_string():
    return uuid.uuid4().hex

grs = generate_random_string
grs

<function __main__.generate_random_string>

In [18]:
from stepfunctions.workflow import Workflow

workflow = Workflow(
    name='{}-{}'.format('Workflow', grs()),
    definition=workflow_definition,
    role=role_arn,
    execution_input=execution_input
)

workflow.create()

'arn:aws:states:us-east-1:390354360073:stateMachine:Workflow-70f09e896cd14cb6bdc9de19865a128e'

In [19]:
execution = workflow.execute(
    inputs={
        'JobName': 'll-{}'.format(grs()),
        'ModelName': 'll-{}'.format(grs()),
        'EndpointName': 'll-{}'.format(grs())
    }
)

execution.list_events()

[{'executionStartedEventDetails': {'input': '{\n    "JobName": "ll-7ddb8f23e086492597c55cc212d2b114",\n    "ModelName": "ll-3f3a9209f5d04c64b47650998978b580",\n    "EndpointName": "ll-4ff5949f09c9458284d8acb606056397"\n}',
   'inputDetails': {'truncated': False},
   'roleArn': 'arn:aws:iam::390354360073:role/sagemakerRole'},
  'id': 1,
  'previousEventId': 0,
  'timestamp': datetime.datetime(2022, 6, 11, 4, 37, 54, 815000, tzinfo=tzlocal()),
  'type': 'ExecutionStarted'},
 {'id': 2,
  'previousEventId': 0,
  'stateEnteredEventDetails': {'input': '{\n    "JobName": "ll-7ddb8f23e086492597c55cc212d2b114",\n    "ModelName": "ll-3f3a9209f5d04c64b47650998978b580",\n    "EndpointName": "ll-4ff5949f09c9458284d8acb606056397"\n}',
   'inputDetails': {'truncated': False},
   'name': 'Training Step'},
  'timestamp': datetime.datetime(2022, 6, 11, 4, 37, 54, 856000, tzinfo=tzlocal()),
  'type': 'TaskStateEntered'},
 {'id': 3,
  'previousEventId': 2,
  'taskScheduledEventDetails': {'parameters': '{"

In [20]:
events = execution.list_events()
pd.json_normalize(events)

Unnamed: 0,timestamp,type,id,previousEventId,executionStartedEventDetails.input,executionStartedEventDetails.inputDetails.truncated,executionStartedEventDetails.roleArn,stateEnteredEventDetails.name,stateEnteredEventDetails.input,stateEnteredEventDetails.inputDetails.truncated,...,taskScheduledEventDetails.region,taskScheduledEventDetails.parameters,taskStartedEventDetails.resourceType,taskStartedEventDetails.resource,taskSubmitFailedEventDetails.resourceType,taskSubmitFailedEventDetails.resource,taskSubmitFailedEventDetails.error,taskSubmitFailedEventDetails.cause,executionFailedEventDetails.error,executionFailedEventDetails.cause
0,2022-06-11 04:37:54.815000+00:00,ExecutionStarted,1,0,"{\n ""JobName"": ""ll-7ddb8f23e086492597c55cc2...",False,arn:aws:iam::390354360073:role/sagemakerRole,,,,...,,,,,,,,,,
1,2022-06-11 04:37:54.856000+00:00,TaskStateEntered,2,0,,,,Training Step,"{\n ""JobName"": ""ll-7ddb8f23e086492597c55cc2...",False,...,,,,,,,,,,
2,2022-06-11 04:37:54.856000+00:00,TaskScheduled,3,2,,,,,,,...,us-east-1,"{""AlgorithmSpecification"":{""TrainingImage"":""38...",,,,,,,,
3,2022-06-11 04:37:55.028000+00:00,TaskStarted,4,3,,,,,,,...,,,sagemaker,createTrainingJob.sync,,,,,,
4,2022-06-11 04:37:55.122000+00:00,TaskSubmitFailed,5,4,,,,,,,...,,,,,sagemaker,createTrainingJob.sync,SageMaker.AmazonSageMakerException,2 validation errors detected: Value 'sagemaker...,,
5,2022-06-11 04:37:55.122000+00:00,ExecutionFailed,6,5,,,,,,,...,,,,,,,,,SageMaker.AmazonSageMakerException,2 validation errors detected: Value 'sagemaker...


In [21]:
workflow.__dict__

{'client': <botocore.client.SFN at 0x7f30ec16ae90>,
 'comment': None,
 'definition': Graph(timeout_seconds=None, comment=None, version=None),
 'format_json': True,
 'name': 'Workflow-70f09e896cd14cb6bdc9de19865a128e',
 'role': 'arn:aws:iam::390354360073:role/sagemakerRole',
 'state_machine_arn': 'arn:aws:states:us-east-1:390354360073:stateMachine:Workflow-70f09e896cd14cb6bdc9de19865a128e',
 'tags': [],
 'timeout_seconds': None,
 'version': None,
 'workflow_input': <stepfunctions.inputs.placeholders.ExecutionInput at 0x7f30ede5df50>}

In [22]:
print(workflow.definition.to_json(pretty=True))

{
    "StartAt": "Training Step",
    "States": {
        "Training Step": {
            "Resource": "arn:aws:states:::sagemaker:createTrainingJob.sync",
            "Parameters": {
                "AlgorithmSpecification": {
                    "TrainingImage": "382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:1",
                    "TrainingInputMode": "File"
                },
                "OutputDataConfig": {
                    "S3OutputPath": "s3://sagemaker-us-east-1-390354360073/sagemaker-exp11060345/output/"
                },
                "StoppingCondition": {
                    "MaxRuntimeInSeconds": 86400
                },
                "ResourceConfig": {
                    "InstanceCount": 1,
                    "InstanceType": "ml.m5.xlarge",
                    "VolumeSizeInGB": 30
                },
                "RoleArn": "sagemakerRole",
                "InputDataConfig": [
                    {
                        "DataSource": {
    

## Managing ML workflows with SageMaker Pipelines

In [23]:
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
)

processing_instance_type = ParameterString(
    name="ProcessingInstanceType", 
    default_value="ml.m5.xlarge"
)

training_instance_type = ParameterString(
    name="TrainingInstanceType", 
    default_value="ml.m5.xlarge"
)

input_data_uri = f"s3://{bucket}/{prefix}/input/management_experience_and_salary.csv"

!aws s3 cp management_experience_and_salary.csv {input_data_uri}

input_data = ParameterString(
    name="InputData",
    default_value=input_data_uri,
)

In [41]:
from sagemaker.sklearn.processing import SKLearnProcessor

sklearn_processor = SKLearnProcessor(
    framework_version="0.23-1",
    instance_type="ml.m5.xlarge",
    instance_count=1,
    role=role,
)

In [42]:
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep

step_process = ProcessingStep(
    name="ProcessingStep",
    processor=sklearn_processor,
    inputs=[
        ProcessingInput(
            source=input_data, 
            destination="/opt/ml/processing/input"
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="output", 
            source="/opt/ml/processing/output"
        ),
    ],
    code="synthetic_text/preprocessing.py",
)

In [43]:
model_path = f"s3://{bucket}/{prefix}/model"

container = sagemaker.image_uris.retrieve("linear-learner", region, "1")

estimator = sagemaker.estimator.Estimator(
    container,
    role, 
    instance_count=1, 
    instance_type='ml.m5.xlarge',
    output_path=model_path,
    sagemaker_session=sess
)

estimator.set_hyperparameters(
    predictor_type='regressor', 
    mini_batch_size=4
)

In [44]:
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep

s3_input_data = step_process.properties.ProcessingOutputConfig.Outputs["output"].S3Output.S3Uri

step_train = TrainingStep(
    name="TrainStep",
    estimator=estimator,
    inputs={
        "train": TrainingInput(
            s3_data=s3_input_data,
            content_type="text/csv",
        )
    },
)

In [45]:
from sagemaker.workflow.pipeline import Pipeline

pipeline_name = f"Pipeline"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        processing_instance_type,
        training_instance_type,
        input_data,
    ],
    steps=[step_process, step_train],
)

In [46]:
pipeline.upsert(role_arn=role_arn)

{'PipelineArn': 'arn:aws:sagemaker:us-east-1:390354360073:pipeline/pipeline',
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '76',
   'content-type': 'application/x-amz-json-1.1',
   'date': 'Sat, 11 Jun 2022 05:46:43 GMT',
   'x-amzn-requestid': 'e2e927e5-0dfe-4a5d-a3d1-3591fdb1a958'},
  'HTTPStatusCode': 200,
  'RequestId': 'e2e927e5-0dfe-4a5d-a3d1-3591fdb1a958',
  'RetryAttempts': 0}}

In [47]:
execution = pipeline.start()

In [48]:
execution.describe()

{'CreatedBy': {},
 'CreationTime': datetime.datetime(2022, 6, 11, 5, 46, 51, 466000, tzinfo=tzlocal()),
 'LastModifiedBy': {},
 'LastModifiedTime': datetime.datetime(2022, 6, 11, 5, 46, 51, 466000, tzinfo=tzlocal()),
 'PipelineArn': 'arn:aws:sagemaker:us-east-1:390354360073:pipeline/pipeline',
 'PipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:390354360073:pipeline/pipeline/execution/tn1sme3j6jdv',
 'PipelineExecutionDisplayName': 'execution-1654926411537',
 'PipelineExecutionStatus': 'Executing',
 'PipelineExperimentConfig': {'ExperimentName': 'pipeline',
  'TrialName': 'tn1sme3j6jdv'},
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '465',
   'content-type': 'application/x-amz-json-1.1',
   'date': 'Sat, 11 Jun 2022 05:46:53 GMT',
   'x-amzn-requestid': '663a8938-80ca-4b2e-bcd6-1a5c8bc9f631'},
  'HTTPStatusCode': 200,
  'RequestId': '663a8938-80ca-4b2e-bcd6-1a5c8bc9f631',
  'RetryAttempts': 0}}

In [49]:
execution.wait()

In [50]:
execution.list_steps()

[{'AttemptCount': 0,
  'EndTime': datetime.datetime(2022, 6, 11, 5, 53, 52, 430000, tzinfo=tzlocal()),
  'Metadata': {'TrainingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:390354360073:training-job/pipelines-tn1sme3j6jdv-trainstep-ntcvurrm57'}},
  'StartTime': datetime.datetime(2022, 6, 11, 5, 51, 7, 972000, tzinfo=tzlocal()),
  'StepName': 'TrainStep',
  'StepStatus': 'Succeeded'},
 {'AttemptCount': 0,
  'EndTime': datetime.datetime(2022, 6, 11, 5, 51, 7, 261000, tzinfo=tzlocal()),
  'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:390354360073:processing-job/pipelines-tn1sme3j6jdv-processingstep-cr5uebwizj'}},
  'StartTime': datetime.datetime(2022, 6, 11, 5, 46, 52, 872000, tzinfo=tzlocal()),
  'StepName': 'ProcessingStep',
  'StepStatus': 'Succeeded'}]

![](https://drive.google.com/uc?id=1elEIgg3uKX7kopkXI4mplNcClONFmKIf&authuser=recohut.data.001%40gmail.com&usp=drive_fs)

In [52]:
from sagemaker.lineage.visualizer import LineageTableVisualizer

session = sagemaker.session.Session()
viz = LineageTableVisualizer(session)
ess = reversed(execution.list_steps())

for execution_step in ess:
    print(execution_step)
    display(viz.show(
        pipeline_execution_step=execution_step
    ))
    time.sleep(5)

{'StepName': 'ProcessingStep', 'StartTime': datetime.datetime(2022, 6, 11, 5, 46, 52, 872000, tzinfo=tzlocal()), 'EndTime': datetime.datetime(2022, 6, 11, 5, 51, 7, 261000, tzinfo=tzlocal()), 'StepStatus': 'Succeeded', 'AttemptCount': 0, 'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:390354360073:processing-job/pipelines-tn1sme3j6jdv-processingstep-cr5uebwizj'}}}


Unnamed: 0,Name/Source,Direction,Type,Association Type,Lineage Type
0,s3://...7d401254ebc5/input/code/preprocessing.py,Input,DataSet,ContributedTo,artifact
1,s3://...put/management_experience_and_salary.csv,Input,DataSet,ContributedTo,artifact
2,68331...om/sagemaker-scikit-learn:0.23-1-cpu-py3,Input,Image,ContributedTo,artifact
3,s3://...497b00271bc6817d401254ebc5/output/output,Output,DataSet,Produced,artifact


{'StepName': 'TrainStep', 'StartTime': datetime.datetime(2022, 6, 11, 5, 51, 7, 972000, tzinfo=tzlocal()), 'EndTime': datetime.datetime(2022, 6, 11, 5, 53, 52, 430000, tzinfo=tzlocal()), 'StepStatus': 'Succeeded', 'AttemptCount': 0, 'Metadata': {'TrainingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:390354360073:training-job/pipelines-tn1sme3j6jdv-trainstep-ntcvurrm57'}}}


Unnamed: 0,Name/Source,Direction,Type,Association Type,Lineage Type
0,s3://...497b00271bc6817d401254ebc5/output/output,Input,DataSet,ContributedTo,artifact
1,38241...us-east-1.amazonaws.com/linear-learner:1,Input,Image,ContributedTo,artifact
2,s3://...TrainStep-NTCVUrRm57/output/model.tar.gz,Output,Model,Produced,artifact


In [53]:
from pprint import pprint

pprint(pipeline.describe())

{'CreatedBy': {},
 'CreationTime': datetime.datetime(2022, 6, 11, 4, 53, 56, 280000, tzinfo=tzlocal()),
 'LastModifiedBy': {},
 'LastModifiedTime': datetime.datetime(2022, 6, 11, 5, 53, 52, 991000, tzinfo=tzlocal()),
 'PipelineArn': 'arn:aws:sagemaker:us-east-1:390354360073:pipeline/pipeline',
 'PipelineDefinition': '{"Version": "2020-12-01", "Metadata": {}, '
                       '"Parameters": [{"Name": "ProcessingInstanceType", '
                       '"Type": "String", "DefaultValue": "ml.m5.xlarge"}, '
                       '{"Name": "TrainingInstanceType", "Type": "String", '
                       '"DefaultValue": "ml.m5.xlarge"}, {"Name": "InputData", '
                       '"Type": "String", "DefaultValue": '
                       '"s3://sagemaker-us-east-1-390354360073/sagemaker-exp11060345/input/management_experience_and_salary.csv"}], '
                       '"PipelineExperimentConfig": {"ExperimentName": {"Get": '
                       '"Execution.PipelineName"}, 

That's all.

Thanks for your attention.