* changed by nov05 on 2024-11-24
* [Udacity solution video](https://youtu.be/rKUxX033KaU)  

# UDACITY Designing Your First Workflow - Tying it All Together

AWS comprises many services, and one of the main skills you'll develop as an ML Engineer working in AWS is in chaining these services together to accomplish specific data engineering goals. With Lambda, you've learned how to launch serverless jobs, and with Step Functions, you've learned how to create a workflow that chains jobs together. Now, you'll learn how to launch a Step Function using a Lambda job. 

Before starting this, it's important to highlight that this is not the only way to accomplish something like this. Multiple services integrate with Step Functions, and so it follows that there are multiple ways to launch Step Functions. These services, among others, include `API Gateway`, `EventBridge`, and even other `Step Functions`. 

Your task is to `create a new lambda function that will launch the state machine` you created in the **last exercise**. You'll then launch this lambda function from the command line. To find the definition of the step function you've made, click into the step function and look for the definition under the 'Definition' tab. 

First, create a new Lambda role. Attach to this role the **StepFunctionsFullAccess** policy. Then create a new lambda function under the default template, and attach this new role to it. Use the starter code below to help you modify the lambda handler to accomplish your task. 

As Step Function cannot execute more than once with the same name, you must update the definition with a new name. You can find the existing definition of a Step Function in the AWS Console under 'Step Functions'. In the lambda function code below, update the 'definition' with the step function definition from your last exercise, with the only difference being the, step fucntion name, processing-job name and the training-job name. 

## Exercise: Create the Lambda Function

In [None]:
## copy the code in this cell to the lambda function
import boto3
import json
import time
from datetime import datetime
import random


client = boto3.client('stepfunctions')
state_machine_arn = "arn:aws:states:us-east-1:807711953667:stateMachine:udacity_step_handson"
# todo, copy the definition from the last exercise and paste it below. 
# Also change the names of step function, training job and processing job. 
response = client.describe_state_machine(
    stateMachineArn=state_machine_arn
)
definition = json.loads(response['definition'])
## create names with datetime and 3 random digits to make them unique
definition['States']['SageMaker Preprocessing Step']['Parameters']['ProcessingJobName'] \
    = f"udacity-step-preprocess-{datetime.now().strftime('%Y%m%d%H%M%S')}-{random.randint(100, 999)}"
definition['States']['SageMaker Training Step']['Parameters']['TrainingJobName'] \
    = f"udacity-step-train-{datetime.now().strftime('%Y%m%d%H%M%S')}-{random.randint(100, 999)}"
# execution_name = f'udacity-lambda-step-{datetime.now().strftime('%Y%m%d%H%M%S')}-{random.randint(100, 999)}'


def lambda_handler(event, context):
    # todo 
    client.update_state_machine(
        definition=json.dumps(definition), 
        stateMachineArn=state_machine_arn) 
    # Give AWS time to register the defintion
    time.sleep(5)
    #todo
    client.start_execution(
        input='{}',  ## the least input is {}
        # name=execution_name,  ## optional 
        stateMachineArn=state_machine_arn) 
    return {
        'statusCode': 200,
        'body': 'The step function launched successfully!'
    }

In [9]:
## test code; do not copy to the lambda function
import boto3
import json
from datetime import datetime
import random

client = boto3.client('stepfunctions')
state_machine_arn = "arn:aws:states:us-east-1:807711953667:stateMachine:udacity_step_handson"
response = client.describe_state_machine(
    stateMachineArn=state_machine_arn
)
print(type(response['definition']))  ## str
definition = json.loads(response['definition'])
print(definition['States']['SageMaker Preprocessing Step']['Parameters']['ProcessingJobName'])
definition['States']['SageMaker Preprocessing Step']['Parameters']['ProcessingJobName'] \
    = f"udacity-step-preprocess-{datetime.now().strftime('%Y%m%d%H%M%S')}-{random.randint(100, 999)}"
definition['States']['SageMaker Training Step']['Parameters']['TrainingJobName'] \
    = f"udacity-step-train-{datetime.now().strftime('%Y%m%d%H%M%S')}-{random.randint(100, 999)}"
print(json.dumps(definition))

<class 'str'>
udacity-step-preprocess-20241124035109-152
{"StartAt": "SageMaker Preprocessing Step", "States": {"SageMaker Preprocessing Step": {"Resource": "arn:aws:states:::sagemaker:createProcessingJob.sync", "Parameters": {"ProcessingJobName": "udacity-step-preprocess-20241124085631-791", "ProcessingInputs": [{"InputName": "input_data", "AppManaged": false, "S3Input": {"S3Uri": "s3://sagemaker-studio-807711953667-mmx0am1bt28/step_upload/reviews_Musical_Instruments_5.json.zip", "LocalPath": "/opt/ml/processing/input", "S3DataType": "S3Prefix", "S3InputMode": "File", "S3DataDistributionType": "FullyReplicated", "S3CompressionType": "None"}}, {"InputName": "input_code", "AppManaged": false, "S3Input": {"S3Uri": "s3://sagemaker-studio-807711953667-mmx0am1bt28/step_upload/HelloBlazePreprocess.py", "LocalPath": "/opt/ml/processing/input/code", "S3DataType": "S3Prefix", "S3InputMode": "File", "S3DataDistributionType": "FullyReplicated", "S3CompressionType": "None"}}], "ProcessingOutputCon

## Exercise: Launch the Lambda Function


Launch the lambda function and confirm the step function is created successfully.

## Conceptual Exercise: What are next steps? 

Right now, the Step Function that we made in the prior exercise has a hard-coded location of the dataset we input, as well as all of the locations of the intermediary steps. What are ways that you could modify the Step Function to make it more generalizable? If you could input an S3 location, how could you integrate it with Lambda so that it could asychronously be called? 