# UDACITY Designing Your First Workflow - Tying it All Together

AWS is comprised of many services, and one of the main skills you'll develop as an ML Engineer working in AWS is in chaining these services together to accomplish specific data engineering goals. With Lambda, you've learned how to launch serverless jobs, and with Step Functions, you've learned how to create a workflow that chains jobs together. Now, you'll learn how to launch a Step Function using a Lambda job. 

Before starting this, it's important to highlight that this is not the only way to accomplish something like this. Multiple services integrate with Step Functions, and so it follows that there are multiple ways to launch Step Functions. These services, among others, include API Gateway, EventBridge, and even other Step Functions. 

Your task is to create a new lambda function that will launch the state machine you created in the **last exercise**. You'll then launch this lambda function from the command line. To find the definition of the step function you've made, click into the step function and look for the definition under the 'Definition' tab. 

First, create a new Lambda role. Attach to this role the StepFunctionsFullAccess policy. Then create a new lambda function under the default template, and attach this new role to it. Use the starter code below to help you modify the lambda handler to accomplish your task. 

As Step Function cannot execute more than once with the same name, you must update the definition with a new name. You can find the existing definition of a Step Function in the AWS Console under 'Step Functions'. In the lambda function code below, update the 'definition' with the step function definition from your last exercise, with the only difference being the, step fucntion name, processing-job name and the training-job name. 

## Exercise: Create the Lambda Function

In [38]:
import json
import boto3
import time

client = boto3.client('stepfunctions')

# todo, copy the definition from the last exercise and paste it below. 
# Also change the names of step function, training job and processing job. 

run_time = int(time.time())


definition = """{
  "StartAt": "SageMaker pre-processing step 4",
  "States": {
    "SageMaker pre-processing step 4": {
      "Resource": "arn:aws:states:::sagemaker:createProcessingJob.sync",
      "Parameters": {
        "ProcessingJobName": "PreprocessingJob-%s",
        "ProcessingInputs": [
          {
            "InputName": "input-1",
            "AppManaged": false,
            "S3Input": {
              "S3Uri": "s3://udacity-landingzone/lesson3-stepfunction/input/Toys_and_Games_5.json.zip",
              "LocalPath": "/opt/ml/processing/input",
              "S3DataType": "S3Prefix",
              "S3InputMode": "File",
              "S3DataDistributionType": "FullyReplicated",
              "S3CompressionType": "None"
            }
          },
          {
            "InputName": "code",
            "AppManaged": false,
            "S3Input": {
              "S3Uri": "s3://udacity-landingzone/lesson3-stepfunction/script/HelloBlazePreprocess.py",
              "LocalPath": "/opt/ml/processing/input/script",
              "S3DataType": "S3Prefix",
              "S3InputMode": "File",
              "S3DataDistributionType": "FullyReplicated",
              "S3CompressionType": "None"
            }
          }
        ],
        "ProcessingOutputConfig": {
          "Outputs": [
            {
              "OutputName": "train_data",
              "AppManaged": false,
              "S3Output": {
                "S3Uri": "s3://udacity-landingzone/lesson3-stepfunction/output/Toys_and_Games_5.json.zip_train",
                "LocalPath": "/opt/ml/processing/output/train",
                "S3UploadMode": "EndOfJob"
              }
            },
            {
              "OutputName": "test_data",
              "AppManaged": false,
              "S3Output": {
                "S3Uri": "s3://udacity-landingzone/lesson3-stepfunction/output/Toys_and_Games_5.json.zip_test",
                "LocalPath": "/opt/ml/processing/output/test",
                "S3UploadMode": "EndOfJob"
              }
            }
          ]
        },
        "AppSpecification": {
          "ImageUri": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3",
          "ContainerEntrypoint": [
            "python3",
            "/opt/ml/processing/input/script/HelloBlazePreprocess.py"
          ]
        },
        "RoleArn": "arn:aws:iam::002427974286:role/UdacitySageMakerStepFunctionExecutionRole",
        "ProcessingResources": {
          "ClusterConfig": {
            "InstanceCount": 1,
            "InstanceType": "ml.m5.large",
            "VolumeSizeInGB": 30
          }
        }
      },
      "Type": "Task",
      "Next": "SageMaker Training Step"
    },
    "SageMaker Training Step": {
      "Resource": "arn:aws:states:::sagemaker:createTrainingJob.sync",
      "Parameters": {
        "AlgorithmSpecification": {
          "TrainingImage": "811284229777.dkr.ecr.us-east-1.amazonaws.com/blazingtext:1",
          "TrainingInputMode": "File"
        },
        "OutputDataConfig": {
          "S3OutputPath": "s3://udacity-landingzone/lesson3-stepfunction/workflow/"
        },
        "StoppingCondition": {
          "MaxRuntimeInSeconds": 360000
        },
        "ResourceConfig": {
          "VolumeSizeInGB": 30,
          "InstanceCount": 1,
          "InstanceType": "ml.m5.large"
        },
        "RoleArn": "arn:aws:iam::002427974286:role/UdacitySageMakerStepFunctionExecutionRole",
        "InputDataConfig": [
          {
            "DataSource": {
              "S3DataSource": {
                "S3DataType": "S3Prefix",
                "S3Uri": "s3://udacity-landingzone/lesson3-stepfunction/output/Toys_and_Games_5.json.zip_train",
                "S3DataDistributionType": "FullyReplicated"
              }
            },
            "ContentType": "text/plain",
            "ChannelName": "train"
          },
          {
            "DataSource": {
              "S3DataSource": {
                "S3DataType": "S3Prefix",
                "S3Uri": "s3://udacity-landingzone/lesson3-stepfunction/output/Toys_and_Games_5.json.zip_test",
                "S3DataDistributionType": "FullyReplicated"
              }
            },
            "ContentType": "text/plain",
            "ChannelName": "validation"
          }
        ],
        "HyperParameters": {
          "mode": "supervised"
        },
        "TrainingJobName": "TrainingJob-%s",
        "DebugHookConfig": {
          "S3OutputPath": "s3://udacity-landingzone/lesson3-stepfunction/workflow/"
        }
      },
      "Type": "Task",
      "End": true
    }
  }
}
""" % (run_time, run_time)


with open("definition_payload.txt", "w") as file:
    file.write(f"""{{
        "definition": {definition}
    }}
    """)

In [28]:
%%writefile lambda_function.py
def lambda_handler(event, context):
    import boto3
    import time
    import json
    import uuid

 
    # Connecto to service
    client = boto3.client("stepfunctions")

    print("DEBUG event", event)
    definition = event["definition"]
    definition_json = json.dumps(definition)
    print("DEBUG definition", definition_json)
    
    execution_name = "LambdaExecution-" + str(uuid.uuid4())

    # Update existing state machine
    state_machine_arn = "arn:aws:states:us-east-1:002427974286:stateMachine:workflow-stepfunction-processing"
    
    try:
        client.update_state_machine(definition=definition, stateMachineArn=state_machine_arn) 
    except:
        client.update_state_machine(definition=definition_json, stateMachineArn=state_machine_arn) 
    
    # Give AWS time to register the defintion
    time.sleep(5)
    
    # Start the execution
    client.start_execution(input='{}', name=execution_name, stateMachineArn=state_machine_arn) 
    
    return {
        'statusCode': 200,
        'body': 'The step function has successfully launched!'
    }


Overwriting lambda_function.py


In [29]:
!sudo apt install zip

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
zip is already the newest version (3.0-12build2).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.


In [30]:
!zip lambda_function.zip lambda_function.py

updating: lambda_function.py (deflated 44%)


## Exercise: Launch the Lambda Function


Launch the lambda function and confirm the step function is created successfully.

In [31]:
import boto3
import json

lambda_client = boto3.client("lambda")

role_arn = "arn:aws:iam::002427974286:role/UdacityLambdaS3FullAccess"
function_name = "SepFunctionExecutor"

# Check if the Lambda function already exists
def lambda_exists(function_name):
    try:
        lambda_client.get_function(FunctionName=function_name)
        return True
    except lambda_client.exceptions.ResourceNotFoundException:
        return False

# Invoke the Lambda function
def invoke_lambda(function_name, payload):
    try:
        response = lambda_client.invoke(
            FunctionName=function_name,
            InvocationType='Event',  # Change to 'Event' if you want async execution
            Payload=json.dumps(payload)
        )
        # Read and process the response from the Lambda function
        response_payload = response['Payload'].read()
        print(f"Lambda function {function_name} invoked successfully.")
        print(f"Response: {response_payload}")
    except Exception as e:
        print(f"Error invoking Lambda function: {e}")

# Create Lambda function
def create_lambda():
    try:
        response = lambda_client.create_function(
            FunctionName=function_name,
            Runtime='python3.8',  # You can change the runtime
            Role=role_arn,  # The role that Lambda assumes when it executes the function
            Handler='lambda_function.lambda_handler',  # This refers to the function in the code
            Code={
                'ZipFile': open('lambda_function.zip', 'rb').read(),
            },
            Description='A simple hello world Lambda function',
            Timeout=15,  # Time the function can run (in seconds)
            MemorySize=128,  # Memory size (in MB)
            Publish=True,  # Whether to publish the function right away
        )

        print(f"Lambda function {function_name} created successfully")
        print(response)
    except Exception as e:
        print(f"Error creating Lambda function: {e}")

# Main flow
if lambda_exists(function_name):
    print(f"Lambda function {function_name} already exists. Invoking the function...")
    # Example payload to send to the Lambda function
    payload = {
        "definition": definition
    }
    invoke_lambda(function_name, payload)
else:
    print(f"Lambda function {function_name} does not exist. Creating the function...")
    create_lambda()


Lambda function SepFunctionExecutor already exists. Invoking the function...
Lambda function SepFunctionExecutor invoked successfully.
Response: b''


In [32]:
print(definition[:300])

{
  "StartAt": "SageMaker pre-processing step 4",
  "States": {
    "SageMaker pre-processing step 4": {
      "Resource": "arn:aws:states:::sagemaker:createProcessingJob.sync",
      "Parameters": {
        "ProcessingJobName": "PreprocessingJob-1728147394",
        "ProcessingInputs": [
          


In [40]:
!aws lambda invoke --function-name SepFunctionExecutor --payload file://definition_payload.txt response.json --cli-binary-format raw-in-base64-out

{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}


## Conceptual Exercise: What are next steps? 

Right now, the Step Function that we made in the prior exercise has a hard-coded location of the dataset we input, as well as all of the locations of the intermediary steps. What are ways that you could modify the Step Function to make it more generalizable? If you could input an S3 location, how could you integrate it with Lambda so that it could asychronously be called? 