AWS Batch memory auto-scaling with Step Functions

This is a sample implementation for running an AWS Batch job that automatically scales up required memory using the Serverless framework.s To achieve this, a simple AWS Step Function state machine is used to orchestrate the job running process. An AWS Lambda is used to process failed jobs. When the job failure is due to a Memory Error , it then creates a job specification with more memory available and the job is executed again.

All components are serverless keeping the cost down for some scenarios like Data Science pipelines and ETL/ELT pipelines.

This example also show how to run multiple jobs on different schedules. This relies on AWS EventBridge Rules, so keep in mind currently there is a limit of 300 Rules per Bus (which can be increased).

Here is a representation of the state machine:

In addition to Step Function and Lambdas, other CloudFormation resources are created as part of this implementation (under resources.yml):

Step Functions IAM Role
Batch Compute Environment
Batch Job Queue
Batch Job Definition

Getting started

Install the serverless framework
Install the required serverless plugins:

> serverless plugin install -n serverless-step-functions
✔ Plugin "serverless-step-functions" installed  (17s)

Choose a subnet and security group, then deploy a new stack:

> sls deploy --param="subnet_id=subnet-12345678" --param="security_group_id=sg-12345678"
Running "serverless" from node_modules

Deploying memory-autoscaling-example to stage dev (us-east-1)
✓ State machine "JobRunner" definition is valid

✔ Service deployed to stack memory-autoscaling-example-dev (162s)

functions:
  processFailedBatchJob: memory-autoscaling-example-dev-processFailedBatchJob (137 kB)

Retrieve the ARN of the state machine and start a manual state machine execution (can also be done from AWS Management Console):

> aws stepfunctions start-execution --state-machine-arn arn:aws:states:us-east-1:ACCOUNT_ID:stateMachine:JobRunnerStepFunctionsStateMachine-eB75gE2choxG --input "{
  \"job_name\": \"bonjour\",
  \"job_vcpus\": \"0.25\",
  \"job_memory\": \"512\",
  \"job_command\": [
    \"echo\",
    \"bonjour\"
  ]
}"
{
    "executionArn": "arn:aws:states:us-east-1:ACCOUNT_ID:execution:JobRunnerStepFunctionsStateMachine-eB75gE2choxG:6bb5676c-4fe0-4754-8eac-6cca606d1476",
    "startDate": 1702081624.749
}

Add scheduled jobs

To schedule a new job to run automatically you need to add a new object to the end of the file jobs.yml with these required parameters:

  - schedule:
      rate: rate(1 hour)
      input:
        job_name: the-aws-batch-job-name
        job_vcpus: "0.25"
        job_memory: "512"
        job_command: ["echo", "howdy"]

After that you just need to re-deploy the changes using serverless.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
lambdas		lambdas
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
jobs.yml		jobs.yml
resources.yml		resources.yml
serverless.yml		serverless.yml
stepfunctions_graph.png		stepfunctions_graph.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lambdas

lambdas

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

jobs.yml

jobs.yml

resources.yml

resources.yml

serverless.yml

serverless.yml

stepfunctions_graph.png

stepfunctions_graph.png

Repository files navigation

AWS Batch memory auto-scaling with Step Functions

Getting started

Add scheduled jobs

About

Releases

Packages

Languages

License

ivansabik/aws-step-functions-batch-memory-auto-scaling-example

Folders and files

Latest commit

History

Repository files navigation

AWS Batch memory auto-scaling with Step Functions

Getting started

Add scheduled jobs

About

Topics

Resources

License

Stars

Watchers

Forks

Languages