Skip to content

suzhoum/aws-batch-with-aws-cdk

 
 

Repository files navigation

Benchmarking with AWS Batch and CDK

Welcome

This repository demostrates provisioning the necessary infrastructure for running a benchmarking job on AWS Batch using Cloud Development Kit (CDK). The AWS Batch job reads image training data from an internal dataset package from S3 bucket, runs training on AutoGluon's MultiModalPredictor, and makes prediction on the test set. Code can be easily modified to fit other benchmarking job you might want to perform.

Pre-requisites

  1. Create and source a Python virtualenv on MacOS and Linux, and install python dependencies:
$ python3 -m venv .env
$ source .env/bin/activate
$ pip install -r requirements.txt
  1. Install the latest version of the AWS CDK CLI:
$ npm i -g aws-cdk

Usage

Current code creates the AWS Batch infrastructure, permits the pre-existing S3 Bucket for batch jobs to read the data from, and creates a DynamoDB table to write the batch benchmarking results. Once the infrastructure is provisioned trough AWS CDK, you can go to the created AWS Lambda and submit a job. This will trigger job executions on AWS Batch and you should see the results in the created DynamoDB table.

To deploy and run the batch inference, follow the following steps:

  1. Make sure you have AWS CDK installed and working, all the dependencies of this project defiend in the requirements.txt file, as well as having an installed and configured Docker in your environment;
  2. Set the CDK_DEPLOY_ACCOUNT ENV variable to the name of the AWS account you want to use (pre-defined with AWS CLI);
  3. Set the CDK_DEPLOY_REGION ENV variable to the name of the region you want to deploy the infra in (e.g. 'us-west-2');
  4. Make sure you are logged on to ECR in order to pull the DLC images
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com
  1. Have all the permissions set up in case you need to download dependencies from a private source inside Dockerfile;
  2. Run cdk deploy in the root of this project and wait for the deployment to finish successfully;
  3. Increase the shared memory of container manually (issue) in the ./cdk.out/automm-cv-bench-batch-job-stack.template.json file
"LinuxParameters": { "SharedMemorySize": 20000 },
  1. Deploy again with CloudFormation
aws cloudformation deploy \
--template-file PATH_TO/automm-cv-bench-batch-job-stack.template.json \
--stack-name automm-cv-bench-batch-job-stack \
--capabilities CAPABILITY_NAMED_IAM
  1. Go to the created AWS Lambda and execute the lambda function with the following JSON under the Test tab.

A sample HPO job:

{
    "datasets": [
        "bayer",
        "belgalogos"
    ],
    "lr_range": "0.00005,0.005",
    "models": "timm_image",
    "timm_chkpts": "swin_base_patch4_window7_224,convnext_base_in22ft1k",
    "num_trials": 2,
    "searcher": "bayes",
    "scheduler": "ASHA",
    "hpo": "true"
}

A sample non-HPO job:

{
    "datasets": [
        "bayer",
        "belgalogos"
    ],
    "max_epochs": [
        5, 10
    ],
    "per_gpu_batch_size": [
        32
    ]
}
  1. In the AWS console, go to AWS batch and make sure the jobs are submitted and are running successfully;

  2. Open the created DynamoDB table (in our example automm-cv-bench-dynamodb-table) and validate the results are there. A sample record looks like:

    dataset batch_size is_hpo max_epoch score training_time updated_at
    bayer 32 true 10 0.964 46m42s 2022-10-31 17:44:40
  3. You can now use a DynamoDB client to read and consume the results.

About

Feature-extraction on images using AWS Batch and Cloud Development Kit (CDK).

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.9%
  • Dockerfile 4.0%
  • Shell 2.1%