# Lab1 (deployment): Provision resources for running Progen2 model on AWS Batch

This notebook will guide you through setting up AWS Batch infrastructure to run Progen2 jobs with parameters. We'll create all necessary resources step by step.

#### Prerequisites
- Progen2 docker image pushed to ECR
- IAM roels configured with appropriate permissions


## Step 1: Setup and Configuration

First, let's get our AWS account information and set up variables we'll use throughout the notebook.

In [11]:
import boto3
import json
import time
import datetime
from utils.iam_helper import IamHelper

##########################################################

# Get AWS account information
sts_client = boto3.client('sts')
account_id = sts_client.get_caller_identity()['Account']
region = boto3.Session().region_name

# Define S3 bucket and folder names
S3_BUCKET = f'workshop-data-{account_id}'
LAB1_FOLDER = 'lab1-progen'
LAB2_FOLDER = 'lab2-amplify'
LAB3_FOLDER = 'lab3-esmfold'

print(f"Account ID: {account_id}")
print(f"Region: {region}")
print(f"S3 Bucket: {S3_BUCKET}")

##########################################################

# Define URI of the progen2 image regitered in ECR
ECR_IMAGE_URI = f"{account_id}.dkr.ecr.{region}.amazonaws.com/models/progen2:latest"                 
print(f"ECR Image URI: {ECR_IMAGE_URI}")

# Retrieve ARNs of IAM roles required for provisioning Batch resources
iam_helper = IamHelper()
batch_service_role_arn = iam_helper.find_role_arn_by_pattern('BatchServiceRole')
instance_profile_arn = f"arn:aws:iam::{account_id}:instance-profile/EcsInstanceProfile"
job_role_arn = iam_helper.find_role_arn_by_pattern('BatchJobRole')

print()
print(f"BatchServiceRole ARN: {batch_service_role_arn}")
print(f"Instance Profile ARN: {instance_profile_arn}")
print(f"BatchJobRole ARN : {job_role_arn}")


Account ID: 973884802842
Region: us-east-1
S3 Bucket: workshop-data-973884802842
ECR Image URI: 973884802842.dkr.ecr.us-east-1.amazonaws.com/models/progen2:latest

BatchServiceRole ARN: arn:aws:iam::973884802842:role/main-BatchServiceRole-nIjzojJyYSL9
Instance Profile ARN: arn:aws:iam::973884802842:instance-profile/EcsInstanceProfile
BatchJobRole ARN : arn:aws:iam::973884802842:role/main-BatchJobRole-o4hwBWbniQnH


In [4]:
# cretae S3 bucket and local data folders
!aws s3 mb s3://$S3_BUCKET
!mkdir -p data/$LAB1_FOLDER
!mkdir -p data/$LAB2_FOLDER
!mkdir -p data/$LAB3_FOLDER

make_bucket: workshop-data-973884802842


## Step 2: Get VPC Information

We need to identify the VPC and subnets where our Batch compute environment will run. We'll use the default VPC for simplicity.

In [12]:
ec2_client = boto3.client('ec2')

# Get default VPC
vpcs = ec2_client.describe_vpcs(Filters=[{'Name': 'isDefault', 'Values': ['true']}])
if not vpcs['Vpcs']:
    raise Exception("No default VPC found. Please create one or specify a custom VPC.")

default_vpc_id = vpcs['Vpcs'][0]['VpcId']
print(f"Default VPC ID: {default_vpc_id}")

# Get subnets in default VPC
subnets = ec2_client.describe_subnets(Filters=[{'Name': 'vpc-id', 'Values': [default_vpc_id]}])
subnet_ids = [subnet['SubnetId'] for subnet in subnets['Subnets']]

print(f"Found {len(subnet_ids)} subnets:")
for subnet_id in subnet_ids:
    print(f"  - {subnet_id}")

if not subnet_ids:
    raise Exception("No subnets found in default VPC")

Default VPC ID: vpc-05e46f49bf64c4ee8
Found 6 subnets:
  - subnet-08b80f0d6d2f3f636
  - subnet-06c2d73394f3924d3
  - subnet-02eb39e09662d67c8
  - subnet-0e7d04b9ceced455c
  - subnet-09d5659b96d84625b
  - subnet-09a7d52122326c3dc


## Step 3: Create Batch Resources

TODO - [UPDATE] Batch is an AWS resource that allows you to run jobs on demand

### Step 3.1: Create Batch Compute Environment

The compute environment defines the compute resources (EC2 instances) that will run Progen2 jobs. We'll create a managed compute environment that automatically scales based on job demand.

In [13]:
batch_client = boto3.client('batch')

# Get default security group for the VPC
security_groups = ec2_client.describe_security_groups(
    Filters=[
        {'Name': 'vpc-id', 'Values': [default_vpc_id]},
        {'Name': 'group-name', 'Values': ['default']}
    ]
)
default_sg_id = security_groups['SecurityGroups'][0]['GroupId']
print(f"Default Security Group ID: {default_sg_id}")

# Create compute environment
compute_env_name = 'progen2-batch-compute-env'
try:
    response = batch_client.create_compute_environment(
        computeEnvironmentName=compute_env_name,
        type='MANAGED',
        state='ENABLED',
        computeResources={
            'type': 'EC2',
            'minvCpus': 0,
            'maxvCpus': 50,
            'desiredvCpus': 0,
            'instanceTypes': ['c6i.xlarge'],
            'subnets': subnet_ids,
            'securityGroupIds': [default_sg_id],  
            'instanceRole': instance_profile_arn,
            'tags': {
                'Name': 'BatchComputeEnvironment',
                'Purpose': 'Inference'
            }
        },
        serviceRole=batch_service_role_arn
    )
    
    print(f"Created compute environment: {response['computeEnvironmentName']}")
    print(f"   ARN: {response['computeEnvironmentArn']}")
    
except batch_client.exceptions.ClientException as e:
    if 'already exists' in str(e):
        print(f"Compute environment {compute_env_name} already exists")
    else:
        raise e

compute_env_arn = f"arn:aws:batch:{region}:{account_id}:compute-environment/{compute_env_name}"

Default Security Group ID: sg-0c7252e70a23ed5ee
Created compute environment: progen2-batch-compute-env
   ARN: arn:aws:batch:us-east-1:973884802842:compute-environment/progen2-batch-compute-env


### Step 3.2: Create Batch Job Queue

The job queue connects jobs to compute environments. Jobs submitted to this queue will run on the compute environment we just created.

In [14]:
job_queue_name = 'progen2-batch-job-queue'

try:
    response = batch_client.create_job_queue(
        jobQueueName=job_queue_name,
        state='ENABLED',
        priority=1,
        computeEnvironmentOrder=[
            {
                'order': 1,
                'computeEnvironment': compute_env_name
            }
        ]
    )
    
    print(f"Created job queue: {response['jobQueueName']}")
    print(f"   ARN: {response['jobQueueArn']}")
    
except batch_client.exceptions.ClientException as e:
    if 'already exists' in str(e):
        print(f"Job queue {job_queue_name} already exists")
    else:
        raise e

Created job queue: progen2-batch-job-queue
   ARN: arn:aws:batch:us-east-1:973884802842:job-queue/progen2-batch-job-queue


### Step 3.3: Create Batch Job Definition

The job definition specifies how progen2 jobs should run, including:
- Which container image to use
- Resource requirements (CPU, memory)
- Parameter placeholders
- Environment variables

Note the `Ref::` syntax for parameters - these will be replaced with actual values when you submit jobs.

In [15]:
job_definition_name = 'progen2-job-definition'

try:
    response = batch_client.register_job_definition(
        jobDefinitionName=job_definition_name,
        type='container',
        containerProperties={
            'image': ECR_IMAGE_URI,
            'vcpus': 2,
            'memory': 4096,
            'command': [
                'Ref::hfModelId',
                'Ref::s3InputParamsPath',
                'Ref::batchId',
                'Ref::batchSize',
                'Ref::batchNumber',
                'Ref::s3OutputPath'
            ],
            'environment': [
                {'name': 'AWS_DEFAULT_REGION', 'value': region},
                {'name': 'S3_BUCKET', 'value': S3_BUCKET}
            ],
            'jobRoleArn': job_role_arn
        },
        retryStrategy={'attempts': 2},
        timeout={'attemptDurationSeconds': 3600}  # 1 hour timeout
    )
    
    print(f"Created job definition: {response['jobDefinitionName']}")
    print(f"   ARN: {response['jobDefinitionArn']}")
    print(f"   Revision: {response['revision']}")
    
except Exception as e:
    print(f"Error creating job definition: {e}")

job_definition_arn = f"arn:aws:batch:{region}:{account_id}:job-definition/{job_definition_name}:1"

Created job definition: progen2-job-definition
   ARN: arn:aws:batch:us-east-1:973884802842:job-definition/progen2-job-definition:1
   Revision: 1
