# Lab2 (deployemnt): Provision resources for running Amplify model on AWS Sagemaker

This notebook will guide you through setting up AWS Sagemaker infrastructure to run inference on the Amplify model. We'll create all necessary resources step by step.



## Step 1: Setup and Configuration

First, let's get our AWS account information and set up variables we'll use throughout the notebook.

In [15]:
import boto3
from sagemaker import get_execution_role, Session, image_uris
from sagemaker.pytorch import PyTorchModel
import numpy as np


##########################################################

# Get AWS account information
sts_client = boto3.client('sts')
account_id = sts_client.get_caller_identity()['Account']
region = boto3.Session().region_name

# Define S3 bucket and folder names
S3_BUCKET = f'workshop-data-{account_id}'
LAB1_FOLDER = 'lab1-progen'
LAB2_FOLDER = 'lab2-amplify'
LAB3_FOLDER = 'lab3-esmfold'

print(f"Account ID: {account_id}")
print(f"Region: {region}")
print(f"S3 Bucket: {S3_BUCKET}")

##########################################################


Account ID: 973884802842
Region: us-east-1
S3 Bucket: workshop-data-973884802842


## Step 2: Define variables and resources required for deployment of the Amplify model on SageMaker

### Step 2.1: Define the version of the Amplify model, instance type, and execution role

In [16]:
# Define model
model_version = 'AMPLIFY_120M'
model_id = f'chandar-lab/{model_version}'

# Define instance type
instance_type = 'ml.c6i.xlarge'

# Retrieve execution role
execution_role = get_execution_role()

print(f"Model: {model_id}")
print(f"EC2 Instance Type: {instance_type}") 
print(f"Role: {execution_role}")

Model: chandar-lab/AMPLIFY_120M
EC2 Instance Type: ml.c6i.xlarge
Role: arn:aws:iam::973884802842:role/sagemaker-domain-SageMakerExecutionRole-90tUww4TYQqS


### Step 2.2 Retreive the URI of a base container image
This base image will be used by SageMaker for inference

In [17]:
image = image_uris.retrieve(
    region=region, 
    framework='pytorch', 
    image_scope='inference', 
    version='2.3', 
    base_framework_version='pytorch2.0.0', 
    instance_type=instance_type)

print(f'Image URI: {image}')

Image URI: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.3-cpu-py311


### Step 2.2: Create `.tar.gz` model artifact 
[TODO] - Define ...

In [18]:
model_data_location = f"s3://{S3_BUCKET}/{LAB2_FOLDER}/model_data.tar.gz"

!mkdir data/$LAB2_FOLDER/model_data
!touch data/$LAB2_FOLDER/model_data/data.txt  
!tar -czvf data/$LAB2_FOLDER/model_data.tar.gz data/$LAB2_FOLDER/model_data/  
!aws s3 cp data/$LAB2_FOLDER/model_data.tar.gz $model_data_location

mkdir: cannot create directory ‘data/lab2-amplify/model_data’: File exists
data/lab2-amplify/model_data/
data/lab2-amplify/model_data/data.txt
upload: data/lab2-amplify/model_data.tar.gz to s3://workshop-data-973884802842/lab2-amplify/model_data.tar.gz


### Step 2.3: Create Custom Inference Script and requirements

The custom inference script has 4 methods defining how to:
* ```model_fn``` - load the model and tokenizer
* ```input_fn``` - pre-processes the input data
* ```predict_fn``` - tokenize the input protein sequence and run inference on the model
* ```output_fn``` - post-processes the output, returning the model's predictions

The custom script file and requirements should be stored locally, and will used in the deployment process

In [19]:
!mkdir data/$LAB2_FOLDER/code

mkdir: cannot create directory ‘data/lab2-amplify/code’: File exists


In [20]:
%%writefile data/$LAB2_FOLDER/code/inference.py
import logging
import os
import json  
from transformers import AutoModel, AutoTokenizer
import torch

logging.basicConfig(level=logging.INFO)

def model_fn(model_dir):
    logging.info("[custom] model_fn: Starting the model loading process...")
    try:
        model_id = os.getenv('AMPLIFY_MODEL_ID', 'chandar-lab/AMPLIFY_120M')
        logging.info(f"[custom] model_fn: Model id is {model_id}")

        model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
        logging.info(f"[custom] model_fn: Successfully loaded the model: {model}")

        tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
        logging.info(f"[custom] model_fn: Successfully loaded the tokenizer: {tokenizer}")

        device = "cuda" if torch.cuda.is_available() else "cpu"
        model = model.to(device)
        logging.info(f"[custom] model_fn: Moved model to {device} device")

        return model, tokenizer, device
    except Exception as e:        
        logging.error(f"[custom] model_fn: Error occurred while loading the model and tokenizer: {str(e)}", exc_info=True)
        raise e  

def input_fn(request_body, content_type='application/json'):
    logging.info("input_fn: Received input")
    if content_type == 'application/json':
        input_data = json.loads(request_body)  
        sequence = input_data['sequence']
        return sequence
    else:
        raise ValueError(f"Unsupported content type: {content_type}")

def predict_fn(input_data, model_artifacts):
    logging.info("predict_fn: Running inference")
    model, tokenizer, device = model_artifacts

    inputs = tokenizer.encode(input_data, return_tensors="pt")

    inputs = inputs.to(device)

    with torch.no_grad():
        output = model(inputs)

    return output

def output_fn(prediction, accept='application/json'):
    logging.info("output_fn: Formatting output")
    if accept == 'application/json':
        if hasattr(prediction, 'logits'):
            output = prediction.logits
        else:
            raise ValueError(f"Unknown prediction format: {type(prediction)}")

        return json.dumps({"output": output.tolist()}), accept
    else:
        raise ValueError(f"Unsupported accept type: {accept}")


Overwriting data/lab2-amplify/code/inference.py


In [21]:
%%writefile data/$LAB2_FOLDER/code/requirements.txt
transformers==4.37.0
xformers==0.0.28.post1
sentencepiece==0.2.0

Overwriting data/lab2-amplify/code/requirements.txt


## Step 4. Deploy and test the Amplify model

### Step 4.1: Deploy the Amplify model with PyTorchModel and the Custom Inference Script 

In [22]:
# Define SageMaker endpoint name
base_name = model_id.split("/")[-1].lower().replace("_", "-").replace(".", "-")
endpoint_name = f"{base_name}-endpoint"

print(f'Endpoint name: {endpoint_name}')

# Create endpoint definition
sm_model = PyTorchModel(
    model_data=model_data_location,
    role=execution_role,
    image_uri=image,
    source_dir=f'data/{LAB2_FOLDER}/code/',
    entry_point="inference.py",
    env={
        'AMPLIFY_MODEL_ID': model_id
    }
)

# Deploy the model
predictor = sm_model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    endpoint_name=endpoint_name
)

Endpoint name: amplify-120m-endpoint


### Step 4.2 Test the depoyed model

In [11]:
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

predictor.serializer = JSONSerializer()  
predictor.deserializer = JSONDeserializer()  

sequence = "MSVVGIDLGFQSCYVAVARAGGIETIANEYSDRCTPACISFGPKNR"

result = predictor.predict({"sequence": sequence})

print(f"Logits: {result}")

Logits: {'output': [[[3.6029605865478516, 3.604310989379883, 3.603405714035034, 3.604494571685791, 3.603926420211792, 9.026841163635254, 21.575468063354492, 23.35076141357422, 22.799560546875, 22.219818115234375, 22.1911678314209, 22.642969131469727, 21.981033325195312, 22.41286849975586, 21.51314926147461, 22.321937561035156, 21.817138671875, 21.944259643554688, 21.615201950073242, 21.540863037109375, 21.17413330078125, 22.455562591552734, 21.91006851196289, 21.293737411499023, 19.359338760375977, 20.684070587158203, 3.6044747829437256], [6.29001522064209, 6.290866374969482, 6.290349960327148, 6.290815353393555, 6.290607452392578, 10.12349796295166, 20.438907623291016, 20.94894027709961, 20.45217514038086, 21.437747955322266, 20.80021095275879, 20.44017791748047, 20.083988189697266, 20.67687225341797, 20.081541061401367, 20.451784133911133, 20.06969451904297, 19.504045486450195, 20.253231048583984, 19.706085205078125, 19.989582061767578, 19.157560348510742, 29.237415313720703, 18.6237

## Step 5: Create and test the second SageMaker inference endpoint that will support embeddings

### Step 5.1 Modify Custom Inference Script

* ```input_fn``` - will accept additional parameter *mode*, which can take values either *logits* or *embeddings*
* ```predict_fn``` - will pass additional argument *output_hidden_states=True* to model if *mode* value is *embeddings*
* ```output_fn``` - will extract hidden_states from the output if  *mode* value is *embeddings*

In [13]:
%%writefile data/$LAB2_FOLDER/code/inference-embeddings.py

import logging
import os
import json  
from transformers import AutoModel, AutoTokenizer
import torch

logging.basicConfig(level=logging.INFO)

def model_fn(model_dir):
    logging.info("[custom] model_fn: Starting the model loading process...")

    try:
        model_id = os.getenv('AMPLIFY_MODEL_ID', 'chandar-lab/AMPLIFY_120M')
        logging.info(f"[custom] model_fn: Model id is {model_id}")

        model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
        logging.info(f"[custom] model_fn: Successfully loaded the model: {model}")

        tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
        logging.info(f"[custom] model_fn: Successfully loaded the tokenizer: {tokenizer}")

        device = "cuda" if torch.cuda.is_available() else "cpu"
        model = model.to(device)
        logging.info(f"[custom] model_fn: Moved model to {device} device")

        return model, tokenizer, device

    except Exception as e:        
        logging.error(f"[custom] model_fn: Error occurred while loading the model and tokenizer: {str(e)}", exc_info=True)
        raise e

def input_fn(request_body, content_type='application/json'):
    logging.info("input_fn: Received input")
    if content_type == 'application/json':
        input_data = json.loads(request_body)  
        sequence = input_data['sequence']
        mode = input_data.get('mode','logits')
        return sequence, mode
    else:
        raise ValueError(f"Unsupported content type: {content_type}")

def predict_fn(input_data, model_artifacts):
    logging.info("predict_fn: Running inference")
    sequence, mode = input_data
    model, tokenizer, device = model_artifacts
    
    inputs = tokenizer.encode(sequence, return_tensors="pt")
    inputs = inputs.to(device)

    with torch.no_grad():
        if mode == 'logits':
            output = model(inputs)
        elif mode == 'embeddings':
            output = model(inputs, output_hidden_states=True)
        else:
            raise ValueError(f"Unknown mode: {mode}")            

    return output, mode

def output_fn(prediction, accept='application/json'):
    logging.info("output_fn: Formatting output")
    output, mode = prediction
    
    if accept == 'application/json':
        if mode == 'logits':
            if hasattr(output, 'logits'):
                result = output.logits
            else:
                raise ValueError(f"Unknown prediction format: {type(output)}")
        elif mode == 'embeddings':
            if hasattr(output, 'hidden_states'):
                result = output.hidden_states[-1]
            else:
                raise ValueError(f"Unknown prediction format: {type(output)}")
        else:
            raise ValueError(f"Unknown mode: {mode}")
        return json.dumps({mode: result.tolist()}), accept
    else:
        raise ValueError(f"Unsupported accept type: {accept}")

Overwriting data/lab2-amplify/code/inference-embeddings.py


### Step 5.2: Deploy the Amplify model with modified Custom Inference Script

In [14]:
# Define SageMaker endpoint name
base_name = model_id.split("/")[-1].lower().replace("_", "-").replace(".", "-")
endpoint_name_embeddings = f"{base_name}-endpoint-embeddings"

print(f'Endpoint name: {endpoint_name_embeddings}')

# Create endpoint definition
sm_model_embeddings = PyTorchModel(
    model_data=model_data_location,
    role=execution_role,
    image_uri=image,
    source_dir=f'data/{LAB2_FOLDER}/code/',
    entry_point="inference-embeddings.py",
    env={
        'AMPLIFY_MODEL_ID': model_id
    }
)

# Deploy the model
predictor_embeddings = sm_model_embeddings.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    endpoint_name=endpoint_name_embeddings
)

Endpoint name: amplify-120m-endpoint-embeddings
---------!

### Step 5.3: Test generation of emdeddings (and logits) using the second endpoint

In [None]:
predictor_embeddings.serializer = JSONSerializer()  
predictor_embeddings.deserializer = JSONDeserializer()  

sequence = "MSVVGIDLGFQSCYVAVARAGGIETIANEYSDRCTPACISFGPKNR"

# generate logits
result = predictor_embeddings.predict({"sequence": sequence, "mode": "logits"})
logits = np.array(result['logits'])

print(result)
print(f'Logits shape: {logits.shape}')
print()

# generate embeddings
result = predictor_embeddings.predict({"sequence": sequence, "mode": "embeddings"})
embeddings = np.array(result['embeddings'])
print(result)
print(f'Embeddings shape: {embeddings.shape}')



### [Optional] Delete the unused endpoint

In [None]:
# Remove endpoint
sagemaker = boto3.client('sagemaker')
try:
    sagemaker.delete_endpoint(EndpointName=endpoint_name)
    print(f"Successfully deleted endpoint: {endpoint_name}")
except Exception as e:
    print(f"Error deleting endpoint: {str(e)}")