# Deploy ESM Embeddings Server on on Amazon SageMaker

Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: MIT-0

---
## 1. Setup

### 1.1. Create clients

In [1]:
import boto3
import sagemaker
import numpy

boto_session = boto3.session.Session()
sagemaker_session = sagemaker.session.Session(boto_session)
s3 = boto_session.resource("s3")
region = boto_session.region_name
role = sagemaker.get_execution_role()

sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/elxsj/Library/Application Support/sagemaker/config.yaml


### 1.2. Build BioNeMo-Inference Container Image

If you don't already have access to the BioNeMo-SageMaker container image, run the following cell to build and deploy it to your AWS account. Take note of the image URI - you'll use it for the processing and training steps below.

Here is an example shell script you can use in your environment (including SageMaker Notebook Instances) to build the container.

Once you have built and pushed the container, we strongly recommend using [ECR image scanning](https://docs.aws.amazon.com/AmazonECR/latest/userguide/image-scanning.html) to ensure that it meets your security requirements.

In [8]:
%%bash

set -e

# The name of our algorithm
# algorithm_name=bionemo-inference

REPOSITORY_NAME=bionemo-inference
DOCKERIMAGE_TAG='latest'
AWS_REGION='us-east-1'
DOCKERIMAGE_FULLNAME='740593519315.dkr.ecr.us-east-1.amazonaws.com/bionemo-inference'
# fullname="${account}.dkr.ecr.${AWS_REGION}.amazonaws.com/${REPOSITORY_NAME}:latest"


account=$(aws sts get-caller-identity --query Account --output text)

pushd container/inference


# Create ECR repository
if ! aws ecr describe-repositories --repository-names "${REPOSITORY_NAME}" > /dev/null 2>&1; then
    aws ecr create-repository --repository-name "${REPOSITORY_NAME}" > /dev/null
fi

aws ecr get-login-password --region "$AWS_REGION" | docker login --username AWS --password-stdin "$DOCKERIMAGE_FULLNAME"

## Build the docker image locally
docker build -t ${REPOSITORY_NAME}:${DOCKERIMAGE_TAG} .

## Remove dangling images
docker image prune -f

## Tag and then push to ECR
docker tag ${REPOSITORY_NAME}:${DOCKERIMAGE_TAG} ${DOCKERIMAGE_FULLNAME}:${DOCKERIMAGE_TAG}
docker push ${DOCKERIMAGE_FULLNAME}:${DOCKERIMAGE_TAG}
echo " ----- Docker image pushed to ECR ----- $DOCKERIMAGE_FULLNAME:$DOCKERIMAGE_TAG"

popd

python(30567) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.


~/Documents/GitHub/ml_deployments/amazon-sagemaker-with-nvidia-bionemo/container/inference ~/Documents/GitHub/ml_deployments/amazon-sagemaker-with-nvidia-bionemo
Login Succeeded


#0 building with "desktop-linux" instance using docker driver

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 1.39kB done
#1 DONE 0.0s

#2 [internal] load metadata for nvcr.io/nvidia/clara/bionemo-framework:1.5
#2 DONE 0.6s

#3 [internal] load .dockerignore
#3 transferring context: 2B done
#3 DONE 0.0s

#4 [internal] load build context
#4 DONE 0.0s

#5 [1/7] FROM nvcr.io/nvidia/clara/bionemo-framework:1.5@sha256:5bb99911dc6afe72b33b7d5231101d6163b574e6cb296c117217bb4bcf18ee33
#5 DONE 0.0s

#4 [internal] load build context
#4 transferring context: 264B done
#4 DONE 0.0s

#6 [2/7] COPY serve .
#6 CACHED

#7 [3/7] COPY inference.py .
#7 CACHED

#8 [4/7] COPY wsgi.py .
#8 CACHED

#9 [5/7] COPY nginx.conf .
#9 CACHED

#10 [6/7] RUN apt-get update && apt-get upgrade -y && apt-get clean     && apt-get -y install --no-install-recommends     build-essential     ca-certificates     curl     nginx     && rm -rf /var/lib/apt/lists/*     && pip3 --no-cache-dir insta

Total reclaimed space: 0B
The push refers to repository [740593519315.dkr.ecr.us-east-1.amazonaws.com/bionemo-inference]
5f70bf18a086: Preparing
6fbd4d9d9fdc: Preparing
27601985adcc: Preparing
249bef20c377: Preparing
8afa51d243b2: Preparing
0036d13c083d: Preparing
5da3ef4ffebc: Preparing
5f70bf18a086: Preparing
d612239603f9: Preparing
7eb91e371782: Preparing
9cf978b7a7e4: Preparing
89febeb32383: Preparing
0036d13c083d: Waiting
5da3ef4ffebc: Waiting
d612239603f9: Waiting
7eb91e371782: Waiting
9cf978b7a7e4: Waiting
4a4b5d167020: Preparing
627736012a26: Preparing
89febeb32383: Waiting
d56a5355cae8: Preparing
4a4b5d167020: Waiting
627736012a26: Waiting
5f70bf18a086: Preparing
06265cfd0604: Preparing
f4c88516d1e2: Preparing
e6575c105b9f: Preparing
06265cfd0604: Waiting
f4c88516d1e2: Waiting
e6575c105b9f: Waiting
d56a5355cae8: Waiting
31d366a28156: Preparing
1a77a4de4b14: Preparing
a4091ddf609d: Preparing
9d19538988d0: Preparing
26147de970c0: Preparing
d0fbe2990a05: Preparing
367feb0a1036: P

---
## 2. Deploy Real-Time Inference Endpoint

### 2.1. Create esm1nv model

In [None]:
from sagemaker.model import Model

# Replace this with your ECR repository URI from above
BIONEMO_IMAGE_URI = (
    "<ACCOUNT ID>.dkr.ecr.<REGION>.amazonaws.com/bionemo-inference:latest"
)

esm_embeddings = Model(
    image_uri=BIONEMO_IMAGE_URI,
    name="esm-embeddings",
    model_data=None,
    role=role,
    predictor_cls=sagemaker.predictor.Predictor,
    sagemaker_session=sagemaker_session,
    env={
        "SM_SECRET_NAME": "NVIDIA_NGC_CREDS",
        "MODEL_NAME": "esm1nv"
    }
)

### 2.2. Deploy model to SageMaker endpoint

In [None]:
esm_embeddings_predictor = esm_embeddings.deploy(
    initial_instance_count=1,
    instance_type='ml.g5.xlarge',
    serializer = sagemaker.base_serializers.CSVSerializer(),
    deserializer = sagemaker.base_deserializers.NumpyDeserializer()
)

### 2.3. Test model

In [None]:
esm_embeddings_predictor.predict("MSLKRKNIALIPAAGIGVRFGADKPKQYVEIGSKTVLEHVL,MIQSQINRNIRLDLADAILLSKAKKDLSFAEIADGTGLA")