# Lab2 (deployment): Provision resources for running the AMPLIFY model on Amazon SageMaker

This notebook will guide you through setting up Amazon SageMaker infrastructure to generate protein embeddings with the AMPLIFY model. We'll create all necessary resources step by step.



## Step 1: Setup and Configuration

First, let's get our AWS account information and set up variables we'll use throughout the notebook.

In [None]:
import boto3
from sagemaker import get_execution_role, Session, image_uris
from sagemaker.pytorch import PyTorchModel
import numpy as np


##########################################################

# Get AWS account information
sts_client = boto3.client('sts')
account_id = sts_client.get_caller_identity()['Account']
region = boto3.Session().region_name

# Define S3 bucket and folder names
S3_BUCKET = f'workshop-data-{account_id}'
LAB1_FOLDER = 'lab1-progen'
LAB2_FOLDER = 'lab2-amplify'
LAB3_FOLDER = 'lab3-esmfold'

print(f"Account ID: {account_id}")
print(f"Region: {region}")
print(f"S3 Bucket: {S3_BUCKET}")

##########################################################


## Step 2: Define the variables and resources required to deploy the Amplify model on Amazon SageMaker

### Step 2.1: Define the version of the Amplify model, instance type, and execution role

In [None]:
# Define model
model_version = 'AMPLIFY_120M'
model_id = f'chandar-lab/{model_version}'

# Define instance type
instance_type = 'ml.c6i.xlarge'

# Retrieve execution role
execution_role = get_execution_role()

print(f"Model: {model_id}")
print(f"EC2 Instance Type: {instance_type}") 
print(f"Role: {execution_role}")

### Step 2.2: Retrieve the URI of a base container image
This base image will be used by Amazon SageMaker for inference

In [None]:
image = image_uris.retrieve(
    region=region, 
    framework='pytorch', 
    image_scope='inference', 
    version='2.3', 
    base_framework_version='pytorch2.0.0', 
    instance_type=instance_type)

print(f'Image URI: {image}')

### Step 2.3: Create and upload model artifact
Package model files into a .tar.gz archive and upload to S3 for SageMaker endpoint deployment. In our case, the model artifact is 
empty since the model will be downloaded from Hugging Face during container startup. However, if you were training or fine-tuning the
model, this artifact would contain your custom model weights and configuration files.

In [None]:
model_data_location = f"s3://{S3_BUCKET}/{LAB2_FOLDER}/model_data.tar.gz"

!mkdir data/$LAB2_FOLDER/model_data
!touch data/$LAB2_FOLDER/model_data/data.txt  
!tar -czvf data/$LAB2_FOLDER/model_data.tar.gz data/$LAB2_FOLDER/model_data/  
!aws s3 cp data/$LAB2_FOLDER/model_data.tar.gz $model_data_location

### Step 2.4: Create Custom Inference Script and requirements

The custom inference script has 4 methods defining how to:
* ```model_fn``` - load the model and tokenizer
* ```input_fn``` - pre-processes the input data
* ```predict_fn``` - tokenize the input protein sequence and run inference on the model
* ```output_fn``` - post-processes the output, returning the model's predictions

The custom script file and requirements should be stored locally and will be used in the deployment process

In [None]:
!mkdir data/$LAB2_FOLDER/code

In [None]:
%%writefile data/$LAB2_FOLDER/code/inference-embeddings.py

import logging
import os
import json  
from transformers import AutoModel, AutoTokenizer
import torch

logging.basicConfig(level=logging.INFO)

def model_fn(model_dir):
    logging.info("[custom] model_fn: Starting the model loading process...")

    try:
        model_id = os.getenv('AMPLIFY_MODEL_ID', 'chandar-lab/AMPLIFY_120M')
        logging.info(f"[custom] model_fn: Model id is {model_id}")

        model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
        logging.info(f"[custom] model_fn: Successfully loaded the model: {model}")

        tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
        logging.info(f"[custom] model_fn: Successfully loaded the tokenizer: {tokenizer}")

        device = "cuda" if torch.cuda.is_available() else "cpu"
        model = model.to(device)
        logging.info(f"[custom] model_fn: Moved model to {device} device")

        return model, tokenizer, device

    except Exception as e:        
        logging.error(f"[custom] model_fn: Error occurred while loading the model and tokenizer: {str(e)}", exc_info=True)
        raise e

def input_fn(request_body, content_type='application/json'):
    logging.info("input_fn: Received input")
    if content_type == 'application/json':
        input_data = json.loads(request_body)  
        sequence = input_data['sequence']
        mode = input_data.get('mode','logits')
        return sequence, mode
    else:
        raise ValueError(f"Unsupported content type: {content_type}")

def predict_fn(input_data, model_artifacts):
    logging.info("predict_fn: Running inference")
    sequence, mode = input_data
    model, tokenizer, device = model_artifacts
    
    inputs = tokenizer.encode(sequence, return_tensors="pt")
    inputs = inputs.to(device)

    with torch.no_grad():
        if mode == 'logits':
            output = model(inputs)
        elif mode == 'embeddings':
            output = model(inputs, output_hidden_states=True)
        else:
            raise ValueError(f"Unknown mode: {mode}")            

    return output, mode

def output_fn(prediction, accept='application/json'):
    logging.info("output_fn: Formatting output")
    output, mode = prediction
    
    if accept == 'application/json':
        if mode == 'logits':
            if hasattr(output, 'logits'):
                result = output.logits
            else:
                raise ValueError(f"Unknown prediction format: {type(output)}")
        elif mode == 'embeddings':
            if hasattr(output, 'hidden_states'):
                result = output.hidden_states[-1]
            else:
                raise ValueError(f"Unknown prediction format: {type(output)}")
        else:
            raise ValueError(f"Unknown mode: {mode}")
        return json.dumps({mode: result.tolist()}), accept
    else:
        raise ValueError(f"Unsupported accept type: {accept}")


In [None]:
%%writefile data/$LAB2_FOLDER/code/requirements.txt
transformers==4.37.0
xformers==0.0.28.post1
sentencepiece==0.2.0

## Step 3. Deploy and test the Amplify model

### Step 3.1: Deploy the Amplify model with PyTorchModel and Custom Inference Script 

In [None]:
# Define SageMaker endpoint name
base_name = model_id.split("/")[-1].lower().replace("_", "-").replace(".", "-")
endpoint_name = f"{base_name}-endpoint"

print(f'Endpoint name: {endpoint_name}')

# Create endpoint definition
sm_model = PyTorchModel(
    model_data=model_data_location,
    role=execution_role,
    image_uri=image,
    source_dir=f'data/{LAB2_FOLDER}/code/',
    entry_point="inference.py",
    env={
        'AMPLIFY_MODEL_ID': model_id
    }
)

# Deploy the model
predictor = sm_model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    endpoint_name=endpoint_name
)

### Step 3.2 Test the deployed model

In [None]:
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

predictor.serializer = JSONSerializer()  
predictor.deserializer = JSONDeserializer()  

sequence = "MSVVGIDLGFQSCYVAVARAGGIETIANEYSDRCTPACISFGPKNR"

# generate logits
result = predictor.predict({"sequence": sequence, "mode": "logits"})
logits = np.array(result['logits'])

print(result)
print(f'Logits shape: {logits.shape}')
print()

# generate embeddings
result = predictor.predict({"sequence": sequence, "mode": "embeddings"})
embeddings = np.array(result['embeddings'])
print(result)
print(f'Embeddings shape: {embeddings.shape}')
