# Deploy LLaVA on Amazon SageMaker

Amazon SageMaker is a popular platform for running AI models, and models on huggingface deploy [Hugging Face Transformers](https://github.com/huggingface/transformers) using [Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) and the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/).

![llava](https://i.imgur.com/YNVG140.png)

Install sagemaker sdk:

In [43]:
!pip install sagemaker --upgrade
!pip install -r code/requirements.txt

Collecting sagemaker
  Using cached sagemaker-2.221.1-py3-none-any.whl.metadata (14 kB)
Using cached sagemaker-2.221.1-py3-none-any.whl (1.5 MB)
Installing collected packages: sagemaker
  Attempting uninstall: sagemaker
    Found existing installation: sagemaker 2.219.0
    Uninstalling sagemaker-2.219.0:
      Successfully uninstalled sagemaker-2.219.0
Successfully installed sagemaker-2.221.1
Collecting llava@ git+https://github.com/haotian-liu/LLaVA@v1.1.1 (from -r code/requirements.txt (line 1))
  Cloning https://github.com/haotian-liu/LLaVA (to revision v1.1.1) to /tmp/pip-install-sfvrcqmh/llava_dd56adf0c7fb4518932e199a75659d1c
  Running command git clone --filter=blob:none --quiet https://github.com/haotian-liu/LLaVA /tmp/pip-install-sfvrcqmh/llava_dd56adf0c7fb4518932e199a75659d1c
  Running command git checkout -q 1619889c712e347be1cb4f78ec66e7cf414ac1a6
  Resolved https://github.com/haotian-liu/LLaVA to commit 1619889c712e347be1cb4f78ec66e7cf414ac1a6
  Installing build dependenci

Bundle llava model weights and code into a `model.tar.gz`:

In [2]:
#download model from s3 
!aws s3 cp s3://sagemaker-us-west-2-726335585155/sagemaker-checkpoint-test/checkpoints-klook-0529-v2-10/ ./ --recursive

download: s3://sagemaker-us-west-2-726335585155/sagemaker-checkpoint-test/checkpoints-klook-0529-v2-10/generation_config.json to ./generation_config.json
download: s3://sagemaker-us-west-2-726335585155/sagemaker-checkpoint-test/checkpoints-klook-0529-v2-10/config.json to ./config.json
download: s3://sagemaker-us-west-2-726335585155/sagemaker-checkpoint-test/checkpoints-klook-0529-v2-10/model-00005-of-00006.safetensors to ./model-00005-of-00006.safetensors
download: s3://sagemaker-us-west-2-726335585155/sagemaker-checkpoint-test/checkpoints-klook-0529-v2-10/model.safetensors.index.json to ./model.safetensors.index.json
download: s3://sagemaker-us-west-2-726335585155/sagemaker-checkpoint-test/checkpoints-klook-0529-v2-10/model-00004-of-00006.safetensors to ./model-00004-of-00006.safetensors
download: s3://sagemaker-us-west-2-726335585155/sagemaker-checkpoint-test/checkpoints-klook-0529-v2-10/special_tokens_map.json to ./special_tokens_map.json
download: s3://sagemaker-us-west-2-726335585

In [104]:
# Create SageMaker model.tar.gz artifact
!tar -cf model.tar.gz --use-compress-program=pigz *

tar: model.tar.gz: file changed as we read it


After we created the `model.tar.gz` archive we can upload it to Amazon S3. We will use the `sagemaker` SDK to upload the model to our sagemaker session bucket.

Initialize sagemaker session first:

In [105]:
import sagemaker
import boto3
sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    # setup your own rolename in sagemaker
    role = iam.get_role(RoleName='AmazonSageMaker-ExecutionRole-20231008T201275')['Role']['Arn']

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

sagemaker role arn: arn:aws:iam::726335585155:role/service-role/AmazonSageMakerServiceCatalogProductsUseRole
sagemaker bucket: sagemaker-us-west-2-726335585155
sagemaker session region: us-west-2


In [106]:
!aws s3 rm s3://sagemaker-us-west-2-726335585155/llava-v1.5-13b/model.tar.gz

delete: s3://sagemaker-us-west-2-726335585155/llava-v1.5-13b/model.tar.gz


Upload the `model.tar.gz` to our sagemaker session bucket:

In [None]:
from sagemaker.s3 import S3Uploader

# upload model.tar.gz to s3
s3_model_uri = S3Uploader.upload(local_path="./model.tar.gz", desired_s3_uri=f"s3://{sess.default_bucket()}/llava-v1.5-13b-v1")

print(f"model uploaded to: {s3_model_uri}")

We will use `HuggingfaceModel` to create our real-time inference endpoint:

In [None]:

from sagemaker.huggingface.model import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=s3_model_uri,      # path to your model and script
   role=role,                    # iam role with permissions to create an Endpoint
   transformers_version="4.28.1",  # transformers version used
   pytorch_version="2.0.0",       # pytorch version used
   py_version='py310',            # python version used
   model_server_workers=1
)

# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    container_startup_health_check_timeout=600, # increase timeout for large models
    model_data_download_timeout=600, # increase timeout for large models
)

The `.deploy()` returns an `HuggingFacePredictor` object which can be used to request inference using the `.predict()` method. Our endpoint expects a `json` with at least `image` and `question` key.

In [None]:
%%time
data = {
    "image" : 'https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_logo.png', 
    "question" : "Describe the image and color details.",
    # "max_new_tokens" : 1024,
    # "temperature" : 0.2,
    # "stop_str" : "###"
}

# request
output = predictor.predict(data)
print(output)

To run inference with `llava` special token:

## inference test

In [87]:
%%time
## multiprocessing 
#g5.2xlarge
#g5.4xlarge

import pandas as pd

df = pd.read_csv('../../../../klook/data0527/original_hotel_image_data.csv')

df.head(10)

CPU times: user 4.04 ms, sys: 0 ns, total: 4.04 ms
Wall time: 3.2 ms


Unnamed: 0,label,url
0,Exterior,https://i.travelapi.com/lodging/1000000/920000...
1,Exterior,https://res.klook.com/klook-hotel/image/upload...
2,Exterior,http://photos.hotelbeds.com/giata/bigger/75/75...
3,Exterior,https://res.klook.com/klook-hotel/image/upload...
4,Exterior,https://res.klook.com/klook-hotel/image/upload...
5,Exterior,https://q-xx.bstatic.com/xdata/images/hotel/ma...
6,Exterior,https://res.klook.com/klook-hotel/image/upload...
7,Exterior,https://res.klook.com/image/upload/v1662706162...
8,Exterior,https://i.travelapi.com/lodging/104000000/1030...
9,Exterior,https://res.klook.com/klook-hotel/image/upload...


In [100]:
%%time

res = []
for i in range(10):
    img_path = df.iloc[i,1]
    data = {
    "image" : img_path, 
    "question" : "Describe the image",
    # "max_new_tokens" : 1024,
    # "temperature" : 0.2,
    # "stop_str" : "###"
}
    
    output = predictor.predict(data)
    res.append(output)


CPU times: user 20.1 ms, sys: 1.1 ms, total: 21.2 ms
Wall time: 10.1 s


In [101]:
input_ls = []
for i in range(10):
    img_path = df.iloc[i,1]
    data = {
    "image" : img_path, 
    "question" : "Describe the image",}
    input_ls.append(data)

In [102]:
%%time
import multiprocessing



def process_function(data):
    output = predictor.predict(data)
    return output

pool = multiprocessing.Pool(processes=4)  # Create a pool of 4 worker processes

results = pool.map(process_function, input_ls)  # Apply process_function to range(10) in parallel

pool.close()
pool.join()  # Wait for all worker processes to finish


Process ForkPoolWorker-78:
Exception ignored in: <Finalize object, dead>
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/multiprocessing/pool.py", line 695, in _terminate_pool
    cls._help_stuff_finish(inqueue, task_handler, len(pool))
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/multiprocessing/pool.py", line 675, in _help_stuff_finish
    inqueue._rlock.acquire()
KeyboardInterrupt: 
Process ForkPoolWorker-76:
Process ForkPoolWorker-75:
Process ForkPoolWorker-77:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/

CPU times: user 50.1 ms, sys: 630 ms, total: 680 ms
Wall time: 28min 9s


In [103]:
!!pwd

['/home/ec2-user/SageMaker/yafei/LLaVA/sagemaker/deploy']

The inference ` predictor` can also be initilized like with your deployed `endpoint_name` :

In [22]:
import sagemaker
import boto3
sess = sagemaker.Session()
try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    # setup your own rolename in sagemaker
    role = iam.get_role(RoleName='AmazonSageMaker-ExecutionRole-20231008T201275')['Role']['Arn']

from sagemaker.huggingface.model import HuggingFacePredictor
# initial the endpoint predictor
predictor2 = HuggingFacePredictor(
    endpoint_name="huggingface-pytorch-inference-2024-06-04-10-51-51-665",
    sagemaker_session=sess
)

In [24]:
raw_prompt = "Describe the image and color details."
prompt, stop_str = get_prompt(raw_prompt)
image_path = "https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_logo.png"
data = {"image" : image_path, "question" : prompt}
output = predictor2.predict(data)
print(output)

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "model_fn definition takes 1 or 2 arguments but 3 were given."
}
". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/huggingface-pytorch-inference-2024-06-04-10-51-51-665 in account 726335585155 for more information.

To clean up, we can delete the model and endpoint by `delete_endpoint()`or using sagemaker console:

In [54]:
# delete sagemaker endpoint
predictor.delete_endpoint()