## Deploy BLIP2 Endpoint on SageMaker

In this notebook, we will deploy a BLIP2 endpoint with DJLServing container image.

This notebook has been tested within SageMaker Studio Notebook environment Python3 Data Science environment. 

### Setup

In [1]:
!pip install sagemaker boto3 huggingface_hub --upgrade --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
awscli 1.29.75 requires botocore==1.31.75, but you have botocore 1.31.85 which is incompatible.
tokenizers 0.14.1 requires huggingface_hub<0.18,>=0.16.4, but you have huggingface-hub 0.19.1 which is incompatible.[0m[31m
[0m

In [2]:
import sagemaker
import jinja2
from sagemaker import image_uris
import boto3
import os
import time
import json
from pathlib import Path
import json
import base64

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [3]:
role = sagemaker.get_execution_role()  # execution role for the endpoint
sess = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs
bucket = sess.default_bucket()  # bucket to house artifacts

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [4]:
model_bucket = sess.default_bucket()  # bucket to house artifacts
s3_code_prefix = "blip2"  # folder within bucket where code artifact will go
s3_model_prefix = "model_blip2"  # folder within bucket where code artifact will go
region = sess._region_name
account_id = sess.account_id()

s3_client = boto3.client("s3")
sm_client = boto3.client("sagemaker")
smr_client = boto3.client("sagemaker-runtime")

jinja_env = jinja2.Environment()

# define a variable to contain the s3url of the location that has the model
pretrained_model_location = f"s3://{model_bucket}/{s3_model_prefix}/"
print(f"Pretrained model will be uploaded to ---- > {pretrained_model_location}")

Pretrained model will be uploaded to ---- > s3://sagemaker-us-west-2-691188012938/model_blip2/


## Prepare inference script and container image

In [5]:
inference_image_uri = image_uris.retrieve(
    framework="djl-deepspeed", region=sess.boto_session.region_name, version="0.22.1"
)
inference_image_uri

'763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.22.1-deepspeed0.9.2-cu118'

In [6]:
blip_model_version = "blip2-flan-t5-xl"
model_names = {
    "caption_model_name": blip_model_version, #@param ["blip-base", "blip-large", "blip2-flan-t5-xl"]
}
with open("blip2/model_name.json",'w') as file:
    json.dump(model_names, file)

In this notebook, we will provide two ways to load the model when deploying to an endpoint.
- Directly load from Hugging Face 
- Store the model artifacts on S3 and load the model directly from S3

The [Large Model Inference (LMI)](https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-dlc.html) container uses [s5cmd](https://github.com/peak/s5cmd) to download data from S3 which significantly reduces the speed when loading model during deployment. Therefore, we recommend to load the model from S3 by following the below section to download the model from Hugging Face and upload the model on S3. 

If you choose to load the model directly from Hugging Face during model deployment, you can skip the below section and jump to the section to **prepare the model tarbal file and upload to S3**.

### [OPTIONAL] Download the model from Hugging Face and upload the model artifacts on Amazon S3
If you intend to download your copy of the model and upload it to a s3 location in your AWS account, please follow the below steps, else you can skip to the next step.

In [7]:
from huggingface_hub import snapshot_download
from pathlib import Path

CAPTION_MODELS = {
    'blip-base': 'Salesforce/blip-image-captioning-base',   # 990MB
    'blip-large': 'Salesforce/blip-image-captioning-large', # 1.9GB
    'blip2-2.7b': 'Salesforce/blip2-opt-2.7b',              # 15.5GB
    'blip2-flan-t5-xl': 'Salesforce/blip2-flan-t5-xl',      # 15.77GB
}

# - This will download the model into the current directory where ever the jupyter notebook is running
local_model_path = Path("./blip2-model")
local_model_path.mkdir(exist_ok=True)
model_name = CAPTION_MODELS[blip_model_version]
# Only download pytorch checkpoint files
allow_patterns = ["*.json", "*.pt", "*.bin", "*.txt", "*.model"]

# - Leverage the snapshot library to donload the model since the model is stored in repository using LFS
model_download_path = snapshot_download(
    repo_id=model_name,
    cache_dir=local_model_path,
    allow_patterns=allow_patterns,
)

Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.44G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/6.33G [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

(…)d4d1c37753c7e9c05a443a226614/config.json:   0%|          | 0.00/7.68k [00:00<?, ?B/s]

(…)e9c05a443a226614/special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

(…)9c05a443a226614/preprocessor_config.json:   0%|          | 0.00/432 [00:00<?, ?B/s]

(…)a443a226614/pytorch_model.bin.index.json:   0%|          | 0.00/128k [00:00<?, ?B/s]

(…)1c37753c7e9c05a443a226614/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

(…)c7e9c05a443a226614/tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Please make sure the file is downloaded correctly by checking the files exist in the newly created folder `blip2-model/models--Salesforce--<model-name>/snapshots/...` before running the below cell.

In [8]:
# upload the model artifacts to s3
model_artifact = sess.upload_data(path=model_download_path, key_prefix=s3_model_prefix)
print(f"Model uploaded to --- > {model_artifact}")
print(f"We will set option.s3url={model_artifact}")

Model uploaded to --- > s3://sagemaker-us-west-2-691188012938/model_blip2
We will set option.s3url=s3://sagemaker-us-west-2-691188012938/model_blip2


In [9]:
!rm -rf {local_model_path}

SageMaker Large Model Inference containers can be used to host models without providing your own inference code. This is extremely useful when there is no custom pre-processing of the input data or post-processing of the model's predictions.

However, in this notebook, we demonstrate how to deploy a model with custom inference code.

SageMaker needs the model artifacts to be in a Tarball format. In this example, we provide the following files - `serving.properties`, `model.py`, and `requirements.txt`.
- `serving.properties` is the configuration file that can be used to indicate to DJL Serving which model parallelization and inference optimization libraries you would like to use. Depending on your need, you can set the appropriate configuration. For more details on the configuration options and an exhaustive list, you can refer the [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-large-model-configuration.html).
- `model.py` is the script handles any requests for serving.
- `requirements.txt` is the text file containing any additional pip wheel need to install. 

If you want to download the model from huggingface.co, you can set option.model_id. The model id of a pretrained model hosted inside a model repository on huggingface.co (https://huggingface.co/models). The container uses this model id to download the corresponding model repository on huggingface.co. If you set the model_id to a s3 url, the DJL will download the model artifacts from s3 and swap the model_id to the actual location of the model artifacts. In your script, you can point to this value to load the pre-trained model.
- `option.tensor_parallel_degree`: Set to the number of GPU devices over which the model needs to be partitioned. This parameter also controls the number of workers per model which will be started up when DJL serving runs. As an example if we have a 8 GPU machine, and we are creating 8 partitions then we will have 1 worker per model to serve the requests.


In [10]:
%%writefile blip2/serving.properties
engine = Python
option.tensor_parallel_degree = 1
option.model_id = {{s3url}}

Overwriting blip2/serving.properties


In [11]:
# we plug in the appropriate model location into our `serving.properties` file based on the region in which this notebook is running
template = jinja_env.from_string(Path("blip2/serving.properties").open().read())
Path("blip2/serving.properties").open("w").write(
    template.render(s3url=pretrained_model_location)
)
!pygmentize blip2/serving.properties | cat -n

     1	[36mengine[39;49;00m[37m [39;49;00m=[37m [39;49;00m[33mPython[39;49;00m[37m[39;49;00m
     2	[36moption.tensor_parallel_degree[39;49;00m[37m [39;49;00m=[37m [39;49;00m[33m1[39;49;00m[37m[39;49;00m
     3	[36moption.model_id[39;49;00m[37m [39;49;00m=[37m [39;49;00m[33ms3://sagemaker-us-west-2-691188012938/model_blip2/[39;49;00m[37m[39;49;00m


## Prepare the model tarball file and upload to S3

In [12]:
%%sh
tar czvf model.tar.gz blip2/

blip2/
blip2/.ipynb_checkpoints/
blip2/.ipynb_checkpoints/model-checkpoint.py
blip2/.ipynb_checkpoints/serving-checkpoint.properties
blip2/.ipynb_checkpoints/requirements-checkpoint.txt
blip2/serving.properties
blip2/requirements.txt
blip2/model_name.json
blip2/model.py


In [13]:
s3_code_artifact = sess.upload_data("model.tar.gz", bucket, s3_code_prefix)
print(f"S3 Code or Model tar ball uploaded to --- > {s3_code_artifact}")

S3 Code or Model tar ball uploaded to --- > s3://sagemaker-us-west-2-691188012938/blip2/model.tar.gz


## Deploy model

In [14]:
from sagemaker.model import Model
from sagemaker.utils import name_from_base

model_name = name_from_base(blip_model_version)
model = Model(
    image_uri=inference_image_uri,
    model_data=s3_code_artifact,
    role=role,
    name=model_name,
)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [None]:
%%time
endpoint_name = "endpoint-" + model_name
model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    endpoint_name=endpoint_name
)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
------------

In [None]:
%store endpoint_name

## 2.生成关键帧的内容描述

## 2.1 对视频抽帧并上传至S3

In [17]:
!pip install opencv-python



In [47]:
import cv2
import os
import time
import datetime
import sagemaker
from sagemaker import image_uris
import boto3
import json

In [74]:
s3_frame_prefix = 'videokeyframe'
video_name='bird_going_inside_a_bird_house'
video_path='../video/4/'+video_name+'.mp4'
frame_path='../video_frame/4/'

In [75]:
print(video_path)

../video/4/bird_going_inside_a_bird_house.mp4


In [76]:
def extract_and_upload_img(video_path,frame_path,video_name):
    frame_list=[]
    vc = cv2.VideoCapture(video_path)  # 读入视频文件
    c = 1
    if vc.isOpened():  # 判断是否正常打开
        rval, frame = vc.read()
        print('yes')
    else:
        rval = False
    timeF = 35  # 视频帧计数间隔频率
    while rval:  # 循环读取视频帧
        rval, frame = vc.read()
        if (c % timeF == 0):  # 每隔timeF帧进行存储操作
            if frame is not None:
                #timestr = time.strftime("%Y%m%d-%H%M%S")
                filename = video_name+str(c)+'.jpg'
                cv2.imwrite(frame_path + filename, frame)
                frame_list.append(filename)
        c = c + 1
        cv2.waitKey(10)
    vc.release()
    return frame_list

In [77]:
frame_list= extract_and_upload_img(video_path,frame_path,video_name)

yes


In [78]:
print(frame_list)

['bird_going_inside_a_bird_house35.jpg', 'bird_going_inside_a_bird_house70.jpg', 'bird_going_inside_a_bird_house105.jpg', 'bird_going_inside_a_bird_house140.jpg', 'bird_going_inside_a_bird_house175.jpg', 'bird_going_inside_a_bird_house210.jpg', 'bird_going_inside_a_bird_house245.jpg', 'bird_going_inside_a_bird_house280.jpg', 'bird_going_inside_a_bird_house315.jpg', 'bird_going_inside_a_bird_house350.jpg', 'bird_going_inside_a_bird_house385.jpg', 'bird_going_inside_a_bird_house420.jpg', 'bird_going_inside_a_bird_house455.jpg', 'bird_going_inside_a_bird_house490.jpg', 'bird_going_inside_a_bird_house525.jpg', 'bird_going_inside_a_bird_house560.jpg', 'bird_going_inside_a_bird_house595.jpg', 'bird_going_inside_a_bird_house630.jpg', 'bird_going_inside_a_bird_house665.jpg', 'bird_going_inside_a_bird_house700.jpg', 'bird_going_inside_a_bird_house735.jpg', 'bird_going_inside_a_bird_house770.jpg', 'bird_going_inside_a_bird_house805.jpg', 'bird_going_inside_a_bird_house840.jpg', 'bird_going_insid

In [79]:
!aws s3 cp --recursive {frame_path} s3://{bucket}/{s3_frame_prefix}

upload: ../video_frame/4/bird_going_inside_a_bird_house1015.jpg to s3://sagemaker-us-west-2-691188012938/videokeyframe/bird_going_inside_a_bird_house1015.jpg
upload: ../video_frame/4/bird_going_inside_a_bird_house175.jpg to s3://sagemaker-us-west-2-691188012938/videokeyframe/bird_going_inside_a_bird_house175.jpg
upload: ../video_frame/4/bird_going_inside_a_bird_house105.jpg to s3://sagemaker-us-west-2-691188012938/videokeyframe/bird_going_inside_a_bird_house105.jpg
upload: ../video_frame/4/bird_going_inside_a_bird_house1120.jpg to s3://sagemaker-us-west-2-691188012938/videokeyframe/bird_going_inside_a_bird_house1120.jpg
upload: ../video_frame/4/bird_going_inside_a_bird_house1085.jpg to s3://sagemaker-us-west-2-691188012938/videokeyframe/bird_going_inside_a_bird_house1085.jpg
upload: ../video_frame/4/bird_going_inside_a_bird_house280.jpg to s3://sagemaker-us-west-2-691188012938/videokeyframe/bird_going_inside_a_bird_house280.jpg
upload: ../video_frame/4/bird_going_inside_a_bird_house105

## 2.2 利用VLM 产生内容描述

In [80]:
from PIL import Image
import base64
import json
import boto3

smr_client = boto3.client("sagemaker-runtime")
endpoint_name = model.endpoint_name

In [81]:
def encode_image(img_file):
    with open(img_file, "rb") as image_file:
        img_str = base64.b64encode(image_file.read())
        base64_string = img_str.decode("latin1")
    return base64_string

def run_inference(endpoint_name, inputs):
    response = smr_client.invoke_endpoint(
        EndpointName=endpoint_name, Body=json.dumps(inputs)
    )
    return response["Body"].read().decode('utf-8')

In [82]:
content_result_list=[]
length_output_params = {
    "max_new_tokens": 70,
    "min_new_tokens": 30,
    "early_stopping": True
}

# Parameters that control the generation strategy used
gen_strategy_params = {
    "do_sample": False,
    "num_beams": 2,
    "num_beam_groups": 1,
    "use_cache": True
}

gen_strategy_params.update(length_output_params)

for item in frame_list:
    image_path=frame_path+item
    raw_image = Image.open(image_path).convert('RGB')
    
    base64_string = encode_image(image_path)
    inputs = {"prompt": "please describe this image.", "image": base64_string,"parameters": gen_strategy_params}
    result=run_inference(endpoint_name, inputs)
    content_result_list.append(result)
    



#test_image = "../video_frame/2/33632_315.jpg.jpeg"
#raw_image = Image.open(test_image).convert('RGB')
#display(raw_image)

In [83]:
print(content_result_list)

['a bird is sitting in a birdhouse in a garden with a tree in the foreground and bushes in the back ground', 'a blue bird is sitting in a birdhouse in a garden surrounded by trees and shrubs a birdhouse in a garden', 'a birdhouse with a small bird inside of it and a tree in the back yard with a bird in the birdhouse and a tree in the back yard', 'a wooden birdhouse with a hole in the side and a bird in the nest on the roof of the birdhouse - birdhouses', 'a wooden birdhouse with a hole in the side and a bird in the nest on the roof of the birdhouse - birdhouses', 'a birdhouse with a small bird inside of it and a tree behind the birdhouse - a birdhouse with a small bird inside', 'a birdhouse with a bird in the hole and a tree in the back yard with a bird in the hole and a tree in the back yard with a bird in the hole', 'a wooden birdhouse with a hole in the side and a bird in the nest on the roof of the birdhouse - birdhouses', "a wooden birdhouse with a hole in the side and a bird on i

## 3.使用Bedrock Claude进行不同任务

In [62]:
!pip install langchain
!pip install anthropic



In [63]:
from langchain.llms.bedrock import Bedrock
from langchain import LLMChain, PromptTemplate
import json
import os
import sys

import boto3
import botocore

from utils import bedrock

boto3_bedrock = bedrock.get_bedrock_client()

Create new client
  Using region: None
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-west-2.amazonaws.com)


In [64]:
from langchain.llms.bedrock import Bedrock

inference_modifier = {'max_tokens_to_sample':4096, 
                      "temperature":0.5,
                      "top_k":250,
                      "top_p":1,
                      "stop_sequences": ["\n\nHuman"]
                     }

textgen_llm = Bedrock(model_id = "anthropic.claude-v2",
                    client = boto3_bedrock, 
                    model_kwargs = inference_modifier 
                    )

### 3.1 视频摘要（Video Summary）

In [91]:
with open('./utils/video_summary.txt', 'r', encoding="utf-8") as task:
    lines = task.readlines()
    multi_var_prompt = PromptTemplate(
    input_variables=["input"], 
    template="""

        Human: {}

        Assistant: Here is a simple video summary: """.format(lines)
        )
    prompt = multi_var_prompt.format(input=str(content_result_list))
    num_tokens = textgen_llm.get_num_tokens(prompt)
    print(f"Our prompt has {num_tokens} tokens")
    
    response = textgen_llm(prompt)
    content = response[1:]

Our prompt has 1099 tokens


In [92]:
print(content)


This video shows a birdhouse in a garden with trees and bushes in the background. A small bird is sitting inside the birdhouse or on its roof. The birdhouse is made of wood and has holes on the sides and top for the bird to enter and exit. The video focuses on capturing the birdhouse from different angles with the bird inside its nest.


### 3.2 社交圈文案生成

In [93]:
with open('./utils/social_media_post.txt', 'r', encoding="utf-8") as task:
    lines = task.readlines()
    multi_var_prompt = PromptTemplate(
    input_variables=["input"], 
    template="""

        Human: {}

        Assistant: ok """.format(lines)
        )
    prompt = multi_var_prompt.format(input=str(content_result_list))
    num_tokens = textgen_llm.get_num_tokens(prompt)
    print(f"Our prompt has {num_tokens} tokens")
    
    response = textgen_llm(prompt)
    content = response[response.index('\n')+1:]

Our prompt has 1355 tokens


In [94]:
print(content)


😊 This video shows some precious moments with my little feathered friend! Over the past year, I've had the joy of watching a sweet bird make its home in the wooden birdhouse in my backyard. 

Through changing seasons, my new neighbor stuck around, peeking its head out of the round hole in its cozy abode. I loved seeing its bright blue feathers and hearing its cheerful chirps on sunny mornings. 

Some of my favorite memories were watching it dart in and out of its home, perched up high on the birdhouse roof or nestled inside. No matter how much time passed, it always recognized me when I came outside. 

I feel so lucky to have made a new friend! This little bird brought me comfort and made me smile during difficult days. I hope it continues to stay in the neighborhood so we can enjoy more special moments together. 🐦


### 3.3 基于图像的问答

In [95]:
with open('./utils/VQA.txt', 'r', encoding="utf-8") as task:
    lines = task.readlines()
    multi_var_prompt = PromptTemplate(
    input_variables=["input","question"], 
    template="""

        Human: {}

        Assistant: """.format(lines)
        )
    prompt = multi_var_prompt.format(input=str(content_result_list),question="What kind of bird appeared today? ")
    num_tokens = textgen_llm.get_num_tokens(prompt)
    print(f"Our prompt has {num_tokens} tokens")
    
    response = textgen_llm(prompt)
    content = response[1:]

Our prompt has 1068 tokens


In [96]:
print(content)

Blue bird


In [None]:
# sm_client.delete_model(ModelName=model_name)
# sm_client.delete_endpoint(EndpointName=endpoint_name)