# 部署 Vicuna-13B模型

本实验基于 HuggingFace Text Generate Inference （TGI）容器将 Vicuna-13B 模型部署到 Amazon SageMaker Endpoint。

In [None]:
!pip install -U sagemaker

### 环境初始化

In [1]:
import sagemaker
import boto3
sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker session region: {sess.boto_region_name}")

sagemaker role arn: arn:aws:iam::169088282855:role/AmazonSageMaker-ExecutionRole-20191205T170039
sagemaker session region: us-west-2


### 获取专用于 SageMaker 的 tgi 镜像地址

In [2]:
from sagemaker.huggingface import get_huggingface_llm_image_uri

# retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri(
  "huggingface",
  version="0.8.2"
)

# print ecr image uri
print(f"llm image uri: {llm_image}")

llm image uri: 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.0-tgi0.8.2-gpu-py39-cu118-ubuntu20.04


### 设定模型相关配置

In [3]:
import json
from sagemaker.huggingface import HuggingFaceModel

# LLM config
model_id = "lmsys/vicuna-13b-v1.3"
instance_type = "ml.g5.12xlarge"
number_of_gpu = 4
health_check_timeout = 600

# create HuggingFaceModel
llm_model = HuggingFaceModel(
  role=role,
  image_uri=llm_image,
  env={
    'HF_MODEL_ID': model_id,
    'SM_NUM_GPUS': json.dumps(number_of_gpu),
    # 'HF_MODEL_QUANTIZE': "bitsandbytes", # comment in to quantize
  }
)


开始部署！

In [4]:
# Deploy model to an endpoint
# https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model.deploy
llm = llm_model.deploy(
  initial_instance_count=1,
  instance_type=instance_type,
  # volume_size=400, # If using an instance with local SSD storage, volume_size must be None, e.g. p4 but not p3
  container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model
)

-------------!

### 开始推理

In [8]:
# define payload
prompt = """You are an helpful Assistant.

User: Give me 5 tips about how to run faster.
Assistant:"""

# hyperparameters for llm
payload = {
  "inputs": prompt,
  "parameters": {
    "do_sample": True,
    "top_p": 0.9,
    "temperature": 0.8,
    "max_new_tokens": 512,
    "repetition_penalty": 1.03,
    "stop": ["\nUser:","<|endoftext|>","</s>"]
  }
}

# send request to endpoint
response = llm.predict(payload)

# print assistant respond
assistant = response[0]["generated_text"][len(prompt):]
assistant

'Here are five tips that may help you run faster:\n\n1. Incorporate regular stretching and strengthening exercises into your training routine to improve flexibility and muscular strength.\n2. Make sure you have proper running form, including a slight forward lean, a high cadence (steps per minute), and a short stride length.\n3. Run at a comfortable pace that allows you to maintain good form and rhythm. As you get fitter, you can gradually increase the intensity of your runs.\n4. Incorporate interval training into your routine, where you alternate between periods of high-intensity running and periods of rest or low-intensity running.\n5. Make sure you get enough rest and recovery time between runs. This will allow your body to repair and rebuild itself, which will ultimately help you run faster over time.'

In [9]:
%%time
new_prompt = f"""{prompt}{assistant}
User: How would you recommend start learning Machine Learning?
Assistant :"""
# update payload
payload["inputs"] = new_prompt

# send request to endpoint
response = llm.predict(payload)

# print assistant respond
new_assistant = response[0]["generated_text"][len(new_prompt):]
print(new_assistant)

There are several ways to start learning machine learning, here are a few recommendations:

1. Start with the basics: Learn the fundamental concepts of machine learning such as supervised and unsupervised learning, regression, classification, clustering, and neural networks.
2. Choose a programming language: Python is a popular choice for machine learning due to its simplicity and versatility. You can use libraries like NumPy, Pandas, Scikit-learn, TensorFlow, and PyTorch to implement machine learning algorithms.
3. Practice with real-world datasets: Use publicly available datasets to practice your machine learning skills and gain hands-on experience.
4. Read books and online tutorials: There are many excellent books and online tutorials available that can provide a comprehensive introduction to machine learning.
5. Join a community or take a course: Joining a machine learning community or taking a course can provide you with additional support, resources, and guidance as you learn.

R

### 清空资源

In [10]:
llm.delete_model()
llm.delete_endpoint()