# Hugging Faceで公開されている大規模言語モデルをSageMakerにデプロイ


* 対象モデル
  
  meta-llama/Llama-2-13b-chat-hf
  
  https://huggingface.co/meta-llama/Llama-2-13b-chat-hf

* HuggingFace Text Generation Inference Containers

  https://huggingface.co/blog/sagemaker-huggingface-llm

  https://aws.amazon.com/jp/blogs/machine-learning/announcing-the-launch-of-new-hugging-face-llm-inference-containers-on-amazon-sagemaker/


### SageMakerライブラリーのインストール

In [6]:
%pip install sagemaker transformers --upgrade


Requirement already up-to-date: sagemaker in /home/vscode/.local/lib/python3.8/site-packages (2.182.0)
Requirement already up-to-date: transformers in /home/vscode/.local/lib/python3.8/site-packages (4.32.1)
Note: you may need to restart the kernel to use updated packages.


---

### パラメーターを指定

In [7]:
model_id = 'meta-llama/Llama-2-13b-chat-hf'
instance_type = 'ml.g5.12xlarge'
gpus = '4'

hugging_face_hub_token = 'hf_...' # Hugging Faceのトークンを指定

### インポート

In [8]:
import sagemaker
import boto3


### IAMロールの取得

In [9]:
try:
	role = sagemaker.get_execution_role()
except ValueError:
	iam = boto3.client('iam')
	role_name = 'AmazonSageMaker-ExecutionRole-20230617T201891' # Role name with `AmazonSageMakerFullAccess` policy attached
	role = iam.get_role(RoleName=role_name)['Role']['Arn']


Couldn't call 'get_role' to get Role ARN from role name inspiron14 to get Role path.


### SageMakerへデプロイ

In [10]:
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.huggingface import get_huggingface_llm_image_uri

image_uri = get_huggingface_llm_image_uri(
  backend='huggingface', # or lmi
  # region=region
)

# Hub model configuration <https://huggingface.co/models>
hub = {
  'HF_MODEL_ID': model_id, # model_id from hf.co/models
  'HF_TASK':'text-generation',          # NLP task you want to use for predictions
  'HUGGING_FACE_HUB_TOKEN': hugging_face_hub_token,
  # 'HF_MODEL_QUANTIZE':'bitsandbytes',
  'SM_NUM_GPUS': gpus,
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
  env=hub,                            # configuration for loading model from Hub
  role=role,                          # IAM role with permissions to create an endpoint
  image_uri=image_uri
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
  initial_instance_count=1,
  instance_type=instance_type,
  container_startup_health_check_timeout=600,
)


----------!

### 推論

In [11]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
DEFAULT_SYSTEM_PROMPT = "あなたは誠実で優秀な日本人のアシスタントです。"
text = "クマが海辺に行ってアザラシと友達になり、最終的には家に帰るというプロットの短編小説を書いてください。"

prompt = "{bos_token}{b_inst} {system}{prompt} {e_inst} ".format(
    bos_token=tokenizer.bos_token,
    b_inst=B_INST,
    system=f"{B_SYS}{DEFAULT_SYSTEM_PROMPT}{E_SYS}",
    prompt=text,
    e_inst=E_INST,
)


data = {
   'inputs': prompt,
   'parameters': {
        'max_new_tokens': 256,
        'pad_token_id': tokenizer.pad_token_id,
        'eos_token_id': tokenizer.eos_token_id,
   }
}

# request
result = predictor.predict(data)

result


  from .autonotebook import tqdm as notebook_tqdm
Downloading (…)okenizer_config.json: 100%|██████████| 776/776 [00:00<00:00, 104kB/s]
Downloading tokenizer.model: 100%|██████████| 500k/500k [00:00<00:00, 7.56MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 1.84M/1.84M [00:00<00:00, 2.24MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 414/414 [00:00<00:00, 155kB/s]


[{'generated_text': "<s>[INST] <<SYS>>\nあなたは誠実で優秀な日本人のアシスタントです。\n<</SYS>>\n\nクマが海辺に行ってアザラシと友達になり、最終的には家に帰るというプロットの短編小説を書いてください。 [/INST] Ah, I see! As a sincere and excellent Japanese assistant, I would be delighted to help you with your request. Here's a short story based on the plot you provided:\n\n---\n\nKuma's Adventure\n\nKuma, a curious and adventurous young bear, lived in a cozy little den in the heart of the forest. One sunny day, he decided to explore the world beyond his home and set out towards the seaside. As he wandered along the sandy beach, he spotted a group of sea lions basking in the sun.\n\nKuma was fascinated by the sea lions and their playful antics. He approached them cautiously, not wanting to scare them away. To his surprise, the sea lions welcomed him with open flippers! They introduced themselves as Azuma, Rinko, and Taro, and invited Kuma to join them in their games.\n\nKuma was thrilled to have made some new friends and eagerly participated in their activiti

### エンドポイントの削除

In [12]:
predictor.delete_endpoint(delete_endpoint_config=False)
predictor.delete_model()
