# Hugging Faceで公開されている大規模言語モデルをSageMakerにデプロイ


* 対象モデル
  
  matsuo-lab/weblab-10b-instruction-sft
  
  https://huggingface.co/matsuo-lab/weblab-10b-instruction-sft


### SageMakerライブラリーのインストール

In [6]:
%pip install sagemaker --upgrade


Requirement already up-to-date: sagemaker in /home/vscode/.local/lib/python3.8/site-packages (2.180.0)
Note: you may need to restart the kernel to use updated packages.


### インポート

In [7]:
import sagemaker
import boto3


### IAMロールの取得

In [8]:
try:
	role = sagemaker.get_execution_role()
except ValueError:
	iam = boto3.client('iam')
	role_name = 'AmazonSageMaker-ExecutionRole-20230617T201891' # Role name with `AmazonSageMakerFullAccess` policy attached
	role = iam.get_role(RoleName=role_name)['Role']['Arn']


Couldn't call 'get_role' to get Role ARN from role name inspiron14 to get Role path.


### モデル名などのパラメーターを指定

In [9]:
model_id = 'matsuo-lab/weblab-10b-instruction-sft'
instance_type = 'ml.g5.8xlarge'


### SageMakerへデプロイ

In [10]:
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.huggingface import get_huggingface_llm_image_uri

image_uri = get_huggingface_llm_image_uri(
  backend='huggingface', # or lmi
  # region=region
)

# Hub model configuration <https://huggingface.co/models>
hub = {
  'HF_MODEL_ID': model_id, # model_id from hf.co/models
  'HF_TASK':'text-generation',          # NLP task you want to use for predictions
  # 'HF_MODEL_QUANTIZE':'bitsandbytes',
  # 'SM_NUM_GPUS': '4',
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
  env=hub,                            # configuration for loading model from Hub
  role=role,                          # IAM role with permissions to create an endpoint
  # transformers_version='4.28',        # Transformers version used
  # pytorch_version='2.0',              # PyTorch version used
  # py_version='py310',                 # Python version used
  image_uri=image_uri
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
  initial_instance_count=1,
  instance_type=instance_type,
)


---------------------------------------------------*

UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2023-08-27-01-23-53-255: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..

### 推論

In [None]:
text = "大規模言語モデルについて説明してください。"
text = f'以下は、タスクを説明する指示です。要求を適切に満たす応答を書きなさい。\n\n### 指示:\n{text}\n\n### 応答:'

data = {
   'inputs': text,
   'parameters': {
        'max_new_tokens': 100,
        'do_sample': True,
        'temperature': 0.7,
        'top_p': 0.95  
   }
}

# request
result = predictor.predict(data)

result


### エンドポイントの削除

In [None]:
predictor.delete_endpoint(delete_endpoint_config=False)
predictor.delete_model()
