# Deploying Flan-T5-XXL in SageMaker

Setting up the role and S3 bucket that we will need later

In [2]:
import sagemaker
import boto3

sess = sagemaker.Session()
sagemaker_session_bucket = sess.default_bucket()
role = sagemaker.get_execution_role() 

[Optional] Deleting the `model.tar.gz` file, if one exists

In [3]:
import os
filePath = 'model/model.tar.gz'

if os.path.exists(filePath):
    os.remove(filePath)

In [4]:
model_name = "flan-t5-xxl"

Creating a ne w`model.tar.gz` file and uploading it to S3

In [5]:
%cd model
!tar zcvf model.tar.gz *
s3_location = f"s3://{sess.default_bucket()}/{model_name}/model.tar.gz"
!aws s3 cp model.tar.gz $s3_location
%cd ..

/home/ec2-user/SageMaker/deploy-flan-t5-sagemaker/model


Creating the Hugging Face Model, indicating the package versions we want to use and the S£ location with the inference code

In [10]:
from sagemaker.huggingface.model import HuggingFaceModel

huggingface_model = HuggingFaceModel(
    model_data=s3_location,
    role=role,
    transformers_version="4.17",
    pytorch_version="1.10",
    py_version='py38',
)

Deploying the model to an endpoint

In [11]:
from sagemaker.utils import name_from_base

endpoint_name = name_from_base(model_name)

predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.4xlarge",
    endpoint_name=endpoint_name,
)

---------!

!!!NOTE: Even after the endpoint has been deployed, we still need to wait 1-2 minutes before we can start using it. That's because the model is downloading from the HF Model Hub and due to its size it won't be quite finished when the endpoint is deployed.

In [12]:
predictor.endpoint_name

'flan-t5-xxl-2023-03-10-07-09-14-864'

In [13]:
prompt = """Answer the following question by reasoning step by step.
The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apples do they have now?"""                                              

In [19]:
data = {
    "inputs": prompt,
    "min_length": 20,
    "max_length": 50,
    "do_sample": True,
    "temperature": 0.6,
}

res = predictor.predict(data=data)
print(res)

They used 20 apples, so they have 23 - 20 = 3 apples now. They bought 6, so they have 3 + 6 = 9 apples now. Therefore, the answer is 9.
