# Deploying Flan-T5-XXL in SageMaker

Setting up the role and S3 bucket that we will need later

In [None]:
import sagemaker

sess = sagemaker.Session()
sagemaker_session_bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
model_name = "flan-t5-xxl"

Creating the Hugging Face Model, indicating the package versions we want to use and the S£ location with the inference code

In [None]:
from sagemaker.huggingface.model import HuggingFaceModel

huggingface_model = HuggingFaceModel(
    model_data=f"s3://{sess.default_bucket()}/{model_name}/model.tar.gz",
    role=role,
    transformers_version="4.17",
    pytorch_version="1.10",
    py_version="py38",
)

Deploying the model to an endpoint

In [None]:
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.4xlarge",
    endpoint_name=model_name,
)

NOTE: Even after the endpoint has been deployed, we still need to wait 1-2 minutes before we can start using it. That's because the model is downloading from the HF Model Hub and due to its size it won't be quite finished when the endpoint is deployed.

In [None]:
predictor.endpoint_name

In [None]:
prompt = """Who is the best soccer player?"""

In [None]:
data = {
    "inputs": prompt,
    "min_length": 20,
    "max_length": 50,
    "do_sample": True,
    "temperature": 0.6,
}

res = predictor.predict(data=data)
print(res)