New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ModelError while deploying FlanT5-xl #21402
Comments
Hello @RonLek Thanks for the issue! |
Hi @younesbelkada and @RonLek ! I have the same issue deploying I've tried to update to # Hub Model configuration. https://huggingface.co/models
hub: dict = {"HF_MODEL_ID": "google/flan-t5-xxl", "HF_TASK": "text2text-generation"}
# Create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version="4.17.0",
pytorch_version="1.10.2",
py_version="py38",
model_data="s3://sagemaker-eu-north-1-***/model.tar.gz",
env=hub,
role=role,
) Observing the AWS logs I can see that
But I got the same error when trying to do an inference:
AWS logs:
|
Hello @valentinboyanov I can see in your script that: HuggingFaceModel(
transformers_version="4.17.0",
pytorch_version="1.10.2",
py_version="py38",
model_data="s3://sagemaker-eu-north-1-***/model.tar.gz",
env=hub,
role=role,
) can you update |
@younesbelkada if I change it, I'm unable to deploy at all:
This is why I've followed the instructions by Heiko Hotz (marshmellow77) in this comment to provide a |
@valentinboyanov what is the content for your |
@philschmid yes, here it goes:
|
When you provide a |
@philschmid what should be the contents of the @valentinboyanov I confirm getting the same as well. From the CW logs it seems that
|
I'm having the same Code works without Docker, but If I build and run
Dockerfile is the following:
The code of from fastapi import FastAPI, Request
from fastapi.logger import logger
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, T5ForConditionalGeneration
import json
import logging
import numpy as np
import os
import torch
from transformers import pipeline
app = FastAPI()
gunicorn_logger = logging.getLogger('gunicorn.error')
logger.handlers = gunicorn_logger.handlers
if __name__ != "main":
logger.setLevel(gunicorn_logger.level)
else:
logger.setLevel(logging.INFO)
logger.info(f"Is CUDA available: {torch.cuda.is_available()}")
logger.info(f"CUDA device: {torch.cuda.get_device_name(torch.cuda.current_device())}")
logger.info("Loading model")
# error is in this line
pipe_flan = pipeline("text2text-generation", model="../flan-t5-xxl-sharded-fp16", model_kwargs={"load_in_8bit":True, "device_map": "auto"})
# extra code removed |
@philschmid @younesbelkada just wanted to follow up on this.
|
@RonLek i am planning to create an example. I ll post it here once it is ready. |
This works! Thanks a ton @philschmid for the prompt response 🚀 |
@philschmid just curious. Would there be a similar sharded model repo for flan-t5-xl? |
If you check this blog post: https://www.philschmid.de/deploy-t5-11b There is a code snippet on how to do this, for import torch
from transformers import AutoModelWithLMHead
from huggingface_hub import HfApi
# load model as float16
model = AutoModelWithLMHead.from_pretrained("t5-11b", torch_dtype=torch.float16, low_cpu_mem_usage=True)
# shard model an push to hub
model.save_pretrained("sharded", max_shard_size="2000MB") |
Thanks! This worked 🔥 |
@philschmid thanks for the guidance here. While deploying your solution on SageMaker i noticed that it works great on g5 instances but not on p3 instances( p3.8xlarge). Also, do we know when the the direct deploy from HF hub would work out of the box?
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
transformers_version==4.17.0
Plaform = Sagemaker Notebook
python==3.9.0
Who can help?
@ArthurZucker @younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Amazon Sagemaker deployment script in AWS for flant5-xl
Results in
From an existing issue, I suspected this might be due to the use of
transformers==4.17.0
, however, when I use the exact same script to deploy flant5-large model, it works without any issues.Expected behavior
The model should get deployed on AWS Sagemaker without any issues.
The text was updated successfully, but these errors were encountered: