You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi team !
I am using Diffusers and specifically looking at this documentation, however the link to open a notebook in Studio Lab doesn't work (the notebook is not present in the GitHub repo).
I am trying to deploy https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt to a SageMaker endpoint. I am able to deploy and run inference against the model, however it works only for the first request. Subsequent calls are failing due to an out of memory error.
Is there anything I could tweak on the diffusers side ? Thank you
Reproduction
In the notebook, I am using the following workflow:
import json
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig
from sagemaker.s3 import s3_path_join
# create async endpoint configuration
async_config = AsyncInferenceConfig(
output_path=s3_path_join("s3://",sagemaker_session_bucket,"async_svd_inference/output"), # Where our results will be stored
)
hub = {
'SM_NUM_GPUS': json.dumps(8),
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
#model_data=s3_model_uri, # path to your model and script
model_data="s3://BUCKET/model.tar.gz",
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.37.0", # transformers version used
pytorch_version="2.1.0", # pytorch version used
py_version='py310', # python version used
env=hub
)
# deploy the endpoint
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.48xlarge",
async_inference_config=async_config
)
Logs
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 288.00 MiB. GPU 0 has a total capacty of 22.20 GiB of which 199.12 MiB is free. Process 22167 has 17.79 GiB memory in use. Process 22175 has 4.21 GiB memory in use. Of the allocated memory 2.68 GiB is allocated by PyTorch, and 157.72 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
### System Info
Platform: AWS
Instance: g5.48XLarge
Diffusers==0.26.2
Transformers==4.37
Accelerate==0.27.0
### Who can help?
@sayakpaul @DN6
The text was updated successfully, but these errors were encountered:
@krokoko You could try reducing the decode chunk size to 4. Looking at the traceback, it seems like multiple processes are trying to use the same GPU? Is that the case here?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Describe the bug
Hi team !
I am using Diffusers and specifically looking at this documentation, however the link to open a notebook in Studio Lab doesn't work (the notebook is not present in the GitHub repo).
I am trying to deploy https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt to a SageMaker endpoint. I am able to deploy and run inference against the model, however it works only for the first request. Subsequent calls are failing due to an out of memory error.
Is there anything I could tweak on the diffusers side ? Thank you
Reproduction
In the notebook, I am using the following workflow:
Logs
The text was updated successfully, but these errors were encountered: