-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM Error when deploying BLOOM-3B on 16GB GPU via MII #103
Comments
I'm having the same issue. would like to know the answer too |
I can confirm that I'm able to reproduce this on an A6000 as well. With MII: VRAM usage is ~18GB, with The difference is unexpectedly large, and we are investigating the cause. I'll also note that this is not the case for all models. I just tested a few and many have the same memory usage. |
@marshmellow77 it appears that the OOM you are seeing when using MII is due to the need for extra VRAM when injecting kernels with DeepSpeed-Inference. This can be avoided by loading the model onto system memory rather than GPU memory before using DeepSpeed-Inference. #105 adds the option to allow users to do this if you could give it a try and let me know the results: Install this version of MII and add the following to your |
I can confirm that the mdeol now load into GPU. I can't test the text generation because of #102, but I believe this issue can be closed. |
@mrwyattii Can we not later release the extra gpu memory used when load_with_sys_mem is False? |
DeepSpeed-inference will release the extra memory after kernel injection happens |
When deploying the
bigscience/bloom-3b
(in fp32) via MII on a T4 GPU I receive aCUDA out of memory
error, see this notebook. When deploying the same model (also in fp32) via the standard HF Pipeline API, it works, see this notebook.My expectation would be that it should be possible to deploy the same model via MII if I can deploy it via HF Pipelines. If this is not possible then it'd be good to explain why and set expectations with users.
The text was updated successfully, but these errors were encountered: