Improvements to VRAM usage when loading HF models #105

mrwyattii · 2022-11-21T19:36:09Z

MII can now optionally load models onto system memory initially with the load_with_sys_mem option added to the config. This solves a problem when the model takes up nearly all the GPU memory and performing kernel injection requires some additional space, causing OOM errors despite there being enough GPU memory to hold the model.

Also adding the DTypeEnum from DeepSpeed to the MII config and unit tests to test this new option

load HF models with correct dtype, add option to load with sys memory

9cefab6

mrwyattii requested review from jeffra, awan-10 and samyam as code owners November 21, 2022 19:36

This was referenced Nov 21, 2022

CUDA OOM when loading large models #99

Closed

OOM Error when deploying BLOOM-3B on 16GB GPU via MII #103

Closed

OPT in TP or PP mode #71

Closed

jeffra approved these changes Dec 1, 2022

View reviewed changes

mrwyattii added 3 commits December 1, 2022 13:18

Merge branch 'main' into mrwyattii/address-poor-vram-usage

387993f

Update load_models.py

ad3b9ee

added unit test

5e6f5cb

mrwyattii merged commit 06714bb into main Dec 1, 2022

mrwyattii deleted the mrwyattii/address-poor-vram-usage branch December 1, 2022 22:58

mrwyattii mentioned this pull request Dec 7, 2022

[BUG] DS-inference possible memory duplication microsoft/DeepSpeed#2578

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to VRAM usage when loading HF models #105

Improvements to VRAM usage when loading HF models #105

mrwyattii commented Nov 21, 2022 •

edited

Loading

Improvements to VRAM usage when loading HF models #105

Improvements to VRAM usage when loading HF models #105

Conversation

mrwyattii commented Nov 21, 2022 • edited Loading

mrwyattii commented Nov 21, 2022 •

edited

Loading