Large model instantiation using DeepSpeed.zero.Init
under ZeRO-3
#1189
Labels
feature request
New feature or request
DeepSpeed.zero.Init
under ZeRO-3
#1189
Is your feature request related to a problem? Please describe.
Currently GPT-NeoX doesn't support partitioned model initialization when using ZeRO-3, which will cause OOM error in most cases.
Describe the solution you'd like
A simple fix like this will do the trick inside
get_model
Describe alternatives you've considered
Other things I have in mind is to figure out a way to properly test this, I have tested this on a 175B model and it works. Please let me know if there's other testing needed
Additional context
Related issue: huggingface/accelerate#922
The text was updated successfully, but these errors were encountered: