Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large model instantiation using DeepSpeed.zero.Init under ZeRO-3 #1189

Closed
R0n12 opened this issue Mar 18, 2024 · 1 comment · Fixed by #1190
Closed

Large model instantiation using DeepSpeed.zero.Init under ZeRO-3 #1189

R0n12 opened this issue Mar 18, 2024 · 1 comment · Fixed by #1190
Labels
feature request New feature or request

Comments

@R0n12
Copy link
Contributor

R0n12 commented Mar 18, 2024

Is your feature request related to a problem? Please describe.
Currently GPT-NeoX doesn't support partitioned model initialization when using ZeRO-3, which will cause OOM error in most cases.

Describe the solution you'd like
A simple fix like this will do the trick inside get_model

    if neox_args.zero_stage == 3:
        with deepspeed.zero.Init():
            model = GPT2ModelPipe(
                neox_args=neox_args,
                num_tokentypes=0,
                parallel_output=True,
                topology=mpu.get_topology(),
                use_cache=use_cache,
            )

Describe alternatives you've considered
Other things I have in mind is to figure out a way to properly test this, I have tested this on a 175B model and it works. Please let me know if there's other testing needed

Additional context
Related issue: huggingface/accelerate#922

@R0n12 R0n12 added the feature request New feature or request label Mar 18, 2024
@R0n12
Copy link
Contributor Author

R0n12 commented Mar 18, 2024

I am working on a branch addressing this issue

@R0n12 R0n12 changed the title Large model instantiation using DeepSpeed.zero.Init for extremely large model under ZeRO-3 Large model instantiation using DeepSpeed.zero.Init under ZeRO-3 Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant