Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeRO++ support in Accelerate's DeepSpeed integration #2020

Closed
SumanthRH opened this issue Oct 2, 2023 · 3 comments
Closed

ZeRO++ support in Accelerate's DeepSpeed integration #2020

SumanthRH opened this issue Oct 2, 2023 · 3 comments

Comments

@SumanthRH
Copy link
Contributor

SumanthRH commented Oct 2, 2023

Hi,

I've been going over multi-node training strategies with Accelerate + DeepSpeed and had a question: Does the Accelerate integration support ZeRO++? With ZeRO++, one can, for example, have a hybrid sharding strategy where you have DS ZeRO-3 running on each machine with data parallelism across machines. I couldn't find any information on whether this was supported. This is a simple config change with DeepSpeed (zero_hpz_partition_size) so I'm guessing it is supported, but I wanted clarity. I also see that the corresponding hybrid sharding strategy is supported by the FSDP integration.

It would be great if this can be clarified in the docs as well!

cc @pacman100

@SumanthRH
Copy link
Contributor Author

Ping @pacman100 @muellerzr

@pacman100
Copy link
Contributor

Hello @SumanthRH, yes, as you suggested it is a simple config change and should be supported by the current integration of DeepSpeed. If you already have a PR in mind with the updates to the docs, it would be much appreciated. Thank you!

@SumanthRH
Copy link
Contributor Author

Hi @pacman100 , great! I'll put up a PR soon for the documentation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants