You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been going over multi-node training strategies with Accelerate + DeepSpeed and had a question: Does the Accelerate integration support ZeRO++? With ZeRO++, one can, for example, have a hybrid sharding strategy where you have DS ZeRO-3 running on each machine with data parallelism across machines. I couldn't find any information on whether this was supported. This is a simple config change with DeepSpeed (zero_hpz_partition_size) so I'm guessing it is supported, but I wanted clarity. I also see that the corresponding hybrid sharding strategy is supported by the FSDP integration.
It would be great if this can be clarified in the docs as well!
Hello @SumanthRH, yes, as you suggested it is a simple config change and should be supported by the current integration of DeepSpeed. If you already have a PR in mind with the updates to the docs, it would be much appreciated. Thank you!
Hi,
I've been going over multi-node training strategies with Accelerate + DeepSpeed and had a question: Does the Accelerate integration support ZeRO++? With ZeRO++, one can, for example, have a hybrid sharding strategy where you have DS ZeRO-3 running on each machine with data parallelism across machines. I couldn't find any information on whether this was supported. This is a simple config change with DeepSpeed (
zero_hpz_partition_size
) so I'm guessing it is supported, but I wanted clarity. I also see that the corresponding hybrid sharding strategy is supported by the FSDP integration.It would be great if this can be clarified in the docs as well!
cc @pacman100
The text was updated successfully, but these errors were encountered: