You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Recently I have been integrating DeepSpeed into Megatron v3.
However I find that the new DeepSpeed Comm. Backend introduced in the next pr seems not compatible with the outside torch.distributed module. Is there any plan for the compatibility for that?
Currently, the DeepSpeed comm. backend deepspeed.comm from #1985 is a full wrapper around torch.distributed, and is fully compatible with external calls to torch.distributed.
Please open an issue if you face errors when mixing torch.distributed and deepspeed.comm.
Currently, the DeepSpeed comm. backend deepspeed.comm from #1985 is a full wrapper around torch.distributed, and is fully compatible with external calls to torch.distributed.
Please open an issue if you face errors when mixing torch.distributed and deepspeed.comm.
I've tried the v0.6.5 version and it's okay to run. However, using the master code and built from source the issue happens.
Not sure if the difference between two versions causes this. But I can use the release version to avoid this. Thanks
Describe the bug
Recently I have been integrating DeepSpeed into Megatron v3.
However I find that the new DeepSpeed Comm. Backend introduced in the next pr seems not compatible with the outside torch.distributed module. Is there any plan for the compatibility for that?
DeepSpeed Comm. Backend v1 #1985
The text was updated successfully, but these errors were encountered: