Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]DeepSpeed Comm. Backend not compatible with outside torch.distributed module #2063

Open
kisseternity opened this issue Jun 28, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@kisseternity
Copy link
Contributor

Describe the bug
Recently I have been integrating DeepSpeed into Megatron v3.
However I find that the new DeepSpeed Comm. Backend introduced in the next pr seems not compatible with the outside torch.distributed module. Is there any plan for the compatibility for that?

DeepSpeed Comm. Backend v1 #1985

@kisseternity kisseternity added the bug Something isn't working label Jun 28, 2022
@Quentin-Anthony
Copy link
Contributor

Quentin-Anthony commented Jun 28, 2022

Currently, the DeepSpeed comm. backend deepspeed.comm from #1985 is a full wrapper around torch.distributed, and is fully compatible with external calls to torch.distributed.

Please open an issue if you face errors when mixing torch.distributed and deepspeed.comm.

@kisseternity
Copy link
Contributor Author

Currently, the DeepSpeed comm. backend deepspeed.comm from #1985 is a full wrapper around torch.distributed, and is fully compatible with external calls to torch.distributed.

Please open an issue if you face errors when mixing torch.distributed and deepspeed.comm.

I've tried the v0.6.5 version and it's okay to run. However, using the master code and built from source the issue happens.
Not sure if the difference between two versions causes this. But I can use the release version to avoid this. Thanks

@mrwyattii
Copy link
Contributor

@kisseternity could you please share what issue you are having when building from source?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants