New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility between ORTModule and DeepSpeed #108
Comments
Hi @JingyaHuang .
Looking at the source code, it seems we have not added support for Thanks for opening this issue. |
@iK1D for reference. |
After syncing internally, I can confirm that we will create a work item to add support for bfloat16 for Aten op execution and plan to have it completed in the near future. I'll leave this issue open and will update it as and when we initiate/complete the work. |
Hi @baijumeswani , |
This has been addressed in the pull request microsoft/onnxruntime#11546. Please try out the nightly onnxruntime-training to evaluate if the fix works for you. Closing this issue now. Please re-open or open another one if you need help. |
Hi @baijumeswani , Thanks for adding the BF16 support for the
This time, it seems to be good with
If not mistaken, although Besides, one thing I can not understand well is that the training by Thanks! |
Hi folks,
I am recently working on validating distributed training features while using
ORTModule
, here are some incompatibilities that I found during some tests:[With DeepSpeed]
Warnings:
Error Message:
[With Fairscale]
Environment
I would like to confirm with you folks if these behaviors are intended? And concerning the compatibility with DeepSpeed stage 3 and BF16, would it be possible to have some insights on if it would be supported in the future?
Thanks a lot!
The text was updated successfully, but these errors were encountered: