New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support uint8 and int8 allreduce in tensorflow #3649
Conversation
fd5c248
to
0b29c83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a stab at this, @kvignesh1420!
I've triggered the CI pipeline and have left one comment regarding the test script.
Unit Test Results (with flaky tests) 1 371 files - 63 1 371 suites - 63 12h 31m 31s ⏱️ - 26m 1s Results for commit a02bddc. ± Comparison against base commit 867741e. ♻️ This comment has been updated with latest results. |
@maxhgerlach I reverted the sampling range back to |
Thank you @kvignesh1420, that looks great! There is a bunch of additional TensorFlow allreduce tests, scattered over a couple of separate files:
I think it would be good to extend the There would be less to gain from the various tests with multiple process sets in the two other files, so I'm not sure if those are worth the busy work. |
83035ba
to
67880e8
Compare
update CHANGELOG fix int8, uint8 tests when averaging in allreduce extend uint8 and int8 allreduce tests to xla and process sets Signed-off-by: Vignesh Kothapalli <k.vignesh1420@gmail.com>
473e93e
to
a02bddc
Compare
@maxhgerlach I modified the tests in those 3 files as well. Please let me know if this looks good 🙂 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's just wait for the tests to pass. Thank you for the contribution!
Tests are all good. The failure at "CI / Build docker image horovod-ray" is unrelated. |
Checklist before submitting
Description
This PR adds support for
int8
anduint8
allreduce in tensorflow along with the relevant tests.Corresponding issue: #3642
Signed-off-by: Vignesh Kothapalli k.vignesh1420@gmail.com
Review process to land