Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support uint8 and int8 allreduce in tensorflow #3649

Merged
merged 1 commit into from Aug 17, 2022

Conversation

kvignesh1420
Copy link
Contributor

@kvignesh1420 kvignesh1420 commented Aug 15, 2022

Checklist before submitting

  • Did you read the contributor guide?
  • Did you update the docs?
  • Did you write any tests to validate this change?
  • Did you update the CHANGELOG, if this change affects users?

Description

This PR adds support for int8 and uint8 allreduce in tensorflow along with the relevant tests.
Corresponding issue: #3642

Signed-off-by: Vignesh Kothapalli k.vignesh1420@gmail.com

Review process to land

  1. All tests and other checks must succeed.
  2. At least one member of the technical steering committee must review and approve.
  3. If any member of the technical steering committee requests changes, they must be addressed.

Copy link
Collaborator

@maxhgerlach maxhgerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a stab at this, @kvignesh1420!

I've triggered the CI pipeline and have left one comment regarding the test script.

test/parallel/test_tensorflow.py Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Aug 15, 2022

Unit Test Results

  1 193 files   -   38    1 193 suites   - 38   11h 41m 28s ⏱️ - 17m 32s
     816 tests ±    0       765 ✔️ ±    0       51 💤 ±    0  0 ±0 
23 443 runs   - 816  16 691 ✔️  - 506  6 752 💤  - 310  0 ±0 

Results for commit a02bddc. ± Comparison against base commit 867741e.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Aug 15, 2022

Unit Test Results (with flaky tests)

  1 371 files   -   63    1 371 suites   - 63   12h 31m 31s ⏱️ - 26m 1s
     816 tests ±    0       765 ✔️ +    2       51 💤 ±    0  0  - 2 
26 889 runs   - 926  18 645 ✔️  - 528  8 244 💤  - 396  0  - 2 

Results for commit a02bddc. ± Comparison against base commit 867741e.

♻️ This comment has been updated with latest results.

@kvignesh1420
Copy link
Contributor Author

kvignesh1420 commented Aug 15, 2022

@maxhgerlach I reverted the sampling range back to (-100, 100) and fixed the tests for int8, uint8 when average=True. Please take a look.

@maxhgerlach
Copy link
Collaborator

Thank you @kvignesh1420, that looks great!

There is a bunch of additional TensorFlow allreduce tests, scattered over a couple of separate files:

  • test_tensorflow_process_sets.py
  • test_xla.py
  • test_xla_process_sets.py

I think it would be good to extend the (u)int8 test coverage at least to test_xla.py as that one uses a different backend than test_tensorflow.py.

There would be less to gain from the various tests with multiple process sets in the two other files, so I'm not sure if those are worth the busy work.

update CHANGELOG
fix int8, uint8 tests when averaging in allreduce
extend uint8 and int8 allreduce tests to xla and process sets

Signed-off-by: Vignesh Kothapalli <k.vignesh1420@gmail.com>
@kvignesh1420
Copy link
Contributor Author

Thank you @kvignesh1420, that looks great!

There is a bunch of additional TensorFlow allreduce tests, scattered over a couple of separate files:

  • test_tensorflow_process_sets.py
  • test_xla.py
  • test_xla_process_sets.py

I think it would be good to extend the (u)int8 test coverage at least to test_xla.py as that one uses a different backend than test_tensorflow.py.

There would be less to gain from the various tests with multiple process sets in the two other files, so I'm not sure if those are worth the busy work.

@maxhgerlach I modified the tests in those 3 files as well. Please let me know if this looks good 🙂 .

@kvignesh1420 kvignesh1420 changed the title support uint8 and int8 allreduce in tensorflow feat: support uint8 and int8 allreduce in tensorflow Aug 16, 2022
Copy link
Collaborator

@maxhgerlach maxhgerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's just wait for the tests to pass. Thank you for the contribution!

@maxhgerlach
Copy link
Collaborator

Tests are all good. The failure at "CI / Build docker image horovod-ray" is unrelated.

@maxhgerlach maxhgerlach merged commit 87b5fc7 into horovod:master Aug 17, 2022
@kvignesh1420 kvignesh1420 deleted the tf-int8-allreduce branch August 17, 2022 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants