-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[functional collective] change the Python APIs to only use the native funcol ops #123777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/123777
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 655f2ba with merge base 585cd11 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang [ghstack-poisoned]
cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang [ghstack-poisoned]
… the native funcol ops" ## Summary After this PR, the functional collective Python APIs will stop honoring `TORCH_DISABLE_NATIVE_FUNCOL` and only use native funcol ops. Specifically, this PR: - Removed `use_native_funcol()`. - Removed the code path in the Python APIs when `use_native_funcol()` is `False`. - Changed the CI tests that runs on both native funcol and legacy funcol through the Python API to only run with native funcol. ## Test Changes `test_functional_api.py` - Removed the tests where only one of output_split_sizes or input_split_sizes is specified. This behavior is unreliable has been removed from the native funcol. - Removed `TestWaitiness` which tests an implementation detail of the legacy funcol. We have equivalent tests for native funcol in `test/distributed/test_c10d_functional_native.py` https://github.com/pytorch/pytorch/blob/b7fac76fc259394136bc77b3e39d5705919e5c4c/test/distributed/test_c10d_functional_native.py#L114-L116 `test/distributed/_tensor/test_dtensor.py` `test/distributed/_tensor/test_dtensor_compile.py` `test/distributed/test_device_mesh.py` `test/distributed/_tensor/experimental/test_tp_transform.py` `test/distributed/_tensor/test_matrix_ops.py` `test/distributed/test_inductor_collectives.py` - All these tests were double running with both native funcol and legacy funcol. Changed to only run with native funcol. `test/distributed/test_c10d_functional_native.py` - Removed the `run_with_native_funcol` decorators. cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome to see this happens!
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Command
Details for Dev Infra teamRaised by workflow job |
… the native funcol ops" ## Summary After this PR, the functional collective Python APIs will stop honoring `TORCH_DISABLE_NATIVE_FUNCOL` and only use native funcol ops. Specifically, this PR: - Removed `use_native_funcol()`. - Removed the code path in the Python APIs when `use_native_funcol()` is `False`. - Changed the CI tests that runs on both native funcol and legacy funcol through the Python API to only run with native funcol. ## Test Changes `test_functional_api.py` - Removed the tests where only one of output_split_sizes or input_split_sizes is specified. This behavior is unreliable has been removed from the native funcol. - Removed `TestWaitiness` which tests an implementation detail of the legacy funcol. We have equivalent tests for native funcol in `test/distributed/test_c10d_functional_native.py` https://github.com/pytorch/pytorch/blob/b7fac76fc259394136bc77b3e39d5705919e5c4c/test/distributed/test_c10d_functional_native.py#L114-L116 `test/distributed/_tensor/test_dtensor.py` `test/distributed/_tensor/test_dtensor_compile.py` `test/distributed/test_device_mesh.py` `test/distributed/_tensor/experimental/test_tp_transform.py` `test/distributed/_tensor/test_matrix_ops.py` `test/distributed/test_inductor_collectives.py` - All these tests were double running with both native funcol and legacy funcol. Changed to only run with native funcol. `test/distributed/test_c10d_functional_native.py` - Removed the `run_with_native_funcol` decorators. cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
… funcol ops (pytorch#123777) ## Summary After this PR, the functional collective Python APIs will stop honoring `TORCH_DISABLE_NATIVE_FUNCOL` and only use native funcol ops. Specifically, this PR: - Removed `use_native_funcol()`. - Removed the code path in the Python APIs when `use_native_funcol()` is `False`. - Changed the CI tests that runs on both native funcol and legacy funcol through the Python API to only run with native funcol. ## Test Changes `test_functional_api.py` - Removed the tests where only one of output_split_sizes or input_split_sizes is specified. This behavior is unreliable has been removed from the native funcol. - Removed `TestWaitiness` which tests an implementation detail of the legacy funcol. We have equivalent tests for native funcol in `test/distributed/test_c10d_functional_native.py` https://github.com/pytorch/pytorch/blob/b7fac76fc259394136bc77b3e39d5705919e5c4c/test/distributed/test_c10d_functional_native.py#L114-L116 `test/distributed/_tensor/test_dtensor.py` `test/distributed/_tensor/test_dtensor_compile.py` `test/distributed/test_device_mesh.py` `test/distributed/_tensor/experimental/test_tp_transform.py` `test/distributed/_tensor/test_matrix_ops.py` `test/distributed/test_inductor_collectives.py` - All these tests were double running with both native funcol and legacy funcol. Changed to only run with native funcol. `test/distributed/test_c10d_functional_native.py` - Removed the `run_with_native_funcol` decorators. Pull Request resolved: pytorch#123777 Approved by: https://github.com/wanchaol ghstack dependencies: pytorch#123776
… funcol ops (pytorch#123777) ## Summary After this PR, the functional collective Python APIs will stop honoring `TORCH_DISABLE_NATIVE_FUNCOL` and only use native funcol ops. Specifically, this PR: - Removed `use_native_funcol()`. - Removed the code path in the Python APIs when `use_native_funcol()` is `False`. - Changed the CI tests that runs on both native funcol and legacy funcol through the Python API to only run with native funcol. ## Test Changes `test_functional_api.py` - Removed the tests where only one of output_split_sizes or input_split_sizes is specified. This behavior is unreliable has been removed from the native funcol. - Removed `TestWaitiness` which tests an implementation detail of the legacy funcol. We have equivalent tests for native funcol in `test/distributed/test_c10d_functional_native.py` https://github.com/pytorch/pytorch/blob/b7fac76fc259394136bc77b3e39d5705919e5c4c/test/distributed/test_c10d_functional_native.py#L114-L116 `test/distributed/_tensor/test_dtensor.py` `test/distributed/_tensor/test_dtensor_compile.py` `test/distributed/test_device_mesh.py` `test/distributed/_tensor/experimental/test_tp_transform.py` `test/distributed/_tensor/test_matrix_ops.py` `test/distributed/test_inductor_collectives.py` - All these tests were double running with both native funcol and legacy funcol. Changed to only run with native funcol. `test/distributed/test_c10d_functional_native.py` - Removed the `run_with_native_funcol` decorators. Pull Request resolved: pytorch#123777 Approved by: https://github.com/wanchaol ghstack dependencies: pytorch#123776
Stack from ghstack (oldest at bottom):
Summary
After this PR, the functional collective Python APIs will stop honoring
TORCH_DISABLE_NATIVE_FUNCOL
and only use native funcol ops. Specifically, this PR:use_native_funcol()
.use_native_funcol()
isFalse
.Test Changes
test_functional_api.py
TestWaitiness
which tests an implementation detail of the legacy funcol. We have equivalent tests for native funcol intest/distributed/test_c10d_functional_native.py
pytorch/test/distributed/test_c10d_functional_native.py
Lines 114 to 116 in b7fac76
test/distributed/_tensor/test_dtensor.py
test/distributed/_tensor/test_dtensor_compile.py
test/distributed/test_device_mesh.py
test/distributed/_tensor/experimental/test_tp_transform.py
test/distributed/_tensor/test_matrix_ops.py
test/distributed/test_inductor_collectives.py
test/distributed/test_c10d_functional_native.py
run_with_native_funcol
decorators.cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang