[c10d] Ensure collectives are called with the same dtype for all tensor params. #84664

kumpera · 2022-09-07T21:10:06Z

While passing tensors with different dtypes don't crash, they don't produce sensible results.

We see data tearing instead of casting.

It's not clear we want to support transparent casting so, for now, we fail when such input is presented.

Fixes #84525

Fixes #ISSUE_NUMBER

pytorch-bot · 2022-09-07T21:10:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/84664

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 7d31ac2:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

numpee · 2022-09-08T08:28:04Z

Thanks for addressing the issue. This would be a great QoL update

torch/distributed/distributed_c10d.py

…or params. While passing tensors with different dtypes don't crash, they don't produce sensible results. We see data tearing instead of casting. It's not clear we want to support transparent casting so, for now, we fail when such input is presented. Fixes pytorch#84525

rohan-varma

LGTM overall, but I guess it is technically BC breaking?

torch/distributed/distributed_c10d.py

kumpera · 2022-09-15T19:48:33Z

@pytorchmergebot merge

pytorchmergebot · 2022-09-15T19:49:58Z

@pytorchbot successfully started a merge job. Check the current status here and land check progress here.
The merge job was triggered with the land checks (-l) flag. If you did not specify this flag yourself, you are likely enrolled in the land checks rollout. This means that your change will be merged once all checks on your PR and the land checks have passed (ETA 4 Hours). If you need to coordinate lands between different changes and cannot risk a land race, please add the ciflow/trunk label to your PR and wait for signal to complete, and then land your changes in proper order. Having trunk, pull, and Lint pre-run on a PR will bypass land checks and the ETA should be immediate. If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

…or params. (#84664) While passing tensors with different dtypes don't crash, they don't produce sensible results. We see data tearing instead of casting. It's not clear we want to support transparent casting so, for now, we fail when such input is presented. Fixes #84525 Fixes #ISSUE_NUMBER Pull Request resolved: #84664 Approved by: https://github.com/rohan-varma

…or params. (pytorch#84664) While passing tensors with different dtypes don't crash, they don't produce sensible results. We see data tearing instead of casting. It's not clear we want to support transparent casting so, for now, we fail when such input is presented. Fixes pytorch#84525 Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#84664 Approved by: https://github.com/rohan-varma

kumpera requested review from mrshenli, pritamdamania87, zhaojuanmao, rohan-varma, H-Huang, awgu and mingzhe09088 as code owners September 7, 2022 21:10

pytorch-bot bot added the release notes: distributed (c10d) release notes category label Sep 7, 2022

facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Sep 7, 2022

kumpera commented Sep 8, 2022

View reviewed changes

torch/distributed/distributed_c10d.py Outdated Show resolved Hide resolved

Rodrigo Kumpera added 3 commits September 12, 2022 18:59

[c10d] Use collections.abc instead of collections.

22486a1

[c10d] Allow mixing complex and its element type in collectives.

492fed9

kumpera force-pushed the fix_84525 branch from c90e4a0 to 492fed9 Compare September 12, 2022 19:52

fix type annotation.

b569ff8

rohan-varma approved these changes Sep 13, 2022

View reviewed changes

torch/distributed/distributed_c10d.py Show resolved Hide resolved

Add test for complex

7d31ac2

kumpera added topic: bc breaking topic category topic: improvements topic category labels Sep 15, 2022

pytorchmergebot added the Merged label Sep 15, 2022

pytorchmergebot closed this in 7dcc723 Sep 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[c10d] Ensure collectives are called with the same dtype for all tensor params. #84664

[c10d] Ensure collectives are called with the same dtype for all tensor params. #84664

kumpera commented Sep 7, 2022

pytorch-bot bot commented Sep 7, 2022 •

edited

numpee commented Sep 8, 2022

rohan-varma left a comment

kumpera commented Sep 15, 2022

pytorchmergebot commented Sep 15, 2022

[c10d] Ensure collectives are called with the same dtype for all tensor params. #84664

[c10d] Ensure collectives are called with the same dtype for all tensor params. #84664

Conversation

kumpera commented Sep 7, 2022

pytorch-bot bot commented Sep 7, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/84664

✅ No Failures

numpee commented Sep 8, 2022

rohan-varma left a comment

Choose a reason for hiding this comment

kumpera commented Sep 15, 2022

pytorchmergebot commented Sep 15, 2022

pytorch-bot bot commented Sep 7, 2022 •

edited