[C10d][NCCL] Refactor complex all_reduce and broadcast #121045

Aidyn-A · 2024-03-01T21:22:16Z

The necessity of this PR lies in the fact that autograd engine + DDP calls all_reduce from C++, so the changes must be made in C++.

[rank0]: Traceback (most recent call last):
[rank0]:   File "~/complex_ddp.py", line 72, in <module>
[rank0]:     main()
[rank0]:   File "~/complex_ddp.py", line 64, in main
[rank0]:     loss.backward()
[rank0]:   File "/home/usr/pytorch/torch/_tensor.py", line 525, in backward
[rank0]:     torch.autograd.backward(
[rank0]:   File "/home/usr/pytorch/torch/autograd/__init__.py", line 267, in backward
[rank0]:     _engine_run_backward(
[rank0]:   File "/home/usr/pytorch/torch/autograd/graph.py", line 744, in _engine_run_backward
[rank0]:     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank0]: TypeError: Input tensor data type is not supported for NCCL process group: ComplexFloat

I believe, for minimizing the Python overhead, the same could be done for the rest of the ops, what do you think @kwen2501?

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @eqy @ptrblck

pytorch-bot · 2024-03-01T21:22:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121045

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9a304f7 with merge base 8861507 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eqy · 2024-03-01T21:29:19Z

Naive question, what happens when we view complex float as real and apply premul sum? Is the "elementwise" computation valid there?

Aidyn-A · 2024-03-02T01:04:24Z

Naive question, what happens when we view complex float as real and apply premul sum? Is the "elementwise" computation valid there?

Yes, the "elementwise" computation must be valid there, as long as the "pre-mul factor" is of the real type.

eqy

Would it make sense to add a minimal test e.g,. in test_c10d_nccl.py?

kwen2501

LGTM.

Re: move complex support down from python to cpp
we might need to give it more thought, as today all cpp backends rely on the view_as_real conversion at the python level. If we move it, then it means we'd need to add it back in every backend. (like you did here).

Another way to do it is at the dispatcher:
https://github.com/pytorch/pytorch/blob/main/torch/csrc/distributed/c10d/Ops.cpp

kwen2501 · 2024-03-08T23:08:55Z

@pytorchbot merge

pytorchmergebot · 2024-03-08T23:11:03Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@kwen2501

The necessity of this PR lies in the fact that autograd engine + DDP calls `all_reduce` from C++, so the changes must be made in C++. ``` [rank0]: Traceback (most recent call last): [rank0]: File "~/complex_ddp.py", line 72, in <module> [rank0]: main() [rank0]: File "~/complex_ddp.py", line 64, in main [rank0]: loss.backward() [rank0]: File "/home/usr/pytorch/torch/_tensor.py", line 525, in backward [rank0]: torch.autograd.backward( [rank0]: File "/home/usr/pytorch/torch/autograd/__init__.py", line 267, in backward [rank0]: _engine_run_backward( [rank0]: File "/home/usr/pytorch/torch/autograd/graph.py", line 744, in _engine_run_backward [rank0]: return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass [rank0]: TypeError: Input tensor data type is not supported for NCCL process group: ComplexFloat ``` I believe, for minimizing the Python overhead, the same could be done for the rest of the ops, what do you think @kwen2501? Pull Request resolved: #121045 Approved by: https://github.com/eqy, https://github.com/kwen2501

first commit

292d0a3

pytorch-bot bot added the release notes: distributed (c10d) release notes category label Mar 1, 2024

github-actions bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Mar 1, 2024

Aidyn-A requested a review from kwen2501 March 1, 2024 21:25

missed something

3609c31

pytorchbot added the open source label Mar 1, 2024

restore distributed_c10d.py

fabf426

eqy reviewed Mar 2, 2024

View reviewed changes

add test

9a304f7

eqy approved these changes Mar 4, 2024

View reviewed changes

kwen2501 approved these changes Mar 8, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 8, 2024

pytorchmergebot added the merging label Mar 8, 2024

pytorchmergebot added the Merged label Mar 9, 2024

pytorchmergebot closed this in eb39199 Mar 9, 2024

pytorchmergebot removed the merging label Mar 9, 2024

malfet mentioned this pull request Mar 12, 2024

[MPS] Fix naive matmul for BFloat16 #121731

Closed

shunzhiwen mentioned this pull request Jun 23, 2025

Support Complex Data Type in all_reduce and broadcast for Gloo Backend #156632

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[C10d][NCCL] Refactor complex all_reduce and broadcast #121045

[C10d][NCCL] Refactor complex all_reduce and broadcast #121045

Uh oh!

Aidyn-A commented Mar 1, 2024 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Mar 1, 2024 •

edited

Loading

Uh oh!

eqy commented Mar 1, 2024

Uh oh!

Aidyn-A commented Mar 2, 2024

Uh oh!

eqy left a comment

Uh oh!

kwen2501 left a comment

Uh oh!

kwen2501 commented Mar 8, 2024

Uh oh!

pytorchmergebot commented Mar 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[C10d][NCCL] Refactor complex all_reduce and broadcast #121045

[C10d][NCCL] Refactor complex all_reduce and broadcast #121045

Uh oh!

Conversation

Aidyn-A commented Mar 1, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121045

✅ No Failures

Uh oh!

eqy commented Mar 1, 2024

Uh oh!

Aidyn-A commented Mar 2, 2024

Uh oh!

eqy left a comment

Choose a reason for hiding this comment

Uh oh!

kwen2501 left a comment

Choose a reason for hiding this comment

Uh oh!

kwen2501 commented Mar 8, 2024

Uh oh!

pytorchmergebot commented Mar 8, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Aidyn-A commented Mar 1, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Mar 1, 2024 •

edited

Loading