Supporting compilation of distributed_c10d.send and distributed_c10d.recv #155070

qingyi-yan · 2025-06-03T20:46:13Z

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @EikanWang @jgong5 @wenzhe-nrv @sanchitintel @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd @mingfeima @XiaobingSuper @ashokei @jingxu10 @jerryzh168 @aditew01 @ezyang @voznesenskym @penguinwu @Guobing-Chen @zhuhaozhe @blzheng @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela @xmfan @SherlockNoMad

pytorch-bot · 2025-06-03T20:46:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155070

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 14 New Failures, 1 Unrelated Failure

As of commit 68feaac with merge base 065c446 ():

NEW FAILURES - The following jobs have failed:

inductor / unit-test / inductor-test / test (inductor_distributed, 1, 1, linux.g5.12xlarge.nvidia.gpu) (gh)
distributed/tensor/test_dtensor_compile.py::TestDTensorCompileE2E::test_2d_fsdp_tp_ac_compile_use_ca_True
Lint / lintrunner-mypy / linux-job (gh)
>>> Lint for torch/_inductor/codecache.py:
Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for torch/_inductor/codecache.py:
pull / linux-jammy-py3.10-clang12 / test (default, 2, 5, lf.linux.4xlarge) (gh)
dynamo/test_structured_trace.py::StructuredTraceTest::test_compiled_autograd_chromium
pull / linux-jammy-py3.10-clang12 / test (default, 3, 5, lf.linux.4xlarge) (gh)
inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_accuracy
pull / linux-jammy-py3.10-clang18-asan / test (default, 5, 7, lf.linux.4xlarge) (gh)
dynamo/test_structured_trace.py::StructuredTraceTest::test_compiled_autograd_chromium
pull / linux-jammy-py3.10-clang18-asan / test (default, 6, 7, lf.linux.4xlarge) (gh)
inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_3
pull / linux-jammy-py3.10-clang18-asan / test (default, 7, 7, lf.linux.4xlarge) (gh)
inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_accuracy
pull / linux-jammy-py3.10-gcc11 / test (default, 2, 5, lf.linux.2xlarge) (gh)
dynamo/test_structured_trace.py::StructuredTraceTest::test_compiled_autograd_chromium
pull / linux-jammy-py3.10-gcc11 / test (default, 3, 5, lf.linux.2xlarge) (gh)
inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_accuracy
pull / linux-jammy-py3.10-gcc11 / test (distributed, 1, 2, lf.linux.2xlarge) (gh)
distributed/tensor/test_dtensor_compile.py::TestDTensorCompileE2E::test_tp_compile_fullgraph_is_seq_parallel_False_use_ca_True
pull / linux-jammy-py3.13-clang12 / test (default, 3, 5, lf.linux.4xlarge) (gh)
inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_3
pull / linux-jammy-py3.13-clang12 / test (default, 4, 5, lf.linux.4xlarge) (gh)
inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_accuracy
pull / linux-jammy-py3.13-clang12 / test (default, 5, 5, lf.linux.4xlarge) (gh)
dynamo/test_structured_trace.py::StructuredTraceTest::test_compiled_autograd_chromium

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor / unit-test / inductor-test / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (disabled by #109341 but the issue was closed recently and a rebase is needed to make it pass)
test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_kthvalue_cuda_float64

This comment was automatically generated by Dr. CI and updates every 15 minutes.

RabbitWhite1 · 2025-06-24T06:46:51Z

@qingyi-yan Hi, this is a great work! Kindly ask if isend and irecv will be supported?

qingyi-yan · 2025-06-25T04:50:52Z

@qingyi-yan Hi, this is a great work! Kindly ask if isend and irecv will be supported?

Thanks @RabbitWhite1 for the compliment. Right now I am focusing on getting this pull request merged. Supporting isend and irecv are certainly doable, but it depends on availability of my resources for this work. This availability is currently uncertain.

qingyi-yan · 2025-08-03T17:27:28Z

Hi - Just checking --- I believe this pull request is ready for review and possibly merge. It has been a long time since i did my last update. Is there anything I need to do? Thanks.

wconstab

This is an interesting case because by definition if we compile a send or recv op, our graph is non-spmd. We have been designing how the compiler optimizations should behave for distributed programs and making sure that the resulting program is still valid is very difficult unless we assume/enforce it is spmd (same graphs on every rank).

I think we should support capturing send/recv, but we should have some rules. If we capture a p2p op, we need to also make sure we are not doing unsafe collective optimizations, for example. For now I think all we need to do is raise an error if any of the spmd-mode flags or compiler passes are registered and we encounter a p2p op during tracing. What do folks think?

@bdhirsh would be the best person to advise on this issue though he is out for a week or two. Also cc @xmfan @ezyang

ezyang · 2025-08-03T21:09:13Z

I mean we just have to actually implement spmd mode IMO.

qingyi-yan · 2025-08-13T22:42:55Z

Agreed with @wconstab that more consistency checking would make the support of p2p ops safer. Waiting for more detailed feedback on the relevant rules (checks) that are needed.

ezyang · 2025-09-02T03:18:51Z

I think this is basically reasonable. To address wconstab's concern, I suggest we only enable this is a config flag is set, and set it to False by default. I haven't reviewed the rest of the PR carefully but if you're willing to do the config flag I'll do the rest of the review.

qingyi-yan · 2025-09-02T18:45:04Z

Yes, I agree to adding a config flag which is set to False by default. I will work on this and hope to have it ready in a week or so. Thanks for the feedback!

ezyang · 2025-09-03T03:26:12Z

torch/_dynamo/trace_rules.py

    "torch.sparse_compressed_tensor": SkipFunctionVariable,
+    # Specially handle system-level communication functions 
+    "torch.distributed.distributed_c10d.send": CommunicationFunctionVariable,
+    "torch.distributed.distributed_c10d.recv": CommunicationFunctionVariable,


Help me understand why these aren't handled the same way as other traceable collectives?

The recv function has an API that does not belong to the functional paradigm --- specifically the recv(variable) interface modifies the given variable. Another concern is that system-level communcations may modify the underlying system states, which means they may have unknown side effects. So I assumed it is not safe to treat them as functional collectives?

ezyang · 2025-09-03T03:27:19Z

torch/_dynamo/backends/common.py

-        use_fallback = False
+
+        import torch.distributed.distributed_c10d as c10d
+        # Fall back to not enable autograd if mutation has to be supported.


Do the tests you added fail due to this? It feels like this is just a DCE problem? It should work to run AOTAutograd here.

The issue was mutations of the input parameter in the recv function. Because this mutation violates the functional paradigm assumption for functions, the parameter fails to be modified if the Autograd is enabled, due to the use of functional objects. I am not aware if there is a way around it, except if we adopt an alternative functional API for the recv function.

ezyang · 2025-09-03T03:27:35Z

torch/_dynamo/variables/functions.py

            fn = fn_var.fn
            return variables.TorchInGraphFunctionVariable(fn, nonstrict_traceable=True)
+        name = self.fn.__name__
+        print (f"name={name}") 


don't forget to remove

Definitely. Thanks for catching my carelessness!

ezyang · 2025-09-03T03:32:38Z

torch/_dynamo/variables/functions.py

+            tx.output.create_proxy("call_function", self.fn,
+                   *proxy_args_kwargs(args, kwargs))
+            return variables.ConstantVariable(None)
+        return super().call_function(tx, args, kwargs)


This is probably not quite right. I think it would be better to use something analogous to traceable collectives remapping to support this. See this:

def _traceable_collective_remaps(): # We can't rely on importing from distributed, since it's not always built if torch.distributed.is_available(): from torch.distributed._functional_collectives import ( traceable_collective_remaps, ) return traceable_collective_remaps return {} def _traceable_collectives_source(tx: "InstructionTranslator", fn): assert torch.distributed.is_available(), "Illegal invocation." assert fn in _traceable_collective_remaps().values() inner_name = fn.__name__ path_source = tx.import_source("torch.distributed._functional_collectives") return AttrSource(path_source, inner_name)

Essentially, we need functional versions of send and recv. Then you can use the CollectiveFunctionRewriteVariable apparatus to get to the functional collective.

qingyi-yan · 2025-09-03T13:45:35Z

Thank you @ezyang for the helpful feedback! I will try what you suggested and get back to you.

…153642)

github-actions · 2025-11-09T04:37:32Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

qingyi-yan requested review from EikanWang, angelayi, avikchaudhuri, bobrenjc93, divyanshk, eqy, gujinghui, jeffdaily, jithunnair-amd, kulinseth, laithsakka, malfet, mruberry, ramanishsingh, sraikund16, syed-ahmed, tugsbayasgalan, ydwu4, zhxchen17 and zou3519 as code owners June 3, 2025 20:46

qingyi-yan requested review from a team, albanD, jbschlosser, jerryzh168, justinchuby, lezcano, shubhambhokare1, titaiwangms and wschin as code owners June 3, 2025 20:46

qingyi-yan force-pushed the main branch from 4beefd0 to 60b8194 Compare June 3, 2025 23:03

albanD removed their request for review June 4, 2025 00:09

qingyi-yan force-pushed the main branch from 60b8194 to addf514 Compare June 4, 2025 18:52

janeyx99 removed their request for review June 4, 2025 20:11

qingyi-yan force-pushed the main branch 2 times, most recently from d9dd5d1 to 3453f31 Compare June 5, 2025 18:48

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 6, 2025

mikaylagawarecki requested a review from yf225 June 6, 2025 22:03

qingyi-yan force-pushed the main branch from 3453f31 to 617604c Compare June 25, 2025 04:10

qingyi-yan force-pushed the main branch from 617604c to 8e220db Compare June 25, 2025 05:18

mikaylagawarecki removed their request for review July 14, 2025 20:13

wconstab reviewed Aug 3, 2025

View reviewed changes

ezyang reviewed Sep 3, 2025

View reviewed changes

ezyang mentioned this pull request Sep 6, 2025

[dynamo ] Enabling batch_isend_irecv to compile #161213

Open

[dynamo] Supporting compilation of send/recv in distributed (pytorch#…

68feaac

…153642)

qingyi-yan force-pushed the main branch from 8e220db to 68feaac Compare September 10, 2025 00:00

github-actions bot added the Stale label Nov 9, 2025

pytorch-bot bot added the ciflow/inductor label Nov 9, 2025

Supporting compilation of distributed_c10d.send and distributed_c10d.recv #155070

Are you sure you want to change the base?

Supporting compilation of distributed_c10d.send and distributed_c10d.recv #155070

Uh oh!

Conversation

qingyi-yan commented Jun 3, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155070

❌ 14 New Failures, 1 Unrelated Failure

Uh oh!

RabbitWhite1 commented Jun 24, 2025

Uh oh!

qingyi-yan commented Jun 25, 2025

Uh oh!

qingyi-yan commented Aug 3, 2025

Uh oh!

wconstab left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Aug 3, 2025

Uh oh!

qingyi-yan commented Aug 13, 2025

Uh oh!

ezyang commented Sep 2, 2025

Uh oh!

qingyi-yan commented Sep 2, 2025

Uh oh!

ezyang Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

qingyi-yan Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

qingyi-yan Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

qingyi-yan Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

qingyi-yan commented Sep 3, 2025

Uh oh!

github-actions bot commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

qingyi-yan commented Jun 3, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jun 3, 2025 •

edited

Loading