Aoti support multi method #14715

larryliu0820 · 2025-10-01T06:42:49Z

This pull request introduces several improvements to the CUDA backend. The main changes include adding a new graph pass to replace unnecessary slice_copy operations, improving how method names are tracked in compilation artifacts, and making the preprocessing pipeline more robust and accurate.

Key changes:

Graph optimization and preprocessing

Introduced ReplaceSliceCopyWithSlicePass, a new export pass that replaces non-mutated slice_copy operations with more efficient slice view operations in the computational graph (replace_slice_copy_with_slice.py, used in cuda_backend.py). [1] [2]
Added context management for attention kernel selection and no-grad mode during AOT compilation to ensure correct backend selection for decomposition. This is needed in the short term until we have a flash attention cuda kernel.

Method name and compile specification handling

Added a COMPILE_SPEC_KEYS enum and utility methods (generate_method_name_compile_spec, method_name_from_compile_specs) to consistently embed and retrieve the method name in compile specs and as a key in the data store, improving traceability of compiled artifacts. [1] [2] [3]

Code cleanup and maintainability

Minor refactor in cuda_partitioner.py to clarify delegation tag assignment.
Improved imports and code organization for clarity in cuda_backend.py.

These changes collectively improve the reliability, performance, and maintainability of the CUDA backend pipeline.

pytorch-bot · 2025-10-01T06:42:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14715

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 85 Pending

As of commit 5a40be7 with merge base 96dfa9c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

JacobSzwejbka · 2025-10-01T21:16:31Z

backends/cuda/cuda_backend.py

        cuda_edge_program = move_to_device_pass(edge_program, "cuda")

+        # replace slice_copy with slice
+        ReplaceSliceCopyWithSlicePass()(cuda_edge_program.graph_module)


Seems pretty hacky to run the force functionalization pass and then come through and undo it (but only for slice). Wont you in practice have to do this for all view ops?

Does AOTI lowering typically happen on functionalized IR?

Inductor's reinplace pass reverts most of the functionalization. I don't think it handles slice_copy though, since it comes from this pass: https://github.com/pytorch/executorch/blob/main/exir/passes/replace_broken_ops_with_function_ops_pass.py#L13

The other option we can do is to optionally run this pass in to_edge().

JacobSzwejbka · 2025-10-01T21:17:08Z

backends/cuda/cuda_backend.py


-        with collect_unsupported_fallback_kernels():
+        with collect_unsupported_fallback_kernels(), torch.nn.attention.sdpa_kernel([SDPBackend.MATH]), torch.no_grad():
+            torch._logging.set_logs(post_grad_graphs=True)


Should this be landed?

I'll revert

Summary: Fix Reviewed By: abhinaykukkadapu Differential Revision: D83765135

Summary: Pull Request resolved: #14753 Fix Reviewed By: abhinaykukkadapu Differential Revision: D83765135

cccclai · 2025-10-07T17:33:37Z

I think method name should be in the preprocess_blob instead of the compile spec....if you use compile spec, users need to pass in the graph name as part of the compile spec every time, while in preprocess blob, users don't need to.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 1, 2025

larryliu0820 added the release notes: desktop for desktop/laptop workstream label Oct 1, 2025

JacobSzwejbka reviewed Oct 1, 2025

View reviewed changes

JacobSzwejbka approved these changes Oct 1, 2025

View reviewed changes

larryliu0820 added 2 commits October 1, 2025 23:30

Make it work

8c3ec9e

Address comments

5a40be7

larryliu0820 force-pushed the aoti_support_multi_method branch from 01096a2 to 5a40be7 Compare October 1, 2025 22:30

larryliu0820 merged commit b1309e7 into main Oct 1, 2025
126 of 127 checks passed

larryliu0820 deleted the aoti_support_multi_method branch October 1, 2025 22:52

facebook-github-bot pushed a commit that referenced this pull request Oct 2, 2025

Fix cuda export test failures from #14715

72a2761

Summary: Fix Reviewed By: abhinaykukkadapu Differential Revision: D83765135

facebook-github-bot pushed a commit that referenced this pull request Oct 2, 2025

Fix cuda export test failures from #14715 (#14753)

aa31a9c

Summary: Fix Reviewed By: abhinaykukkadapu Differential Revision: D83765135

facebook-github-bot pushed a commit that referenced this pull request Oct 2, 2025

Fix cuda export test failures from #14715 (#14753)

c05f07c

Summary: Fix Reviewed By: abhinaykukkadapu Differential Revision: D83765135

larryliu0820 added a commit that referenced this pull request Oct 2, 2025

Fix cuda export test failures from #14715 (#14753)

f0fb066

Summary: Pull Request resolved: #14753 Fix Reviewed By: abhinaykukkadapu Differential Revision: D83765135

abhinaykukkadapu pushed a commit that referenced this pull request Oct 2, 2025

Fix cuda export test failures from #14715 (#14753)

53ccfd0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aoti support multi method #14715

Aoti support multi method #14715

Uh oh!

larryliu0820 commented Oct 1, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 1, 2025 •

edited

Loading

Uh oh!

JacobSzwejbka Oct 1, 2025

Uh oh!

larryliu0820 Oct 1, 2025

Uh oh!

JacobSzwejbka Oct 1, 2025

Uh oh!

larryliu0820 Oct 1, 2025

Uh oh!

Uh oh!

cccclai commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Aoti support multi method #14715

Aoti support multi method #14715

Uh oh!

Conversation

larryliu0820 commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Graph optimization and preprocessing

Method name and compile specification handling

Code cleanup and maintainability

Uh oh!

pytorch-bot bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14715

⏳ No Failures, 85 Pending

Uh oh!

JacobSzwejbka Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

larryliu0820 Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

JacobSzwejbka Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

larryliu0820 Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cccclai commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

larryliu0820 commented Oct 1, 2025 •

edited

Loading

pytorch-bot bot commented Oct 1, 2025 •

edited

Loading