Skip to content

Conversation

larryliu0820
Copy link
Contributor

@larryliu0820 larryliu0820 commented Oct 1, 2025

This pull request introduces several improvements to the CUDA backend. The main changes include adding a new graph pass to replace unnecessary slice_copy operations, improving how method names are tracked in compilation artifacts, and making the preprocessing pipeline more robust and accurate.

Key changes:

Graph optimization and preprocessing

  • Introduced ReplaceSliceCopyWithSlicePass, a new export pass that replaces non-mutated slice_copy operations with more efficient slice view operations in the computational graph (replace_slice_copy_with_slice.py, used in cuda_backend.py). [1] [2]
  • Added context management for attention kernel selection and no-grad mode during AOT compilation to ensure correct backend selection for decomposition. This is needed in the short term until we have a flash attention cuda kernel.

Method name and compile specification handling

  • Added a COMPILE_SPEC_KEYS enum and utility methods (generate_method_name_compile_spec, method_name_from_compile_specs) to consistently embed and retrieve the method name in compile specs and as a key in the data store, improving traceability of compiled artifacts. [1] [2] [3]

Code cleanup and maintainability

  • Minor refactor in cuda_partitioner.py to clarify delegation tag assignment.
  • Improved imports and code organization for clarity in cuda_backend.py.

These changes collectively improve the reliability, performance, and maintainability of the CUDA backend pipeline.

Copy link

pytorch-bot bot commented Oct 1, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14715

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 85 Pending

As of commit 5a40be7 with merge base 96dfa9c (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 1, 2025
@larryliu0820 larryliu0820 added the release notes: desktop for desktop/laptop workstream label Oct 1, 2025
cuda_edge_program = move_to_device_pass(edge_program, "cuda")

# replace slice_copy with slice
ReplaceSliceCopyWithSlicePass()(cuda_edge_program.graph_module)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems pretty hacky to run the force functionalization pass and then come through and undo it (but only for slice). Wont you in practice have to do this for all view ops?

Does AOTI lowering typically happen on functionalized IR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inductor's reinplace pass reverts most of the functionalization. I don't think it handles slice_copy though, since it comes from this pass: https://github.com/pytorch/executorch/blob/main/exir/passes/replace_broken_ops_with_function_ops_pass.py#L13

The other option we can do is to optionally run this pass in to_edge().


with collect_unsupported_fallback_kernels():
with collect_unsupported_fallback_kernels(), torch.nn.attention.sdpa_kernel([SDPBackend.MATH]), torch.no_grad():
torch._logging.set_logs(post_grad_graphs=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be landed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert

@larryliu0820 larryliu0820 force-pushed the aoti_support_multi_method branch from 01096a2 to 5a40be7 Compare October 1, 2025 22:30
@larryliu0820 larryliu0820 merged commit b1309e7 into main Oct 1, 2025
126 of 127 checks passed
@larryliu0820 larryliu0820 deleted the aoti_support_multi_method branch October 1, 2025 22:52
facebook-github-bot pushed a commit that referenced this pull request Oct 2, 2025
Summary: Fix

Reviewed By: abhinaykukkadapu

Differential Revision: D83765135
facebook-github-bot pushed a commit that referenced this pull request Oct 2, 2025
Summary:

Fix

Reviewed By: abhinaykukkadapu

Differential Revision: D83765135
facebook-github-bot pushed a commit that referenced this pull request Oct 2, 2025
Summary:

Fix

Reviewed By: abhinaykukkadapu

Differential Revision: D83765135
larryliu0820 added a commit that referenced this pull request Oct 2, 2025
Summary:
Pull Request resolved: #14753

Fix

Reviewed By: abhinaykukkadapu

Differential Revision: D83765135
@cccclai
Copy link
Contributor

cccclai commented Oct 7, 2025

I think method name should be in the preprocess_blob instead of the compile spec....if you use compile spec, users need to pass in the graph name as part of the compile spec every time, while in preprocess blob, users don't need to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: desktop for desktop/laptop workstream

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants