-
Notifications
You must be signed in to change notification settings - Fork 685
Aoti support multi method #14715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aoti support multi method #14715
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14715
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 85 PendingAs of commit 5a40be7 with merge base 96dfa9c ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cuda_edge_program = move_to_device_pass(edge_program, "cuda") | ||
|
||
# replace slice_copy with slice | ||
ReplaceSliceCopyWithSlicePass()(cuda_edge_program.graph_module) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems pretty hacky to run the force functionalization pass and then come through and undo it (but only for slice). Wont you in practice have to do this for all view ops?
Does AOTI lowering typically happen on functionalized IR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inductor's reinplace pass reverts most of the functionalization. I don't think it handles slice_copy
though, since it comes from this pass: https://github.com/pytorch/executorch/blob/main/exir/passes/replace_broken_ops_with_function_ops_pass.py#L13
The other option we can do is to optionally run this pass in to_edge()
.
backends/cuda/cuda_backend.py
Outdated
|
||
with collect_unsupported_fallback_kernels(): | ||
with collect_unsupported_fallback_kernels(), torch.nn.attention.sdpa_kernel([SDPBackend.MATH]), torch.no_grad(): | ||
torch._logging.set_logs(post_grad_graphs=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be landed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert
01096a2
to
5a40be7
Compare
Summary: Fix Reviewed By: abhinaykukkadapu Differential Revision: D83765135
I think method name should be in the preprocess_blob instead of the compile spec....if you use compile spec, users need to pass in the graph name as part of the compile spec every time, while in preprocess blob, users don't need to. |
This pull request introduces several improvements to the CUDA backend. The main changes include adding a new graph pass to replace unnecessary
slice_copy
operations, improving how method names are tracked in compilation artifacts, and making the preprocessing pipeline more robust and accurate.Key changes:
Graph optimization and preprocessing
ReplaceSliceCopyWithSlicePass
, a new export pass that replaces non-mutatedslice_copy
operations with more efficientslice
view operations in the computational graph (replace_slice_copy_with_slice.py
, used incuda_backend.py
). [1] [2]Method name and compile specification handling
COMPILE_SPEC_KEYS
enum and utility methods (generate_method_name_compile_spec
,method_name_from_compile_specs
) to consistently embed and retrieve the method name in compile specs and as a key in the data store, improving traceability of compiled artifacts. [1] [2] [3]Code cleanup and maintainability
cuda_partitioner.py
to clarify delegation tag assignment.cuda_backend.py
.These changes collectively improve the reliability, performance, and maintainability of the CUDA backend pipeline.