cuda partioner supported #14477

Gasoonjia · 2025-09-22T19:49:22Z

Summary:
This diff introduce partitioner for cuda delegated, which driven by aoti library.

The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table.

Differential Revision: D82987193

pytorch-bot · 2025-09-22T19:49:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14477

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Unrelated Failures

As of commit e8a30b0 with merge base 7b33035 ():

NEW FAILURE - The following job has failed:

pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t 041a9f9b2f0faa8a7555fce80e81fa5e406ddad0c399d887bc5b6d3b6b10bf9a /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-binary-size-linux-gcc / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-setup-linux-gcc / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-arm-backend-with-no-fvp (test_pytest_models) / linux-job (gh) (trunk failure)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-09-22T19:49:31Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987193.

Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Differential Revision: D82987193

github-actions · 2025-09-22T19:50:02Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Differential Revision: D82987193

facebook-github-bot · 2025-09-22T20:23:54Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987193.

Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Reviewed By: larryliu0820 Differential Revision: D82987193

facebook-github-bot · 2025-09-22T23:28:30Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987193.

Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Reviewed By: larryliu0820 Differential Revision: D82987193

JacobSzwejbka · 2025-09-23T17:11:18Z

backends/cuda/cuda_partitioner.py

+
+
+@final
+class CudaPartitioner(Partitioner):


Mark as experimental

JacobSzwejbka · 2025-09-23T17:14:49Z

backends/cuda/cuda_partitioner.py

+        do_not_decompose = set()
+
+        for node in ep.graph.nodes:
+            if node.op == "call_function" and isinstance(


can aoti eat control flow hops?

didn't test that. Let me add a test later.

Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Reviewed By: larryliu0820 Differential Revision: D82987193

facebook-github-bot · 2025-09-23T22:38:09Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987193.

Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Reviewed By: larryliu0820 Differential Revision: D82987193

facebook-github-bot · 2025-09-23T23:31:56Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987193.

Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Reviewed By: larryliu0820 Differential Revision: D82987193

Gasoonjia requested review from JacobSzwejbka and larryliu0820 as code owners September 22, 2025 19:49

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 22, 2025

facebook-github-bot added fb-exported meta-exported labels Sep 22, 2025

larryliu0820 approved these changes Sep 22, 2025

View reviewed changes

Gasoonjia force-pushed the export-D82987193 branch from 7154f1f to c32854e Compare September 22, 2025 20:23

Gasoonjia force-pushed the export-D82987193 branch from c32854e to 56480a6 Compare September 22, 2025 23:28

JacobSzwejbka reviewed Sep 23, 2025

View reviewed changes

Gasoonjia force-pushed the export-D82987193 branch from 56480a6 to 9945631 Compare September 23, 2025 22:37

Gasoonjia force-pushed the export-D82987193 branch from 9945631 to e8a30b0 Compare September 23, 2025 23:31

facebook-github-bot merged commit 44f3740 into pytorch:main Sep 24, 2025
126 of 132 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cuda partioner supported #14477

cuda partioner supported #14477

Uh oh!

Gasoonjia commented Sep 22, 2025

Uh oh!

pytorch-bot bot commented Sep 22, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

facebook-github-bot commented Sep 22, 2025

Uh oh!

facebook-github-bot commented Sep 22, 2025

Uh oh!

JacobSzwejbka Sep 23, 2025

Uh oh!

JacobSzwejbka Sep 23, 2025 •

edited

Loading

Uh oh!

Gasoonjia Sep 24, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!



		@final
		class CudaPartitioner(Partitioner):

cuda partioner supported #14477

cuda partioner supported #14477

Uh oh!

Conversation

Gasoonjia commented Sep 22, 2025

Uh oh!

pytorch-bot bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14477

❌ 1 New Failure, 3 Unrelated Failures

Uh oh!

facebook-github-bot commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

This PR needs a release notes: label

Uh oh!

facebook-github-bot commented Sep 22, 2025

Uh oh!

facebook-github-bot commented Sep 22, 2025

Uh oh!

JacobSzwejbka Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

JacobSzwejbka Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gasoonjia Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 22, 2025 •

edited

Loading

This PR needs a `release notes:` label

JacobSzwejbka Sep 23, 2025 •

edited

Loading