-
Notifications
You must be signed in to change notification settings - Fork 683
cuda partioner supported #14477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda partioner supported #14477
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14477
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 3 Unrelated FailuresAs of commit e8a30b0 with merge base 7b33035 ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987193. |
Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Differential Revision: D82987193
This PR needs a
|
Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Differential Revision: D82987193
7154f1f
to
c32854e
Compare
@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987193. |
Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Reviewed By: larryliu0820 Differential Revision: D82987193
Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Reviewed By: larryliu0820 Differential Revision: D82987193
c32854e
to
56480a6
Compare
@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987193. |
Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Reviewed By: larryliu0820 Differential Revision: D82987193
|
||
|
||
@final | ||
class CudaPartitioner(Partitioner): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mark as experimental
do_not_decompose = set() | ||
|
||
for node in ep.graph.nodes: | ||
if node.op == "call_function" and isinstance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can aoti eat control flow hops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't test that. Let me add a test later.
Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Reviewed By: larryliu0820 Differential Revision: D82987193
Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Reviewed By: larryliu0820 Differential Revision: D82987193
Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Reviewed By: larryliu0820 Differential Revision: D82987193
56480a6
to
9945631
Compare
@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987193. |
Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Reviewed By: larryliu0820 Differential Revision: D82987193
Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Reviewed By: larryliu0820 Differential Revision: D82987193
9945631
to
e8a30b0
Compare
@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating diff in D82987193. |
Summary: This diff introduce partitioner for cuda delegated, which driven by aoti library. The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table. Reviewed By: larryliu0820 Differential Revision: D82987193
Summary:
This diff introduce partitioner for cuda delegated, which driven by aoti library.
The partitioner will partition the input into exactly one partitioned graph that contains all operators from the input graph, also will keep all operators (except the ops that couldn't handled by aoti-cuda lib) away from executorch operator decomposition. Operator will be decomposed in the cuda backend using aoti-cuda specific decomposition table.
Differential Revision: D82987193