[Plugin EP] Make compiled kernel names "more" unique#27608
[Plugin EP] Make compiled kernel names "more" unique#27608apwojcik wants to merge 2 commits intomicrosoft:mainfrom
Conversation
| uint64_t model_hash = 0; | ||
| int id = generator_.GenerateId(graph_viewer_, model_hash); | ||
| return MakeString(prefix_, "_", model_hash, "_", id); | ||
| return MakeString(prefix_, "_", graph_viewer_.Name(), "_", model_hash, "_", id); |
There was a problem hiding this comment.
Alternatively, could this be fixed by making the ModelMetadefIdGenerator local variable that is currently declared in PluginExecutionProvider::GetCapability() into a member of PluginExecutionProvider? I suspect that this would properly make every call to generator_.GenerateId() return a monotonically increasing number. It would also prevent redundant regenerations of the model_hash as a nice bonus.
I can try to experiment on my side.
Aside: for a future release of ORT, perhaps we can allow EPs to provide a custom metadef name via OrtNodeFusionOptions.
There was a problem hiding this comment.
Also, I'm worried about two things with this approach:
- Two different subgraphs (e.g., branches of an If node) may have the same name. This approach would still lead to duplicate metadef names.
- External tools may be depending on the current format of this metadef name.
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
Hi @apwojcik, The PR also adds testing. Could you please verify on your end. @tianleiwu, I'm removing the 1.24.4 release tag from this PR. |
…fused nodes in different GraphViews (#27666) ### Description Fixes a bug where `PluginExecutionProvider::GetCapability()` incorrectly assigned duplicate MetaDef IDs to fused nodes that live in different GraphViewer instances (e.g., the then/else branches of an If node). The root cause was that `GetCapability()` created a new `ModelMetadefIdGenerator` on every invocation. Since the graph partitioner calls `GetCapability()` once per subgraph, the generator's monotonic counter reset each time, producing colliding IDs across subgraphs. This caused session creation to fail with: > Failed to add kernel for example_ep_9433721956998717990_0 example_ep example_ep: Conflicting with a registered kernel with op versions. the since version is: 1 #### Fix - Promoted `ModelMetadefIdGenerator` to an instance member of `PluginExecutionProvider` so the same generator is reused across all `GetCapability()` calls, ensuring unique MetaDef IDs. - This is also consistent with how existing provider-bridge EPs create and use a single generator instance. - **Bonus perf improvement**: No longer recomputes the entire model's hash on every call to `GetCapability()`. #### Testing Example EP changes: - Refactored `SaveConstantInitializers()` → `TrySaveConstantInitializer()` to save initializers per-node-input instead of via `graph.GetInitializers()`, which doesn't return initializers defined in parent or sibling subgraphs. - Extracted `CopiesConstantInitializers()` helper to deduplicate the condition for drop_constant_initializers. Unit testing: - Added unit test called `CompilingPluginEp_MultiSubgraphs_DuplicateMetaDefIdBug` — runs an If model with Mul nodes in both branches, verifying that both fused nodes receive unique MetaDef IDs and the session creates/runs successfully. Credit to @apwojcik for [finding the bug.](#27608)
…fused nodes in different GraphViews (#27666) Fixes a bug where `PluginExecutionProvider::GetCapability()` incorrectly assigned duplicate MetaDef IDs to fused nodes that live in different GraphViewer instances (e.g., the then/else branches of an If node). The root cause was that `GetCapability()` created a new `ModelMetadefIdGenerator` on every invocation. Since the graph partitioner calls `GetCapability()` once per subgraph, the generator's monotonic counter reset each time, producing colliding IDs across subgraphs. This caused session creation to fail with: > Failed to add kernel for example_ep_9433721956998717990_0 example_ep example_ep: Conflicting with a registered kernel with op versions. the since version is: 1 - Promoted `ModelMetadefIdGenerator` to an instance member of `PluginExecutionProvider` so the same generator is reused across all `GetCapability()` calls, ensuring unique MetaDef IDs. - This is also consistent with how existing provider-bridge EPs create and use a single generator instance. - **Bonus perf improvement**: No longer recomputes the entire model's hash on every call to `GetCapability()`. Example EP changes: - Refactored `SaveConstantInitializers()` → `TrySaveConstantInitializer()` to save initializers per-node-input instead of via `graph.GetInitializers()`, which doesn't return initializers defined in parent or sibling subgraphs. - Extracted `CopiesConstantInitializers()` helper to deduplicate the condition for drop_constant_initializers. Unit testing: - Added unit test called `CompilingPluginEp_MultiSubgraphs_DuplicateMetaDefIdBug` — runs an If model with Mul nodes in both branches, verifying that both fused nodes receive unique MetaDef IDs and the session creates/runs successfully. Credit to @apwojcik for [finding the bug.](#27608)
|
Closing this PR in favour of #27666 |
Description
Plugin EPs have no control over the compiled kernel names, unlike Host API EPs, which use ComputeCapability and MetaDef directly. With this change, I want to add more uniqueness to the compiled kernel names.
Motivation and Context
Currently, the name is chosen based on the graph view, which represents the model graph. For subgraphs of the same graph, the names are identical, and the Kernel Registry refuses to register the subgraph kernel, citing that the same-name kernels must have different op sets.