Skip to content

[Plugin EP] Make compiled kernel names "more" unique#27608

Closed
apwojcik wants to merge 2 commits intomicrosoft:mainfrom
ROCm:compiled_kernel_name
Closed

[Plugin EP] Make compiled kernel names "more" unique#27608
apwojcik wants to merge 2 commits intomicrosoft:mainfrom
ROCm:compiled_kernel_name

Conversation

@apwojcik
Copy link
Contributor

Description

Plugin EPs have no control over the compiled kernel names, unlike Host API EPs, which use ComputeCapability and MetaDef directly. With this change, I want to add more uniqueness to the compiled kernel names.

Motivation and Context

Currently, the name is chosen based on the graph view, which represents the model graph. For subgraphs of the same graph, the names are identical, and the Kernel Registry refuses to register the subgraph kernel, citing that the same-name kernels must have different op sets.

@apwojcik apwojcik changed the title [Plugin EP] Make compiled kernel names more unique [Plugin EP] Make compiled kernel names "more" unique Mar 10, 2026
@adrianlizarraga adrianlizarraga self-requested a review March 13, 2026 21:08
uint64_t model_hash = 0;
int id = generator_.GenerateId(graph_viewer_, model_hash);
return MakeString(prefix_, "_", model_hash, "_", id);
return MakeString(prefix_, "_", graph_viewer_.Name(), "_", model_hash, "_", id);
Copy link
Contributor

@adrianlizarraga adrianlizarraga Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, could this be fixed by making the ModelMetadefIdGenerator local variable that is currently declared in PluginExecutionProvider::GetCapability() into a member of PluginExecutionProvider? I suspect that this would properly make every call to generator_.GenerateId() return a monotonically increasing number. It would also prevent redundant regenerations of the model_hash as a nice bonus.

I can try to experiment on my side.

Aside: for a future release of ORT, perhaps we can allow EPs to provide a custom metadef name via OrtNodeFusionOptions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm worried about two things with this approach:

  • Two different subgraphs (e.g., branches of an If node) may have the same name. This approach would still lead to duplicate metadef names.
  • External tools may be depending on the current format of this metadef name.

@adrianlizarraga
Copy link
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@adrianlizarraga
Copy link
Contributor

adrianlizarraga commented Mar 15, 2026

Hi @apwojcik,
Thank you very much for bringing this bug to our attention. I created an alternative PR with the fix described in this comment: #27608 (comment)

The PR also adds testing. Could you please verify on your end.

@tianleiwu, I'm removing the 1.24.4 release tag from this PR.

adrianlizarraga added a commit that referenced this pull request Mar 16, 2026
…fused nodes in different GraphViews (#27666)

### Description
Fixes a bug where `PluginExecutionProvider::GetCapability()` incorrectly
assigned duplicate MetaDef IDs to fused nodes that live in different
GraphViewer instances (e.g., the then/else branches of an If node).

The root cause was that `GetCapability()` created a new
`ModelMetadefIdGenerator` on every invocation. Since the graph
partitioner calls `GetCapability()` once per subgraph, the generator's
monotonic counter reset each time, producing colliding IDs across
subgraphs. This caused session creation to fail with:

> Failed to add kernel for example_ep_9433721956998717990_0 example_ep
example_ep: Conflicting with a registered kernel with op versions. the
since version is: 1

#### Fix
- Promoted `ModelMetadefIdGenerator` to an instance member of
`PluginExecutionProvider` so the same generator is reused across all
`GetCapability()` calls, ensuring unique MetaDef IDs.
- This is also consistent with how existing provider-bridge EPs create
and use a single generator instance.
- **Bonus perf improvement**: No longer recomputes the entire model's
hash on every call to `GetCapability()`.

#### Testing

Example EP changes:
- Refactored `SaveConstantInitializers()` →
`TrySaveConstantInitializer()` to save initializers per-node-input
instead of via `graph.GetInitializers()`, which doesn't return
initializers defined in parent or sibling subgraphs.
- Extracted `CopiesConstantInitializers()` helper to deduplicate the
condition for drop_constant_initializers.

Unit testing:
- Added unit test called
`CompilingPluginEp_MultiSubgraphs_DuplicateMetaDefIdBug` — runs an If
model with Mul nodes in both branches, verifying that both fused nodes
receive unique MetaDef IDs and the session creates/runs successfully.


Credit to @apwojcik for [finding the
bug.](#27608)
tianleiwu pushed a commit that referenced this pull request Mar 16, 2026
…fused nodes in different GraphViews (#27666)

Fixes a bug where `PluginExecutionProvider::GetCapability()` incorrectly
assigned duplicate MetaDef IDs to fused nodes that live in different
GraphViewer instances (e.g., the then/else branches of an If node).

The root cause was that `GetCapability()` created a new
`ModelMetadefIdGenerator` on every invocation. Since the graph
partitioner calls `GetCapability()` once per subgraph, the generator's
monotonic counter reset each time, producing colliding IDs across
subgraphs. This caused session creation to fail with:

> Failed to add kernel for example_ep_9433721956998717990_0 example_ep
example_ep: Conflicting with a registered kernel with op versions. the
since version is: 1

- Promoted `ModelMetadefIdGenerator` to an instance member of
`PluginExecutionProvider` so the same generator is reused across all
`GetCapability()` calls, ensuring unique MetaDef IDs.
- This is also consistent with how existing provider-bridge EPs create
and use a single generator instance.
- **Bonus perf improvement**: No longer recomputes the entire model's
hash on every call to `GetCapability()`.

Example EP changes:
- Refactored `SaveConstantInitializers()` →
`TrySaveConstantInitializer()` to save initializers per-node-input
instead of via `graph.GetInitializers()`, which doesn't return
initializers defined in parent or sibling subgraphs.
- Extracted `CopiesConstantInitializers()` helper to deduplicate the
condition for drop_constant_initializers.

Unit testing:
- Added unit test called
`CompilingPluginEp_MultiSubgraphs_DuplicateMetaDefIdBug` — runs an If
model with Mul nodes in both branches, verifying that both fused nodes
receive unique MetaDef IDs and the session creates/runs successfully.

Credit to @apwojcik for [finding the
bug.](#27608)
@apwojcik
Copy link
Contributor Author

Closing this PR in favour of #27666

@apwojcik apwojcik closed this Mar 18, 2026
@apwojcik apwojcik deleted the compiled_kernel_name branch March 18, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants