Skip to content

[Plugin EP] Port graph capture/replay APIs#27958

Merged
tianleiwu merged 27 commits intomainfrom
adrianl/PluginEp_CudaGraphCaptureReplay
Apr 10, 2026
Merged

[Plugin EP] Port graph capture/replay APIs#27958
tianleiwu merged 27 commits intomainfrom
adrianl/PluginEp_CudaGraphCaptureReplay

Conversation

@adrianlizarraga
Copy link
Copy Markdown
Contributor

@adrianlizarraga adrianlizarraga commented Apr 3, 2026

NOTE: This PR cannot be merged until the ORT version is updated to 1.26.0 in the main branch

Description

Ports graph capture/replay APIs (e.g., CUDA Graph) to the Plugin EP (OrtEp) C API so that plugin-based execution providers can participate in ORT-managed graph capture and replay.

What changed

New Plugin EP C API functions (onnxruntime_ep_c_api.h):

  • OrtEp::IsGraphCaptureEnabled — indicates whether the EP has graph capture enabled.
  • OrtEp::IsGraphCaptured — indicates whether a graph has been captured for a given annotation ID.
  • OrtEp::ReplayGraph — replays a previously captured graph.
  • OrtEp::GetGraphCaptureNodeAssignmentPolicy — returns the node assignment validation policy for graph capture.

All four are optional (NULL defaults to safe behavior) and version-gated (ort_version_supported >= 26).
If IsGraphCaptureEnabled returns true, IsGraphCaptured and ReplayGraph must also be implemented;
otherwise PluginExecutionProvider logs a warning and disables graph capture for that EP.

New OrtGraphCaptureNodeAssignmentPolicy enum (onnxruntime_ep_c_api.h):
Replaces the hardcoded EP-name checks in InferenceSession::Initialize() with a policy-based approach:

  • ALL_NODES_ON_EP — all nodes must be on the target EP (e.g., TensorRT).
  • ALLOW_CPU_FOR_SHAPES — CPU nodes allowed for shape computation if no memcpy nodes exist (e.g., CUDA, WebGPU, DML).

Refactored InferenceSession graph capture selection (inference_session.cc):

  • Removed the hardcoded graph_support_ep_list and per-EP strcmp checks.
  • Now iterates over all registered EPs and uses IsGraphCaptureEnabled() + GetGraphCaptureNodeAssignmentPolicy() to select and validate the graph-capturing EP.
  • AreAllComputeNodesAssignedToCudaOrJsOrDmlEpWebGpuEp() → generalized to AreAllComputeNodesAssignedToEpOrCpu(), which also requires at least one node on the target EP.
  • IExecutionProvider::GetGraphCaptureNodeAssignmentPolicy() added to the base class (defaults to ALL_NODES_ON_EP).

Bounded graph capture recursion (inference_session.cc/h):

  • Run() now delegates to RunImpl() with a graph_capture_depth parameter.
  • Caps internal run attempts at kMaxGraphCaptureRunAttempts = 8, returning a clear error if the EP never reports IsGraphCaptured() == true.

EP implementations:

  • WebGPU plugin EP: Fully implements all four graph capture APIs by forwarding to the underlying IExecutionProvider.
  • CUDA plugin EP: Stubs with TODOs (returns disabled/not-implemented).
  • NvTensorRTRTX EP: IsGraphCaptureEnabled() now returns false since this EP manages graph capture internally (not via ORT).

C++ wrapper (onnxruntime_cxx_api.h / onnxruntime_cxx_inline.h):

  • Added Ort::Env::CopyTensor() convenience overload for copying a single tensor (wraps CopyTensors with num_tensors=1).

Tests

  • ep_plugin_provider_test.cc: Unit tests for each new PluginExecutionProvider graph capture method, including NULL function pointer defaults, version < 26 backward compatibility, and validation that IsGraphCaptureEnabled() returns false when IsGraphCaptured or ReplayGraph are NULL.
  • test_graph_capture.cc: End-to-end test for WebGPU plugin EP graph capture/replay using IO binding (warm-up + capture run, then replay with different inputs).

Motivation and Context

Previously, graph capture support was limited to a hardcoded list of EPs (kCudaExecutionProvider, kTensorrtExecutionProvider, kJsExecutionProvider, kWebGpuExecutionProvider, kDmlExecutionProvider) with EP-specific validation logic in InferenceSession. This made it impossible for plugin EPs to participate in ORT-managed graph capture/replay without modifying the core session code.

This PR makes graph capture/replay extensible to any EP, including out-of-tree plugin EPs, by exposing it through the OrtEp C API.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR (draft) ports graph capture/replay support to the Plugin EP pathway by extending the OrtEp C API, wiring those callbacks through the plugin EP provider wrapper, and updating session initialization logic to validate/capture based on an EP-provided node-assignment policy.

Changes:

  • Bump ORT version/API version to 1.26.0 / ORT_API_VERSION=26 and add new OrtEp graph capture/replay callbacks plus OrtGraphCaptureNodeAssignmentPolicy.
  • Update InferenceSession::Initialize() to select any EP with graph capture enabled, validate graph assignment via EP policy, and cache a single EP for replay.
  • Add/extend tests for plugin EP graph capture APIs and add an end-to-end autoep WebGPU plugin EP graph capture/replay test.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
VERSION_NUMBER Bumps runtime version to 1.26.0.
include/onnxruntime/core/session/onnxruntime_c_api.h Bumps ORT_API_VERSION to 26.
onnxruntime/core/session/onnxruntime_c_api.cc Updates version string static assert to 1.26.0.
include/onnxruntime/core/session/onnxruntime_ep_c_api.h Adds graph capture/replay callbacks to OrtEp and introduces OrtGraphCaptureNodeAssignmentPolicy.
include/onnxruntime/core/framework/execution_provider.h Adds IExecutionProvider::GetGraphCaptureNodeAssignmentPolicy() with a strict default.
onnxruntime/core/session/inference_session.cc Generalizes graph-capture EP selection/validation and uses EP-specified node assignment policy.
onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.h Exposes graph capture/replay APIs on PluginExecutionProvider.
onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Implements plugin-side forwarding for graph capture/replay and policy query with version gating.
onnxruntime/core/providers/webgpu/ep/ep.h Declares plugin adapter entrypoints for graph capture/replay and assignment policy.
onnxruntime/core/providers/webgpu/ep/ep.cc Wires WebGPU plugin EP adapter function pointers and forwards to EP impl.
onnxruntime/core/providers/webgpu/webgpu_execution_provider.h Returns ALLOW_CPU_FOR_SHAPES policy for WebGPU EP.
onnxruntime/core/providers/js/js_execution_provider.h Returns ALLOW_CPU_FOR_SHAPES policy for JS EP.
onnxruntime/core/providers/dml/DmlExecutionProvider/src/ExecutionProvider.h Returns ALLOW_CPU_FOR_SHAPES policy for DML EP wrapper.
onnxruntime/core/providers/cuda/cuda_execution_provider.h Returns ALLOW_CPU_FOR_SHAPES policy for CUDA EP.
onnxruntime/core/providers/cuda/plugin/cuda_ep.h Declares plugin CUDA EP adapter entrypoints for graph capture/replay and policy.
onnxruntime/core/providers/cuda/plugin/cuda_ep.cc Wires plugin CUDA EP adapter function pointers (currently stubbed).
onnxruntime/test/framework/ep_plugin_provider_test.cc Adds unit tests for plugin EP graph capture/replay function-pointer behavior and version gating.
onnxruntime/test/autoep/test_graph_capture.cc Adds end-to-end test exercising WebGPU plugin EP graph capture + replay via public APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ionProvider::IsGraphCaptureEnabled() as ORT previously never managed graph capture for this EP; Update use of webgpu name with constant
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@adrianlizarraga adrianlizarraga requested a review from Copilot April 9, 2026 21:14
@adrianlizarraga adrianlizarraga changed the title [DRAFT] [Plugin EP] Port graph capture/replay APIs [Plugin EP] Port graph capture/replay APIs Apr 9, 2026
@adrianlizarraga adrianlizarraga marked this pull request as ready for review April 9, 2026 21:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Well-structured PR that generalizes the hardcoded EP-specific graph capture logic in InferenceSession into a policy-based, extensible design driven by IExecutionProvider virtual methods. The C API additions are thoroughly documented, version-gated, and defensively validated. The recursion-depth guard in Run() is a valuable safety net.

Highlights

  • AreAllComputeNodesAssignedToEpOrCpu() now requires has_node_on_provider, which is a correctness improvement — avoids false positives where no nodes are on the target EP.
  • PluginExecutionProvider::IsGraphCaptureEnabled() validation that checks IsGraphCaptured/ReplayGraph are non-null is a valuable defensive measure — without it, ORT would silently hang in the recursive warm-up loop.
  • NvTensorRTRTX change to return false from IsGraphCaptureEnabled() is correct: it already returned false from the public IsGraphCaptured() and manages capture internally.
  • Excellent API documentation on the four new OrtEp members — exactly the level of detail an out-of-tree EP author needs.

Minor Observations (not blocking)

  • CopyTensor API consistency: The new single-tensor CopyTensor(const OrtValue*, OrtValue*, OrtSyncStream*) takes raw pointers while the existing CopyTensors takes const std::vector<Value>& wrappers. Works correctly but inconsistent style.
  • CUDA plugin stubs: Since IsGraphCaptureEnabledImpl returns false, the other three stub registrations are dead code. Fine as scaffolding for upcoming work.

Overall: clean, well-tested, ready to merge. Two inline suggestions below.

@tianleiwu tianleiwu merged commit 7afe4c2 into main Apr 10, 2026
103 of 106 checks passed
@tianleiwu tianleiwu deleted the adrianl/PluginEp_CudaGraphCaptureReplay branch April 10, 2026 00:16
sanaa-hamel-microsoft added a commit that referenced this pull request Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants