Skip to content

CUDA Plugin EP: NHWC Cleanup & Hardening#28612

Merged
tianleiwu merged 5 commits into
mainfrom
tlwu/cuda_plugin_nhwc_cleanup
May 29, 2026
Merged

CUDA Plugin EP: NHWC Cleanup & Hardening#28612
tianleiwu merged 5 commits into
mainfrom
tlwu/cuda_plugin_nhwc_cleanup

Conversation

@tianleiwu
Copy link
Copy Markdown
Contributor

Summary

Unifies the NHWC-eligible op allowlist between the bundled CUDA EP and the CUDA plugin EP into a single shared header, adds kernel-miss diagnostics, and expands NHWC test coverage from 4 ops to 11.

Motivation

The bundled EP (cuda_execution_provider.cc) and the plugin EP (plugin/cuda_ep.cc) independently maintained their own copies of the NHWC allowlist. This created a maintenance hazard where ops could be added to one but not the other, leading to silent divergence. Additionally, there was no runtime diagnostic when the framework rewrote a node to the NHWC domain but the plugin EP lacked a matching kernel — failures were silent fallbacks to CPU.

Key Changes

Shared NHWC Allowlist (cuda_nhwc_ops.h)

Item Detail
New file onnxruntime/core/providers/cuda/cuda_nhwc_ops.h
Contents IsNhwcEligibleOnnxOp(), IsNhwcEligibleMsOp(), IsNhwcEligible() inline functions
Ops covered AveragePool, BatchNormalization, Conv, ConvTranspose, DepthToSpace, GlobalAveragePool, GlobalMaxPool, GridSample, LRN, MaxPool, SpaceToDepth (+ MS-domain GridSample)

Bundled EP Refactor (cuda_execution_provider.cc)

  • Removed the static std::unordered_set<std::string_view> cuda_nhwc_onnx_ops and the inline domain check logic.
  • Replaced with a single call to cuda::IsNhwcEligible(node_domain, node_op_type).

Plugin EP Refactor & Diagnostics (plugin/cuda_ep.cc)

  • ShouldConvertDataLayoutForOpImpl: Replaced ~20 lines of static set + domain checks with a single cuda::IsNhwcEligible() call.
  • GetCapabilityImpl: Added a WARNING-level diagnostic in the else branch (kernel not found). When a node in the com.ms.internal.nhwc domain has no registered kernel, the log emits the op type, domain, version, and node name — making future NHWC registration gaps immediately visible at session creation.

Expanded NHWC Test Coverage (test_cuda_plugin_ep.py)

  • Added _assert_nhwc_domain_assigned() helper that verifies NHWC layout transformation occurred by checking for framework-inserted Transpose nodes in the EP's assignment info.
  • Added _run_nhwc_model_test() helper combining domain assertion + numerical validation.
  • Updated 4 existing NHWC tests (Conv, BatchNormalization, MaxPool, AveragePool) to include structural assertions.
  • Added 7 new NHWC test methods:
    • test_nhwc_conv_transpose
    • test_nhwc_global_max_pool
    • test_nhwc_global_average_pool
    • test_nhwc_depth_to_space
    • test_nhwc_space_to_depth
    • test_nhwc_lrn
    • test_nhwc_grid_sample

Testing Notes

Run the full CUDA plugin EP test suite with NHWC enabled:

bash .env/cuda13_plugin.sh --build --install --test_plugin

Or run only the NHWC tests directly:

cd onnxruntime/test/python/transformers
ORT_TEST_CUDA_PLUGIN_EP=1 python -m unittest \
  test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_conv \
  test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_batch_normalization \
  test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_maxpool \
  test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_avgpool \
  test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_conv_transpose \
  test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_global_max_pool \
  test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_global_average_pool \
  test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_depth_to_space \
  test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_space_to_depth \
  test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_lrn \
  test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_grid_sample

All 86 tests in the suite pass (11 NHWC + 75 existing), with no regressions.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates the CUDA NHWC layout-conversion allowlist into a shared header used by both the in-tree CUDA EP and the CUDA plugin EP, adds a plugin-side warning diagnostic for NHWC-domain kernel misses, and expands CUDA plugin EP NHWC test coverage with new structural assertions plus additional operator tests.

Changes:

  • Introduce a shared NHWC eligibility helper (cuda_nhwc_ops.h) and replace duplicated allowlists in both CUDA EP implementations with cuda::IsNhwcEligible(...).
  • Add a warning diagnostic in the CUDA plugin EP when an internal NHWC-domain node has no matching registered kernel.
  • Expand and harden NHWC-focused plugin EP tests, adding helpers and new op coverage.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Adds NHWC structural assertions and expands NHWC operator coverage in CUDA plugin EP tests.
onnxruntime/core/providers/cuda/plugin/cuda_ep.cc Switches layout-conversion logic to shared helper and adds NHWC kernel-miss warning diagnostics.
onnxruntime/core/providers/cuda/cuda_nhwc_ops.h New shared header providing unified NHWC eligibility allowlist helpers.
onnxruntime/core/providers/cuda/cuda_execution_provider.cc Refactors NHWC eligibility check to use the shared helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/python/transformers/test_cuda_plugin_ep.py
Comment thread onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Outdated
Comment thread onnxruntime/core/providers/cuda/cuda_nhwc_ops.h
Comment thread onnxruntime/core/providers/cuda/plugin/cuda_ep.cc Outdated
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment thread onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Outdated
Comment thread onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Outdated
Comment thread onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Outdated
Comment thread onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Outdated
Comment thread onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Outdated
Comment thread onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Outdated
Comment thread onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Outdated
Comment thread onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Outdated
Comment thread onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Outdated
Comment thread onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Outdated
Comment thread onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@tianleiwu tianleiwu requested a review from yuslepukhin May 22, 2026 05:46
Comment thread onnxruntime/core/providers/cuda/plugin/cuda_ep.cc Outdated
tianleiwu added 3 commits May 27, 2026 16:15
The header included core/graph/constants.h which defines kOnnxDomain, kMSDomain,
etc. When cuda_execution_provider.cc includes both provider_api.h (which also
defines these constants for the shared library build) and cuda_nhwc_ops.h, it
causes 'redefinition' errors on all platforms.

Fix: use the string literal "com.microsoft" directly instead of kMSDomain,
removing the problematic include. The header is self-contained and works in both
the bundled EP and plugin EP build modes.
Comment thread onnxruntime/core/providers/cuda/plugin/cuda_ep.cc
Copy link
Copy Markdown
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with comments.

@tianleiwu tianleiwu enabled auto-merge (squash) May 29, 2026 19:47
@tianleiwu tianleiwu merged commit d165fba into main May 29, 2026
86 checks passed
@tianleiwu tianleiwu deleted the tlwu/cuda_plugin_nhwc_cleanup branch May 29, 2026 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants