CUDA Plugin EP: NHWC Cleanup & Hardening#28612
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR consolidates the CUDA NHWC layout-conversion allowlist into a shared header used by both the in-tree CUDA EP and the CUDA plugin EP, adds a plugin-side warning diagnostic for NHWC-domain kernel misses, and expands CUDA plugin EP NHWC test coverage with new structural assertions plus additional operator tests.
Changes:
- Introduce a shared NHWC eligibility helper (
cuda_nhwc_ops.h) and replace duplicated allowlists in both CUDA EP implementations withcuda::IsNhwcEligible(...). - Add a warning diagnostic in the CUDA plugin EP when an internal NHWC-domain node has no matching registered kernel.
- Expand and harden NHWC-focused plugin EP tests, adding helpers and new op coverage.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| onnxruntime/test/python/transformers/test_cuda_plugin_ep.py | Adds NHWC structural assertions and expands NHWC operator coverage in CUDA plugin EP tests. |
| onnxruntime/core/providers/cuda/plugin/cuda_ep.cc | Switches layout-conversion logic to shared helper and adds NHWC kernel-miss warning diagnostics. |
| onnxruntime/core/providers/cuda/cuda_nhwc_ops.h | New shared header providing unified NHWC eligibility allowlist helpers. |
| onnxruntime/core/providers/cuda/cuda_execution_provider.cc | Refactors NHWC eligibility check to use the shared helper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tianleiwu
commented
May 27, 2026
The header included core/graph/constants.h which defines kOnnxDomain, kMSDomain, etc. When cuda_execution_provider.cc includes both provider_api.h (which also defines these constants for the shared library build) and cuda_nhwc_ops.h, it causes 'redefinition' errors on all platforms. Fix: use the string literal "com.microsoft" directly instead of kMSDomain, removing the problematic include. The header is self-contained and works in both the bundled EP and plugin EP build modes.
yuslepukhin
reviewed
May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Unifies the NHWC-eligible op allowlist between the bundled CUDA EP and the CUDA plugin EP into a single shared header, adds kernel-miss diagnostics, and expands NHWC test coverage from 4 ops to 11.
Motivation
The bundled EP (
cuda_execution_provider.cc) and the plugin EP (plugin/cuda_ep.cc) independently maintained their own copies of the NHWC allowlist. This created a maintenance hazard where ops could be added to one but not the other, leading to silent divergence. Additionally, there was no runtime diagnostic when the framework rewrote a node to the NHWC domain but the plugin EP lacked a matching kernel — failures were silent fallbacks to CPU.Key Changes
Shared NHWC Allowlist (
cuda_nhwc_ops.h)onnxruntime/core/providers/cuda/cuda_nhwc_ops.hIsNhwcEligibleOnnxOp(),IsNhwcEligibleMsOp(),IsNhwcEligible()inline functionsBundled EP Refactor (
cuda_execution_provider.cc)std::unordered_set<std::string_view> cuda_nhwc_onnx_opsand the inline domain check logic.cuda::IsNhwcEligible(node_domain, node_op_type).Plugin EP Refactor & Diagnostics (
plugin/cuda_ep.cc)ShouldConvertDataLayoutForOpImpl: Replaced ~20 lines of static set + domain checks with a singlecuda::IsNhwcEligible()call.GetCapabilityImpl: Added a WARNING-level diagnostic in theelsebranch (kernel not found). When a node in thecom.ms.internal.nhwcdomain has no registered kernel, the log emits the op type, domain, version, and node name — making future NHWC registration gaps immediately visible at session creation.Expanded NHWC Test Coverage (
test_cuda_plugin_ep.py)_assert_nhwc_domain_assigned()helper that verifies NHWC layout transformation occurred by checking for framework-inserted Transpose nodes in the EP's assignment info._run_nhwc_model_test()helper combining domain assertion + numerical validation.test_nhwc_conv_transposetest_nhwc_global_max_pooltest_nhwc_global_average_pooltest_nhwc_depth_to_spacetest_nhwc_space_to_depthtest_nhwc_lrntest_nhwc_grid_sampleTesting Notes
Run the full CUDA plugin EP test suite with NHWC enabled:
Or run only the NHWC tests directly:
cd onnxruntime/test/python/transformers ORT_TEST_CUDA_PLUGIN_EP=1 python -m unittest \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_conv \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_batch_normalization \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_maxpool \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_avgpool \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_conv_transpose \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_global_max_pool \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_global_average_pool \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_depth_to_space \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_space_to_depth \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_lrn \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_grid_sampleAll 86 tests in the suite pass (11 NHWC + 75 existing), with no regressions.