Draft
Conversation
tianleiwu
added a commit
that referenced
this pull request
Mar 20, 2026
### Description Extends GRU CUDA kernel registration from opset 14 to opset 22, following the same pattern as other recent opset gap fills (e.g., ConvTranspose in #27710). - **`gru.cc`**: Cap existing opset-14 non-versioned kernel to versioned 14–21; add new non-versioned kernel at opset 22+ - **`cuda_execution_provider.cc`**: Update forward declarations and `BuildKernelCreateInfo` entries for versioned 14–21 and non-versioned 22+ - **`deep_cpu_gru_op_test.cc`**: Add CUDA-specific test for GRU at opset 22 with `linear_before_reset=1` (cuDNN requirement) - **`docs/OperatorKernels.md`**: Update CUDA provider GRU entry to reflect `22+`, `[14, 21]`, and `[7, 13]` version ranges No functional changes to the kernel implementation—the GRU spec is unchanged between opsets 14 and 22. ### Motivation and Context CUDA EP registered GRU only up to opset 14, while ONNX defines GRU through opset 22. Models exported at opset ≥15 would fail to find a matching CUDA kernel and fall back to CPU. This is one of the P1 gaps tracked in #27729. ### Limitation BF16 version is not added for GRU-22. It can be added later if needed. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR extends the CUDA
ConvTransposeoperator registration to support ONNX opset 22. The CUDA implementation already shares the same attribute and output-shape handling used by earlier supported opsets, so this change primarily exposes the existing kernel path for opset 22 and adds regression coverage to keep that path working.Summary of Changes
CUDA Kernel Registration
onnxruntime/core/providers/cuda/nn/conv_transpose.ccConvTransposekernel registration into1-10,11-21, and22so the CUDA kernel can be selected for opset 22.onnxruntime/core/providers/cuda/cuda_execution_provider.ccConvTranspose(22)on CUDA.onnxruntime/core/providers/cuda/cuda_nhwc_kernels.ccConvTranspose(22).Test Coverage
onnxruntime/test/providers/cpu/nn/conv_transpose_op_test.ccConvTransposewithoutput_shape.onnxruntime/test/providers/cuda/nhwc/conv_transpose_test.ccConvTransposeat opset 22.Testing
ninjaobject builds:onnxruntime/core/providers/cuda/cuda_execution_provider.cconnxruntime/core/providers/cuda/cuda_nhwc_kernels.cconnxruntime/core/providers/cuda/nn/conv_transpose.cconnxruntime/test/providers/cpu/nn/conv_transpose_op_test.cconnxruntime/test/providers/cuda/nhwc/conv_transpose_test.ccgit diff --checkto confirm the patch is formatting-clean.onnxruntime_test_all/ gtest runtime pass was not completed locally because the build tree regenerated and expanded into a broad rebuild; runtime verification of the new CUDA and NHWC tests should still be run in CI or a focused local test build.Motivation and Context
Related issue: #26393
The ONNX
ConvTranspose-22schema keeps the same core padding and output-shape semantics used by the earlier CUDA-supported opsets, but CUDA registration stopped at opset 11. That meant models usingConvTransposeat opset 22 could miss CUDA kernel assignment even though the implementation path was already compatible.This PR closes that gap by updating kernel registration and test coverage without changing the underlying CUDA compute logic.
Checklist