Fill CUDA EP opset gap for GRU operator (14 → 22)#27738
Conversation
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
|
/azp run Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@copilot, please update docs/OperatorKernels.md, and merge latest main branch to this branch. |
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Updated |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Extends CUDA Execution Provider kernel registration for the ONNX GRU operator to cover the opset gap between 14 and 22, so models exported with opset ≥15 can continue to use the CUDA kernel instead of falling back to CPU.
Changes:
- Split CUDA GRU registration into versioned opset 14–21 and non-versioned 22+.
- Update CUDA EP kernel declarations/registry entries to match the new version ranges.
- Add a CUDA-focused GRU opset 22 regression test; update kernel docs to reflect the new ranges.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| onnxruntime/test/providers/cpu/rnn/deep_cpu_gru_op_test.cc | Adds a CUDA-targeted opset 22 GRU test case (linear_before_reset=1). |
| onnxruntime/core/providers/cuda/rnn/gru.cc | Splits GRU kernel registration into opset 7–13, 14–21, and 22+. |
| onnxruntime/core/providers/cuda/cuda_execution_provider.cc | Adds matching forward declarations and registry entries for GRU 14–21 and 22+. |
| docs/OperatorKernels.md | Updates the CUDA GRU documentation to show opset 22+ and the versioned ranges. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
Description
Extends GRU CUDA kernel registration from opset 14 to opset 22, following the same pattern as other recent opset gap fills (e.g., ConvTranspose in #27710).
gru.cc: Cap existing opset-14 non-versioned kernel to versioned 14–21; add new non-versioned kernel at opset 22+cuda_execution_provider.cc: Update forward declarations andBuildKernelCreateInfoentries for versioned 14–21 and non-versioned 22+deep_cpu_gru_op_test.cc: Add CUDA-specific test for GRU at opset 22 withlinear_before_reset=1(cuDNN requirement)docs/OperatorKernels.md: Update CUDA provider GRU entry to reflect22+,[14, 21], and[7, 13]version rangesNo functional changes to the kernel implementation—the GRU spec is unchanged between opsets 14 and 22.
Motivation and Context
CUDA EP registered GRU only up to opset 14, while ONNX defines GRU through opset 22. Models exported at opset ≥15 would fail to find a matching CUDA kernel and fall back to CPU. This is one of the P1 gaps tracked in #27729.
Limitation
BF16 version is not added for GRU-22. It can be added later if needed.