Skip to content

Fill CUDA EP opset gap for GRU operator (14 → 22)#27738

Merged
tianleiwu merged 6 commits intomainfrom
copilot/update-gru-cuda-operator
Mar 20, 2026
Merged

Fill CUDA EP opset gap for GRU operator (14 → 22)#27738
tianleiwu merged 6 commits intomainfrom
copilot/update-gru-cuda-operator

Conversation

Copy link
Contributor

Copilot AI commented Mar 18, 2026

Description

Extends GRU CUDA kernel registration from opset 14 to opset 22, following the same pattern as other recent opset gap fills (e.g., ConvTranspose in #27710).

  • gru.cc: Cap existing opset-14 non-versioned kernel to versioned 14–21; add new non-versioned kernel at opset 22+
  • cuda_execution_provider.cc: Update forward declarations and BuildKernelCreateInfo entries for versioned 14–21 and non-versioned 22+
  • deep_cpu_gru_op_test.cc: Add CUDA-specific test for GRU at opset 22 with linear_before_reset=1 (cuDNN requirement)
  • docs/OperatorKernels.md: Update CUDA provider GRU entry to reflect 22+, [14, 21], and [7, 13] version ranges

No functional changes to the kernel implementation—the GRU spec is unchanged between opsets 14 and 22.

Motivation and Context

CUDA EP registered GRU only up to opset 14, while ONNX defines GRU through opset 22. Models exported at opset ≥15 would fail to find a matching CUDA kernel and fall back to CPU. This is one of the P1 gaps tracked in #27729.

Limitation

BF16 version is not added for GRU-22. It can be added later if needed.

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
@tianleiwu
Copy link
Contributor

/azp run Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tianleiwu
Copy link
Contributor

@copilot, please update docs/OperatorKernels.md, and merge latest main branch to this branch.

Copilot AI and others added 2 commits March 18, 2026 23:14
Copy link
Contributor Author

Copilot AI commented Mar 18, 2026

@copilot, please update docs/OperatorKernels.md, and merge latest main branch to this branch.

Updated docs/OperatorKernels.md with the CUDA GRU opset 22 entry (showing 22+, [14, 21], [7, 13]) and merged latest main into this branch in commit 032eaa6.

Copilot AI requested a review from tianleiwu March 18, 2026 23:15
@tianleiwu tianleiwu marked this pull request as ready for review March 18, 2026 23:45
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

tianleiwu and others added 2 commits March 19, 2026 11:25
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@tianleiwu
Copy link
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends CUDA Execution Provider kernel registration for the ONNX GRU operator to cover the opset gap between 14 and 22, so models exported with opset ≥15 can continue to use the CUDA kernel instead of falling back to CPU.

Changes:

  • Split CUDA GRU registration into versioned opset 14–21 and non-versioned 22+.
  • Update CUDA EP kernel declarations/registry entries to match the new version ranges.
  • Add a CUDA-focused GRU opset 22 regression test; update kernel docs to reflect the new ranges.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
onnxruntime/test/providers/cpu/rnn/deep_cpu_gru_op_test.cc Adds a CUDA-targeted opset 22 GRU test case (linear_before_reset=1).
onnxruntime/core/providers/cuda/rnn/gru.cc Splits GRU kernel registration into opset 7–13, 14–21, and 22+.
onnxruntime/core/providers/cuda/cuda_execution_provider.cc Adds matching forward declarations and registry entries for GRU 14–21 and 22+.
docs/OperatorKernels.md Updates the CUDA GRU documentation to show opset 22+ and the versioned ranges.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tianleiwu tianleiwu enabled auto-merge (squash) March 20, 2026 17:19
@tianleiwu
Copy link
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@tianleiwu tianleiwu merged commit 830a29e into main Mar 20, 2026
90 of 91 checks passed
@tianleiwu tianleiwu deleted the copilot/update-gru-cuda-operator branch March 20, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants