Skip to content

[ Cuda] ConvTranspose-22#27710

Draft
tianleiwu wants to merge 1 commit intomainfrom
tlwu/20260317/cuda_ConvTranspose
Draft

[ Cuda] ConvTranspose-22#27710
tianleiwu wants to merge 1 commit intomainfrom
tlwu/20260317/cuda_ConvTranspose

Conversation

@tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Mar 17, 2026

Description

This PR extends the CUDA ConvTranspose operator registration to support ONNX opset 22. The CUDA implementation already shares the same attribute and output-shape handling used by earlier supported opsets, so this change primarily exposes the existing kernel path for opset 22 and adds regression coverage to keep that path working.

Summary of Changes

CUDA Kernel Registration

File Change
onnxruntime/core/providers/cuda/nn/conv_transpose.cc Split ConvTranspose kernel registration into 1-10, 11-21, and 22 so the CUDA kernel can be selected for opset 22.
onnxruntime/core/providers/cuda/cuda_execution_provider.cc Added matching provider-side kernel declarations and registry entries for ConvTranspose(22) on CUDA.
onnxruntime/core/providers/cuda/cuda_nhwc_kernels.cc Added matching NHWC CUDA declarations and registry entries for ConvTranspose(22).

Test Coverage

File Change
onnxruntime/test/providers/cpu/nn/conv_transpose_op_test.cc Added a CUDA-only opset 22 regression test that validates ConvTranspose with output_shape.
onnxruntime/test/providers/cuda/nhwc/conv_transpose_test.cc Updated existing NHWC comparison coverage to instantiate ConvTranspose at opset 22.

Testing

  • Built the touched translation units successfully with targeted ninja object builds:
    • onnxruntime/core/providers/cuda/cuda_execution_provider.cc
    • onnxruntime/core/providers/cuda/cuda_nhwc_kernels.cc
    • onnxruntime/core/providers/cuda/nn/conv_transpose.cc
    • onnxruntime/test/providers/cpu/nn/conv_transpose_op_test.cc
    • onnxruntime/test/providers/cuda/nhwc/conv_transpose_test.cc
  • Ran git diff --check to confirm the patch is formatting-clean.
  • A full onnxruntime_test_all / gtest runtime pass was not completed locally because the build tree regenerated and expanded into a broad rebuild; runtime verification of the new CUDA and NHWC tests should still be run in CI or a focused local test build.

Motivation and Context

Related issue: #26393

The ONNX ConvTranspose-22 schema keeps the same core padding and output-shape semantics used by the earlier CUDA-supported opsets, but CUDA registration stopped at opset 11. That meant models using ConvTranspose at opset 22 could miss CUDA kernel assignment even though the implementation path was already compatible.

This PR closes that gap by updating kernel registration and test coverage without changing the underlying CUDA compute logic.

Checklist

  • Tests added/updated
  • Documentation updated (if applicable)
  • No breaking changes (or documented in description)
  • CI passes

@tianleiwu tianleiwu marked this pull request as draft March 17, 2026 19:26
@tianleiwu tianleiwu changed the title [ Cuda] Enable ConvTranspose opset 22 [ Cuda] ConvTranspose-22 Mar 17, 2026
tianleiwu added a commit that referenced this pull request Mar 20, 2026
### Description

Extends GRU CUDA kernel registration from opset 14 to opset 22,
following the same pattern as other recent opset gap fills (e.g.,
ConvTranspose in #27710).

- **`gru.cc`**: Cap existing opset-14 non-versioned kernel to versioned
14–21; add new non-versioned kernel at opset 22+
- **`cuda_execution_provider.cc`**: Update forward declarations and
`BuildKernelCreateInfo` entries for versioned 14–21 and non-versioned
22+
- **`deep_cpu_gru_op_test.cc`**: Add CUDA-specific test for GRU at opset
22 with `linear_before_reset=1` (cuDNN requirement)
- **`docs/OperatorKernels.md`**: Update CUDA provider GRU entry to
reflect `22+`, `[14, 21]`, and `[7, 13]` version ranges

No functional changes to the kernel implementation—the GRU spec is
unchanged between opsets 14 and 22.

### Motivation and Context

CUDA EP registered GRU only up to opset 14, while ONNX defines GRU
through opset 22. Models exported at opset ≥15 would fail to find a
matching CUDA kernel and fall back to CPU. This is one of the P1 gaps
tracked in #27729.

### Limitation

BF16 version is not added for GRU-22. It can be added later if needed.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant