Skip to content

[CUDA EP] Add pad op version from 19 to 23 support for CUDA#27416

Closed
ShirasawaSama wants to merge 1 commit intomicrosoft:mainfrom
ShirasawaSama:feature/add-pad-op-version-19-to-23-support-for-CUDA
Closed

[CUDA EP] Add pad op version from 19 to 23 support for CUDA#27416
ShirasawaSama wants to merge 1 commit intomicrosoft:mainfrom
ShirasawaSama:feature/add-pad-op-version-19-to-23-support-for-CUDA

Conversation

@ShirasawaSama
Copy link
Contributor

Description

Add pad op version from 19 to 23 support for CUDA

Motivation and Context

The current CUDA executor does not support the pad operation in Opset from 19 to 23. When an ONNX model exported in Opset from 19 to 23 is run on the CUDA executor, the pad operation is forcibly offloaded to the CPU, resulting in significant performance degradation.

QQ_1771777261545

@ShirasawaSama ShirasawaSama changed the title Add pad op version from 19 to 23 support for CUDA [CUDA EP] Add pad op version from 19 to 23 support for CUDA Feb 23, 2026
@tianleiwu tianleiwu requested a review from Copilot February 24, 2026 19:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CUDA Execution Provider coverage for ONNX Pad in opset 19–23 (previously only registered up to opset 18), including implementing wrap mode behavior so models exported with newer opsets no longer force a CPU fallback for Pad.

Changes:

  • Register CUDA Pad kernels for opset 19–20, 21–22, and 23 (and make opset 18 explicitly versioned).
  • Add CUDA kernel support for wrap mode, including handling negative pads via slicing metadata.
  • Update an existing wrap padding test comment now that CUDA is expected to support opset 19.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
onnxruntime/test/providers/cpu/tensor/pad_test.cc Updates wrap-mode test context now that CUDA can register opset 19+ Pad.
onnxruntime/core/providers/cuda/tensor/pad_impl.h Extends CUDA pad kernel APIs to accept slice/effective-dim metadata needed for wrap + negative pads.
onnxruntime/core/providers/cuda/tensor/pad_impl.cu Implements wrap mode in CUDA kernels and wires new parameters through launch paths.
onnxruntime/core/providers/cuda/tensor/pad.cc Adds CUDA kernel registrations for opset 19–23 and passes slice/effective dims into CUDA implementations.
onnxruntime/core/providers/cuda/cuda_execution_provider.cc Declares/registers the additional versioned CUDA Pad kernels in the EP registry.
Comments suppressed due to low confidence (1)

onnxruntime/test/providers/cpu/tensor/pad_test.cc:1401

  • This test previously avoided CUDA by using an opset version CUDA didn’t register for. Now that CUDA is expected to support opset 19+, it would be good to make the test actually fail if Pad falls back to CPU (otherwise a future regression could silently reintroduce CPU offload while still passing). Consider running this case with session.disable_cpu_ep_fallback=1 and restricting execution providers to CUDA for this test so it validates the new CUDA registration/support for opset 19–23.
  OpTester test("Pad", 19);
  test.AddInput<float>("data", input_shape, input_data);
  test.AddInput<int64_t>("pads", {static_cast<int64_t>(pads.size())}, pads, true);
  test.AddOutput<float>("output", expected_shape, expected_data);
  test.AddAttribute("mode", "wrap");
  test.ConfigExcludeEps({kDmlExecutionProvider, kQnnExecutionProvider,
                         kTensorrtExecutionProvider, kWebGpuExecutionProvider});
  test.RunWithConfig();

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


// CUDA registers only up to 18 and does not impl wrap mode
// so we force version to 19 to automatically exclude EPs that do not
// implement wrap mode similar to the above tests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing there are Wrap mode Pad tests already in ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@hariharans29
Copy link
Member

Can you please resolve the conflicts ?

@ShirasawaSama
Copy link
Contributor Author

ShirasawaSama commented Feb 24, 2026

Sorry, I think I found some errors in my math formula (My final code review). I'll try adding more unit tests to cover them.

@ShirasawaSama

This comment was marked as outdated.

@ShirasawaSama ShirasawaSama force-pushed the feature/add-pad-op-version-19-to-23-support-for-CUDA branch 3 times, most recently from 3b7e80a to 37eabee Compare March 2, 2026 19:23
@ShirasawaSama
Copy link
Contributor Author

The algorithm has now been modified to use the same formula as the CPU and has passed local testing without any noticeable performance degradation.

@ShirasawaSama ShirasawaSama force-pushed the feature/add-pad-op-version-19-to-23-support-for-CUDA branch from 37eabee to 3dc473e Compare March 9, 2026 14:36
@tianleiwu
Copy link
Contributor

This PR is superseded by #27774

@tianleiwu tianleiwu closed this Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants