Skip to content

Add LabelEncoder CUDA execution provider for numeric types#28045

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/add-labelencoder-on-cuda-provider
Draft

Add LabelEncoder CUDA execution provider for numeric types#28045
Copilot wants to merge 4 commits intomainfrom
copilot/add-labelencoder-on-cuda-provider

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 13, 2026

Description

Implements ai.onnx.ml.LabelEncoder on the CUDA execution provider for numeric key/value types using sorted arrays + binary search (O(log n) per element).

New files (onnxruntime/core/providers/cuda/ml/):

  • label_encoder_impl.cu / .h — CUDA kernel: per-thread binary search on sorted keys, NaN-aware for float/double
  • label_encoder.cc / .h — Host-side op classes (CudaLabelEncoder for opset 2-3, CudaLabelEncoder_4 for opset 4+). Constructor sorts keys, copies to GPU; ComputeInternal launches kernel.

Modified files:

  • cuda_execution_provider.cc — Register 11 kernel variants (4 versioned opset 2-3, 7 opset 4+)
  • provider_api.h — Add missing kMLDomain constant (first ML-domain op on CUDA EP)
  • docs/OperatorKernels.md — Add ai.onnx.ml section to CUDA provider table

Supported type combinations:

Opset Types
2-3 int64↔float, int64↔int64, float↔float
4+ Above + double↔double, double↔int64, int64↔double

String types remain CPU-only. NaN keys are placed at end of sorted array and short-circuited before binary search.

Tests: 5 new test cases covering NaN-key-to-numeric-value mappings and double type combinations. Existing numeric tests (FloatToInt64Opset2, Int64ToFloatOpset2, etc.) will automatically run on CUDA via OpTester::Run().

Motivation and Context

Models with large LabelEncoder nodes (>100k entries) force a CPU round-trip when all other nodes run on GPU. This adds the CUDA implementation to eliminate that data transfer bottleneck.

Copilot AI and others added 3 commits April 13, 2026 02:54
Implements LabelEncoder for the CUDA execution provider supporting
numeric types (int64, float, double). Uses sorted arrays and binary
search on GPU for efficient O(log n) per-element lookup.

Supports:
- Opset 2-3: int64↔float, int64↔int64, float↔float
- Opset 4+: above plus double↔double, double↔int64, int64↔double

String types remain CPU-only as they cannot run on GPU.

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/d17c0a15-3bf2-4ac4-bc57-255876153271

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Add tests for:
- Float NaN keys to int64 values (opset 4)
- Float NaN keys to float values (opset 4)
- Double NaN keys to int64 values (opset 4)
- Int64 to double conversion (opset 4)
- Double to double conversion (opset 4)

These tests exercise the CUDA binary search with NaN handling
and double type support.

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/d17c0a15-3bf2-4ac4-bc57-255876153271

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Copilot AI changed the title [WIP] Add LabelEncoder support for CUDA provider Add LabelEncoder CUDA execution provider for numeric types Apr 13, 2026
Copilot AI requested a review from tianleiwu April 13, 2026 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] LabelEncoder on Cuda provider

2 participants