[CoreML EP] Add QuickGelu support by maxwbuckley · Pull Request #28184 · microsoft/onnxruntime

maxwbuckley · 2026-04-22T15:28:33Z

Description

Adds support for com.microsoft:QuickGelu (x * Sigmoid(alpha * x)) to the CoreML Execution Provider's MLProgram path. The builder decomposes QuickGelu into three MIL ops (mul / sigmoid / mul), matching the op's own schema function-body in contrib_defs.cc:605-631 and the approach the QNN EP already uses in qnn/builder/opbuilder/quick_gelu_op_builder.cc. Only the MLProgram path is implemented; NeuralNetwork is deprecated on Apple Silicon.

Adds CoreMLExecutionProviderTest.QuickGeluTest which builds a single com.microsoft:QuickGelu node with non-default alpha=1.5 and verifies the entire graph is claimed by the CoreML EP via ExpectedEPNodeAssignment::All. Verified with a negative test: temporarily removing the CreateQuickGeluOpBuilder registration causes the new test to fail with a VerifyEPNodeAssignment fatal failure, proving it genuinely exercises the CoreML path.

Also updates coreml_supported_mlprogram_ops.md.

Motivation and Context

Fixes #28183.

QuickGelu is produced by ORT's own QuickGeluFusion optimizer pass (onnxruntime/core/optimizer/quick_gelu_fusion.cc), which runs at ORT_ENABLE_EXTENDED — and therefore also at ORT_ENABLE_ALL, the default session optimization level. So any model that contains the x * sigmoid(alpha * x) pattern (CLIP, several mobile transformers, the DWPose pose estimator) gets silently mutated by ORT into a graph with QuickGelu nodes that the CoreML EP then rejects — turning 3 supported primitives into 1 unsupported op, making the fusion strictly harmful for CoreML.

On the DWPose dw-ll_ucoco_384.onnx model with batch=1 and ORT_ENABLE_EXTENDED, 76 QuickGelu nodes get produced. Running the result on the CoreML EP:

ORT build	CoreML subgraphs	Inference (ms)
main (QuickGelu rejected)	~80 (each QuickGelu is a graph break)	54.77
this PR (QuickGelu supported)	10	13.91

The remaining breaks are other ops — see "Related gaps" below. A ~4× speedup at EXTENDED level from this patch alone.

Even at the default ORT_ENABLE_ALL with a symbolic batch dim (where partial shape inference inhibits most fusions), 3 QuickGelu nodes still get produced — so this patch helps any CoreML user who hasn't explicitly downgraded to ORT_ENABLE_BASIC.

Related CoreML EP gaps observed (out of scope for this PR)

With QuickGelu fixed, the remaining 9 CPU-fallback nodes on the EXTENDED-optimized DWPose pose model are:

com.microsoft:FusedConv (×4) — produced by ConvActivationFusion. Fuses Conv + activation into one node. Same failure mode as QuickGelu: Conv and the activations (Relu, Sigmoid, HardSigmoid, etc.) are individually CoreML-supported, but the fused form isn't. Decomposition is straightforward — emit the underlying conv MIL op, then the corresponding activation.
com.microsoft:FusedMatMul (×2, from MatMulScaleFusion) — MatMul * alpha with an optional transpose. Decomposition: matmul + scalar mul.
ai.onnx:Split (×2) — pre-existing CoreML EP gap unrelated to fusion. CoreML MIL has a native split op; this one is a straight op-builder omission.

Happy to send follow-up PRs for any of these after this one lands, following the same pattern. Flagging here so they're on the EP coverage roadmap.

Copilot

Pull request overview

Adds CoreML EP MLProgram support for the com.microsoft:QuickGelu contrib op by lowering it into existing MIL primitives, improving CoreML graph-claim coverage for models affected by ORT’s QuickGeluFusion.

Changes:

Register a new CoreML op builder for com.microsoft:QuickGelu and decompose it into mul -> sigmoid -> mul in the MLProgram path.
Add a CoreML EP unit test that builds a single-node QuickGelu model (with non-default alpha) and verifies full graph assignment to CoreML.
Update the MLProgram supported-ops documentation to include com.microsoft:QuickGelu.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tools/ci_build/github/apple/coreml_supported_mlprogram_ops.md	Documents QuickGelu as supported in the MLProgram path.
onnxruntime/test/providers/coreml/coreml_basic_test.cc	Adds `CoreMLExecutionProviderTest.QuickGeluTest` for EP assignment + output verification.
onnxruntime/core/providers/coreml/builders/op_builder_factory.h	Declares `CreateQuickGeluOpBuilder`.
onnxruntime/core/providers/coreml/builders/op_builder_factory.cc	Registers the QuickGelu builder in the factory.
onnxruntime/core/providers/coreml/builders/impl/quick_gelu_op_builder.cc	Implements QuickGelu decomposition into MIL ops for MLProgram.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

yuslepukhin · 2026-04-22T22:04:54Z

The PR may require rebase from main when the pipelines are fixed.

maxwbuckley · 2026-04-23T17:07:32Z

Thanks for catching that — rebased on main and dropped the misleading comment along with the redundant params.fp32_abs_err assignment (1e-5f was already the default). The test passes with the plain EPVerificationParams{ExpectedEPNodeAssignment::All} constructor now.

maxwbuckley · 2026-04-23T19:20:43Z

Note: I force-pushed this branch too (one commit, same "rebase + address Copilot's inline comment" amendment pattern). Realized on #28182 that force-pushing wipes the "changes since last review" view — sorry for the same here. Going forward I'll stack follow-up commits instead of amending.

Delta since the original commit (275498df6a):

Fix misleading tolerance comment — drop the params.fp32_abs_err = 1e-5f line (1e-5f was already the default so it wasn't loosening anything) and the comment claiming otherwise. Test now uses EPVerificationParams{ExpectedEPNodeAssignment::All} inline.
Rebase onto current main — no code changes, fast-forward over 4265122712.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

yuslepukhin · 2026-04-23T20:40:18Z

Apart from the comments above.

The core QuickGelu implementation is functionally correct, spec-compliant, and exception-safe. Recommended actions before merge:

Add a FLOAT16 test (the most impactful gap)
Split out the unrelated NuGet/pipeline changes into their own PR
Consider making the non-MLProgram path return an error instead of silent OK (optional, matches existing convention)
Consider the alpha ≈ 1.0 optimization (optional)

Adds `CoreMLExecutionProviderTest.QuickGeluTestFp16` — same single-node model and non-default alpha=1.5 as the existing QuickGeluTest, but with FLOAT16 input/output. Exercises the MLFloat16 branch of the alpha-scalar wiring in `QuickGeluOpBuilder::AddToModelBuilderImpl`. Tolerance widened to 2e-2 (fp16 ulp at magnitude 20 is ~0.01). Addresses review feedback on microsoft#28184. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

maxwbuckley · 2026-04-24T07:48:00Z

Addressing feedback in stacked commits.

Just pushed (7f36cfcf56): FLOAT16 test added — CoreMLExecutionProviderTest.QuickGeluTestFp16, same single-node model and non-default alpha=1.5 as the fp32 variant, with FLOAT16 input/output. Exercises the MLFloat16 branch of the alpha-scalar wiring. Tolerance 2e-2 (fp16 ulp at magnitude 20 is ~0.01, with headroom for the 3-op decomposition).

Still on my list for follow-up commits:

Shape availability check in IsOpSupportedImpl (Copilot's line-45 comment)
Fail-fast on the non-MLProgram path in AddToModelBuilderImpl (Copilot's line-48 comment + your "silent OK" note)
alpha ≈ 1.0 skip optimization (optional)

One clarification needed: your review mentioned "Split out the unrelated NuGet/pipeline changes into their own PR" — but this PR only touches 5 files, all CoreML-EP-related (quick_gelu_op_builder.cc, op_builder_factory.{cc,h}, coreml_basic_test.cc, coreml_supported_mlprogram_ops.md). I don't see any NuGet or pipeline changes on the branch. Could you point me at what you're seeing? It's possible you're thinking of a different PR, or there's something showing up for you that isn't showing up in gh pr view 28184 --json files for me.

When `alpha` is within 1e-6 of 1.0 (e.g. CLIP's `x * sigmoid(x)`), skip the leading `mul(x, alpha)` in `QuickGeluOpBuilder::AddToModelBuilderImpl` and feed `x` straight into `sigmoid`. Saves one MIL op per QuickGelu node and avoids the rounding it would introduce. Mirrors the same optimization in the QNN builder (`qnn/builder/opbuilder/quick_gelu_op_builder.cc:42-49`). Adds `CoreMLExecutionProviderTest.QuickGeluTestAlphaOne` covering the `alpha=1.0` branch with `ExpectedEPNodeAssignment::All`. Verified via negative test: temporarily forcing `skip_alpha_mul` for all alphas causes the alpha=1.5 tests (fp32 + fp16) to fail with a tolerance mismatch while alpha=1.0 still passes, proving both branches are exercised. Addresses optional review feedback on microsoft#28184. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

maxwbuckley · 2026-04-24T07:51:55Z

Just pushed a2781ee6e1: alpha ≈ 1.0 skip optimization.

When |alpha - 1.0| < 1e-6 (CLIP's x * sigmoid(x)), skip the leading mul(x, alpha) and feed x straight into sigmoid. Saves one MIL op per node and avoids its rounding. Same logic the QNN builder uses (quick_gelu_op_builder.cc:42-49 there).

Added CoreMLExecutionProviderTest.QuickGeluTestAlphaOne covering that branch. Verified via the same negative-test discipline as the other tests: temporarily forcing skip_alpha_mul = true for all alphas causes the alpha=1.5 tests (fp32 + fp16) to fail while alpha=1.0 still passes — confirming both branches are genuinely exercised.

Remaining items from Copilot's review (shape availability check in IsOpSupportedImpl, fail-fast for non-MLProgram path in AddToModelBuilderImpl) coming in one more stacked commit.

…LProgram Two defensive checks in `QuickGeluOpBuilder`: 1. `IsOpSupportedImpl` now calls `GetShape(...)` on input 0 and returns false (with VERBOSE log) if shape info is unavailable, matching the hard requirement in `AddToModelBuilderImpl`. Previously the EP could claim a QuickGelu node and then fail at model-build time if shape inference was incomplete upstream. Matches the pattern used in e.g. `conv_op_builder.cc` and `batch_norm_op_builder.cc`. 2. `AddToModelBuilderImpl` replaces the `if (CreateMLProgram()) { ... }` guard with an `ORT_RETURN_IF_NOT` at the top. The old form silently returned `Status::OK()` without emitting any op if called in NeuralNetwork mode — an invalid CoreML model. `IsOpSupportedImpl` gates this, but defense-in-depth is cheap here. Addresses Copilot's two inline review comments on microsoft#28184. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

yuslepukhin · 2026-04-24T17:50:14Z

I am seeing lint failures. Please, ensure that you install lintrunner, then lintrunner init then run lintrunner -a for all files prior to a push.

yuslepukhin · 2026-04-24T18:08:52Z

One clarification needed: your review mentioned "Split out the unrelated NuGet/pipeline changes into their own PR" — but this PR only touches 5 files, all CoreML-EP-related

This turns out to be a residual from my local merge.

yuslepukhin · 2026-04-24T18:11:58Z

LGTM. Need to resolve conflics.

maxwbuckley · 2026-04-24T19:54:23Z

Conflicts resolved via merge commit 2a85e2cf12 (kept the branch append-only so the 'changes since last review' view stays intact). The conflict was only in coreml_basic_test.cc where #28182's HardSigmoidTest landed in the same spot I was appending the QuickGelu* tests — both now present side-by-side. Other files auto-merged clean.

Also re-ran lintrunner locally with the pinned versions from requirements-lintrunner.txt — reports clean on all 5 files. All 4 CoreML-EP tests (HardSigmoidTest, QuickGeluTest, QuickGeluTestAlphaOne, QuickGeluTestFp16) pass against the rebuilt binary.

maxwbuckley · 2026-04-25T07:41:39Z

Thanks for the approval! Quick note on the React Native CI Android failure — that ran after the approval and looks unrelated to this PR (the diff touches only onnxruntime/core/providers/coreml/builders/..., onnxruntime/test/providers/coreml/coreml_basic_test.cc, and the CoreML supported-ops doc — no Android / React Native code). The job log shows it failed in the React Native pipeline itself, not in anything our patch could affect. Happy to retry or rebase if it'd help, but otherwise hopefully it's just a transient.

Adds support for `com.microsoft:QuickGelu` (`x * Sigmoid(alpha * x)`) to the CoreML Execution Provider's MLProgram path. QuickGelu is produced by ORT's own `QuickGeluFusion` optimizer pass (`ORT_ENABLE_EXTENDED` and above, which includes the default `ORT_ENABLE_ALL`), so any model with the `x * sigmoid(alpha * x)` pattern in it ends up with an op CoreML rejects — turning 3 supported primitives into 1 unsupported op and making the fusion a net negative for CoreML. The builder decomposes QuickGelu into three MIL ops (`mul` / `sigmoid` / `mul`), matching the op's own schema function-body in `contrib_defs.cc` and the approach the QNN EP already uses in `qnn/builder/opbuilder/quick_gelu_op_builder.cc`. Only the MLProgram path is implemented; NeuralNetwork is deprecated on Apple Silicon. Adds `CoreMLExecutionProviderTest.QuickGeluTest` which builds a single `com.microsoft:QuickGelu` node with non-default alpha=1.5 and verifies the entire graph is claimed by the CoreML EP via `ExpectedEPNodeAssignment::All`. Verified via negative test: temporarily removing the `CreateQuickGeluOpBuilder` registration causes the new test to fail with a `VerifyEPNodeAssignment` fatal failure, proving it genuinely exercises the CoreML path. Also updates `coreml_supported_mlprogram_ops.md`. Fixes microsoft#28183. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds `CoreMLExecutionProviderTest.QuickGeluTestFp16` — same single-node model and non-default alpha=1.5 as the existing QuickGeluTest, but with FLOAT16 input/output. Exercises the MLFloat16 branch of the alpha-scalar wiring in `QuickGeluOpBuilder::AddToModelBuilderImpl`. Tolerance widened to 2e-2 (fp16 ulp at magnitude 20 is ~0.01). Addresses review feedback on microsoft#28184. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When `alpha` is within 1e-6 of 1.0 (e.g. CLIP's `x * sigmoid(x)`), skip the leading `mul(x, alpha)` in `QuickGeluOpBuilder::AddToModelBuilderImpl` and feed `x` straight into `sigmoid`. Saves one MIL op per QuickGelu node and avoids the rounding it would introduce. Mirrors the same optimization in the QNN builder (`qnn/builder/opbuilder/quick_gelu_op_builder.cc:42-49`). Adds `CoreMLExecutionProviderTest.QuickGeluTestAlphaOne` covering the `alpha=1.0` branch with `ExpectedEPNodeAssignment::All`. Verified via negative test: temporarily forcing `skip_alpha_mul` for all alphas causes the alpha=1.5 tests (fp32 + fp16) to fail with a tolerance mismatch while alpha=1.0 still passes, proving both branches are exercised. Addresses optional review feedback on microsoft#28184. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…LProgram Two defensive checks in `QuickGeluOpBuilder`: 1. `IsOpSupportedImpl` now calls `GetShape(...)` on input 0 and returns false (with VERBOSE log) if shape info is unavailable, matching the hard requirement in `AddToModelBuilderImpl`. Previously the EP could claim a QuickGelu node and then fail at model-build time if shape inference was incomplete upstream. Matches the pattern used in e.g. `conv_op_builder.cc` and `batch_norm_op_builder.cc`. 2. `AddToModelBuilderImpl` replaces the `if (CreateMLProgram()) { ... }` guard with an `ORT_RETURN_IF_NOT` at the top. The old form silently returned `Status::OK()` without emitting any op if called in NeuralNetwork mode — an invalid CoreML model. `IsOpSupportedImpl` gates this, but defense-in-depth is cheap here. Addresses Copilot's two inline review comments on microsoft#28184. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

yuslepukhin requested a review from Copilot April 22, 2026 19:01

Copilot started reviewing on behalf of yuslepukhin April 22, 2026 19:02 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread onnxruntime/test/providers/coreml/coreml_basic_test.cc Outdated

maxwbuckley force-pushed the coreml-quickgelu branch from 275498d to f1b8842 Compare April 23, 2026 17:07

yuslepukhin requested a review from Copilot April 23, 2026 19:20

Copilot started reviewing on behalf of yuslepukhin April 23, 2026 19:21 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/coreml/builders/impl/quick_gelu_op_builder.cc

Comment thread onnxruntime/core/providers/coreml/builders/impl/quick_gelu_op_builder.cc Outdated

yuslepukhin approved these changes Apr 24, 2026

View reviewed changes

maxwbuckley and others added 4 commits April 26, 2026 20:25

maxwbuckley force-pushed the coreml-quickgelu branch from 2a85e2c to 3df4235 Compare April 26, 2026 18:26

yuslepukhin merged commit a53d6d7 into microsoft:main Apr 28, 2026
94 of 102 checks passed

Conversation

maxwbuckley commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Related CoreML EP gaps observed (out of scope for this PR)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

yuslepukhin commented Apr 22, 2026

Uh oh!

maxwbuckley commented Apr 23, 2026

Uh oh!

maxwbuckley commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

yuslepukhin commented Apr 23, 2026

Uh oh!

maxwbuckley commented Apr 24, 2026

Uh oh!

maxwbuckley commented Apr 24, 2026

Uh oh!

yuslepukhin commented Apr 24, 2026

Uh oh!

yuslepukhin commented Apr 24, 2026

Uh oh!

yuslepukhin commented Apr 24, 2026

Uh oh!

maxwbuckley commented Apr 24, 2026

Uh oh!

maxwbuckley commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maxwbuckley commented Apr 22, 2026 •

edited

Loading