[CoreML EP] Add QuickGelu support#28184
Conversation
There was a problem hiding this comment.
Pull request overview
Adds CoreML EP MLProgram support for the com.microsoft:QuickGelu contrib op by lowering it into existing MIL primitives, improving CoreML graph-claim coverage for models affected by ORT’s QuickGeluFusion.
Changes:
- Register a new CoreML op builder for
com.microsoft:QuickGeluand decompose it intomul -> sigmoid -> mulin the MLProgram path. - Add a CoreML EP unit test that builds a single-node QuickGelu model (with non-default
alpha) and verifies full graph assignment to CoreML. - Update the MLProgram supported-ops documentation to include
com.microsoft:QuickGelu.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tools/ci_build/github/apple/coreml_supported_mlprogram_ops.md | Documents QuickGelu as supported in the MLProgram path. |
| onnxruntime/test/providers/coreml/coreml_basic_test.cc | Adds CoreMLExecutionProviderTest.QuickGeluTest for EP assignment + output verification. |
| onnxruntime/core/providers/coreml/builders/op_builder_factory.h | Declares CreateQuickGeluOpBuilder. |
| onnxruntime/core/providers/coreml/builders/op_builder_factory.cc | Registers the QuickGelu builder in the factory. |
| onnxruntime/core/providers/coreml/builders/impl/quick_gelu_op_builder.cc | Implements QuickGelu decomposition into MIL ops for MLProgram. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
The PR may require rebase from main when the pipelines are fixed. |
275498d to
f1b8842
Compare
|
Thanks for catching that — rebased on main and dropped the misleading comment along with the redundant |
|
Note: I force-pushed this branch too (one commit, same "rebase + address Copilot's inline comment" amendment pattern). Realized on #28182 that force-pushing wipes the "changes since last review" view — sorry for the same here. Going forward I'll stack follow-up commits instead of amending. Delta since the original commit (
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Apart from the comments above. The core QuickGelu implementation is functionally correct, spec-compliant, and exception-safe. Recommended actions before merge:
|
Adds `CoreMLExecutionProviderTest.QuickGeluTestFp16` — same single-node model and non-default alpha=1.5 as the existing QuickGeluTest, but with FLOAT16 input/output. Exercises the MLFloat16 branch of the alpha-scalar wiring in `QuickGeluOpBuilder::AddToModelBuilderImpl`. Tolerance widened to 2e-2 (fp16 ulp at magnitude 20 is ~0.01). Addresses review feedback on microsoft#28184. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressing feedback in stacked commits. Just pushed ( Still on my list for follow-up commits:
One clarification needed: your review mentioned "Split out the unrelated NuGet/pipeline changes into their own PR" — but this PR only touches 5 files, all CoreML-EP-related ( |
When `alpha` is within 1e-6 of 1.0 (e.g. CLIP's `x * sigmoid(x)`), skip the leading `mul(x, alpha)` in `QuickGeluOpBuilder::AddToModelBuilderImpl` and feed `x` straight into `sigmoid`. Saves one MIL op per QuickGelu node and avoids the rounding it would introduce. Mirrors the same optimization in the QNN builder (`qnn/builder/opbuilder/quick_gelu_op_builder.cc:42-49`). Adds `CoreMLExecutionProviderTest.QuickGeluTestAlphaOne` covering the `alpha=1.0` branch with `ExpectedEPNodeAssignment::All`. Verified via negative test: temporarily forcing `skip_alpha_mul` for all alphas causes the alpha=1.5 tests (fp32 + fp16) to fail with a tolerance mismatch while alpha=1.0 still passes, proving both branches are exercised. Addresses optional review feedback on microsoft#28184. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Just pushed When Added Remaining items from Copilot's review (shape availability check in |
…LProgram
Two defensive checks in `QuickGeluOpBuilder`:
1. `IsOpSupportedImpl` now calls `GetShape(...)` on input 0 and returns
false (with VERBOSE log) if shape info is unavailable, matching the
hard requirement in `AddToModelBuilderImpl`. Previously the EP could
claim a QuickGelu node and then fail at model-build time if shape
inference was incomplete upstream. Matches the pattern used in e.g.
`conv_op_builder.cc` and `batch_norm_op_builder.cc`.
2. `AddToModelBuilderImpl` replaces the `if (CreateMLProgram()) { ... }`
guard with an `ORT_RETURN_IF_NOT` at the top. The old form silently
returned `Status::OK()` without emitting any op if called in
NeuralNetwork mode — an invalid CoreML model. `IsOpSupportedImpl`
gates this, but defense-in-depth is cheap here.
Addresses Copilot's two inline review comments on microsoft#28184.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
I am seeing lint failures. Please, ensure that you install lintrunner, then |
This turns out to be a residual from my local merge. |
|
LGTM. Need to resolve conflics. |
|
Conflicts resolved via merge commit Also re-ran lintrunner locally with the pinned versions from |
|
Thanks for the approval! Quick note on the |
Adds support for `com.microsoft:QuickGelu` (`x * Sigmoid(alpha * x)`) to the CoreML Execution Provider's MLProgram path. QuickGelu is produced by ORT's own `QuickGeluFusion` optimizer pass (`ORT_ENABLE_EXTENDED` and above, which includes the default `ORT_ENABLE_ALL`), so any model with the `x * sigmoid(alpha * x)` pattern in it ends up with an op CoreML rejects — turning 3 supported primitives into 1 unsupported op and making the fusion a net negative for CoreML. The builder decomposes QuickGelu into three MIL ops (`mul` / `sigmoid` / `mul`), matching the op's own schema function-body in `contrib_defs.cc` and the approach the QNN EP already uses in `qnn/builder/opbuilder/quick_gelu_op_builder.cc`. Only the MLProgram path is implemented; NeuralNetwork is deprecated on Apple Silicon. Adds `CoreMLExecutionProviderTest.QuickGeluTest` which builds a single `com.microsoft:QuickGelu` node with non-default alpha=1.5 and verifies the entire graph is claimed by the CoreML EP via `ExpectedEPNodeAssignment::All`. Verified via negative test: temporarily removing the `CreateQuickGeluOpBuilder` registration causes the new test to fail with a `VerifyEPNodeAssignment` fatal failure, proving it genuinely exercises the CoreML path. Also updates `coreml_supported_mlprogram_ops.md`. Fixes microsoft#28183. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `CoreMLExecutionProviderTest.QuickGeluTestFp16` — same single-node model and non-default alpha=1.5 as the existing QuickGeluTest, but with FLOAT16 input/output. Exercises the MLFloat16 branch of the alpha-scalar wiring in `QuickGeluOpBuilder::AddToModelBuilderImpl`. Tolerance widened to 2e-2 (fp16 ulp at magnitude 20 is ~0.01). Addresses review feedback on microsoft#28184. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When `alpha` is within 1e-6 of 1.0 (e.g. CLIP's `x * sigmoid(x)`), skip the leading `mul(x, alpha)` in `QuickGeluOpBuilder::AddToModelBuilderImpl` and feed `x` straight into `sigmoid`. Saves one MIL op per QuickGelu node and avoids the rounding it would introduce. Mirrors the same optimization in the QNN builder (`qnn/builder/opbuilder/quick_gelu_op_builder.cc:42-49`). Adds `CoreMLExecutionProviderTest.QuickGeluTestAlphaOne` covering the `alpha=1.0` branch with `ExpectedEPNodeAssignment::All`. Verified via negative test: temporarily forcing `skip_alpha_mul` for all alphas causes the alpha=1.5 tests (fp32 + fp16) to fail with a tolerance mismatch while alpha=1.0 still passes, proving both branches are exercised. Addresses optional review feedback on microsoft#28184. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…LProgram
Two defensive checks in `QuickGeluOpBuilder`:
1. `IsOpSupportedImpl` now calls `GetShape(...)` on input 0 and returns
false (with VERBOSE log) if shape info is unavailable, matching the
hard requirement in `AddToModelBuilderImpl`. Previously the EP could
claim a QuickGelu node and then fail at model-build time if shape
inference was incomplete upstream. Matches the pattern used in e.g.
`conv_op_builder.cc` and `batch_norm_op_builder.cc`.
2. `AddToModelBuilderImpl` replaces the `if (CreateMLProgram()) { ... }`
guard with an `ORT_RETURN_IF_NOT` at the top. The old form silently
returned `Status::OK()` without emitting any op if called in
NeuralNetwork mode — an invalid CoreML model. `IsOpSupportedImpl`
gates this, but defense-in-depth is cheap here.
Addresses Copilot's two inline review comments on microsoft#28184.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2a85e2c to
3df4235
Compare
Description
Adds support for
com.microsoft:QuickGelu(x * Sigmoid(alpha * x)) to the CoreML Execution Provider's MLProgram path. The builder decomposes QuickGelu into three MIL ops (mul/sigmoid/mul), matching the op's own schema function-body incontrib_defs.cc:605-631and the approach the QNN EP already uses inqnn/builder/opbuilder/quick_gelu_op_builder.cc. Only the MLProgram path is implemented; NeuralNetwork is deprecated on Apple Silicon.Adds
CoreMLExecutionProviderTest.QuickGeluTestwhich builds a singlecom.microsoft:QuickGelunode with non-defaultalpha=1.5and verifies the entire graph is claimed by the CoreML EP viaExpectedEPNodeAssignment::All. Verified with a negative test: temporarily removing theCreateQuickGeluOpBuilderregistration causes the new test to fail with aVerifyEPNodeAssignmentfatal failure, proving it genuinely exercises the CoreML path.Also updates
coreml_supported_mlprogram_ops.md.Motivation and Context
Fixes #28183.
QuickGelu is produced by ORT's own
QuickGeluFusionoptimizer pass (onnxruntime/core/optimizer/quick_gelu_fusion.cc), which runs atORT_ENABLE_EXTENDED— and therefore also atORT_ENABLE_ALL, the default session optimization level. So any model that contains thex * sigmoid(alpha * x)pattern (CLIP, several mobile transformers, the DWPose pose estimator) gets silently mutated by ORT into a graph withQuickGelunodes that the CoreML EP then rejects — turning 3 supported primitives into 1 unsupported op, making the fusion strictly harmful for CoreML.On the DWPose
dw-ll_ucoco_384.onnxmodel with batch=1 andORT_ENABLE_EXTENDED, 76QuickGelunodes get produced. Running the result on the CoreML EP:The remaining breaks are other ops — see "Related gaps" below. A ~4× speedup at EXTENDED level from this patch alone.
Even at the default
ORT_ENABLE_ALLwith a symbolic batch dim (where partial shape inference inhibits most fusions), 3QuickGelunodes still get produced — so this patch helps any CoreML user who hasn't explicitly downgraded toORT_ENABLE_BASIC.Related CoreML EP gaps observed (out of scope for this PR)
With QuickGelu fixed, the remaining 9 CPU-fallback nodes on the EXTENDED-optimized DWPose pose model are:
com.microsoft:FusedConv(×4) — produced byConvActivationFusion. FusesConv + activationinto one node. Same failure mode as QuickGelu:Convand the activations (Relu,Sigmoid,HardSigmoid, etc.) are individually CoreML-supported, but the fused form isn't. Decomposition is straightforward — emit the underlyingconvMIL op, then the corresponding activation.com.microsoft:FusedMatMul(×2, fromMatMulScaleFusion) —MatMul * alphawith an optional transpose. Decomposition:matmul+ scalarmul.ai.onnx:Split(×2) — pre-existing CoreML EP gap unrelated to fusion. CoreML MIL has a nativesplitop; this one is a straight op-builder omission.Happy to send follow-up PRs for any of these after this one lands, following the same pattern. Flagging here so they're on the EP coverage roadmap.