Update CoreML docs #13120

metascroy · 2025-08-05T00:33:26Z

This updates CoreML docs to:

Discuss the new partitioner options
Discuss quantize_ support
Discuss backward compatibility guarantees

pytorch-bot · 2025-08-05T00:33:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13120

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 8 New Failures, 2 Unrelated Failures

As of commit ffc7040 with merge base ec35f56 ():

NEW FAILURES - The following jobs have failed:

Build Presets / windows (pybind) / build (gh)
Process completed with exit code 1.
Build Presets / windows (windows) / build (gh)
Process completed with exit code 1.
pull / test-openvino-linux / linux-job (gh)
AttributeError: '_OpNamespace' 'quantized_decomposed' object has no attribute 'convert_element_type'
pull / test-vulkan-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t f501772cafabaef64763d634ea8683e55c16ba97eb1d5a8079a200bf8c9c31e0 /exec failed with exit code 1
pull / test-vulkan-operators-linux / linux-job (gh)
RuntimeError: Command docker exec -t e8d0d77203f2f29f74dec0104bd3fca8c91f78cb0e9630d3156811cd97f889e3 /exec failed with exit code 1
pull / unittest / windows / windows-job (gh)
Process completed with exit code 1.
pull / unittest-editable / windows / windows-job (gh)
Process completed with exit code 1.
pull / unittest-nxp-neutron / linux-job (gh)
RuntimeError: Command docker exec -t 776a34aa2e245bc7e1cdb5bbba92b66513a41edfa144721a6b69255f891c0f29 /exec failed with exit code 1

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-models-linux (emformer_transcribe, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
pull / test-models-linux (ic3, portable, linux.2xlarge) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-08-05T00:34:04Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

kimishpatel · 2025-08-06T14:35:00Z

docs/source/backends-coreml.md

+)
+```
+
+Both of the above examples will export and lower to CoreML with the to_edge_transform_and_lower API.


how does codebook one actually lower to coreml? I tried looking up choose_qparams_and_quantize_codebook in et and coremltools but didnt find anythibng

From a user's perspective, it should just lower: after quantize_, you can run torch.export.export, and then to_edge_transform_and_lower.

In terms of how it works, I added the ability to register custom MIL ops in ET CoreML, and I used that to register the dequantize_codebook quant primitive that is produced by CodebookWeightOnlyConfig.

docs/source/backends-coreml.md

abhinaykukkadapu · 2025-08-07T22:24:31Z

docs/source/backends-coreml.md


 See [PyTorch 2 Export Post Training Quantization](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html) for more information.

+### LLM quantization with quantize_


@metascroy Is there a minimum_deployment_target required/ published for Torchao quantization and PT2E quantization, i remember you mentioned it is None by default but how do we enforce if one is using quantization recipe.

CoreML should select the required minimum_deployment target automatically. For PT2E, it should select iOS17.

But for quantize_, I noticed it was only working for iOS18 now (need to investigate further): #13122

In terms of how do we enforce it: it should work automatically for PT2E, but let me know if it doesn't. For quantize_, I'll try to make it work automatically, but as an intermediate stop-gap, we can explicitly set to iOS18 if quantize_ is used in the recipe.

abhinaykukkadapu · 2025-08-07T23:13:11Z

docs/source/backends-coreml.md

    - `coremltools.ComputeUnit.CPU_AND_NE` (uses both the CPU and ANE, but not the GPU)
- `minimum_deployment_target`: The minimum iOS deployment target (e.g., `coremltools.target.iOS18`).  The default value is `coremltools.target.iOS15`.
+- `minimum_deployment_target`: The minimum iOS deployment target (e.g., `coremltools.target.iOS18`).  By default, the smallest deployment target needed to deploy the model is selected.  During export, you will see a warning about the "CoreML specification version" that was used for the model, which maps onto a deployment target as discussed [here](https://apple.github.io/coremltools/mlmodel/Format/Model.html#model).  If you need to control the deployment target, please specify it explicitly.
 - `compute_precision`: The compute precision used by CoreML (`coremltools.precision.FLOAT16` or `coremltools.precision.FLOAT32`).  The default value is `coremltools.precision.FLOAT16`.  Note that the compute precision is applied no matter what dtype is specified in the exported PyTorch model.  For example, an FP32 PyTorch model will be converted to FP16 when delegating to the CoreML backend by default.  Also note that the ANE only supports FP16 precision.


Noob question, it seems we publish the default as FLOAT16 in the generate_compile_specs function, what happens when a quantizer, would the backend ignores this, or is it upto the user to make sure there is no compute_precision in compile specs?

Even for a quantized model, there is a compute precision. Compute precision controls the precision of the non-quantized ops in the model.

metascroy · 2025-09-05T23:07:59Z

@YifanShenSZ can I get a review on these doc updates for the CoreML backend?

mergennachin · 2025-09-06T16:08:14Z

docs/source/backends-coreml.md

+# CoreML Backend

-Core ML delegate is the ExecuTorch solution to take advantage of Apple's [CoreML framework](https://developer.apple.com/documentation/coreml) for on-device ML.  With CoreML, a model can run on CPU, GPU, and the Apple Neural Engine (ANE).
+CoreML delegate is the ExecuTorch solution to take advantage of Apple's [CoreML framework](https://developer.apple.com/documentation/coreml) for on-device ML.  With CoreML, a model can run on CPU, GPU, and the Apple Neural Engine (ANE).


Official name is "Core ML" not "CoreML" https://developer.apple.com/documentation/coreml

kimishpatel · 2025-09-07T23:22:37Z

docs/source/backends-coreml.md

+# When using an enumerated shape compile spec, you must specify lower_full_graph=True
+# in the CoreMLPartitioner.  We do not support using enumerated shapes
+# for partially exported models


are there any error or warning thrown or it just slienty fails to lower

It throws an error asking users to set lower_full_graph=True

kimishpatel · 2025-09-07T23:24:00Z

docs/source/backends-coreml.md

+* Quantize embedding/linear layers with IntxWeightOnlyConfig (with weight_dtype torch.int4 or torch.int8, using PerGroup or PerAxis granularity).  Using 4-bit or PerGroup quantization requires exporting with minimum_deployment_target >= ct.target.iOS18.  Using 8-bit quantization with per-axis granularity is supported on ct.target.IOS16+.  See [CoreML `CompileSpec`](#coreml-compilespec) for more information on setting the deployment target.
+* Quantize embedding/linear layers with CodebookWeightOnlyConfig (with dtype torch.uint1 through torch.uint8, using various block sizes).  Quantizing with CodebookWeightOnlyConfig requires exporting with minimum_deployment_target >= ct.target.iOS18, see [CoreML `CompileSpec`](#coreml-compilespec) for more information on setting the deployment target.


@abhinaykukkadapu are these part of some coreml recipe?

@abhinaykukkadapu are these part of some coreml recipe?

@kimishpatel yes, these are already covered: https://github.com/pytorch/executorch/blob/main/backends/apple/coreml/recipes/coreml_recipe_types.py

kimishpatel

this diff has been out for a while. Are you planning to land?

metascroy · 2025-09-08T16:38:52Z

this diff has been out for a while. Are you planning to land?

Yes, I hope to land this week. @YifanShenSZ approved the changes last week, but I need a stamp from someone in PyTorch to land.

metascroy · 2025-09-09T17:10:38Z

@mergennachin @abhinaykukkadapu @kimishpatel @digantdesai can I get a stamp on this doc update?

abhinaykukkadapu · 2025-09-09T19:01:15Z

docs/source/backends-coreml.md

+
+The Core ML backend also supports quantizing models with the [torchao](https://github.com/pytorch/ao) quantize_ API.  This is most commonly used for LLMs, requiring more advanced quantization.  Since quantize_ is not backend aware, it is important to use a config that is compatible with Core ML:
+
+* Quantize embedding/linear layers with IntxWeightOnlyConfig (with weight_dtype torch.int4 or torch.int8, using PerGroup or PerAxis granularity).  Using 4-bit or PerGroup quantization requires exporting with minimum_deployment_target >= ct.target.iOS18.  Using 8-bit quantization with per-axis granularity is supported on ct.target.IOS16+.  See [Core ML `CompileSpec`](#coreml-compilespec) for more information on setting the deployment target.


@metascroy do you have to update minimum_deployment_target to ios16 after this PR: #13896

It’s already updated in the docs. They say iOS16 for 8-bit per channel, and iOS18 for everything else

This updates CoreML docs to: * Discuss the new partitioner options * Discuss quantize_ support * Discuss backward compatibility guarantees

Update CoreML docs

a1fce54

metascroy requested a review from mergennachin as a code owner August 5, 2025 00:33

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 5, 2025

metascroy requested a review from YifanShenSZ August 5, 2025 00:33

metascroy mentioned this pull request Aug 6, 2025

[Executorch][Recipes][Coreml] Add coreml backend recipes #13121

Merged

kimishpatel reviewed Aug 6, 2025

View reviewed changes

abhinaykukkadapu reviewed Aug 7, 2025

View reviewed changes

docs/source/backends-coreml.md Show resolved Hide resolved

abhinaykukkadapu reviewed Aug 7, 2025

View reviewed changes

metascroy added 2 commits August 8, 2025 09:27

Update backends-coreml.md

186602d

up

203aac9

YifanShenSZ approved these changes Sep 5, 2025

View reviewed changes

mergennachin reviewed Sep 6, 2025

View reviewed changes

kimishpatel reviewed Sep 7, 2025

View reviewed changes

CoreML -> Core ML

ffc7040

metascroy force-pushed the update-coreml-docs branch from 4f7ccc9 to ffc7040 Compare September 8, 2025 16:53

abhinaykukkadapu reviewed Sep 9, 2025

View reviewed changes

abhinaykukkadapu approved these changes Sep 10, 2025

View reviewed changes

metascroy merged commit 8d1684e into main Sep 10, 2025
106 of 116 checks passed

metascroy deleted the update-coreml-docs branch September 10, 2025 16:42

StrycekSimon pushed a commit to nxp-upstream/executorch that referenced this pull request Sep 23, 2025

Update CoreML docs (pytorch#13120)

a9eb705

This updates CoreML docs to: * Discuss the new partitioner options * Discuss quantize_ support * Discuss backward compatibility guarantees


		See [PyTorch 2 Export Post Training Quantization](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html) for more information.

		### LLM quantization with quantize_

		* Quantize embedding/linear layers with IntxWeightOnlyConfig (with weight_dtype torch.int4 or torch.int8, using PerGroup or PerAxis granularity). Using 4-bit or PerGroup quantization requires exporting with minimum_deployment_target >= ct.target.iOS18. Using 8-bit quantization with per-axis granularity is supported on ct.target.IOS16+. See [CoreML `CompileSpec`](#coreml-compilespec) for more information on setting the deployment target.
		* Quantize embedding/linear layers with CodebookWeightOnlyConfig (with dtype torch.uint1 through torch.uint8, using various block sizes). Quantizing with CodebookWeightOnlyConfig requires exporting with minimum_deployment_target >= ct.target.iOS18, see [CoreML `CompileSpec`](#coreml-compilespec) for more information on setting the deployment target.


		The Core ML backend also supports quantizing models with the [torchao](https://github.com/pytorch/ao) quantize_ API. This is most commonly used for LLMs, requiring more advanced quantization. Since quantize_ is not backend aware, it is important to use a config that is compatible with Core ML:

		* Quantize embedding/linear layers with IntxWeightOnlyConfig (with weight_dtype torch.int4 or torch.int8, using PerGroup or PerAxis granularity). Using 4-bit or PerGroup quantization requires exporting with minimum_deployment_target >= ct.target.iOS18. Using 8-bit quantization with per-axis granularity is supported on ct.target.IOS16+. See [Core ML `CompileSpec`](#coreml-compilespec) for more information on setting the deployment target.

Update CoreML docs #13120

Update CoreML docs #13120

Uh oh!

Conversation

metascroy commented Aug 5, 2025

Uh oh!

pytorch-bot bot commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13120

❌ 8 New Failures, 2 Unrelated Failures

Uh oh!

github-actions bot commented Aug 5, 2025

This PR needs a release notes: label

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

metascroy commented Sep 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kimishpatel left a comment

Choose a reason for hiding this comment

Uh oh!

metascroy commented Sep 8, 2025

Uh oh!

metascroy commented Sep 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pytorch-bot bot commented Aug 5, 2025 •

edited

Loading

This PR needs a `release notes:` label