Arm backend: Add 16A8W linear ops support and test #13754

pytorchbot · 2025-08-28T07:28:17Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #13658 by @Ninja91
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/Ninja91/3/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/Ninja91/3/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/Ninja91/1/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/Ninja91/3/orig
@diff-train-skip-merge

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218

Pull Request resolved: #13641 This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479. ## Key Changes **1. New Quantization Configuration Function** - Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py` - Provides 16-bit activations with HistogramObserver (better precision than 8A8W) - Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient) - **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)** ## Benefits - **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets. ghstack-source-id: 305891620 @exported-using-ghexport Differential Revision: [D79763381](https://our.internmc.facebook.com/intern/diff/D79763381/)

Pull Request resolved: #13658 - Adds linear ops test using the 16A8W config in INT16 profile. - Adds support in view ops validation for INT16 Dtype. - Validated with TOSA pipeline test. - Checked earlier marked flaky tests no longer flaky and remove markers. Note: Not verified with tosa reference model run. ghstack-source-id: 305897251 Differential Revision: [D80308822](https://our.internmc.facebook.com/intern/diff/D80308822/)

pytorch-bot · 2025-08-28T07:28:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13754

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures, 3 Unrelated Failures

As of commit ee58c9b with merge base 9053089 ():

NEW FAILURES - The following jobs have failed:

Apple / build-benchmark-app / macos-job (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1
Apple / build-demo-ios / macos-job (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1
Apple / build-frameworks-ios / macos-job (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1
pull / unittest-arm-backend-with-no-fvp (test_pytest_models) / linux-job (gh)
backends/arm/test/models/stable_diffusion/test_T5EncoderModel.py::TestT5EncoderModel::test_T5EncoderModel_tosa_MI
pull / unittest-arm-backend-with-no-fvp (test_pytest_ops) / linux-job (gh)
RuntimeError: Command docker exec -t 43b2b0b494501db223fe9a177bf2a5e21d375f27fc9c6287d7973fba2e186f32 /exec failed with exit code 1
trunk / test-arm-backend (test_pytest_models_ethosu_fvp) / linux-job (gh)
backends/arm/test/models/stable_diffusion/test_T5EncoderModel.py::TestT5EncoderModel::test_T5EncoderModel_tosa_MI
trunk / test-arm-backend (test_pytest_ops_ethosu_fvp) / linux-job (gh)
RuntimeError: Command docker exec -t 03f77f08ad247b1e559b11a0ec16f98678f46ff2d2212c56d076fb7c1e874d8f /exec failed with exit code 1
trunk / test-arm-backend (test_smaller_stories_llama) / linux-job (gh)
RuntimeError: Command docker exec -t 095f9d5a3c87046ebd2ac6d367bd4728a5707868f89db9a574ef19a74a598112 /exec failed with exit code 1
trunk / test-coreml-delegate / macos-job (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

trunk / test-huggingface-transformers-xnnpack (gemma3-1b|xnnpack|--quantize) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
trunk / test-huggingface-transformers-xnnpack (qwen3-1.7b|xnnpack|--quantize) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
trunk / test-huggingface-transformers-xnnpack (smollm3-3b|xnnpack|--quantize) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

oscarandersson8218 · 2025-09-02T07:43:20Z

@Ninja91 Nice change! The changes in op_transpose.py and op_view.py results in some test failures as we partition a few ops incorrectly. @per and @agrima1304 have patches that fixes these failures but their patches are blocked by the vela pin update in #13282.

If you move the changes in op_transpose.py and op_view.py to a separate PR I believe we should be able to merge this PR.

mergennachin · 2025-09-03T02:26:31Z

@Ninja91 arm tests started failing after this PR, see this dashboard

https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=arm-&mergeEphemeralLF=true

mergennachin · 2025-09-03T02:27:07Z

cc @lucylq @per @digantdesai @shoumikhin

This reverts commit f8156fb.

lucylq · 2025-09-03T03:23:33Z

reverting here: #13895

…13895) This reverts commit f8156fb. ### Summary [PLEASE REMOVE] See [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests) for ExecuTorch PR guidelines. [PLEASE REMOVE] If this PR closes an issue, please add a `Fixes #<issue-id>` line. [PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: <area>" label. For a list of available release notes labels, check out [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests). ### Test plan [PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

Ninja91 · 2025-09-03T23:07:58Z

@per @mergennachin @oscarandersson8218 the PR was reverted and I am pushing this now here: #13899.

Validated that no arm tests are failing.

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #13899 - Adds linear ops test using the 16A8W config in INT16 profile. - Adds support in view ops validation for INT16 Dtype. - Validated with TOSA pipeline test. - Checked earlier marked flaky tests no longer flaky and remove markers. Note: Not verified with tosa reference model run. Differential Revision: [D81550511](https://our.internmc.facebook.com/intern/diff/D81550511/) Differential Revision: [D81550511](https://our.internmc.facebook.com/intern/diff/D81550511) Reattempt to land #13754

Ninja91 added 2 commits August 26, 2025 22:39

pytorchbot requested a review from digantdesai as a code owner August 28, 2025 07:28

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 28, 2025

zingo changed the title ~~Add 16A8W linear ops support and test~~ Arm backend: Add 16A8W linear ops support and test Aug 28, 2025

zingo added ciflow/trunk module: arm Issues related to arm backend partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm labels Aug 28, 2025

Ninja91 mentioned this pull request Sep 1, 2025

Add 16A8W linear ops support and test #13658

Merged

Base automatically changed from gh/Ninja91/1/orig to main September 2, 2025 21:56

lucylq approved these changes Sep 2, 2025

View reviewed changes

lucylq merged commit f8156fb into main Sep 2, 2025
247 of 266 checks passed

lucylq deleted the gh/Ninja91/3/orig branch September 2, 2025 22:01

lucylq added a commit that referenced this pull request Sep 3, 2025

Revert "Arm backend: Add 16A8W linear ops support and test (#13754)"

ad98b07

This reverts commit f8156fb.

Ninja91 mentioned this pull request Sep 3, 2025

Add 16A8W linear ops support and test #13899

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Arm backend: Add 16A8W linear ops support and test #13754

Arm backend: Add 16A8W linear ops support and test #13754

Uh oh!

pytorchbot commented Aug 28, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Aug 28, 2025 •

edited

Loading

Uh oh!

oscarandersson8218 commented Sep 2, 2025

Uh oh!

Uh oh!

mergennachin commented Sep 3, 2025

Uh oh!

mergennachin commented Sep 3, 2025 •

edited

Loading

Uh oh!

lucylq commented Sep 3, 2025

Uh oh!

Ninja91 commented Sep 3, 2025

Uh oh!

Uh oh!

Arm backend: Add 16A8W linear ops support and test #13754

Arm backend: Add 16A8W linear ops support and test #13754

Uh oh!

Conversation

pytorchbot commented Aug 28, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13754

❌ 9 New Failures, 3 Unrelated Failures

Uh oh!

oscarandersson8218 commented Sep 2, 2025

Uh oh!

Uh oh!

mergennachin commented Sep 3, 2025

Uh oh!

mergennachin commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucylq commented Sep 3, 2025

Uh oh!

Ninja91 commented Sep 3, 2025

Uh oh!

Uh oh!

pytorchbot commented Aug 28, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 28, 2025 •

edited

Loading

mergennachin commented Sep 3, 2025 •

edited

Loading