Cortex-M backend: Improve int8-portable operator support by AdrianLundell · Pull Request #17812 · pytorch/executorch

AdrianLundell · 2026-03-03T09:08:48Z

Using the int8 portable ops is an option for most ops not requiring rescales such as data move ops, max/min ops and logic ops. Even though they are not accelerated, quantizing them to int8 is more efficient than doing them in fp32+dequant/quantization.

This patch adds a large number of such ops to be quantized by the SharedQspecQuantizer together with tests. It also modifies the quantizer to increase the support:

If multiple qspecs are found, use top one rather than falling back to fp32 since this is what users most likely want.
Reject node with non-float inputs/outputs which would previously crash
Let BFS algorithm search through non-float edges to support indexing ops.

cc @digantdesai @SS-JIA @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

Using the int8 portable ops is an option for most ops not requiring rescales such as data move ops, max/min ops and logic ops. Even though they are not accelerated, quantizing them to int8 is more efficient than doing them in fp32+dequant/quantization. This patch adds a large number of such ops to be quantized by the SharedQspecQuantizer together with tests. It also modifies the quantizer to increase the support: - If multiple qspecs are found, use top one rather than falling back to fp32 since this is what users most likely want. - Reject node with non-float inputs/outputs which would previously crash - Let BFS algorithm search through non-float edges to support indexing ops. Signed-off-by: Adrian Lundell <adrian.lundell@arm.com> Change-Id: I7a5964d5924496480e965724b4f130f56a43f538

pytorch-bot · 2026-03-03T09:08:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17812

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Awaiting Approval, 3 New Failures

As of commit 3771a54 with merge base dae7a02 ():

AWAITING APPROVAL - The following workflow needs approval before CI can run:

periodic (gh)

NEW FAILURES - The following jobs have failed:

pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t f3135b9c72550e156eea4aed1d917a23bef85d5da65f1a03a8eb34b6ad4b7343 /exec failed with exit code 1
pull / test-samsung-quantmodels-linux / linux-job (gh)
RuntimeError: Command docker exec -t 81d9b0d1993013361fcc6513b10f8eee0a72e6cb0e3f684497ca5256ac6ef897 /exec failed with exit code 1
trunk / test-arm-backend-vkml (test_pytest_ops_vkml) / linux-job (gh)
RuntimeError: Command docker exec -t 93bb80811a5c4ed585393f615df07eafab00ec516aea70ab4f8022dc43a8f796 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

AdrianLundell · 2026-03-04T08:15:43Z

Unrelated fails

Using the int8 portable ops is an option for most ops not requiring rescales such as data move ops, max/min ops and logic ops. Even though they are not accelerated, quantizing them to int8 is more efficient than doing them in fp32+dequant/quantization. This patch adds a large number of such ops to be quantized by the SharedQspecQuantizer together with tests. It also modifies the quantizer to increase the support: - If multiple qspecs are found, use top one rather than falling back to fp32 since this is what users most likely want. - Reject node with non-float inputs/outputs which would previously crash - Let BFS algorithm search through non-float edges to support indexing ops. Signed-off-by: Adrian Lundell <adrian.lundell@arm.com>

AdrianLundell requested a review from psiddh March 3, 2026 09:08

AdrianLundell requested a review from rascani as a code owner March 3, 2026 09:08

AdrianLundell added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk release notes: none Do not include this in the release notes labels Mar 3, 2026

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 3, 2026

rascani approved these changes Mar 3, 2026

View reviewed changes

AdrianLundell merged commit 4d39ae5 into pytorch:main Mar 4, 2026
315 of 321 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cortex-M backend: Improve int8-portable operator support#17812

Cortex-M backend: Improve int8-portable operator support#17812
AdrianLundell merged 1 commit into
pytorch:mainfrom
AdrianLundell:change-1207883

AdrianLundell commented Mar 3, 2026 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

AdrianLundell commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdrianLundell commented Mar 3, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17812

❌ 1 Awaiting Approval, 3 New Failures

Uh oh!

AdrianLundell commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AdrianLundell commented Mar 3, 2026 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Mar 3, 2026 •

edited

Loading