Skip to content

Cortex-M backend: Improve int8-portable operator support#17812

Merged
AdrianLundell merged 1 commit into
pytorch:mainfrom
AdrianLundell:change-1207883
Mar 4, 2026
Merged

Cortex-M backend: Improve int8-portable operator support#17812
AdrianLundell merged 1 commit into
pytorch:mainfrom
AdrianLundell:change-1207883

Conversation

@AdrianLundell
Copy link
Copy Markdown
Collaborator

@AdrianLundell AdrianLundell commented Mar 3, 2026

Using the int8 portable ops is an option for most ops not requiring rescales such as data move ops, max/min ops and logic ops. Even though they are not accelerated, quantizing them to int8 is more efficient than doing them in fp32+dequant/quantization.

This patch adds a large number of such ops to be quantized by the SharedQspecQuantizer together with tests. It also modifies the quantizer to increase the support:

  • If multiple qspecs are found, use top one rather than falling back to fp32 since this is what users most likely want.
  • Reject node with non-float inputs/outputs which would previously crash
  • Let BFS algorithm search through non-float edges to support indexing ops.

cc @digantdesai @SS-JIA @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

Using the int8 portable ops is an option for most ops not
requiring rescales such as data move ops, max/min ops and logic ops.
Even though they are not accelerated, quantizing them to int8 is
more efficient than doing them in fp32+dequant/quantization.

This patch adds a large number of such ops to be quantized by the
SharedQspecQuantizer together with tests. It also modifies the
quantizer to increase the support:
- If multiple qspecs are found, use top one rather than falling back to
  fp32 since this is what users most likely want.
- Reject node with non-float inputs/outputs which would previously crash
- Let BFS algorithm search through non-float edges to support indexing
  ops.

Signed-off-by: Adrian Lundell <adrian.lundell@arm.com>
Change-Id: I7a5964d5924496480e965724b4f130f56a43f538
@AdrianLundell AdrianLundell requested a review from psiddh March 3, 2026 09:08
@AdrianLundell AdrianLundell requested a review from rascani as a code owner March 3, 2026 09:08
@AdrianLundell AdrianLundell added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk release notes: none Do not include this in the release notes labels Mar 3, 2026
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 3, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17812

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Awaiting Approval, 3 New Failures

As of commit 3771a54 with merge base dae7a02 (image):

AWAITING APPROVAL - The following workflow needs approval before CI can run:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 3, 2026
@AdrianLundell
Copy link
Copy Markdown
Collaborator Author

Unrelated fails

@AdrianLundell AdrianLundell merged commit 4d39ae5 into pytorch:main Mar 4, 2026
315 of 321 checks passed
jpiat pushed a commit to jpiat/executorch that referenced this pull request Mar 17, 2026
Using the int8 portable ops is an option for most ops not requiring
rescales such as data move ops, max/min ops and logic ops. Even though
they are not accelerated, quantizing them to int8 is more efficient than
doing them in fp32+dequant/quantization.

This patch adds a large number of such ops to be quantized by the
SharedQspecQuantizer together with tests. It also modifies the quantizer
to increase the support:
- If multiple qspecs are found, use top one rather than falling back to
fp32 since this is what users most likely want.
- Reject node with non-float inputs/outputs which would previously crash
- Let BFS algorithm search through non-float edges to support indexing
ops.


Signed-off-by: Adrian Lundell <adrian.lundell@arm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm release notes: none Do not include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants