Skip to content

Arm backend: Fix int8 TABLE domain for sigmoid LUTs#18973

Merged
xingguo01 merged 2 commits intopytorch:mainfrom
xingguo01:arm-backend-fix-luts-softmax
Apr 27, 2026
Merged

Arm backend: Fix int8 TABLE domain for sigmoid LUTs#18973
xingguo01 merged 2 commits intopytorch:mainfrom
xingguo01:arm-backend-fix-luts-softmax

Conversation

@xingguo01
Copy link
Copy Markdown
Collaborator

@xingguo01 xingguo01 commented Apr 17, 2026

  • Build 8-bit TOSA TABLE inputs from the canonical int8 code range [-128, 127] instead of using integer linspace.

  • This avoids the duplicated zero and off-by-one LUT shift seen when qmin=-127 and keeps quantized sigmoid TABLE values aligned with the PT2E q/dq eager reference.

  • Add pass-level regression tests for the full int8 domain and the reported qmin=-127 sigmoid quantization case.

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

Copilot AI review requested due to automatic review settings April 17, 2026 12:43
@xingguo01 xingguo01 requested a review from digantdesai as a code owner April 17, 2026 12:43
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 17, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18973

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 3 Unrelated Failures

As of commit b84fb93 with merge base e638059 (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2026
@xingguo01 xingguo01 added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk release notes: arm Changes to the ARM backend delegate labels Apr 17, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the int8 TOSA TABLE domain generation for sigmoid LUTs by constructing the LUT input domain from the canonical int8 code range [-128, 127] (instead of using an int8 linspace), and adds regression tests to prevent off-by-one / duplicated-zero LUT issues (notably for qmin=-127).

Changes:

  • Add a canonical int8 TABLE domain helper (_get_8bit_table_domain) and use it for 8-bit LUT generation.
  • Replace torch.linspace(..., steps=256, dtype=torch.int8) with an explicit full int8 domain torch.arange(-128, 128, dtype=torch.int8).
  • Add pass-level regression tests covering the full int8 domain and the reported qmin=-127 sigmoid quantization case.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
backends/arm/_passes/insert_table_ops.py Switches 8-bit TABLE input domain generation to the canonical full int8 range and uses it when generating LUT values.
backends/arm/test/passes/test_insert_table_ops_pass.py Adds regression tests validating the full int8 TABLE domain and sigmoid LUT correctness for qmin=-127.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zingo
Copy link
Copy Markdown
Collaborator

zingo commented Apr 17, 2026

I spot the problem below in the tests, I try a rerun to see if it's just a random unrelated error.

FAILED backends/arm/test/models/test_conformer.py::test_conformer_tosa_INT - AssertionError: Output 0 does not match reference output.
Given atol: 0.42308734320104124, rtol: 0.4.
Output tensor shape: torch.Size([10, 95, 16]), dtype: torch.float32
Difference: max: 0.507921576499939, abs: 0.7387949824333191, mean abs error: 0.019569561319964887.
-- Model vs. Reference --
Numel: 15200, 15200
Median: 0.0, 0.0
Mean: 0.0007245174780684083, 0.000347828720573728
Max: 2.8166558742523193, 2.8166558742523193
Min: -3.0706167221069336, -3.0706167221069336

@xingguo01
Copy link
Copy Markdown
Collaborator Author

@zingo thanks for checking, should have trigger-extended. Anyway, it looks the threshold is now too tight for conformer. This failure lines up directly with the change in insert_table_ops.py (line 142): the INT8 TABLE LUT is now built over the canonical [-128, 127] code domain instead of linspace(qmin, qmax, 256). That fixes the reported qmin = -127 sigmoid mapping bug, but test_conformer_tosa_INT in test_conformer.py (line 68) is a model-level accumulation test with already-loose tolerances (atol = rtol = 0.4 at test_conformer.py (line 39)). The reported numbers look like drift, not a hard correctness break:

max diff 0.5079 vs allowed 0.4231
mean abs error only 0.0196
output min/max unchanged
So the most likely situation is: the sigmoid fix is correct, but the conformer tolerance is now slightly too tight for the new LUT behavior after many repeated sigmoid/table applications.

Lowest-risk next step:

Keep the LUT fix.
Bump the conformer INT tolerance slightly in test_conformer.py (line 39), for example atol = 0.5 or 0.55, and rerun that test. I can submit a chained fix to update the threshold.

@zingo
Copy link
Copy Markdown
Collaborator

zingo commented Apr 17, 2026

Interesting, yeah we seem to be very "close" I did a new re-run to get more numbers :) Lets look at it more next week at work, I hope you have a really good weekend. :)

- Build INT8 TOSA TABLE values from the canonical int8 code range
[-128, 127], but clamp the effective input codes to the quantized
[qmin, qmax] range before dequantizing and evaluating the op.

- This preserves the 256-entry TOSA TABLE layout while keeping folded
TABLE behavior aligned with PT2E q/dq eager semantics for reduced
ranges such as qmin=-127, where eager quantization never produces
the -128 code.

- Update the pass-level regression tests to cover the full int8 table
domain and the qmin=-127 sigmoid case under the clamped-input
behavior.

Signed-off-by: Xingguo Li <xingguo.li@arm.com>
Change-Id: I382cdae5aa89a27192d956834e32865033391b64
- Increase the Conformer INT test absolute tolerance from 0.4 to
0.45.

- The clamped INT8 TABLE update keeps folded LUT behavior aligned
with PT2E q/dq eager semantics, but the Conformer INT path still
shows slightly higher accumulated model-level error with the new
TABLE behavior.

Change-Id: I09dcd4ea6593cbe37fcfb2bca3cea446a2115423
Signed-off-by: Xingguo Li <xingguo.li@arm.com>
@xingguo01 xingguo01 force-pushed the arm-backend-fix-luts-softmax branch from 3b65eb7 to b84fb93 Compare April 24, 2026 13:15
@github-actions github-actions Bot added the module: arm Issues related to arm backend label Apr 24, 2026
@xingguo01 xingguo01 merged commit 32a6cec into pytorch:main Apr 27, 2026
422 of 431 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: arm Issues related to arm backend partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm release notes: arm Changes to the ARM backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Arm backend: Quantized Arm sigmoid TABLE generation uses an invalid int8 input code sequence when qmin=-127

4 participants