Add a16w8 per-op test for bmm (#19599)#19599
Conversation
Summary: Add int16 activation / int8 weight (a16w8) quantization tests for `aten.exp` on Ethos-U55 and Ethos-U85. ## Context The `exp` op is part of the softmax decomposition (`softmax(x) = exp(x) / sum(exp(x))`), which is used in the attention mechanism of EMG2Pose Conformer models. This op was identified as the root cause of the U85 SNR regression investigated in SEV T267939669 — without dedicated a16w8 per-op coverage, the numerics issue was only visible at the full-model level. Adding per-op tests allows us to catch int16 precision regressions at the operator granularity before they propagate to end-to-end model accuracy. ## Changes - Add `a16w8_exp_test_parameters` dict with 3 test configurations covering rank-1, rank-2, and rank-3 tensors - Add `test_exp_a16w8_u55_INT` using `EthosU55PipelineINT` with `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16` - Add `test_exp_a16w8_u85_INT` using `EthosU85PipelineINT` with same kwargs - Register `ops/test_exp.py` in `fbcode/` and `xplat/` `targets.bzl` bypass-pytorch-oss-checks Differential Revision: D104532358
Summary: Add int16 activation / int8 weight (a16w8) quantization tests for `aten.reciprocal` on Ethos-U55 and Ethos-U85. ## Context The `reciprocal` op is the second half of the softmax decomposition (`softmax(x) = exp(x) * reciprocal(sum(exp(x)))`), paired with `exp`. Together they form the attention mechanism in EMG2Pose Conformer models. Like `exp`, this op was implicated in the U85 SNR regression (SEV T267939669) — the division-by-reciprocal path can amplify quantization error when the denominator is itself quantized at int16. Adding dedicated a16w8 coverage isolates reciprocal numerics from the rest of the softmax pipeline. ## Changes - Add `a16w8_reciprocal_test_parameters` dict with 3 test configurations covering rank-1, rank-2, and rank-3 tensors (all shifted by +0.1 to avoid division near zero) - Add `test_reciprocal_a16w8_u55_INT` using `EthosU55PipelineINT` with `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16` - Add `test_reciprocal_a16w8_u85_INT` using `EthosU85PipelineINT` with same kwargs - Register `ops/test_reciprocal.py` in `fbcode/` and `xplat/` `targets.bzl` bypass-pytorch-oss-checks Differential Revision: D104532357
Summary: Add int16 activation / int8 weight (a16w8) quantization tests for `aten.mean.dim` on Ethos-U55 and Ethos-U85. ## Context The `mean_dim` op is a core component of the LayerNorm decomposition (`LayerNorm = (x - mean) / sqrt(var + eps) * gamma + beta`). It is used across multiple EMG production models including CC, CASCADE, HW, WAKE, and BTD. Despite this wide usage, no a16w8 per-op coverage existed — the int16 quantization path was only exercised indirectly through end-to-end model tests, making it difficult to isolate mean-specific numerics issues from other LayerNorm components. ## Changes - Add `a16w8_mean_test_parameters` dict with 11 test configurations covering keepdim/no-keepdim, positive/negative dims, dim=None, and ranks 1-4 - Add `test_mean_dim_a16w8_u55_INT` using `EthosU55PipelineINT` with `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16` - Add `test_mean_dim_a16w8_u85_INT` using `EthosU85PipelineINT` with same kwargs - Register `ops/test_mean_dim.py` in `fbcode/` and `xplat/` `targets.bzl` bypass-pytorch-oss-checks Differential Revision: D104532361
Summary: Add int16 activation / int8 weight (a16w8) quantization tests for `aten.var` on Ethos-U55 and Ethos-U85. ## Context The `var` op is the second component of the LayerNorm decomposition, paired with `mean_dim`. Together they compute the normalization statistics used in every LayerNorm layer across EMG models including EMG2Pose. Variance computation is particularly sensitive to int16 quantization because it involves squaring differences — small quantization errors in the mean subtraction are amplified quadratically. Dedicated a16w8 coverage isolates variance numerics from the rest of the LayerNorm pipeline. ## Changes - Add `a16w8_var_test_parameters` dict with 4 test configurations covering keepdim/no-keepdim and correction values 0, 0.5, and 1 - Add `test_var_a16w8_u55_INT` using `EthosU55PipelineINT` with `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16` - Add `test_var_a16w8_u85_INT` using `EthosU85PipelineINT` with same kwargs - Register `ops/test_var.py` in `fbcode/` and `xplat/` `targets.bzl` bypass-pytorch-oss-checks Differential Revision: D104532362
Summary:
Add int16 activation / int8 weight (a16w8) quantization tests for `aten.conv1d` on Ethos-U55 and Ethos-U85.
## Context
`conv1d` is the most critical op in the EMG stack — it is used by ALL 8 production EMG models (CC, CASCADE, HW, WAKE, BTD, AUTH, EMG2Pose, EMG2Touch) for temporal feature extraction from raw EMG signals. Despite this, only `conv2d` had a16w8 test coverage; `conv1d` was completely uncovered at the int16 activation precision. This gap meant that any Vela or quantizer regression affecting 1D convolutions at int16 IO would go undetected until full-model validation, making root-cause analysis significantly harder.
The test matrix is the largest in this stack because conv1d has the most configuration surface: kernel sizes (1, 3, 5), strides, padding, dilation, depthwise groups, and bias/no-bias variants are all crossed with per-channel vs. per-tensor quantization.
## Changes
- Add `a16w8_conv1d_test_parameters` dict with 14 test configurations (7 conv configs × {per_channel_quant=True, False}) covering kernel sizes 1/3/5, stride 1/2, dilation, depthwise, and no-bias variants
- Add `test_conv1d_a16w8_u55_INT` using `EthosU55PipelineINT` with `a16w8_quantization=True, symmetric_io_quantization=True, per_channel_quantization=<varied>, qtol=128, epsilon=2**-16`
- Add `test_conv1d_a16w8_u85_INT` using `EthosU85PipelineINT` with same kwargs
- Register `ops/test_conv1d.py` in `fbcode/` and `xplat/` `targets.bzl`
bypass-pytorch-oss-checks
Differential Revision: D104532360
Summary: Add int16 activation / int8 weight (a16w8) quantization tests for `aten.gelu` on Ethos-U55 and Ethos-U85. ## Context The `gelu` activation is used in the feed-forward blocks of the EMG2Pose Conformer architecture. GELU is a non-linear activation that involves erf/tanh approximations — these are decomposed into multiple primitive ops during lowering to Ethos-U, making the int16 quantization path particularly susceptible to accumulated rounding error. Without dedicated a16w8 per-op coverage, GELU numerics issues could only be detected at the full Conformer block level. ## Changes - Add `a16w8_gelu_test_parameters` dict with 3 test configurations covering rank-1, rank-2, and rank-3 tensors - Add `test_gelu_a16w8_u55_INT` using `EthosU55PipelineINT` with `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16` - Add `test_gelu_a16w8_u85_INT` using `EthosU85PipelineINT` with same kwargs - Register `ops/test_gelu.py` in `fbcode/` and `xplat/` `targets.bzl` bypass-pytorch-oss-checks Differential Revision: D104532359
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19599
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 1 Cancelled JobAs of commit 6a8da35 with merge base 58b4f26 ( NEW FAILURE - The following job has failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
c0c8f38 to
1854072
Compare
|
|
This PR needs a
|
Summary: Add int16 activation / int8 weight (a16w8) quantization tests for `aten.bmm` on Ethos-U55 and Ethos-U85. ## Changes - Add `a16w8_bmm_test_parameters` dict with 5 test configurations covering same-shape, different-shape, rectangular, batch-10, and negative-value tensors - Add `test_bmm_a16w8_u55_INT` using `EthosU55PipelineINT` with `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16` - Add `test_bmm_a16w8_u85_INT` using `EthosU85PipelineINT` with same kwargs - Remove unused `aten_op_mm` and `exir_op_mm` variables - Register `ops/test_bmm.py` in `fbcode/` and `xplat/` `targets.bzl` bypass-pytorch-oss-checks Differential Revision: D104532363
1854072 to
d5349b3
Compare
d5349b3 to
77a9a41
Compare
|
@christine-long-meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104532363. |
1 similar comment
|
@christine-long-meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104532363. |
Summary: Pull Request resolved: pytorch#19599 Add int16 activation / int8 weight (a16w8) quantization tests for `aten.bmm` on Ethos-U55 and Ethos-U85. ## Changes - Add `a16w8_bmm_test_parameters` dict with 5 test configurations covering same-shape, different-shape, rectangular, batch-10, and negative-value tensors - Add `test_bmm_a16w8_u55_INT` using `EthosU55PipelineINT` with `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16` - Add `test_bmm_a16w8_u85_INT` using `EthosU85PipelineINT` with same kwargs - Remove unused `aten_op_mm` and `exir_op_mm` variables - Register `ops/test_bmm.py` in `fbcode/` and `xplat/` `targets.bzl` bypass-pytorch-oss-checks Differential Revision: D104532363
77a9a41 to
6a8da35
Compare
Summary:
Add int16 activation / int8 weight (a16w8) quantization tests for
aten.bmmon Ethos-U55 and Ethos-U85.Changes
a16w8_bmm_test_parametersdict with 5 test configurations covering same-shape, different-shape, rectangular, batch-10, and negative-value tensorstest_bmm_a16w8_u55_INTusingEthosU55PipelineINTwitha16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16test_bmm_a16w8_u85_INTusingEthosU85PipelineINTwith same kwargsaten_op_mmandexir_op_mmvariablesops/test_bmm.pyinfbcode/andxplat/targets.bzlbypass-pytorch-oss-checks
Differential Revision: D104532363