feat(quantization): add ActivationRestrictedAsymmetric option#28237
feat(quantization): add ActivationRestrictedAsymmetric option#28237Rishi-Dave wants to merge 3 commits intomicrosoft:mainfrom
Conversation
…t8 zero-point snapping
When extra_options={"ActivationRestrictedAsymmetric": True} is passed to
quantize_static (or a QDQ config), uint8 activation zero-points are snapped
to 0 when rmin >= 0 (e.g. post-ReLU tensors) or 128 when rmin < 0. Scale
is recomputed so the dequantized range still covers [rmin, rmax] without
clipping.
- quant_utils: add snap_zero_point_to_uint8() helper (~28 LOC)
- base_quantizer: parse ActivationRestrictedAsymmetric extra-option flag
- onnx_quantizer: apply snap after compute_scale_zp in calc_quant_params
(uint8, non-symmetric activations only)
- qdq_quantizer: same snap in QDQ calc_quant_params path
- quantize: document new option in all four extra_options docstrings
- test_symmetric_flag: add TestRestrictedAsymmetricFlag (3 test methods)
Refs microsoft#21398
tianleiwu
left a comment
There was a problem hiding this comment.
Thanks for the focused change. The new option is consistently wired through the QOperator and QDQ paths, and the basic snap behavior is covered. I found one correctness issue that should be fixed before merge: the restricted asymmetric path recomputes scale after the existing quant-param helper and drops the MinimumRealRange guarantee. I also left a smaller test-discovery note.
There was a problem hiding this comment.
Pull request overview
Adds a new Python quantization extra_options mode (ActivationRestrictedAsymmetric) to support uint8 activation zero-points restricted to {0, 128}, as required by some accelerators.
Changes:
- Add
snap_zero_point_to_uint8(rmin, rmax)helper to recompute (zp, scale) with zp snapped to 0 or 128. - Parse/propagate the new
ActivationRestrictedAsymmetricoption and apply it in both QOperator and QDQ quantization activation parameter calculation. - Document the option in
quantize.pyand add unit tests covering the expected snapping behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/python/tools/quantization/quant_utils.py | Adds snapping helper for restricted-asymmetric uint8 activations. |
| onnxruntime/python/tools/quantization/base_quantizer.py | Parses new ActivationRestrictedAsymmetric extra option. |
| onnxruntime/python/tools/quantization/onnx_quantizer.py | Applies snapping in QOperator activation quant-param calculation. |
| onnxruntime/python/tools/quantization/qdq_quantizer.py | Applies snapping in QDQ activation quant-param calculation. |
| onnxruntime/python/tools/quantization/quantize.py | Documents the new extra-option in public docstrings. |
| onnxruntime/test/python/quantization/test_symmetric_flag.py | Adds tests validating zp snapping behavior for positive/signed activation ranges. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…c snap Address review feedback on PR microsoft#28237: - snap_zero_point_to_uint8 now accepts qmin/qmax and min_real_range, so the helper preserves the MinimumRealRange floor (matching compute_scale_zp behavior) and handles reduce_range=True correctly. The midpoint and scale formulas are derived from qmin/qmax instead of hardcoded UINT8 constants. - Both call sites in onnx_quantizer.py and qdq_quantizer.py now pass qmin, qmax, and self.min_real_range into the helper. - Move the unittest.main() guard to the end of test_symmetric_flag.py so TestRestrictedAsymmetricFlag is discovered when the file is run directly with python test_symmetric_flag.py.
…int_to_uint8 snap_zero_point_to_uint8 hardcoded uint8-asymmetric bounds (0/255/128/127) and returned scale=1.0 on degenerate ranges, which discarded any reduce_range or MinimumRealRange settings already applied by the caller. - Parameterize the helper on qmin, qmax, min_real_range. Default arg values reproduce the prior 0/255 math exactly. - Compute the snap pivot as mid = (qmin + qmax + 1) // 2 instead of hardcoding 128, so reduce_range (qmax=127) yields a valid in-range zp. - In the degenerate (rmax <= rmin) branch, derive scale from max(|rmin|, |rmax|) instead of returning 1.0; honor the min_real_range floor when provided. - Forward qmin, qmax, and self.min_real_range from both call sites in onnx_quantizer.py and qdq_quantizer.py to keep ActivationRestrictedAsymmetric consistent with compute_scale_zp. - Add tests for the reduce_range and min_real_range paths.
|
Thanks for the careful review. Pushed d184733 addressing the points:
|
tianleiwu
left a comment
There was a problem hiding this comment.
Follow-up review (round 2)
All high-priority concerns from the previous round (3969e7c) have been addressed:
snap_zero_point_to_uint8now acceptsqmin,qmax, andmin_real_rangeparameters, properly honoringreduce_rangeandMinimumRealRangesettings.- Both QOperator and QDQ call sites forward these parameters.
- The test class is correctly placed before the
if __name__guard. - Degenerate range handling is improved.
All 7 previous threads have been resolved. Two minor suggestions remain (see inline comments), neither blocking.
| key value pair dictionary for various options in different case. Current used: | ||
| extra.Sigmoid.nnapi = True/False (Default is False) | ||
| ActivationSymmetric = True/False: symmetrize calibration data for activations (default is False). | ||
| ActivationRestrictedAsymmetric = True/False: (uint8 activations only) snap zero-point to 0 |
There was a problem hiding this comment.
Suggestion: the docstring says "128" but with reduce_range=True on uint8, qmax=127 and mid = (0 + 127 + 1) // 2 = 64, not 128. Consider saying "midpoint of the quantized range" instead of hardcoding "128" to avoid confusion. Same applies to the three other extra_options docstrings in this file.
| # the symmetry (i.e., signed integer types will use symmetric quantization). See `def is_weight_symmetric()` | ||
| self._is_weight_symmetric: bool | None = self.extra_options.get("WeightSymmetric", None) | ||
| self.is_activation_symmetric = self.extra_options.get("ActivationSymmetric", False) | ||
| self.is_activation_restricted_asymmetric = self.extra_options.get("ActivationRestrictedAsymmetric", False) |
There was a problem hiding this comment.
Suggestion (non-blocking): if a user sets both ActivationSymmetric=True and ActivationRestrictedAsymmetric=True, the restricted path silently does nothing because symmetric=True fails the not symmetric guard in both quantizers. This is almost certainly a misconfiguration. Consider logging a warning here when both flags are enabled:
if self.is_activation_symmetric and self.is_activation_restricted_asymmetric:
logger.warning("ActivationSymmetric and ActivationRestrictedAsymmetric are mutually exclusive; "
"ActivationRestrictedAsymmetric will be ignored.")There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Degenerate range - compute a meaningful scale rather than a hardcoded 1.0. | ||
| abs_max = max(abs(rmin), abs(rmax)) | ||
| scale_val = (abs_max if abs_max > 0 else 1.0) / max(1, (qmax_val - qmin_val) // 2) | ||
| if min_real_range is not None and scale_val < min_real_range / (qmax_val - qmin_val): | ||
| scale_val = min_real_range / (qmax_val - qmin_val) | ||
| return numpy.array(mid, dtype=numpy.uint8), numpy.array(scale_val, dtype=numpy.float32) |
| ActivationRestrictedAsymmetric = True/False: (uint8 activations only) snap zero-point to 0 | ||
| (rmin>=0) or 128 (rmin<0); recompute scale accordingly (default is False). |
| def test_positive_activations_zp_is_zero(self): | ||
| """All-positive range (rmin >= 0): zero-point must snap to 0.""" | ||
| act_zp, act_sc = self._quantize( | ||
| self.positive_activations, | ||
| extra_options={"ActivationRestrictedAsymmetric": True}, | ||
| ) | ||
| self.assertEqual(act_zp, 0, f"Expected zp=0 for rmin>=0, got {act_zp}") |
Description
Adds a new
ActivationRestrictedAsymmetricextra-option to the Pythonquantization tools. When enabled, uint8 activation zero-points are snapped
to either 0 (when
rmin >= 0, e.g. post-ReLU/Sigmoid tensors) or 128(when
rmin < 0). The scale is recomputed so the dequantized range stillcovers
[rmin, rmax]without clipping.This restricted asymmetric mode is required by some hardware accelerators
that only support these two zero-point values for uint8 quantization,
without requiring the full restriction to symmetric (zero-point = 128 for
all tensors).
Motivation and Context
Fixes #21398.
Existing options cover only fully symmetric (
ActivationSymmetric→zero-point fixed at 128) or unrestricted asymmetric. There was no mode
that picks the closer of {0, 128} per tensor based on its observed range.
Changes
quant_utils.py: newsnap_zero_point_to_uint8(rmin, rmax)helper.base_quantizer.py: parse newActivationRestrictedAsymmetricextra-option.onnx_quantizer.pyandqdq_quantizer.py: apply snap aftercompute_scale_zpin the activation path. Guarded onquant_type == UINT8 and not symmetric. Weight and int8 paths areuntouched.
quantize.py: document the new option in the fourextra_optionsdocstrings.
test_symmetric_flag.py: newTestRestrictedAsymmetricFlagcoveringthree cases (positive range → zp=0, signed range → zp=128, and
option-disabled regression).
Testing
```
python -m pytest onnxruntime/test/python/quantization/test_symmetric_flag.py -v
```
All 7 tests pass (4 existing + 3 new). `lintrunner` is clean.