Skip to content

feat(quantization): add ActivationRestrictedAsymmetric option#28237

Open
Rishi-Dave wants to merge 3 commits intomicrosoft:mainfrom
Rishi-Dave:rishidave/feat/restricted-asymmetric-quant
Open

feat(quantization): add ActivationRestrictedAsymmetric option#28237
Rishi-Dave wants to merge 3 commits intomicrosoft:mainfrom
Rishi-Dave:rishidave/feat/restricted-asymmetric-quant

Conversation

@Rishi-Dave
Copy link
Copy Markdown
Contributor

Description

Adds a new ActivationRestrictedAsymmetric extra-option to the Python
quantization tools. When enabled, uint8 activation zero-points are snapped
to either 0 (when rmin >= 0, e.g. post-ReLU/Sigmoid tensors) or 128
(when rmin < 0). The scale is recomputed so the dequantized range still
covers [rmin, rmax] without clipping.

This restricted asymmetric mode is required by some hardware accelerators
that only support these two zero-point values for uint8 quantization,
without requiring the full restriction to symmetric (zero-point = 128 for
all tensors).

Motivation and Context

Fixes #21398.

Existing options cover only fully symmetric (ActivationSymmetric
zero-point fixed at 128) or unrestricted asymmetric. There was no mode
that picks the closer of {0, 128} per tensor based on its observed range.

Changes

  • quant_utils.py: new snap_zero_point_to_uint8(rmin, rmax) helper.
  • base_quantizer.py: parse new ActivationRestrictedAsymmetric extra-option.
  • onnx_quantizer.py and qdq_quantizer.py: apply snap after
    compute_scale_zp in the activation path. Guarded on
    quant_type == UINT8 and not symmetric. Weight and int8 paths are
    untouched.
  • quantize.py: document the new option in the four extra_options
    docstrings.
  • test_symmetric_flag.py: new TestRestrictedAsymmetricFlag covering
    three cases (positive range → zp=0, signed range → zp=128, and
    option-disabled regression).

Testing

```
python -m pytest onnxruntime/test/python/quantization/test_symmetric_flag.py -v
```
All 7 tests pass (4 existing + 3 new). `lintrunner` is clean.

…t8 zero-point snapping

When extra_options={"ActivationRestrictedAsymmetric": True} is passed to
quantize_static (or a QDQ config), uint8 activation zero-points are snapped
to 0 when rmin >= 0 (e.g. post-ReLU tensors) or 128 when rmin < 0.  Scale
is recomputed so the dequantized range still covers [rmin, rmax] without
clipping.

- quant_utils: add snap_zero_point_to_uint8() helper (~28 LOC)
- base_quantizer: parse ActivationRestrictedAsymmetric extra-option flag
- onnx_quantizer: apply snap after compute_scale_zp in calc_quant_params
  (uint8, non-symmetric activations only)
- qdq_quantizer: same snap in QDQ calc_quant_params path
- quantize: document new option in all four extra_options docstrings
- test_symmetric_flag: add TestRestrictedAsymmetricFlag (3 test methods)

Refs microsoft#21398
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the focused change. The new option is consistently wired through the QOperator and QDQ paths, and the basic snap behavior is covered. I found one correctness issue that should be fixed before merge: the restricted asymmetric path recomputes scale after the existing quant-param helper and drops the MinimumRealRange guarantee. I also left a smaller test-discovery note.

Comment thread onnxruntime/python/tools/quantization/quant_utils.py Outdated
Comment thread onnxruntime/test/python/quantization/test_symmetric_flag.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Python quantization extra_options mode (ActivationRestrictedAsymmetric) to support uint8 activation zero-points restricted to {0, 128}, as required by some accelerators.

Changes:

  • Add snap_zero_point_to_uint8(rmin, rmax) helper to recompute (zp, scale) with zp snapped to 0 or 128.
  • Parse/propagate the new ActivationRestrictedAsymmetric option and apply it in both QOperator and QDQ quantization activation parameter calculation.
  • Document the option in quantize.py and add unit tests covering the expected snapping behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
onnxruntime/python/tools/quantization/quant_utils.py Adds snapping helper for restricted-asymmetric uint8 activations.
onnxruntime/python/tools/quantization/base_quantizer.py Parses new ActivationRestrictedAsymmetric extra option.
onnxruntime/python/tools/quantization/onnx_quantizer.py Applies snapping in QOperator activation quant-param calculation.
onnxruntime/python/tools/quantization/qdq_quantizer.py Applies snapping in QDQ activation quant-param calculation.
onnxruntime/python/tools/quantization/quantize.py Documents the new extra-option in public docstrings.
onnxruntime/test/python/quantization/test_symmetric_flag.py Adds tests validating zp snapping behavior for positive/signed activation ranges.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/python/tools/quantization/onnx_quantizer.py Outdated
Comment thread onnxruntime/python/tools/quantization/qdq_quantizer.py Outdated
Comment thread onnxruntime/python/tools/quantization/quant_utils.py Outdated
Comment thread onnxruntime/test/python/quantization/test_symmetric_flag.py Outdated
Comment thread onnxruntime/python/tools/quantization/quant_utils.py Outdated
Rishi-Dave added 2 commits May 3, 2026 11:20
…c snap

Address review feedback on PR microsoft#28237:

- snap_zero_point_to_uint8 now accepts qmin/qmax and min_real_range, so
  the helper preserves the MinimumRealRange floor (matching
  compute_scale_zp behavior) and handles reduce_range=True correctly.
  The midpoint and scale formulas are derived from qmin/qmax instead of
  hardcoded UINT8 constants.
- Both call sites in onnx_quantizer.py and qdq_quantizer.py now pass
  qmin, qmax, and self.min_real_range into the helper.
- Move the unittest.main() guard to the end of test_symmetric_flag.py
  so TestRestrictedAsymmetricFlag is discovered when the file is run
  directly with python test_symmetric_flag.py.
…int_to_uint8

snap_zero_point_to_uint8 hardcoded uint8-asymmetric bounds (0/255/128/127)
and returned scale=1.0 on degenerate ranges, which discarded any
reduce_range or MinimumRealRange settings already applied by the caller.

- Parameterize the helper on qmin, qmax, min_real_range. Default arg
  values reproduce the prior 0/255 math exactly.
- Compute the snap pivot as mid = (qmin + qmax + 1) // 2 instead of
  hardcoding 128, so reduce_range (qmax=127) yields a valid in-range zp.
- In the degenerate (rmax <= rmin) branch, derive scale from
  max(|rmin|, |rmax|) instead of returning 1.0; honor the
  min_real_range floor when provided.
- Forward qmin, qmax, and self.min_real_range from both call sites in
  onnx_quantizer.py and qdq_quantizer.py to keep
  ActivationRestrictedAsymmetric consistent with compute_scale_zp.
- Add tests for the reduce_range and min_real_range paths.
@Rishi-Dave
Copy link
Copy Markdown
Contributor Author

Thanks for the careful review. Pushed d184733 addressing the points:

  • snap_zero_point_to_uint8 now takes qmin, qmax, and min_real_range. The snap pivot is computed as mid = (qmin + qmax + 1) // 2 (still 128 for the default 0/255 path), and the negative-rmin / non-negative-rmin branches use mid - qmin and qmax - mid as denominators. Defaults preserve the prior numerics exactly.
  • Degenerate rmax <= rmin no longer returns scale=1.0. It derives a meaningful scale from max(|rmin|, |rmax|) and applies the min_real_range floor when provided.
  • Both call sites in onnx_quantizer.py and qdq_quantizer.py now forward qmin, qmax, and self.min_real_range, so per-tensor reduce_range and MinimumRealRange flow through correctly when ActivationRestrictedAsymmetric is enabled.
  • Added two tests covering the reduce_range (qmax=127) and min_real_range floor paths.

__main__ block was already at the end of the file; left as-is.

Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (round 2)

All high-priority concerns from the previous round (3969e7c) have been addressed:

  • snap_zero_point_to_uint8 now accepts qmin, qmax, and min_real_range parameters, properly honoring reduce_range and MinimumRealRange settings.
  • Both QOperator and QDQ call sites forward these parameters.
  • The test class is correctly placed before the if __name__ guard.
  • Degenerate range handling is improved.

All 7 previous threads have been resolved. Two minor suggestions remain (see inline comments), neither blocking.

key value pair dictionary for various options in different case. Current used:
extra.Sigmoid.nnapi = True/False (Default is False)
ActivationSymmetric = True/False: symmetrize calibration data for activations (default is False).
ActivationRestrictedAsymmetric = True/False: (uint8 activations only) snap zero-point to 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: the docstring says "128" but with reduce_range=True on uint8, qmax=127 and mid = (0 + 127 + 1) // 2 = 64, not 128. Consider saying "midpoint of the quantized range" instead of hardcoding "128" to avoid confusion. Same applies to the three other extra_options docstrings in this file.

# the symmetry (i.e., signed integer types will use symmetric quantization). See `def is_weight_symmetric()`
self._is_weight_symmetric: bool | None = self.extra_options.get("WeightSymmetric", None)
self.is_activation_symmetric = self.extra_options.get("ActivationSymmetric", False)
self.is_activation_restricted_asymmetric = self.extra_options.get("ActivationRestrictedAsymmetric", False)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion (non-blocking): if a user sets both ActivationSymmetric=True and ActivationRestrictedAsymmetric=True, the restricted path silently does nothing because symmetric=True fails the not symmetric guard in both quantizers. This is almost certainly a misconfiguration. Consider logging a warning here when both flags are enabled:

if self.is_activation_symmetric and self.is_activation_restricted_asymmetric:
    logger.warning("ActivationSymmetric and ActivationRestrictedAsymmetric are mutually exclusive; "
                   "ActivationRestrictedAsymmetric will be ignored.")

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +326 to +331
# Degenerate range - compute a meaningful scale rather than a hardcoded 1.0.
abs_max = max(abs(rmin), abs(rmax))
scale_val = (abs_max if abs_max > 0 else 1.0) / max(1, (qmax_val - qmin_val) // 2)
if min_real_range is not None and scale_val < min_real_range / (qmax_val - qmin_val):
scale_val = min_real_range / (qmax_val - qmin_val)
return numpy.array(mid, dtype=numpy.uint8), numpy.array(scale_val, dtype=numpy.float32)
Comment on lines +123 to +124
ActivationRestrictedAsymmetric = True/False: (uint8 activations only) snap zero-point to 0
(rmin>=0) or 128 (rmin<0); recompute scale accordingly (default is False).
Comment on lines +207 to +213
def test_positive_activations_zp_is_zero(self):
"""All-positive range (rmin >= 0): zero-point must snap to 0."""
act_zp, act_sc = self._quantize(
self.positive_activations,
extra_options={"ActivationRestrictedAsymmetric": True},
)
self.assertEqual(act_zp, 0, f"Expected zp=0 for rmin>=0, got {act_zp}")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New restricted asymmetric quantization mode in QDQ mode with zero_point restricted to either 128 or 0

3 participants