Skip to content

Quantize: Implement functional e2e test cases and fix issues found during test#608

Merged
zhenchaoni merged 2 commits into
mainfrom
private/zhenni/quantize_e2e
May 14, 2026
Merged

Quantize: Implement functional e2e test cases and fix issues found during test#608
zhenchaoni merged 2 commits into
mainfrom
private/zhenni/quantize_e2e

Conversation

@zhenchaoni
Copy link
Copy Markdown
Member

@zhenchaoni zhenchaoni commented May 13, 2026

Fixes #504

Fix issue

Issue 1

When user specify --precision int16 and config.json but no weight_type oractivation_type, the quantize is using uint8 as the weight and activation.

This is because the config resolution logic uses the default value of the data classes instead of the user specified value in CLI. This issue is fixed by only loading the user specified values in the config json file.

Issue 2

The winml quantize command doesn't take model id, causing the calibration always using RandomDataset. The fix is to add the option.

This doesn't impact build command, because build command calls the internal quantize_onnx python method directly which has model id parameter.

Issue 3

When user provides a non-exist output directory, the command fails. The fix is to create that directory.

Add e2e test

Precision routing

Command Input Assertion
winml quantize -m <tiny> --samples 4 tiny FP32 weight zp dtype = UINT8; input metadata survives
winml quantize -m <tiny> --precision int8 --samples 4 tiny weight zp dtype = UINT8 (preset → uint8/uint8)
winml quantize -m <tiny> --precision int16 --samples 4 tiny weight zp dtype = INT16 (skip ORT inference)
winml quantize -m <tiny> --precision w8a16 --samples 4 tiny activation zp dtype = UINT16 (skip ORT inference)
winml quantize -m <tiny> --precision int8 --weight-type int8 --activation-type uint8 --samples 4 tiny weight zp dtype = INT8 (explicit beats preset)
winml quantize -m <tiny> --precision fp16 --samples 4 tiny weight zp dtype = UINT8 (documented silent fallback)

Calibration method

Command Input Assertion
winml quantize -m <tiny> --method minmax --samples 4 tiny shared structural checks
winml quantize -m <tiny> --method entropy --samples 4 tiny shared structural checks
winml quantize -m <tiny> --method percentile --samples 4 tiny shared structural checks

Quant options

Command Input Assertion
winml quantize -m <tiny> --per-channel --samples 4 tiny weight DQ scale init has > 1 element (per-channel vector)
winml quantize -m <tiny> --symmetric --weight-type int8 --samples 4 tiny every weight zp value == 0

Per-task calibration datasets (one per TASK_DATASET_MAPPING class)

Each row constructs the real DatasetCalibrationReader instance for the given task using the
HuggingFace preprocessor of the supplied model.

Command Input Assertion
winml quantize -m <tiny> --task random --samples 4 tiny ONNX (no model_id) shared structural checks — RandomDataset path
winml quantize -m <onnx_imgcls> --task image-classification --model-id microsoft/resnet-50 --samples 4 exported ResNet-50 ONNX + ResNet AutoImageProcessor shared structural checks — ImageDataset path
winml quantize -m <onnx_txtcls> --task text-classification --model-id Intel/bert-base-uncased-mrpc --samples 4 exported BERT-MRPC ONNX + BERT AutoTokenizer shared structural checks — TextDataset path
winml quantize -m <onnx_objdet> --task object-detection --model-id hustvl/yolos-small --samples 4 exported YOLOS ONNX + YOLOS image processor shared structural checks — ObjectDetectionDataset path
winml quantize -m <onnx_imgseg> --task image-segmentation --model-id nvidia/segformer-b0-finetuned-ade-512-512 --samples 4 exported SegFormer ONNX + SegFormer image processor shared structural checks — ImageSegmentationDataset path
winml quantize -m <tiny> --task automatic-speech-recognition --samples 4 tiny ONNX, unsupported task CLI output contains falling back to RandomDataset

Output behavior

Command Input Assertion
winml quantize -m <tiny> --samples 4 tiny output exists at <tiny.parent>/<tiny.stem>_qdq.onnx
winml quantize -m <tiny> -o <tmp>/out/custom.onnx --samples 4 tiny file exists at exact -o path
winml quantize -m <tiny> -o <tmp>/missing/nested/custom.onnx --samples 4 tiny, missing parent dir command auto-creates parent dirs; file exists (regression: previously crashed with FileNotFoundError from os.chdir)
winml quantize -m <tiny_ext>.onnx -o <tmp>/out/quant_ext.onnx --samples 4 tiny + external data sidecar both quant_ext.onnx and quant_ext.onnx.data exist

Build-config precedence (CLI vs config file)

Command Input Assertion
winml quantize -m <tiny> --config bc.json --samples 4 tiny + bc.json={"quant":{"samples":50,"calibration_method":"entropy"}} stdout shows Samples: 4 (CLI wins) and Method: entropy (config used)
winml quantize -m <tiny> --config bc.json --precision int16 --samples 4 tiny + bc.json={"quant":{}} weight zp dtype = INT16 (regression: explicit --precision beats empty config)

Build-config key absorption sweep

For each quant.* key, assert it is consumed when CLI omits it. Verified by structural
inspection of the produced model (not stdout).

Command Input Assertion
winml quantize -m <tiny> --config bc.json --samples 4 bc.json={"quant":{"weight_type":"int8"}} weight DQ zp dtype = INT8
winml quantize -m <tiny> --config bc.json --samples 4 bc.json={"quant":{"per_channel":true}} weight DQ scale init has > 1 element
winml quantize -m <tiny> --config bc.json --samples 4 bc.json={"quant":{"symmetric":true,"weight_type":"int8"}} every weight zp value == 0
winml quantize -m <tiny> --config bc.json --samples 4 bc.json={"quant":{"task":"automatic-speech-recognition"}} CLI output contains falling back to RandomDataset (config task flowed to dataset selection)

Verbose

Command Input Assertion
winml quantize -m <tiny> -o <out> --samples 4 -v vs same without -v tiny verbose output strictly longer than default

Errors

Command Input Assertion
winml quantize exit ≠ 0; stderr matches Missing option .*--model
winml quantize -m <tmp>/nope.onnx non-existent path exit ≠ 0; stderr matches does not exist
winml quantize -m <tiny> --method gaussian tiny exit ≠ 0; stderr matches Invalid value for '--method'
winml quantize -m <tiny> --weight-type float8 tiny exit ≠ 0; stderr matches Invalid value for '--weight-type'
winml quantize -m <bad>.onnx --samples 4 random bytes exit ≠ 0; stderr contains Quantization failed AND a parse-related substring (parse/protobuf/decode/load/invalid)

Total: 6 + 3 + 2 + 6 + 4 + 2 + 4 + 1 + 5 = 33 cases. All passing.

@zhenchaoni zhenchaoni requested a review from a team as a code owner May 13, 2026 07:19
Comment thread tests/e2e/test_quantize_e2e.py Fixed
Comment thread tests/unit/commands/test_compile_quantize_flags.py Fixed
Comment thread src/winml/modelkit/commands/quantize.py
Copy link
Copy Markdown
Collaborator

@timenick timenick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of #608. Comments inline.

🤖 Generated with Claude Code

Comment thread src/winml/modelkit/commands/quantize.py
Comment thread src/winml/modelkit/commands/quantize.py
Comment thread src/winml/modelkit/commands/quantize.py Outdated
Comment thread src/winml/modelkit/quant/quantizer.py
Comment thread tests/unit/commands/test_compile_quantize_flags.py
Comment thread tests/e2e/test_quantize_e2e.py Outdated
Comment thread tests/e2e/test_quantize_e2e.py
Comment thread tests/e2e/test_quantize_e2e.py
@zhenchaoni zhenchaoni merged commit 6835c01 into main May 14, 2026
9 checks passed
@zhenchaoni zhenchaoni deleted the private/zhenni/quantize_e2e branch May 14, 2026 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Identify functional scenarios and implement E2E tests for winml quantize

4 participants