Quantize: Implement functional e2e test cases and fix issues found during test by zhenchaoni · Pull Request #608 · microsoft/winml-cli

zhenchaoni · 2026-05-13T07:19:03Z

Fixes #504

Fix issue

Issue 1

When user specify --precision int16 and config.json but no weight_type oractivation_type, the quantize is using uint8 as the weight and activation.

This is because the config resolution logic uses the default value of the data classes instead of the user specified value in CLI. This issue is fixed by only loading the user specified values in the config json file.

Issue 2

The winml quantize command doesn't take model id, causing the calibration always using RandomDataset. The fix is to add the option.

This doesn't impact build command, because build command calls the internal quantize_onnx python method directly which has model id parameter.

Issue 3

When user provides a non-exist output directory, the command fails. The fix is to create that directory.

Add e2e test

Precision routing

Command	Input	Assertion
`winml quantize -m <tiny> --samples 4`	tiny FP32	weight zp dtype = `UINT8`; input metadata survives
`winml quantize -m <tiny> --precision int8 --samples 4`	tiny	weight zp dtype = `UINT8` (preset → uint8/uint8)
`winml quantize -m <tiny> --precision int16 --samples 4`	tiny	weight zp dtype = `INT16` (skip ORT inference)
`winml quantize -m <tiny> --precision w8a16 --samples 4`	tiny	activation zp dtype = `UINT16` (skip ORT inference)
`winml quantize -m <tiny> --precision int8 --weight-type int8 --activation-type uint8 --samples 4`	tiny	weight zp dtype = `INT8` (explicit beats preset)
`winml quantize -m <tiny> --precision fp16 --samples 4`	tiny	weight zp dtype = `UINT8` (documented silent fallback)

Calibration method

Command	Input	Assertion
`winml quantize -m <tiny> --method minmax --samples 4`	tiny	shared structural checks
`winml quantize -m <tiny> --method entropy --samples 4`	tiny	shared structural checks
`winml quantize -m <tiny> --method percentile --samples 4`	tiny	shared structural checks

Quant options

Command	Input	Assertion
`winml quantize -m <tiny> --per-channel --samples 4`	tiny	weight DQ scale init has > 1 element (per-channel vector)
`winml quantize -m <tiny> --symmetric --weight-type int8 --samples 4`	tiny	every weight zp value == 0

Per-task calibration datasets (one per `TASK_DATASET_MAPPING` class)

Each row constructs the real DatasetCalibrationReader instance for the given task using the
HuggingFace preprocessor of the supplied model.

Command	Input	Assertion
`winml quantize -m <tiny> --task random --samples 4`	tiny ONNX (no model_id)	shared structural checks — `RandomDataset` path
`winml quantize -m <onnx_imgcls> --task image-classification --model-id microsoft/resnet-50 --samples 4`	exported ResNet-50 ONNX + ResNet `AutoImageProcessor`	shared structural checks — `ImageDataset` path
`winml quantize -m <onnx_txtcls> --task text-classification --model-id Intel/bert-base-uncased-mrpc --samples 4`	exported BERT-MRPC ONNX + BERT `AutoTokenizer`	shared structural checks — `TextDataset` path
`winml quantize -m <onnx_objdet> --task object-detection --model-id hustvl/yolos-small --samples 4`	exported YOLOS ONNX + YOLOS image processor	shared structural checks — `ObjectDetectionDataset` path
`winml quantize -m <onnx_imgseg> --task image-segmentation --model-id nvidia/segformer-b0-finetuned-ade-512-512 --samples 4`	exported SegFormer ONNX + SegFormer image processor	shared structural checks — `ImageSegmentationDataset` path
`winml quantize -m <tiny> --task automatic-speech-recognition --samples 4`	tiny ONNX, unsupported task	CLI output contains `falling back to RandomDataset`

Output behavior

Command	Input	Assertion
`winml quantize -m <tiny> --samples 4`	tiny	output exists at `<tiny.parent>/<tiny.stem>_qdq.onnx`
`winml quantize -m <tiny> -o <tmp>/out/custom.onnx --samples 4`	tiny	file exists at exact `-o` path
`winml quantize -m <tiny> -o <tmp>/missing/nested/custom.onnx --samples 4`	tiny, missing parent dir	command auto-creates parent dirs; file exists (regression: previously crashed with `FileNotFoundError` from `os.chdir`)
`winml quantize -m <tiny_ext>.onnx -o <tmp>/out/quant_ext.onnx --samples 4`	tiny + external data sidecar	both `quant_ext.onnx` and `quant_ext.onnx.data` exist

Build-config precedence (CLI vs config file)

Command	Input	Assertion
`winml quantize -m <tiny> --config bc.json --samples 4`	tiny + `bc.json={"quant":{"samples":50,"calibration_method":"entropy"}}`	stdout shows `Samples: 4` (CLI wins) and `Method: entropy` (config used)
`winml quantize -m <tiny> --config bc.json --precision int16 --samples 4`	tiny + `bc.json={"quant":{}}`	weight zp dtype = `INT16` (regression: explicit `--precision` beats empty config)

Build-config key absorption sweep

For each quant.* key, assert it is consumed when CLI omits it. Verified by structural
inspection of the produced model (not stdout).

Command	Input	Assertion
`winml quantize -m <tiny> --config bc.json --samples 4`	`bc.json={"quant":{"weight_type":"int8"}}`	weight DQ zp dtype = `INT8`
`winml quantize -m <tiny> --config bc.json --samples 4`	`bc.json={"quant":{"per_channel":true}}`	weight DQ scale init has > 1 element
`winml quantize -m <tiny> --config bc.json --samples 4`	`bc.json={"quant":{"symmetric":true,"weight_type":"int8"}}`	every weight zp value == 0
`winml quantize -m <tiny> --config bc.json --samples 4`	`bc.json={"quant":{"task":"automatic-speech-recognition"}}`	CLI output contains `falling back to RandomDataset` (config task flowed to dataset selection)

Verbose

Command	Input	Assertion
`winml quantize -m <tiny> -o <out> --samples 4 -v` vs same without `-v`	tiny	verbose output strictly longer than default

Errors

Command	Input	Assertion
`winml quantize`	—	exit ≠ 0; stderr matches `Missing option .*--model`
`winml quantize -m <tmp>/nope.onnx`	non-existent path	exit ≠ 0; stderr matches `does not exist`
`winml quantize -m <tiny> --method gaussian`	tiny	exit ≠ 0; stderr matches `Invalid value for '--method'`
`winml quantize -m <tiny> --weight-type float8`	tiny	exit ≠ 0; stderr matches `Invalid value for '--weight-type'`
`winml quantize -m <bad>.onnx --samples 4`	random bytes	exit ≠ 0; stderr contains `Quantization failed` AND a parse-related substring (parse/protobuf/decode/load/invalid)

Total: 6 + 3 + 2 + 6 + 4 + 2 + 4 + 1 + 5 = 33 cases. All passing.

timenick

Review of #608. Comments inline.

🤖 Generated with Claude Code

Implement quantize e2e

c212927

zhenchaoni requested a review from a team as a code owner May 13, 2026 07:19

github-advanced-security AI found potential problems May 13, 2026

View reviewed changes

Comment thread tests/e2e/test_quantize_e2e.py Fixed

Comment thread tests/unit/commands/test_compile_quantize_flags.py Fixed

xieofxie reviewed May 13, 2026

View reviewed changes

Comment thread src/winml/modelkit/commands/quantize.py

timenick reviewed May 13, 2026

View reviewed changes

zhenchaoni mentioned this pull request May 13, 2026

feat(eval): SA eval pipeline with per-stage perf, quantize step, and workflow HTML report #599

Open

Resolve comments

55cdcc6

xieofxie approved these changes May 14, 2026

View reviewed changes

timenick approved these changes May 14, 2026

View reviewed changes

zhenchaoni merged commit 6835c01 into main May 14, 2026
9 checks passed

zhenchaoni deleted the private/zhenni/quantize_e2e branch May 14, 2026 05:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantize: Implement functional e2e test cases and fix issues found during test#608

Quantize: Implement functional e2e test cases and fix issues found during test#608
zhenchaoni merged 2 commits into
mainfrom
private/zhenni/quantize_e2e

zhenchaoni commented May 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timenick left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zhenchaoni commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix issue

Issue 1

Issue 2

Issue 3

Add e2e test

Precision routing

Calibration method

Quant options

Per-task calibration datasets (one per TASK_DATASET_MAPPING class)

Output behavior

Build-config precedence (CLI vs config file)

Build-config key absorption sweep

Verbose

Errors

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timenick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhenchaoni commented May 13, 2026 •

edited

Loading

Per-task calibration datasets (one per `TASK_DATASET_MAPPING` class)