Summary
When disk space is exhausted during winml build, the optimize step writes a corrupted/truncated ONNX file. The subsequent quantize step then fails with ValueError: Failed to find proper ai.onnx domain, giving no indication that the real cause is an out-of-disk-space condition. Users will chase a phantom code bug.
Repro
# Fill disk to near-zero free space, then:
winml build -c config.json -m microsoft/resnet-50 -o model_a_config/
Error
ValueError: Failed to find proper ai.onnx domain in
onnxruntime.quantization.quant_utils.get_opset_version
Stack trace: quantizer.py:166 → quantize.py:909 → quantize.py:702 → quant_utils.py:984 → quant_utils.py:977 → ValueError
Root Cause
The optimize step writes a truncated/zero-byte ONNX output due to OSError (disk full). The corrupted file is passed to the quantizer, which fails trying to parse the opset domain. The OSError is swallowed or not surfaced to the user.
Expected behavior
ModelKit should either:
- Check available disk space before
winml build starts and warn if below a threshold, or
- Catch
OSError during ONNX file writes and surface a clear error: "Insufficient disk space — unable to write output file."
Notes
- Severity: P2
- Reporter: Agency bug bash feedback
- Context: Full bug bash consumes significant disk — venv (~1.4 GB), HuggingFace cache (~14.4 GB), per-model artifacts (~200–400 MB each). Minimum ~20 GB free disk recommended.
Summary
When disk space is exhausted during
winml build, the optimize step writes a corrupted/truncated ONNX file. The subsequent quantize step then fails withValueError: Failed to find proper ai.onnx domain, giving no indication that the real cause is an out-of-disk-space condition. Users will chase a phantom code bug.Repro
# Fill disk to near-zero free space, then: winml build -c config.json -m microsoft/resnet-50 -o model_a_config/Error
Stack trace:
quantizer.py:166 → quantize.py:909 → quantize.py:702 → quant_utils.py:984 → quant_utils.py:977 → ValueErrorRoot Cause
The optimize step writes a truncated/zero-byte ONNX output due to
OSError(disk full). The corrupted file is passed to the quantizer, which fails trying to parse the opset domain. TheOSErroris swallowed or not surfaced to the user.Expected behavior
ModelKit should either:
winml buildstarts and warn if below a threshold, orOSErrorduring ONNX file writes and surface a clear error: "Insufficient disk space — unable to write output file."Notes