-
Notifications
You must be signed in to change notification settings - Fork 12
Closed
Description
Summary
MVP qualification for Qwen2.5-Coder-1.5B-Instruct fails with 15.9% pass rate. SafeTensors (ground truth) tests pass, but all format conversion tests fail.
Environment
- Model:
Qwen/Qwen2.5-Coder-1.5B-Instruct - Playbook:
qwen2.5-coder-1.5b-mvp.playbook.yaml - Test matrix: 3 formats × 2 backends × 3 modalities + contract tests
Results
| Category | Status |
|---|---|
| SafeTensors (ground truth) | ✅ 7/7 passed |
| GGUF/APR conversions | ❌ 25 failed |
| Skipped (dependencies) | 12 |
| Total pass rate | 15.9% |
Failure Categories
1. File Discovery Bug
Tests look for converted files in gguf/ and apr/ subdirectories, but apr rosetta convert outputs files at root with .converted. prefix:
Expected: /path/to/model/gguf/model.gguf
Actual: /path/to/model/model.converted.gguf
Evidence:
F-CONV-G-A: No gguf file found in .../gguf/ or .../
F-CONV-A-G: No apr file found in .../apr/ or .../
2. 95% Output Difference (Possible LAYOUT-002)
Converted formats produce nearly-opposite inference outputs:
F-CONV-S-G: Conversion SafeTensors → Gguf produced different output (diff: 9.55e-1, ε: 1.00e-6)
F-CONV-S-A: Conversion SafeTensors → Apr produced different output (diff: 9.50e-1, ε: 1.00e-6)
F-CONV-RT-002: Round-trip conversion produced different output
F-CONV-IDEM-001: Idempotency failure: Gguf→Apr produced different output on second conversion
A diff of 0.95 (95%) suggests the outputs are almost completely different, not just precision loss. This pattern matches LAYOUT-002 (row-major vs column-major mismatch).
Converted Files Generated
model.converted.apr 6.7G (SafeTensors→APR)
model.converted.converted.gguf 6.7G (APR→GGUF)
model.converted.converted.converted.apr 6.7G (GGUF→APR round-trip)
model.converted.converted.idem1.apr 6.7G (Idempotency test 1)
model.converted.converted.idem2.apr 6.7G (Idempotency test 2)
model.converted.converted.com_direct.apr 6.7G (Direct comparison)
model.converted.converted.com_via.com_indirect.apr 6.7G
Reproduction
cd apr-model-qa-playbook
cargo run --bin apr-qa -- run playbooks/models/qwen2.5-coder-1.5b-mvp.playbook.yamlExpected Behavior
- Converted files should be discoverable by qualification tests
- Format conversions should preserve inference output within tolerance (ε: 1.00e-6)
- Round-trip conversions should be idempotent
Cross-References
- LAYOUT-002 specification in
aprender/CLAUDE.md - Tensor Layout Contract in
aprender/docs/specifications/qwen2.5-coder-showcase-demo.mdSection E.8 - F-ROSETTA-CONVERT-001: SafeTensors conversion fails with NaN in tensor blk.0.attn_k.weight #190/apr bench: Support APR and SafeTensors formats #191 Five-Whys analysis
Labels
P0, conversion, LAYOUT-002, qualification
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels