Skip to content

CONV-001: MVP qualification fails with 95% output difference after format conversion #205

@noahgift

Description

@noahgift

Summary

MVP qualification for Qwen2.5-Coder-1.5B-Instruct fails with 15.9% pass rate. SafeTensors (ground truth) tests pass, but all format conversion tests fail.

Environment

  • Model: Qwen/Qwen2.5-Coder-1.5B-Instruct
  • Playbook: qwen2.5-coder-1.5b-mvp.playbook.yaml
  • Test matrix: 3 formats × 2 backends × 3 modalities + contract tests

Results

Category Status
SafeTensors (ground truth) ✅ 7/7 passed
GGUF/APR conversions ❌ 25 failed
Skipped (dependencies) 12
Total pass rate 15.9%

Failure Categories

1. File Discovery Bug

Tests look for converted files in gguf/ and apr/ subdirectories, but apr rosetta convert outputs files at root with .converted. prefix:

Expected: /path/to/model/gguf/model.gguf
Actual:   /path/to/model/model.converted.gguf

Evidence:

F-CONV-G-A: No gguf file found in .../gguf/ or .../
F-CONV-A-G: No apr file found in .../apr/ or .../

2. 95% Output Difference (Possible LAYOUT-002)

Converted formats produce nearly-opposite inference outputs:

F-CONV-S-G: Conversion SafeTensors → Gguf produced different output (diff: 9.55e-1, ε: 1.00e-6)
F-CONV-S-A: Conversion SafeTensors → Apr produced different output (diff: 9.50e-1, ε: 1.00e-6)
F-CONV-RT-002: Round-trip conversion produced different output
F-CONV-IDEM-001: Idempotency failure: Gguf→Apr produced different output on second conversion

A diff of 0.95 (95%) suggests the outputs are almost completely different, not just precision loss. This pattern matches LAYOUT-002 (row-major vs column-major mismatch).

Converted Files Generated

model.converted.apr                      6.7G  (SafeTensors→APR)
model.converted.converted.gguf           6.7G  (APR→GGUF)
model.converted.converted.converted.apr  6.7G  (GGUF→APR round-trip)
model.converted.converted.idem1.apr      6.7G  (Idempotency test 1)
model.converted.converted.idem2.apr      6.7G  (Idempotency test 2)
model.converted.converted.com_direct.apr 6.7G  (Direct comparison)
model.converted.converted.com_via.com_indirect.apr 6.7G

Reproduction

cd apr-model-qa-playbook
cargo run --bin apr-qa -- run playbooks/models/qwen2.5-coder-1.5b-mvp.playbook.yaml

Expected Behavior

  1. Converted files should be discoverable by qualification tests
  2. Format conversions should preserve inference output within tolerance (ε: 1.00e-6)
  3. Round-trip conversions should be idempotent

Cross-References

Labels

P0, conversion, LAYOUT-002, qualification

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions