CONV-001: MVP qualification fails with 95% output difference after format conversion

## Summary

MVP qualification for Qwen2.5-Coder-1.5B-Instruct fails with 15.9% pass rate. SafeTensors (ground truth) tests pass, but all format conversion tests fail.

## Environment

- Model: `Qwen/Qwen2.5-Coder-1.5B-Instruct`
- Playbook: `qwen2.5-coder-1.5b-mvp.playbook.yaml`
- Test matrix: 3 formats × 2 backends × 3 modalities + contract tests

## Results

| Category | Status |
|----------|--------|
| SafeTensors (ground truth) | ✅ 7/7 passed |
| GGUF/APR conversions | ❌ 25 failed |
| Skipped (dependencies) | 12 |
| **Total pass rate** | **15.9%** |

## Failure Categories

### 1. File Discovery Bug

Tests look for converted files in `gguf/` and `apr/` subdirectories, but `apr rosetta convert` outputs files at root with `.converted.` prefix:

```
Expected: /path/to/model/gguf/model.gguf
Actual:   /path/to/model/model.converted.gguf
```

Evidence:
```
F-CONV-G-A: No gguf file found in .../gguf/ or .../
F-CONV-A-G: No apr file found in .../apr/ or .../
```

### 2. 95% Output Difference (Possible LAYOUT-002)

Converted formats produce nearly-opposite inference outputs:

```
F-CONV-S-G: Conversion SafeTensors → Gguf produced different output (diff: 9.55e-1, ε: 1.00e-6)
F-CONV-S-A: Conversion SafeTensors → Apr produced different output (diff: 9.50e-1, ε: 1.00e-6)
F-CONV-RT-002: Round-trip conversion produced different output
F-CONV-IDEM-001: Idempotency failure: Gguf→Apr produced different output on second conversion
```

A diff of 0.95 (95%) suggests the outputs are almost completely different, not just precision loss. This pattern matches **LAYOUT-002** (row-major vs column-major mismatch).

## Converted Files Generated

```
model.converted.apr                      6.7G  (SafeTensors→APR)
model.converted.converted.gguf           6.7G  (APR→GGUF)
model.converted.converted.converted.apr  6.7G  (GGUF→APR round-trip)
model.converted.converted.idem1.apr      6.7G  (Idempotency test 1)
model.converted.converted.idem2.apr      6.7G  (Idempotency test 2)
model.converted.converted.com_direct.apr 6.7G  (Direct comparison)
model.converted.converted.com_via.com_indirect.apr 6.7G
```

## Reproduction

```bash
cd apr-model-qa-playbook
cargo run --bin apr-qa -- run playbooks/models/qwen2.5-coder-1.5b-mvp.playbook.yaml
```

## Expected Behavior

1. Converted files should be discoverable by qualification tests
2. Format conversions should preserve inference output within tolerance (ε: 1.00e-6)
3. Round-trip conversions should be idempotent

## Cross-References

- LAYOUT-002 specification in `aprender/CLAUDE.md`
- Tensor Layout Contract in `aprender/docs/specifications/qwen2.5-coder-showcase-demo.md` Section E.8
- GH-190/GH-191 Five-Whys analysis

## Labels

P0, conversion, LAYOUT-002, qualification

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CONV-001: MVP qualification fails with 95% output difference after format conversion #205

Summary

Environment

Results

Failure Categories

1. File Discovery Bug

2. 95% Output Difference (Possible LAYOUT-002)

Converted Files Generated

Reproduction

Expected Behavior

Cross-References

Labels

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Category	Status
SafeTensors (ground truth)	✅ 7/7 passed
GGUF/APR conversions	❌ 25 failed
Skipped (dependencies)	12
Total pass rate	15.9%

CONV-001: MVP qualification fails with 95% output difference after format conversion #205

Description

Summary

Environment

Results

Failure Categories

1. File Discovery Bug

2. 95% Output Difference (Possible LAYOUT-002)

Converted Files Generated

Reproduction

Expected Behavior

Cross-References

Labels

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions