Summary
GGUF inference produces garbage output like "Domainuster random_ random random.Mult" instead of correct answers.
Evidence
apr trace hf://Qwen/Qwen2.5-Coder-0.5B-Instruct-GGUF/qwen2.5-coder-0.5b-instruct-q4_k_m.gguf --payload
Test prompt: "What is 2+2?"
Expected: "4" or "The answer is 4"
Actual: "Domainuster random_ random random.Mult"
Root Cause
LAYOUT-001: Column-major vs row-major kernel mismatch in quantized matmul.
GGUF/APR use ROW-MAJOR layout but some kernels assume COLUMN-MAJOR.
Acceptance Criteria
Files to Investigate
realizar/src/gguf_monolith.rs - forward_single_with_cache
realizar/src/quantize/ - Q4K/Q6K kernels
trueno/src/backends/q4k.rs - matmul kernels
Labels
bug, P0, LAYOUT-001, quantization
Summary
GGUF inference produces garbage output like "Domainuster random_ random random.Mult" instead of correct answers.
Evidence
Root Cause
LAYOUT-001: Column-major vs row-major kernel mismatch in quantized matmul.
GGUF/APR use ROW-MAJOR layout but some kernels assume COLUMN-MAJOR.
Acceptance Criteria
apr trace --payloadon GGUF shows correct "4" outputFiles to Investigate
realizar/src/gguf_monolith.rs- forward_single_with_cacherealizar/src/quantize/- Q4K/Q6K kernelstrueno/src/backends/q4k.rs- matmul kernelsLabels
bug, P0, LAYOUT-001, quantization