Skip to content

v0.41.0

Choose a tag to compare

@noahgift noahgift released this 11 Jun 04:50
· 60 commits to main since this release
1f6b21a

Fixed

  • Q2_K dequantization now matches ggml — fixes corrupt Q2_K output: both
    Q2_K dequant impls used a "16 sub-blocks reading qs[j*4]" scheme that applied
    the wrong super-block scale to the wrong 2-bit lanes, producing corrupt F32
    output (185/256 elements wrong vs ggml/candle on a representative block).
    This silently corrupted every Q2_K/Q2_K_S model (common on HF) — both the
    format path (apr tensors/inspect/validate/convert) and the inference
    path
    (apr run/serve). Fixed both (aprender-core format dequant +
    aprender-serve inference dequant) to match ggml dequantize_row_q2_K /
    candle BlockQ2K::to_float byte-for-byte (golden falsifiers FT-Q2K-001/002,
    contract contracts/q2k-dequant-parity-v1.yaml).