Skip to content

v0.37.0

Choose a tag to compare

@noahgift noahgift released this 11 Jun 04:50
· 65 commits to main since this release
792d69b

Added

  • Sharded GGUF pull (#1893, pull-side): apr pull now detects and downloads
    COMPLETE split-GGUF model sets (<prefix>-NNNNN-of-MMMMM.gguf). Modern 7B+
    GGUFs ship split with NO index.json (unlike sharded SafeTensors), and apr pull previously ran them through select_best_gguf — silently grabbing a
    single part and producing a broken/incomplete model. Now resolve_hf_model
    detects the complete -of- set (rejecting single-file, multi-quant, and
    incomplete sets) and run_sharded_gguf downloads all parts via a no-index
    path (no SafeTensors conversion), pointing usage at the first part. Contract
    contracts/sharded-gguf-pull-v1.yaml (6 falsifiers FT-SHGGUF-001..006 + 2
    kani harnesses, all passing).
    • Scope: this is the pull side. Cross-shard inference in
      aprender-serve (reading split.count and loading tensors across parts so
      apr run/apr serve work on a split GGUF) is the documented follow-up
      (#1893 criterion 2) — the next release.