Skip to content

v0.6.0

@PCfVW PCfVW tagged this 21 May 09:07
Phase 6 ships a unified `amn convert` dispatch covering every
v0.6.0-available format pair through a single CLI subcommand, the
format-symmetric inverse of `parse_gguf` (`write_gguf`, scalar dtypes
today; quantised dtypes reserved for Phase 7.5 through the same
scaffold), and the end-to-end BF16 -> BnB-NF4 safetensors path with
the four-tensor companion layout (`weight`, `weight.absmax`,
`weight.quant_map`, `weight.quant_state.bitsandbytes__nf4`).

New library helpers:
  * `write_gguf` / `write_gguf_to_writer` / `GgufWriteTensor`
  * `npz_to_safetensors` / `npz_to_safetensors_bytes`
  * `write_bnb_nf4_safetensors` / `write_bnb_nf4_safetensors_bytes`
  * `BnbWriteInput` / `BnbNf4WriteStats` / `is_eligible_for_nf4` / `classify_inputs`
  * `NF4_BLOCK_SIZE`

New CLI surface:
  * `amn convert <input> --to {safetensors|gguf|bnb-nf4}`

13 byte-exact integration tests in `tests/cross_validation_convert.rs`
cover every v0.6.0 conversion pair both directions where reversible,
plus a size-matched perf comparison (`t14`, `#[ignore]`d, opt-in via
`--ignored`) against six checked-in Python sidecars
(numpy / safetensors-py / torch.load + safetensors.torch / gguf-py /
bitsandbytes-CPU, plus two PyTorch-CPU equivalents for the non-PyTorch
paths).

Measured CPU performance vs Python at 4096x4096, release,
target-cpu=native:

  npz -> safetensors          11.2 ms  vs 75.7 ms numpy   (6.75x)  /  92.5 ms torch (8.24x)
  pth -> safetensors           5.7 ms  vs 29.6 ms torch   (5.18x)
  safetensors-BF16 -> GGUF    13.6 ms  vs 15.1 ms gguf-py (1.11x)  /  29.6 ms torch+ggufpy (2.17x)
  safetensors-BF16 -> BnB-NF4  141 ms  vs 377 ms bnb-CPU  (2.67x)

Quantised GGUF emit (`gguf-q4km`, etc.) lands at v0.7.5 / Phase 7.5
via the same dispatch.
Assets 2
Loading