Skip to content

AbstractCore v2.13.22

Choose a tag to compare

@github-actions github-actions released this 20 May 20:20
· 22 commits to main since this release

Added

  • Provider-wide durable memory bloc caches: unified exact durable bloc KV artifacts across MLX, HuggingFace Transformers, and HuggingFace GGUF, including shared Python/server APIs, provider-native artifact formats, manifest validation, and request-time prompt_cache_binding proof.
  • Durable cache validation tooling and reports: added the durable bloc cache benchmark script plus real-provider validation reports covering processing-phase speedups, correct cached answers, artifact sizes, and provider compatibility limits.
  • HuggingFace cache-state coverage: expanded Transformers prompt-cache save/load coverage for standard dynamic caches, sliding-window caches, Qwen3.5 hybrid cache state, and Mamba-style tensor state; expanded GGUF persistence around llama.cpp RAM-cache state.
  • Prompt-cache planning records: completed the unified bloc-cache, HF Transformers, and HF GGUF backlog items; accepted ADR 0007 for durable memory bloc cache binding; kept speculative superbloc/exact-prefix recipe and live snapshot persistence work proposed.

Changed

  • Generation defaults: providers now consume inference_parameters from model/architecture metadata for omitted sampling knobs such as temperature, top_p, and top_k; Hugging Face Transformers also applies loaded generation_config.json defaults when present.
  • MLX sampling controls: MLX generation now builds an mlx-lm sampler from unified temperature, top_p, and top_k values instead of ignoring those controls at decode time.
  • Prompt-cache compatibility metadata: architecture and model capability assets now capture cache, reasoning/thinking, quantization, and generation-parameter defaults used by provider capability discovery.
  • Voice/audio compatibility floors: optional voice/audio install profiles now target abstractvoice>=0.10.11 and omnivoice>=0.1.5.

Fixed

  • HuggingFace greedy decoding: Transformers pipeline generation now treats temperature=0 as greedy decoding (do_sample=false) instead of forwarding an invalid sampling temperature.
  • HuggingFace model compatibility failures: unsupported FP8-on-MPS and broken quantized Transformers load paths now fail explicitly instead of being mistaken for prompt-cache failures.
  • Prompt-cache abstraction boundaries: live prompt-cache snapshot persistence is now documented as a proposed local-admin decision, not as a durable bloc or thin-client binding surface.