Skip to content

v1.0.4 — KV prefix-reuse on by default, recall diagnostics, note typing, and memory edges

Choose a tag to compare

@maeddesg maeddesg released this 18 Jun 17:54
· 2 commits to main since this release

KV prefix-reuse on by default, recall diagnostics, note typing, and memory edges. Recall stays byte-identical when no edges exist and no opt-ins are active. Memory is opt-in (--features memory + serve --memory); without it the inference path is unchanged.

  • KV prefix reuse is now on by default (VF_KV_PREFIX_REUSE=0 to disable) — removes the within-turn double-prefill on memory-augmented turns. The reused KV is logit byte-identical to a fresh prefill (standing gate tests/kv_reuse_ident.rs, F16 + FP8-KV paths). Measured (Qwen3-8B, warm steady-state median, isolated GPU with no competing load): the redundant within-turn re-prefill of the shared ~1.5k-token prefix (~460 ms) is eliminated.
  • recall --explain — diagnostic view: returned hits, near-misses, score separation, and the cut reason per near-miss (superseded / type / threshold / top-k).
  • Relevance threshold — opt-in via VF_RECALL_MARGIN, off by default (adaptive, relative-to-top).
  • Note typing--type on remember, /retype, and a --type filter on recall (invariant/working/episodic/decision/failure, default untyped).
  • SUPERSEDES edges/supersede / /unsupersede; superseded notes are suppressed from recall by default (--include-superseded to show), chains resolve to the current head, and recall backfills to k after suppression. Notes are suppressed, never deleted.
  • DERIVES_FROM edges + /why — explicit derivation links and a why-graph trace (cycle-guarded, depth-capped); never alters recall results.

Engine 1.0.3 → 1.0.4, vf-clide 0.3.2 → 0.3.3.