v1.0.4 — KV prefix-reuse on by default, recall diagnostics, note typing, and memory edges
KV prefix-reuse on by default, recall diagnostics, note typing, and memory edges. Recall stays byte-identical when no edges exist and no opt-ins are active. Memory is opt-in (--features memory + serve --memory); without it the inference path is unchanged.
- KV prefix reuse is now on by default (
VF_KV_PREFIX_REUSE=0to disable) — removes the within-turn double-prefill on memory-augmented turns. The reused KV is logit byte-identical to a fresh prefill (standing gatetests/kv_reuse_ident.rs, F16 + FP8-KV paths). Measured (Qwen3-8B, warm steady-state median, isolated GPU with no competing load): the redundant within-turn re-prefill of the shared ~1.5k-token prefix (~460 ms) is eliminated. recall --explain— diagnostic view: returned hits, near-misses, score separation, and the cut reason per near-miss (superseded/type/threshold/top-k).- Relevance threshold — opt-in via
VF_RECALL_MARGIN, off by default (adaptive, relative-to-top). - Note typing —
--typeon remember,/retype, and a--typefilter on recall (invariant/working/episodic/decision/failure, defaultuntyped). SUPERSEDESedges —/supersede//unsupersede; superseded notes are suppressed from recall by default (--include-supersededto show), chains resolve to the current head, and recall backfills tokafter suppression. Notes are suppressed, never deleted.DERIVES_FROMedges +/why— explicit derivation links and a why-graph trace (cycle-guarded, depth-capped); never alters recall results.
Engine 1.0.3 → 1.0.4, vf-clide 0.3.2 → 0.3.3.