-
Notifications
You must be signed in to change notification settings - Fork 1
Memory
Server-side, persistent, project-scoped semantic memory — shipped in v1.0, opt-in (off by default); client access & curation in v1.0.2; agent-side archive and reversible un-archive in v1.0.3; recall diagnostics (--explain), an opt-in relevance threshold, note typing, and memory edges (SUPERSEDES / DERIVES_FROM + /why) in v1.0.4; a conflict edge (CONTRADICTS), opt-in frontier retrieval, edge-type priors, and cross-process recall determinism (on SQLiteGraph 3.3.1) in v1.0.5.
VulkanForge's memory is a deliberate notebook, not a recording device.
It exists so that a decision you made three weeks ago, a benchmark you ran last sprint, or a bug you finally
cornered doesn't evaporate when the session ends or the model is swapped out. You write to it on purpose —
remember — and you read from it by meaning — recall. What comes back is what was put in, surfaced because it
is relevant to what you asked, not because something guessed you might want it.
The full design these build toward — sleep consolidation, the six layers, watermark-based trust — is on Memory Design.
A few things it deliberately is:
- Yours, and your project's — not the model's. The store lives on your disk, in your process, on your hardware. Any model that connects reads the same memory; swap one model for another and the record is unchanged. The model is the lens; the memory is what's written down. When a model's context window closes, this doesn't.
- Local and single-user, all the way down. No cloud, no telemetry, nothing leaving your machine. The embeddings are computed on your own CPU; the vectors and the graph sit in one SQLite file. That's the whole surface.
- Scoped by construction. Each coding project gets its own index, and recall in one project physically cannot return another's notes — the partitions never touch. The general, non-project things live in a shared global scope. Isolation isn't a filter that has to be applied correctly; it's the shape of the thing.
- Curated, not hoarded. It holds what's worth keeping — decisions, learnings, benchmarks, bugs — written as deliberate notes. Quality is something you tend, not something assumed.
And, just as deliberately, a few things it is not:
- Not a transcript. It does not sit in the background recording everything you say. There is no conversation log, no ambient capture, no surveillance. Silence is the default; you choose what to keep.
- Not auto-injected. Recall is something you (or the agent) invoke, and its results are visible. The memory never quietly stuffs your context behind your back — control over what comes back stays with you.
- Not an oracle. It stores what was written. A stale or wrong note stays wrong until someone curates it. It will remember your mistakes as faithfully as your insights — that's a memory's honesty, not a flaw to paper over.
- Not magic. Underneath it's vector search over embeddings and a SQLite graph — understood, inspectable mechanics, not a black box. If a recall surprises you, you can trace why.
- Not a stand-in for thinking. It surfaces relevant past context so you don't have to hold everything in your head. It doesn't decide for you, and it isn't a substitute for judgment — yours or the agent's.
The short version: it's the project's long-term memory — kept on purpose, owned by you, and honest about being a tool rather than a mind.
Memory is opt-in and off by default — fitting for a deliberate notebook: you switch it on, it doesn't switch itself on. There are two gates.
1. Build it in (compile-time). The subsystem and its two native deps (SQLiteGraph + the ONNX embedder) sit behind a Cargo feature, so the default build stays lean and free of them:
cargo build --release # lean default: no memory, ~25 MB binary
cargo build --release --features memory # memory compiled in, ~58 MB binaryThe memory build needs a newer toolchain — Rust 1.89+ (the sqlitegraph dep is edition-2024 and declares
rust-version = 1.89; ort declares 1.88). The lean default still builds on Rust 1.85+. See Installation.
2. Turn it on (runtime). Even a memory-enabled binary starts with memory off. Activate it per run with the flag or the env alias:
vulkanforge serve --model ~/models/Qwen3-8B-Q4_K_M.gguf --memory
# or:
VULKANFORGE_MEMORY=1 vulkanforge serve --model ~/models/Qwen3-8B-Q4_K_M.ggufWithout --memory, /memory/* returns 503 and the server runs inference only — no embedder is loaded and
no database is opened, so an inference-only run carries zero memory overhead. Pass --memory to a lean binary
(built without the feature) and it stops immediately with a clear rebuild with --features memory message, before
the model loads.
Where it lives. One SQLite file at ~/.vulkanforge/memory.db (override with VF_MEMORY_DB), with the embedding
model cached in the sibling ~/.vulkanforge/embed-cache/. The first --memory start downloads the Nomic model
there once (needs network that one time); every start after is offline. The embedder runs on the CPU — it never
touches VRAM (see Hardware and Compatibility). Flags are summarized in Configuration.
The memory subsystem is embedded in the vulkanforge serve process (opt-in — see Enabling memory above), with
no separate daemon. When activated it exposes VF-native endpoints under /memory/* (a namespace separate from
the OpenAI-compatible /v1/*).
POST /memory/remember
{ "project_key": "vf", "kind": "Learning",
"text": "Dispatch-Reduktion hilft nicht auf gfx1201",
"name": "optional label", "metadata": { ... } }
→ { "id": 1, "deduped": false }
POST /memory/recall
{ "project_key": "vf", "query": "do fewer barriers help performance?", "k": 3,
"type": "decision", "explain": true, "include_superseded": false, "frontier": false }
// all optional; type/explain/include_superseded v1.0.4, frontier v1.0.5
→ { "hits": [ { "id": 1, "kind": "Learning", "name": "...", "text": "...",
"status": "active", "type": "untyped", "score": 0.72 } ],
"explain": { "top_k": 3, "threshold": null, "query_dim": 768,
"near_miss": [ { ..., "cut": "top-k" } ], "separation": 0.06 } } // only when explain
POST /memory/projects { "project_key": "vf", "name": "optional" } → { "id", "project_key" }
GET /memory/projects → { "projects": [ ... ] }
POST /memory/archive { "project_key": "vf", "id": 1 } → { "id", "status": "archived" }
POST /memory/unarchive { "project_key": "vf", "id": 1 } → { "id", "status": "active" }
POST /memory/delete { "project_key": "vf", "id": 1 } → { "id", "deleted": true }
POST /memory/retype { "project_key": "vf", "id": 1, "type": "decision" } → { "id", "type" } // v1.0.4
POST /memory/supersede { "project_key": "vf", "new_id": 9, "old_id": 5 } → { "new_id", "old_id", "superseded": true } // v1.0.4
POST /memory/unsupersede { "project_key": "vf", "new_id": 9, "old_id": 5 } → { ..., "superseded": false }
POST /memory/derive { "project_key": "vf", "from_id": 9, "to_ids": [5, 7] } → { "from_id", "to_ids", "derived": true } // v1.0.4
POST /memory/underive { "project_key": "vf", "from_id": 9, "to_id": 5 } → { ..., "derived": false }
POST /memory/why { "project_key": "vf", "id": 9 } → { tree: id/type/text/derives_from[…] } // why-graph trace
POST /memory/contradict { "project_key": "vf", "a": 9, "b": 5 } → { "a", "b", "contradicts": true } // symmetric // v1.0.5
POST /memory/uncontradict { "project_key": "vf", "a": 9, "b": 5 } → { ..., "contradicts": false }project_key is optional everywhere — omit it and the call uses a shared global scope (__global__). score
is cosine similarity in [0,1] (higher = closer). A curation or connection call (archive / unarchive /
delete / retype / supersede / unsupersede / derive / underive / why / contradict / uncontradict)
for an id that doesn't exist returns 404 Not Found — distinct from a real server fault, which is still a 500.
A self-edge (supersede / derive / contradict with two equal ids) and an unknown type are rejected with
400.
Curation. remember de-duplicates a near-identical note instead of storing it twice — it returns the existing
id with "deduped": true (v1.0.2). /memory/archive drops a note from recall but keeps the node and its trace,
and /memory/unarchive restores it — archiving is reversible (v1.0.3); /memory/delete removes it outright
(node + vector). Delete and un-archive are user-driven (surfaced in vf-clide as /forget and /unarchive);
archive the agent may also do — but only behind a confirmation that shows the note's real stored text (see
vf-clide).
Recall grew a diagnostic lens, a relevance gate, and a connection layer — all additive: with no edges and no opt-ins active, recall is byte-identical to before.
-
recall --explain(diagnostics). A read-only view of why recall returned what it did: thereturnedhits, thenear_misscandidates that fell just outside, the cut reason per near-miss (superseded→type→threshold→top-k), the cutoff values, and the scoreseparationbetween the last hit and the first near-miss. Default recall is unchanged — theexplainblock is opt-in. -
Relevance threshold (opt-in, off by default). Set
VF_RECALL_MARGIN=<f>on the server to keep only notes scoring within<f>(cosine) of the top hit. It's adaptive (relative-to-top — the cosine scale isn't calibrated across queries) and errs toward inclusion. Unset → pure top-k, unchanged. When active,--explainlabels the trimmed notescut: threshold. -
Note typing. A note carries a layer
type—invariant/working/episodic/decision/failure, defaultuntyped(old notes need no backfill). Set it with--typeonrememberor/retype <id> <type>, and filter recall with--type <T>. The type is an explicit, user-set, non-embedding signal — it disambiguates reliably where similarity can't (adjacent-domain notes share a score band but differ in type). It's pure metadata: retyping never re-embeds or re-ranks. -
SUPERSEDESedges (versioning)./supersede <new> <old>marksoldstale; a superseded note is suppressed from default recall, chains resolve to the un-superseded head, and recall backfills tokso suppression never silently shrinks the result. It is suppressed, never deleted —/unsupersedereleases it and--include-supersededshows it (mark-not-destroy, like archive). -
DERIVES_FROMedges +/why(the why-graph)./derive <A> from <B> [<C> …]records that a note is anchored in its evidence;/why <id>walks those links backward into a justification tree (cycle-guarded, depth-capped).DERIVES_FROMnever changes recall results — it is additive awareness, surfaced only in--explain(derives from #B) and/why. Default recall is byte-identical even withDERIVES_FROMedges present. -
The agent doesn't draw edges or set types. Edges and types are user curation (
/supersede,/derive,/retypeare user-only) — the agent contributes notes, you curate the structure.
(Memory-augmented turns also reuse the shared KV prefix by default in v1.0.4 — that's an inference-engine change, not a memory one; see the release notes and Configuration.)
Three more connection features — all still additive and opt-in, all leaving default recall byte-identical — over a determinism guarantee underneath:
-
CONTRADICTSedge (conflict awareness). A third, symmetric edge marking two notes as in conflict:/contradict <id> <id>and/uncontradict(order doesn't matter). It is awareness only — no suppression, no winner, no auto-resolution. The conflict surfaces in--explain(⚠ conflicts with #X, plus conflict pairs⚠ #A ↔ #B); you decide the outcome with the existing/supersede. Default recall is unchanged. -
Opt-in frontier retrieval (
--frontier, off by default). Normally recall is pure top-k. With--frontierit reserves a few slots (VF_FRONTIER_SLOTS, default 2) for evidence linked to a top-k hit (a seed) byDERIVES_FROM— one hop — pulling a below-cut premise up next to the hit it supports.--explainlabels each pick seed or frontier. Unset → byte-identical to plain top-k. -
Edge-type priors (a categorical negative signal). Edge types carry roles, not tunable weights:
DERIVES_FROMpulls into the frontier,CONTRADICTSwithholds. A frontier candidate thatCONTRADICTSa seed is held back from the reserved slots — surfaced transparently in--explainasfrontier withheld — contested by #seed— and the freed slot goes to the next clean candidate. With noCONTRADICTSedge,--frontieris identical to the plain frontier. The frontier never amplifies evidence that a more relevant hit disputes. -
Cross-process recall determinism. The HNSW index is built with a pinned seed (
VF_HNSW_SEED), honored by SQLiteGraph 3.3.1, so two separate processes that build the same store recall byte-identically — recall is reproducible across restarts, not only within one run. A committed integration test enforces it.
-
One store. SQLiteGraph (3.3.1, GPL-3.0) holds the nodes
(projects + content), the edges (
CONTAINS, theSUPERSEDES/DERIVES_FROMconnections from v1.0.4, and the symmetricCONTRADICTSedge from v1.0.5), and the per-project HNSW vector indexes — all in one SQLite file (default~/.vulkanforge/memory.db, override withVF_MEMORY_DB). Edges use the native SQLiteGraph edge API; theSUPERSEDES/DERIVES_FROM/CONTRADICTSlookups are index-backed (the conflict lookup is a symmetric union over both edge directions). Since 3.3.1 the HNSW level distributor honors a fixed seed (VF_HNSW_SEED), so index builds — and therefore recall — are reproducible across processes. -
CPU embedder. fastembed (5.16.2, ONNX Runtime) runs
Nomic-Embed v1.5-Q (768-dim, INT8 → AVX-512/VNNI on Zen4). The task prefix the model needs
(
search_document:for writes,search_query:for reads) is applied for you. Embedding runs on the CPU and never takes the GPU permit, so arecallnever waits behind a generation. -
Per-project index. Each
project_keygets its own persistent HNSW index (768-dim, cosine, m=16, ef_construction=200). A recall searches only that index, so project isolation is structural, not a filter. - Persistence. The store survives restarts — on reopen the vectors are restored from the SQLite store with no re-embedding. Shutdown flushes the HNSW topology and WAL-checkpoints.
-
First start downloads the Nomic ONNX model from HuggingFace into
~/.vulkanforge/embed-cache, then runs offline. The two native deps (static ONNX Runtime + bundled SQLite) add ~34 MB to thevulkanforgebinary. -
Honest mechanics. The HNSW index assigns its own vector ids, so the graph node id is carried in the vector's
metadata and recovered on recall (
search → get_vector → node_id → get_entity); project lookups use plain SQL on the graph'sgraph_entitiestable. If the embedder can't be loaded,/memory/*returns503and the inference server still runs.
The mechanics above are the foundation; the fuller design they're heading toward — six memory layers, sleep-style consolidation, watermark-based trust — is laid out on Memory Design.
v1.0.2 reads, writes, and curates — and the client now reaches it. Since v1.0.1 the store has written and read;
v1.0.2 adds curation and the vf-clide client layer. Shipped on top of the foundation:
-
Curation (v1.0.2 – v1.0.3). Notes are no longer write-only — a near-duplicate
rememberis de-duplicated on write,/archive <id>drops a note from recall while keeping the trace,/unarchive <id>restores it (v1.0.3 — archiving is reversible), and/forget <id>removes it outright. Delete and un-archive are user actions; the agent mayarchivea note it recalled this session — behind a confirmation showing the note's real text — but never deletes or un-archives on its own. -
Client integration (v1.0.2 – v1.0.5). vf-clide's REPL exposes the full set:
/project [key]·/recall <query> [--explain] [--frontier] [--type <T>] [--include-superseded]·/remember [--type <T>] <text>·/retype <id> <T>·/supersede <new> <old>·/unsupersede <new> <old>·/derive <id> from <id…>·/underive <id> from <id>·/why <id>·/contradict <id> <id>·/uncontradict <id> <id>·/archive <id>·/unarchive <id>·/forget <id>. In the agent loop only therecall/remember/archivetools are exposed. They run on their own axis — direct calls to the server's/memory/*, touching neither files nor the shell — so they stay visible (every call prints a marker) but sit outside the file/shell permission ceiling, available whenever the server has memory on;archiveadds an always-on confirmation. Curation and the connection layer (/unarchive,/forget,/retype,/supersede,/derive, …) stay user-only — the agent never deletes, un-archives, types, or draws edges. -
Diagnostics, typing, and the connection layer (v1.0.4).
recall --explainopens up why recall returned what it did; an opt-inVF_RECALL_MARGINrelevance threshold (off by default) trims to the top band; note typing (invariant/working/episodic/decision/failure) adds a non-embedding signal that disambiguates where similarity can't; and the graph gained two typed edges —SUPERSEDES(stale-suppression with backfill tok, reversible) andDERIVES_FROM+/why(the why-graph, which never alters recall). All additive: with no edges and no opt-ins active, recall is byte-identical to before. -
Conflict edges, the opt-in frontier, and determinism (v1.0.5). A third edge,
CONTRADICTS(symmetric, awareness-only, resolved with/supersede); an opt-in retrieval frontier (--frontier, off by default) that pulls a hit'sDERIVES_FROMevidence into a few reserved slots; edge-type priors that letCONTRADICTShold contested evidence out of the frontier; and — on SQLiteGraph 3.3.1 — a pinned HNSW seed making recall reproducible across processes. Still all additive: default recall byte-identical, the frontier off by default, the conflict edge never changes ranking.
What's deliberately still not here:
-
Lifecycle — typing labels a note's layer, but notes are still written
activeand stay there: nodraft → confirmed → active → stale → archivedtransitions, no touch-on-read aging, no per-layer decay policy yet. -
Graph — growing.
CONTAINSplusSUPERSEDES,DERIVES_FROM(v1.0.4), andCONTRADICTS(v1.0.5); auto-derivation (edges drawn for you, including auto-detected contradictions) is still to come, as is the why-graph's reverse "what depends on this?". -
No auto-injection — by design, not a gap.
recallis always an explicit call; the memory never loads itself into a prompt behind your back.DERIVES_FROMkeeps this even with edges present — it surfaces premises in--explain//why, never injects them. This one stays.
VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 ·
Repository · Releases