Skip to content
maeddesg edited this page Jun 20, 2026 · 7 revisions

Memory

Server-side, persistent, project-scoped semantic memory — shipped in v1.0, opt-in (off by default); client access & curation in v1.0.2; agent-side archive and reversible un-archive in v1.0.3; recall diagnostics (--explain), an opt-in relevance threshold, note typing, and memory edges (SUPERSEDES / DERIVES_FROM + /why) in v1.0.4; a conflict edge (CONTRADICTS), opt-in frontier retrieval, edge-type priors, and cross-process recall determinism (on SQLiteGraph 3.3.1) in v1.0.5.

What this memory is — and what it isn't

VulkanForge's memory is a deliberate notebook, not a recording device.

It exists so that a decision you made three weeks ago, a benchmark you ran last sprint, or a bug you finally cornered doesn't evaporate when the session ends or the model is swapped out. You write to it on purpose — remember — and you read from it by meaning — recall. What comes back is what was put in, surfaced because it is relevant to what you asked, not because something guessed you might want it.

The full design these build toward — sleep consolidation, the six layers, watermark-based trust — is on Memory Design.

A few things it deliberately is:

  • Yours, and your project's — not the model's. The store lives on your disk, in your process, on your hardware. Any model that connects reads the same memory; swap one model for another and the record is unchanged. The model is the lens; the memory is what's written down. When a model's context window closes, this doesn't.
  • Local and single-user, all the way down. No cloud, no telemetry, nothing leaving your machine. The embeddings are computed on your own CPU; the vectors and the graph sit in one SQLite file. That's the whole surface.
  • Scoped by construction. Each coding project gets its own index, and recall in one project physically cannot return another's notes — the partitions never touch. The general, non-project things live in a shared global scope. Isolation isn't a filter that has to be applied correctly; it's the shape of the thing.
  • Curated, not hoarded. It holds what's worth keeping — decisions, learnings, benchmarks, bugs — written as deliberate notes. Quality is something you tend, not something assumed.

And, just as deliberately, a few things it is not:

  • Not a transcript. It does not sit in the background recording everything you say. There is no conversation log, no ambient capture, no surveillance. Silence is the default; you choose what to keep.
  • Not auto-injected. Recall is something you (or the agent) invoke, and its results are visible. The memory never quietly stuffs your context behind your back — control over what comes back stays with you.
  • Not an oracle. It stores what was written. A stale or wrong note stays wrong until someone curates it. It will remember your mistakes as faithfully as your insights — that's a memory's honesty, not a flaw to paper over.
  • Not magic. Underneath it's vector search over embeddings and a SQLite graph — understood, inspectable mechanics, not a black box. If a recall surprises you, you can trace why.
  • Not a stand-in for thinking. It surfaces relevant past context so you don't have to hold everything in your head. It doesn't decide for you, and it isn't a substitute for judgment — yours or the agent's.

The short version: it's the project's long-term memory — kept on purpose, owned by you, and honest about being a tool rather than a mind.

Enabling memory

Memory is opt-in and off by default — fitting for a deliberate notebook: you switch it on, it doesn't switch itself on. There are two gates.

1. Build it in (compile-time). The subsystem and its two native deps (SQLiteGraph + the ONNX embedder) sit behind a Cargo feature, so the default build stays lean and free of them:

cargo build --release                    # lean default: no memory, ~25 MB binary
cargo build --release --features memory  # memory compiled in, ~58 MB binary

The memory build needs a newer toolchain — Rust 1.89+ (the sqlitegraph dep is edition-2024 and declares rust-version = 1.89; ort declares 1.88). The lean default still builds on Rust 1.85+. See Installation.

2. Turn it on (runtime). Even a memory-enabled binary starts with memory off. Activate it per run with the flag or the env alias:

vulkanforge serve --model ~/models/Qwen3-8B-Q4_K_M.gguf --memory
# or:
VULKANFORGE_MEMORY=1 vulkanforge serve --model ~/models/Qwen3-8B-Q4_K_M.gguf

Without --memory, /memory/* returns 503 and the server runs inference only — no embedder is loaded and no database is opened, so an inference-only run carries zero memory overhead. Pass --memory to a lean binary (built without the feature) and it stops immediately with a clear rebuild with --features memory message, before the model loads.

Where it lives. One SQLite file at ~/.vulkanforge/memory.db (override with VF_MEMORY_DB), with the embedding model cached in the sibling ~/.vulkanforge/embed-cache/. The first --memory start downloads the Nomic model there once (needs network that one time); every start after is offline. The embedder runs on the CPU — it never touches VRAM (see Hardware and Compatibility). Flags are summarized in Configuration.

How it works

The memory subsystem is embedded in the vulkanforge serve process (opt-in — see Enabling memory above), with no separate daemon. When activated it exposes VF-native endpoints under /memory/* (a namespace separate from the OpenAI-compatible /v1/*).

Endpoints

POST /memory/remember
  { "project_key": "vf", "kind": "Learning",
    "text": "Dispatch-Reduktion hilft nicht auf gfx1201",
    "name": "optional label", "metadata": { ... } }
  → { "id": 1, "deduped": false }

POST /memory/recall
  { "project_key": "vf", "query": "do fewer barriers help performance?", "k": 3,
    "type": "decision", "explain": true, "include_superseded": false, "frontier": false }
    // all optional; type/explain/include_superseded v1.0.4, frontier v1.0.5
  → { "hits": [ { "id": 1, "kind": "Learning", "name": "...", "text": "...",
                  "status": "active", "type": "untyped", "score": 0.72 } ],
      "explain": { "top_k": 3, "threshold": null, "query_dim": 768,
                   "near_miss": [ { ..., "cut": "top-k" } ], "separation": 0.06 } }   // only when explain

POST /memory/projects   { "project_key": "vf", "name": "optional" }   → { "id", "project_key" }
GET  /memory/projects                                                 → { "projects": [ ... ] }

POST /memory/archive    { "project_key": "vf", "id": 1 }   → { "id", "status": "archived" }
POST /memory/unarchive  { "project_key": "vf", "id": 1 }   → { "id", "status": "active" }
POST /memory/delete     { "project_key": "vf", "id": 1 }   → { "id", "deleted": true }
POST /memory/retype     { "project_key": "vf", "id": 1, "type": "decision" }   → { "id", "type" }     // v1.0.4

POST /memory/supersede    { "project_key": "vf", "new_id": 9, "old_id": 5 }   → { "new_id", "old_id", "superseded": true }   // v1.0.4
POST /memory/unsupersede  { "project_key": "vf", "new_id": 9, "old_id": 5 }   → { ..., "superseded": false }
POST /memory/derive       { "project_key": "vf", "from_id": 9, "to_ids": [5, 7] }   → { "from_id", "to_ids", "derived": true }   // v1.0.4
POST /memory/underive     { "project_key": "vf", "from_id": 9, "to_id": 5 }   → { ..., "derived": false }
POST /memory/why          { "project_key": "vf", "id": 9 }   → { tree: id/type/text/derives_from[] }   // why-graph trace

POST /memory/contradict   { "project_key": "vf", "a": 9, "b": 5 }   → { "a", "b", "contradicts": true }   // symmetric  // v1.0.5
POST /memory/uncontradict { "project_key": "vf", "a": 9, "b": 5 }   → { ..., "contradicts": false }

project_key is optional everywhere — omit it and the call uses a shared global scope (__global__). score is cosine similarity in [0,1] (higher = closer). A curation or connection call (archive / unarchive / delete / retype / supersede / unsupersede / derive / underive / why / contradict / uncontradict) for an id that doesn't exist returns 404 Not Found — distinct from a real server fault, which is still a 500. A self-edge (supersede / derive / contradict with two equal ids) and an unknown type are rejected with 400.

Curation. remember de-duplicates a near-identical note instead of storing it twice — it returns the existing id with "deduped": true (v1.0.2). /memory/archive drops a note from recall but keeps the node and its trace, and /memory/unarchive restores it — archiving is reversible (v1.0.3); /memory/delete removes it outright (node + vector). Delete and un-archive are user-driven (surfaced in vf-clide as /forget and /unarchive); archive the agent may also do — but only behind a confirmation that shows the note's real stored text (see vf-clide).

Diagnostics, typing, and edges (v1.0.4)

Recall grew a diagnostic lens, a relevance gate, and a connection layer — all additive: with no edges and no opt-ins active, recall is byte-identical to before.

  • recall --explain (diagnostics). A read-only view of why recall returned what it did: the returned hits, the near_miss candidates that fell just outside, the cut reason per near-miss (supersededtypethresholdtop-k), the cutoff values, and the score separation between the last hit and the first near-miss. Default recall is unchanged — the explain block is opt-in.
  • Relevance threshold (opt-in, off by default). Set VF_RECALL_MARGIN=<f> on the server to keep only notes scoring within <f> (cosine) of the top hit. It's adaptive (relative-to-top — the cosine scale isn't calibrated across queries) and errs toward inclusion. Unset → pure top-k, unchanged. When active, --explain labels the trimmed notes cut: threshold.
  • Note typing. A note carries a layer typeinvariant / working / episodic / decision / failure, default untyped (old notes need no backfill). Set it with --type on remember or /retype <id> <type>, and filter recall with --type <T>. The type is an explicit, user-set, non-embedding signal — it disambiguates reliably where similarity can't (adjacent-domain notes share a score band but differ in type). It's pure metadata: retyping never re-embeds or re-ranks.
  • SUPERSEDES edges (versioning). /supersede <new> <old> marks old stale; a superseded note is suppressed from default recall, chains resolve to the un-superseded head, and recall backfills to k so suppression never silently shrinks the result. It is suppressed, never deleted/unsupersede releases it and --include-superseded shows it (mark-not-destroy, like archive).
  • DERIVES_FROM edges + /why (the why-graph). /derive <A> from <B> [<C> …] records that a note is anchored in its evidence; /why <id> walks those links backward into a justification tree (cycle-guarded, depth-capped). DERIVES_FROM never changes recall results — it is additive awareness, surfaced only in --explain (derives from #B) and /why. Default recall is byte-identical even with DERIVES_FROM edges present.
  • The agent doesn't draw edges or set types. Edges and types are user curation (/supersede, /derive, /retype are user-only) — the agent contributes notes, you curate the structure.

(Memory-augmented turns also reuse the shared KV prefix by default in v1.0.4 — that's an inference-engine change, not a memory one; see the release notes and Configuration.)

Conflict edges, the opt-in frontier, and determinism (v1.0.5)

Three more connection features — all still additive and opt-in, all leaving default recall byte-identical — over a determinism guarantee underneath:

  • CONTRADICTS edge (conflict awareness). A third, symmetric edge marking two notes as in conflict: /contradict <id> <id> and /uncontradict (order doesn't matter). It is awareness only — no suppression, no winner, no auto-resolution. The conflict surfaces in --explain (⚠ conflicts with #X, plus conflict pairs ⚠ #A ↔ #B); you decide the outcome with the existing /supersede. Default recall is unchanged.
  • Opt-in frontier retrieval (--frontier, off by default). Normally recall is pure top-k. With --frontier it reserves a few slots (VF_FRONTIER_SLOTS, default 2) for evidence linked to a top-k hit (a seed) by DERIVES_FROM — one hop — pulling a below-cut premise up next to the hit it supports. --explain labels each pick seed or frontier. Unset → byte-identical to plain top-k.
  • Edge-type priors (a categorical negative signal). Edge types carry roles, not tunable weights: DERIVES_FROM pulls into the frontier, CONTRADICTS withholds. A frontier candidate that CONTRADICTS a seed is held back from the reserved slots — surfaced transparently in --explain as frontier withheld — contested by #seed — and the freed slot goes to the next clean candidate. With no CONTRADICTS edge, --frontier is identical to the plain frontier. The frontier never amplifies evidence that a more relevant hit disputes.
  • Cross-process recall determinism. The HNSW index is built with a pinned seed (VF_HNSW_SEED), honored by SQLiteGraph 3.3.1, so two separate processes that build the same store recall byte-identically — recall is reproducible across restarts, not only within one run. A committed integration test enforces it.

Under the hood

  • One store. SQLiteGraph (3.3.1, GPL-3.0) holds the nodes (projects + content), the edges (CONTAINS, the SUPERSEDES / DERIVES_FROM connections from v1.0.4, and the symmetric CONTRADICTS edge from v1.0.5), and the per-project HNSW vector indexes — all in one SQLite file (default ~/.vulkanforge/memory.db, override with VF_MEMORY_DB). Edges use the native SQLiteGraph edge API; the SUPERSEDES / DERIVES_FROM / CONTRADICTS lookups are index-backed (the conflict lookup is a symmetric union over both edge directions). Since 3.3.1 the HNSW level distributor honors a fixed seed (VF_HNSW_SEED), so index builds — and therefore recall — are reproducible across processes.
  • CPU embedder. fastembed (5.16.2, ONNX Runtime) runs Nomic-Embed v1.5-Q (768-dim, INT8 → AVX-512/VNNI on Zen4). The task prefix the model needs (search_document: for writes, search_query: for reads) is applied for you. Embedding runs on the CPU and never takes the GPU permit, so a recall never waits behind a generation.
  • Per-project index. Each project_key gets its own persistent HNSW index (768-dim, cosine, m=16, ef_construction=200). A recall searches only that index, so project isolation is structural, not a filter.
  • Persistence. The store survives restarts — on reopen the vectors are restored from the SQLite store with no re-embedding. Shutdown flushes the HNSW topology and WAL-checkpoints.
  • First start downloads the Nomic ONNX model from HuggingFace into ~/.vulkanforge/embed-cache, then runs offline. The two native deps (static ONNX Runtime + bundled SQLite) add ~34 MB to the vulkanforge binary.
  • Honest mechanics. The HNSW index assigns its own vector ids, so the graph node id is carried in the vector's metadata and recovered on recall (search → get_vector → node_id → get_entity); project lookups use plain SQL on the graph's graph_entities table. If the embedder can't be loaded, /memory/* returns 503 and the inference server still runs.

The mechanics above are the foundation; the fuller design they're heading toward — six memory layers, sleep-style consolidation, watermark-based trust — is laid out on Memory Design.

Today vs. roadmap

v1.0.2 reads, writes, and curates — and the client now reaches it. Since v1.0.1 the store has written and read; v1.0.2 adds curation and the vf-clide client layer. Shipped on top of the foundation:

  • Curation (v1.0.2 – v1.0.3). Notes are no longer write-only — a near-duplicate remember is de-duplicated on write, /archive <id> drops a note from recall while keeping the trace, /unarchive <id> restores it (v1.0.3 — archiving is reversible), and /forget <id> removes it outright. Delete and un-archive are user actions; the agent may archive a note it recalled this session — behind a confirmation showing the note's real text — but never deletes or un-archives on its own.

  • Client integration (v1.0.2 – v1.0.5). vf-clide's REPL exposes the full set: /project [key] · /recall <query> [--explain] [--frontier] [--type <T>] [--include-superseded] · /remember [--type <T>] <text> · /retype <id> <T> · /supersede <new> <old> · /unsupersede <new> <old> · /derive <id> from <id…> · /underive <id> from <id> · /why <id> · /contradict <id> <id> · /uncontradict <id> <id> · /archive <id> · /unarchive <id> · /forget <id>. In the agent loop only the recall / remember / archive tools are exposed. They run on their own axis — direct calls to the server's /memory/*, touching neither files nor the shell — so they stay visible (every call prints a marker) but sit outside the file/shell permission ceiling, available whenever the server has memory on; archive adds an always-on confirmation. Curation and the connection layer (/unarchive, /forget, /retype, /supersede, /derive, …) stay user-only — the agent never deletes, un-archives, types, or draws edges.

  • Diagnostics, typing, and the connection layer (v1.0.4). recall --explain opens up why recall returned what it did; an opt-in VF_RECALL_MARGIN relevance threshold (off by default) trims to the top band; note typing (invariant/working/episodic/decision/failure) adds a non-embedding signal that disambiguates where similarity can't; and the graph gained two typed edges — SUPERSEDES (stale-suppression with backfill to k, reversible) and DERIVES_FROM + /why (the why-graph, which never alters recall). All additive: with no edges and no opt-ins active, recall is byte-identical to before.

  • Conflict edges, the opt-in frontier, and determinism (v1.0.5). A third edge, CONTRADICTS (symmetric, awareness-only, resolved with /supersede); an opt-in retrieval frontier (--frontier, off by default) that pulls a hit's DERIVES_FROM evidence into a few reserved slots; edge-type priors that let CONTRADICTS hold contested evidence out of the frontier; and — on SQLiteGraph 3.3.1 — a pinned HNSW seed making recall reproducible across processes. Still all additive: default recall byte-identical, the frontier off by default, the conflict edge never changes ranking.

What's deliberately still not here:

  • Lifecycle — typing labels a note's layer, but notes are still written active and stay there: no draft → confirmed → active → stale → archived transitions, no touch-on-read aging, no per-layer decay policy yet.
  • Graph — growing. CONTAINS plus SUPERSEDES, DERIVES_FROM (v1.0.4), and CONTRADICTS (v1.0.5); auto-derivation (edges drawn for you, including auto-detected contradictions) is still to come, as is the why-graph's reverse "what depends on this?".
  • No auto-injection — by design, not a gap. recall is always an explicit call; the memory never loads itself into a prompt behind your back. DERIVES_FROM keeps this even with edges present — it surfaces premises in --explain//why, never injects them. This one stays.

Clone this wiki locally