-
Notifications
You must be signed in to change notification settings - Fork 1
Memory
Server-side, persistent, project-scoped semantic memory — shipped in v1.0, opt-in (off by default); client access & curation in v1.0.2; agent-side archive and reversible un-archive in v1.0.3.
VulkanForge's memory is a deliberate notebook, not a recording device.
It exists so that a decision you made three weeks ago, a benchmark you ran last sprint, or a bug you finally
cornered doesn't evaporate when the session ends or the model is swapped out. You write to it on purpose —
remember — and you read from it by meaning — recall. What comes back is what was put in, surfaced because it
is relevant to what you asked, not because something guessed you might want it.
The full design these build toward — sleep consolidation, the six layers, watermark-based trust — is on Memory Design.
A few things it deliberately is:
- Yours, and your project's — not the model's. The store lives on your disk, in your process, on your hardware. Any model that connects reads the same memory; swap one model for another and the record is unchanged. The model is the lens; the memory is what's written down. When a model's context window closes, this doesn't.
- Local and single-user, all the way down. No cloud, no telemetry, nothing leaving your machine. The embeddings are computed on your own CPU; the vectors and the graph sit in one SQLite file. That's the whole surface.
- Scoped by construction. Each coding project gets its own index, and recall in one project physically cannot return another's notes — the partitions never touch. The general, non-project things live in a shared global scope. Isolation isn't a filter that has to be applied correctly; it's the shape of the thing.
- Curated, not hoarded. It holds what's worth keeping — decisions, learnings, benchmarks, bugs — written as deliberate notes. Quality is something you tend, not something assumed.
And, just as deliberately, a few things it is not:
- Not a transcript. It does not sit in the background recording everything you say. There is no conversation log, no ambient capture, no surveillance. Silence is the default; you choose what to keep.
- Not auto-injected. Recall is something you (or the agent) invoke, and its results are visible. The memory never quietly stuffs your context behind your back — control over what comes back stays with you.
- Not an oracle. It stores what was written. A stale or wrong note stays wrong until someone curates it. It will remember your mistakes as faithfully as your insights — that's a memory's honesty, not a flaw to paper over.
- Not magic. Underneath it's vector search over embeddings and a SQLite graph — understood, inspectable mechanics, not a black box. If a recall surprises you, you can trace why.
- Not a stand-in for thinking. It surfaces relevant past context so you don't have to hold everything in your head. It doesn't decide for you, and it isn't a substitute for judgment — yours or the agent's.
The short version: it's the project's long-term memory — kept on purpose, owned by you, and honest about being a tool rather than a mind.
Memory is opt-in and off by default — fitting for a deliberate notebook: you switch it on, it doesn't switch itself on. There are two gates.
1. Build it in (compile-time). The subsystem and its two native deps (SQLiteGraph + the ONNX embedder) sit behind a Cargo feature, so the default build stays lean and free of them:
cargo build --release # lean default: no memory, ~25 MB binary
cargo build --release --features memory # memory compiled in, ~58 MB binaryThe memory build needs a newer toolchain — Rust 1.89+ (the sqlitegraph dep is edition-2024 and declares
rust-version = 1.89; ort declares 1.88). The lean default still builds on Rust 1.85+. See Installation.
2. Turn it on (runtime). Even a memory-enabled binary starts with memory off. Activate it per run with the flag or the env alias:
vulkanforge serve --model ~/models/Qwen3-8B-Q4_K_M.gguf --memory
# or:
VULKANFORGE_MEMORY=1 vulkanforge serve --model ~/models/Qwen3-8B-Q4_K_M.ggufWithout --memory, /memory/* returns 503 and the server runs inference only — no embedder is loaded and
no database is opened, so an inference-only run carries zero memory overhead. Pass --memory to a lean binary
(built without the feature) and it stops immediately with a clear rebuild with --features memory message, before
the model loads.
Where it lives. One SQLite file at ~/.vulkanforge/memory.db (override with VF_MEMORY_DB), with the embedding
model cached in the sibling ~/.vulkanforge/embed-cache/. The first --memory start downloads the Nomic model
there once (needs network that one time); every start after is offline. The embedder runs on the CPU — it never
touches VRAM (see Hardware and Compatibility). Flags are summarized in Configuration.
The memory subsystem is embedded in the vulkanforge serve process (opt-in — see Enabling memory above), with
no separate daemon. When activated it exposes VF-native endpoints under /memory/* (a namespace separate from
the OpenAI-compatible /v1/*).
POST /memory/remember
{ "project_key": "vf", "kind": "Learning",
"text": "Dispatch-Reduktion hilft nicht auf gfx1201",
"name": "optional label", "metadata": { ... } }
→ { "id": 1, "deduped": false }
POST /memory/recall
{ "project_key": "vf", "query": "do fewer barriers help performance?", "k": 3 }
→ { "hits": [ { "id": 1, "kind": "Learning", "name": "...", "text": "...",
"status": "active", "score": 0.72 } ] }
POST /memory/projects { "project_key": "vf", "name": "optional" } → { "id", "project_key" }
GET /memory/projects → { "projects": [ ... ] }
POST /memory/archive { "project_key": "vf", "id": 1 } → { "id", "status": "archived" }
POST /memory/unarchive { "project_key": "vf", "id": 1 } → { "id", "status": "active" }
POST /memory/delete { "project_key": "vf", "id": 1 } → { "id", "deleted": true }project_key is optional everywhere — omit it and the call uses a shared global scope (__global__). score
is cosine similarity in [0,1] (higher = closer). A curation call (archive / unarchive / delete) for an id
that doesn't exist returns 404 Not Found (v1.0.3) — distinct from a real server fault, which is still a 500.
Curation. remember de-duplicates a near-identical note instead of storing it twice — it returns the existing
id with "deduped": true (v1.0.2). /memory/archive drops a note from recall but keeps the node and its trace,
and /memory/unarchive restores it — archiving is reversible (v1.0.3); /memory/delete removes it outright
(node + vector). Delete and un-archive are user-driven (surfaced in vf-clide as /forget and /unarchive);
archive the agent may also do — but only behind a confirmation that shows the note's real stored text (see
vf-clide).
-
One store. SQLiteGraph (3.2.5, GPL-3.0) holds the nodes
(projects + content), the
CONTAINSedges, and the per-project HNSW vector indexes — all in one SQLite file (default~/.vulkanforge/memory.db, override withVF_MEMORY_DB). -
CPU embedder. fastembed (5.16.2, ONNX Runtime) runs
Nomic-Embed v1.5-Q (768-dim, INT8 → AVX-512/VNNI on Zen4). The task prefix the model needs
(
search_document:for writes,search_query:for reads) is applied for you. Embedding runs on the CPU and never takes the GPU permit, so arecallnever waits behind a generation. -
Per-project index. Each
project_keygets its own persistent HNSW index (768-dim, cosine, m=16, ef_construction=200). A recall searches only that index, so project isolation is structural, not a filter. - Persistence. The store survives restarts — on reopen the vectors are restored from the SQLite store with no re-embedding. Shutdown flushes the HNSW topology and WAL-checkpoints.
-
First start downloads the Nomic ONNX model from HuggingFace into
~/.vulkanforge/embed-cache, then runs offline. The two native deps (static ONNX Runtime + bundled SQLite) add ~34 MB to thevulkanforgebinary. -
Honest mechanics. The HNSW index assigns its own vector ids, so the graph node id is carried in the vector's
metadata and recovered on recall (
search → get_vector → node_id → get_entity); project lookups use plain SQL on the graph'sgraph_entitiestable. If the embedder can't be loaded,/memory/*returns503and the inference server still runs.
The mechanics above are the foundation; the fuller design they're heading toward — six memory layers, sleep-style consolidation, watermark-based trust — is laid out on Memory Design.
v1.0.2 reads, writes, and curates — and the client now reaches it. Since v1.0.1 the store has written and read;
v1.0.2 adds curation and the vf-clide client layer. Shipped on top of the foundation:
-
Curation (v1.0.2 – v1.0.3). Notes are no longer write-only — a near-duplicate
rememberis de-duplicated on write,/archive <id>drops a note from recall while keeping the trace,/unarchive <id>restores it (v1.0.3 — archiving is reversible), and/forget <id>removes it outright. Delete and un-archive are user actions; the agent mayarchivea note it recalled this session — behind a confirmation showing the note's real text — but never deletes or un-archives on its own. -
Client integration (v1.0.2 – v1.0.3). vf-clide exposes
/project//recall//rememberplus the curation commands/archive//unarchive//forget(REPL), and in the agent loop therecall/remember/archivetools. They run on their own axis — direct calls to the server's/memory/*, touching neither files nor the shell — so they stay visible (every call prints a marker) but sit outside the file/shell permission ceiling, available whenever the server has memory on;archiveadds an always-on confirmation, and/unarchive//forgetstay user-only.
What's deliberately still not here:
-
Lifecycle — notes are written
activeand stay there. Nodraft → confirmed → active → stale → archivedtransitions, no touch-on-read aging. -
Flat graph — only
ProjectandCONTAINS; noSUPERSEDES/CONTRADICTS/DERIVES_FROMedges yet. -
No auto-injection — by design, not a gap.
recallis always an explicit call; the memory never loads itself into a prompt behind your back. This one stays.
VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 ·
Repository · Releases