-
Notifications
You must be signed in to change notification settings - Fork 1
Memory
Server-side, persistent, project-scoped semantic memory — shipped in v1.0, opt-in (off by default).
VulkanForge's memory is a deliberate notebook, not a recording device.
It exists so that a decision you made three weeks ago, a benchmark you ran last sprint, or a bug you finally
cornered doesn't evaporate when the session ends or the model is swapped out. You write to it on purpose —
remember — and you read from it by meaning — recall. What comes back is what was put in, surfaced because it
is relevant to what you asked, not because something guessed you might want it.
A few things it deliberately is:
- Yours, and your project's — not the model's. The store lives on your disk, in your process, on your hardware. Any model that connects reads the same memory; swap one model for another and the record is unchanged. The model is the lens; the memory is what's written down. When a model's context window closes, this doesn't.
- Local and single-user, all the way down. No cloud, no telemetry, nothing leaving your machine. The embeddings are computed on your own CPU; the vectors and the graph sit in one SQLite file. That's the whole surface.
- Scoped by construction. Each coding project gets its own index, and recall in one project physically cannot return another's notes — the partitions never touch. The general, non-project things live in a shared global scope. Isolation isn't a filter that has to be applied correctly; it's the shape of the thing.
- Curated, not hoarded. It holds what's worth keeping — decisions, learnings, benchmarks, bugs — written as deliberate notes. Quality is something you tend, not something assumed.
And, just as deliberately, a few things it is not:
- Not a transcript. It does not sit in the background recording everything you say. There is no conversation log, no ambient capture, no surveillance. Silence is the default; you choose what to keep.
- Not auto-injected. Recall is something you (or the agent) invoke, and its results are visible. The memory never quietly stuffs your context behind your back — control over what comes back stays with you.
- Not an oracle. It stores what was written. A stale or wrong note stays wrong until someone curates it. It will remember your mistakes as faithfully as your insights — that's a memory's honesty, not a flaw to paper over.
- Not magic. Underneath it's vector search over embeddings and a SQLite graph — understood, inspectable mechanics, not a black box. If a recall surprises you, you can trace why.
- Not a stand-in for thinking. It surfaces relevant past context so you don't have to hold everything in your head. It doesn't decide for you, and it isn't a substitute for judgment — yours or the agent's.
The short version: it's the project's long-term memory — kept on purpose, owned by you, and honest about being a tool rather than a mind.
Memory is opt-in and off by default — fitting for a deliberate notebook: you switch it on, it doesn't switch itself on. There are two gates.
1. Build it in (compile-time). The subsystem and its two native deps (SQLiteGraph + the ONNX embedder) sit behind a Cargo feature, so the default build stays lean and free of them:
cargo build --release # lean default: no memory, ~25 MB binary
cargo build --release --features memory # memory compiled in, ~58 MB binaryThe memory build needs a newer toolchain — Rust 1.89+ (the sqlitegraph dep is edition-2024 and declares
rust-version = 1.89; ort declares 1.88). The lean default still builds on Rust 1.85+. See Installation.
2. Turn it on (runtime). Even a memory-enabled binary starts with memory off. Activate it per run with the flag or the env alias:
vulkanforge serve --model ~/models/Qwen3-8B-Q4_K_M.gguf --memory
# or:
VULKANFORGE_MEMORY=1 vulkanforge serve --model ~/models/Qwen3-8B-Q4_K_M.ggufWithout --memory, /memory/* returns 503 and the server runs inference only — no embedder is loaded and
no database is opened, so an inference-only run carries zero memory overhead. Pass --memory to a lean binary
(built without the feature) and it stops immediately with a clear rebuild with --features memory message, before
the model loads.
Where it lives. One SQLite file at ~/.vulkanforge/memory.db (override with VF_MEMORY_DB), with the embedding
model cached in the sibling ~/.vulkanforge/embed-cache/. The first --memory start downloads the Nomic model
there once (needs network that one time); every start after is offline. The embedder runs on the CPU — it never
touches VRAM (see Hardware and Compatibility). Flags are summarized in Configuration.
The memory subsystem is embedded in the vulkanforge serve process (opt-in — see Enabling memory above), with
no separate daemon. When activated it exposes VF-native endpoints under /memory/* (a namespace separate from
the OpenAI-compatible /v1/*).
POST /memory/remember
{ "project_key": "vf", "kind": "Learning",
"text": "Dispatch-Reduktion hilft nicht auf gfx1201",
"name": "optional label", "metadata": { ... } }
→ { "id": 1 }
POST /memory/recall
{ "project_key": "vf", "query": "do fewer barriers help performance?", "k": 3 }
→ { "hits": [ { "id": 1, "kind": "Learning", "name": "...", "text": "...",
"status": "active", "score": 0.72 } ] }
POST /memory/projects { "project_key": "vf", "name": "optional" } → { "id", "project_key" }
GET /memory/projects → { "projects": [ ... ] }project_key is optional everywhere — omit it and the call uses a shared global scope (__global__). score
is cosine similarity in [0,1] (higher = closer).
-
One store. SQLiteGraph (3.2.5, GPL-3.0) holds the nodes
(projects + content), the
CONTAINSedges, and the per-project HNSW vector indexes — all in one SQLite file (default~/.vulkanforge/memory.db, override withVF_MEMORY_DB). -
CPU embedder. fastembed (5.16.2, ONNX Runtime) runs
Nomic-Embed v1.5-Q (768-dim, INT8 → AVX-512/VNNI on Zen4). The task prefix the model needs
(
search_document:for writes,search_query:for reads) is applied for you. Embedding runs on the CPU and never takes the GPU permit, so arecallnever waits behind a generation. -
Per-project index. Each
project_keygets its own persistent HNSW index (768-dim, cosine, m=16, ef_construction=200). A recall searches only that index, so project isolation is structural, not a filter. - Persistence. The store survives restarts — on reopen the vectors are restored from the SQLite store with no re-embedding. Shutdown flushes the HNSW topology and WAL-checkpoints.
-
First start downloads the Nomic ONNX model from HuggingFace into
~/.vulkanforge/embed-cache, then runs offline. The two native deps (static ONNX Runtime + bundled SQLite) add ~34 MB to thevulkanforgebinary. -
Honest mechanics. The HNSW index assigns its own vector ids, so the graph node id is carried in the vector's
metadata and recovered on recall (
search → get_vector → node_id → get_entity); project lookups use plain SQL on the graph'sgraph_entitiestable. If the embedder can't be loaded,/memory/*returns503and the inference server still runs.
v1.0 writes and reads — honestly, that's it. What's deliberately not here yet:
-
Lifecycle — notes are written
activeand stay there. Nodraft → confirmed → active → stale → archivedtransitions, no touch-on-read aging, nopinned-survives-forever. - No delete / archive. v1.0 does not remove or supersede entries.
-
Flat graph. Only
ProjectandCONTAINS— no sprint/session hierarchy, noSUPERSEDES/CONTRADICTS/DERIVES_FROMedges yet. -
No auto-injection. Nothing loads memory into a prompt for you;
recallis always an explicit call. -
No client integration yet.
vf-clidedoesn't expose/project//recallREPL commands or agent memory-tools yet — that's the next milestone (and it slots straight into the existing tiered permission model:recallis read-only,rememberis mutating).
These are planned, not promised-as-present. v1.0 is the foundation: a real store that remembers, in the right shape, that the rest builds on.
VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 ·
Repository · Releases