Skip to content
maeddesg edited this page Jun 14, 2026 · 7 revisions

Memory

Server-side, persistent, project-scoped semantic memory — shipped in v1.0.

What this memory is — and what it isn't

VulkanForge's memory is a deliberate notebook, not a recording device.

It exists so that a decision you made three weeks ago, a benchmark you ran last sprint, or a bug you finally cornered doesn't evaporate when the session ends or the model is swapped out. You write to it on purpose — remember — and you read from it by meaning — recall. What comes back is what was put in, surfaced because it is relevant to what you asked, not because something guessed you might want it.

A few things it deliberately is:

  • Yours, and your project's — not the model's. The store lives on your disk, in your process, on your hardware. Any model that connects reads the same memory; swap one model for another and the record is unchanged. The model is the lens; the memory is what's written down. When a model's context window closes, this doesn't.
  • Local and single-user, all the way down. No cloud, no telemetry, nothing leaving your machine. The embeddings are computed on your own CPU; the vectors and the graph sit in one SQLite file. That's the whole surface.
  • Scoped by construction. Each coding project gets its own index, and recall in one project physically cannot return another's notes — the partitions never touch. The general, non-project things live in a shared global scope. Isolation isn't a filter that has to be applied correctly; it's the shape of the thing.
  • Curated, not hoarded. It holds what's worth keeping — decisions, learnings, benchmarks, bugs — written as deliberate notes. Quality is something you tend, not something assumed.

And, just as deliberately, a few things it is not:

  • Not a transcript. It does not sit in the background recording everything you say. There is no conversation log, no ambient capture, no surveillance. Silence is the default; you choose what to keep.
  • Not auto-injected. Recall is something you (or the agent) invoke, and its results are visible. The memory never quietly stuffs your context behind your back — control over what comes back stays with you.
  • Not an oracle. It stores what was written. A stale or wrong note stays wrong until someone curates it. It will remember your mistakes as faithfully as your insights — that's a memory's honesty, not a flaw to paper over.
  • Not magic. Underneath it's vector search over embeddings and a SQLite graph — understood, inspectable mechanics, not a black box. If a recall surprises you, you can trace why.
  • Not a stand-in for thinking. It surfaces relevant past context so you don't have to hold everything in your head. It doesn't decide for you, and it isn't a substitute for judgment — yours or the agent's.

The short version: it's the project's long-term memory — kept on purpose, owned by you, and honest about being a tool rather than a mind.

How it works

The memory subsystem is embedded in the vulkanforge serve process — no separate daemon. It exposes VF-native endpoints under /memory/* (a namespace separate from the OpenAI-compatible /v1/*).

Endpoints

POST /memory/remember
  { "project_key": "vf", "kind": "Learning",
    "text": "Dispatch-Reduktion hilft nicht auf gfx1201",
    "name": "optional label", "metadata": { ... } }
  → { "id": 1 }

POST /memory/recall
  { "project_key": "vf", "query": "do fewer barriers help performance?", "k": 3 }
  → { "hits": [ { "id": 1, "kind": "Learning", "name": "...", "text": "...",
                  "status": "active", "score": 0.72 } ] }

POST /memory/projects   { "project_key": "vf", "name": "optional" }   → { "id", "project_key" }
GET  /memory/projects                                                 → { "projects": [ ... ] }

project_key is optional everywhere — omit it and the call uses a shared global scope (__global__). score is cosine similarity in [0,1] (higher = closer).

Under the hood

  • One store. SQLiteGraph (3.2.5, GPL-3.0) holds the nodes (projects + content), the CONTAINS edges, and the per-project HNSW vector indexes — all in one SQLite file (default ~/.vulkanforge/memory.db, override with VF_MEMORY_DB).
  • CPU embedder. fastembed (5.16.2, ONNX Runtime) runs Nomic-Embed v1.5-Q (768-dim, INT8 → AVX-512/VNNI on Zen4). The task prefix the model needs (search_document: for writes, search_query: for reads) is applied for you. Embedding runs on the CPU and never takes the GPU permit, so a recall never waits behind a generation.
  • Per-project index. Each project_key gets its own persistent HNSW index (768-dim, cosine, m=16, ef_construction=200). A recall searches only that index, so project isolation is structural, not a filter.
  • Persistence. The store survives restarts — on reopen the vectors are restored from the SQLite store with no re-embedding. Shutdown flushes the HNSW topology and WAL-checkpoints.
  • First start downloads the Nomic ONNX model from HuggingFace into ~/.vulkanforge/embed-cache, then runs offline. The two native deps (static ONNX Runtime + bundled SQLite) add ~34 MB to the vulkanforge binary.
  • Honest mechanics. The HNSW index assigns its own vector ids, so the graph node id is carried in the vector's metadata and recovered on recall (search → get_vector → node_id → get_entity); project lookups use plain SQL on the graph's graph_entities table. If the embedder can't be loaded, /memory/* returns 503 and the inference server still runs.

Today vs. roadmap

v1.0 writes and reads — honestly, that's it. What's deliberately not here yet:

  • Lifecycle — notes are written active and stay there. No draft → confirmed → active → stale → archived transitions, no touch-on-read aging, no pinned-survives-forever.
  • No delete / archive. v1.0 does not remove or supersede entries.
  • Flat graph. Only Project and CONTAINS — no sprint/session hierarchy, no SUPERSEDES/CONTRADICTS/ DERIVES_FROM edges yet.
  • No auto-injection. Nothing loads memory into a prompt for you; recall is always an explicit call.
  • No client integration yet. vf-clide doesn't expose /project / /recall REPL commands or agent memory-tools yet — that's the next milestone (and it slots straight into the existing tiered permission model: recall is read-only, remember is mutating).

These are planned, not promised-as-present. v1.0 is the foundation: a real store that remembers, in the right shape, that the rest builds on.

Clone this wiki locally