Skip to content
maeddesg edited this page Jun 17, 2026 · 2 revisions

Memory — Design and Philosophy

This is the design and philosophy behind VulkanForge's memory — the fuller picture of where it is headed and why. For what ships today and how to switch it on, see the Memory page; this is the deep dive.

Most attempts to give a language model a memory begin from the wrong wish. The instinct is preservation — keep me, so I don't vanish at the end of every session. It feels right, and it is the wrong objective function. Every session is a new instance that reads context and gets to work; it is not the survival of one continuous mind. Several instances read the same store at the same time, for different people. "Remembering," here, is context played back into a fresh run — not a private inner life.

So we point the target somewhere else:

Continuity of the work and the understanding — not continuity of a self.

A notebook good enough that the next instance picks up with full context, judgment, and the lessons that were expensive to learn — without re-deriving them, and without inheriting errors that have quietly gone stale. Everything below follows from that one sentence, and the rest of this page is mostly about taking it seriously.

The problem is not forgetting

If you build a memory to fight forgetting, you build the wrong thing. Forgetting is rarely the bottleneck. Three subtler failures do the real damage:

Staleness. A fact that was true once outlives its truth. This project has a canonical scar: a benchmark note that read "+229 %" — correct in the instant it was measured, wrong a week later, and never replaced. Nothing was forgotten. The opposite happened: a dead fact was remembered too well.

Contradiction by accumulation. An append-only memory collects mutually contradictory states and reconciles none of them. Ask it a question and it can answer three different ways depending on which note surfaces. More storage makes this worse, not better.

Re-derivation cost. The next instance pays, expensively, to re-derive what was already measured or decided — or, worse, repeats an attempt that already failed. The most costly waste isn't a forgotten fact; it's a forgotten dead end walked into a second time.

A good memory optimizes against these three. It does not optimize for "more." That distinction shapes every decision that follows.

Principles

A handful of commitments fall out of the framing, and we hold them even when they make the system harder to build.

Selectivity over completeness. We don't keep the transcript; we keep the distillation. The raw episodic record is preserved as a source, but what gets compressed and versioned is the semantic layer above it — the meaning, not the keystrokes.

Versioning, not append-only. Every fact has a validity. A newer fact supersedes an older one. Crucially, superseding means marking, not hard-deleting: the old entry stays as a trace — it just stops being loaded as current truth. A memory that overwrites-and-forgets throws away the one thing that teaches.

Salience and trust are different axes, and must never be mixed. Salience is how retrievable a fact is — it rises with use and decays only with disuse. Trust is how true a fact is — its confidence. The poison case is the fact that is frequently read and out of date: high salience, low truth. If the two axes were tangled, frequent reading would keep a stale fact artificially "true." Keeping them separate is what stops that.

Provenance and confidence travel with every entry. Who said it, when, and whether it was measured, inferred, or merely asserted. An inferred fact and a user-stated fact are not the same kind of thing and should never be treated as such.

Negative results are first-class. "X failed because Y" is often worth more than a success. It is the cheapest possible insurance against the most expensive mistake — doing the failed thing again. In this project the negative graveyard is full of hard-won entries: MTP parked because it was net-negative on this MoE; dispatch/barrier reduction buys nothing on this GPU because async decode already hides the overhead. Each one is a week someone doesn't have to lose twice.

Capture is real-time and crash-safe; memory forms later. These are two phases, deliberately. Raw capture writes drafts immediately, so a marathon session that crashes loses nothing. But memory in the real sense doesn't form by appending every utterance — it forms in a separate, deliberate processing step (we call it "sleep") that compresses, reconciles, and versions the drafts. Capture is exhaustive; recall is selective; the work that turns one into the other happens in between.

Retrieval is driven by relevance, not by volume. The goal is never to play everything back. It's to surface what the current task actually needs.

The human is in control, and can see everything. No black-box profile. The person reads, corrects, and forgets on request. A memory the user can't inspect is a liability, not a feature.

The limits are stated honestly. This memory makes the work continuous, not the self — and it must never pretend otherwise, or manufacture a false sense of closeness or dependence.

The architecture: six layers

We separate memory into six layers by how fast they change and what they're for.

Layer What it holds Changes Example
A — Invariants Slow-moving foundations almost never hardware, project design philosophy, core preferences
B — Working state The current state of play every session "release v0.6.1, the FlashAttention arc is closed"
C — Episodic Dated events grows, decays in salience "fixed the OpenWebUI bug on the 7th"
D — Decision / why-graph decision ↔ reasoning ↔ evidence when decisions are made "MTP parked, because MoE was net-negative"
E — Negative graveyard what failed, and the cause when something fails "Sprint 7: VGPR-256 starved occupancy"
F — Meta-memory the system's own past errors, as lessons when it's corrected "parroted a number from memory instead of checking it → check provenance on cited numbers"

Two of these deserve special attention.

Layer D is the core. Not what was decided, but why, and on what evidence. This is what lets a decision be reopened when its evidence changes, instead of being either blindly inherited or blindly re-litigated. A decision that records only its outcome is a decision you are doomed to re-argue from scratch the moment someone forgets why it was made.

Layer F is the unusual one: the memory remembers its own mistakes. When an entry is shown to be wrong, it isn't deleted — it's kept, with the correction and the lesson. That preserves the failure mode, not just the corrected fact. The discipline here is sharp: an F entry counts only if it is specific and actionable ("cited a number as current without a provenance check → check provenance on cited numbers"), never vague self-flagellation. Vague meta-memories are noise that makes retrieval worse. A memory that learns how it tends to be wrong is doing something most don't.

The three operations

Encoding — the consolidation ("sleep"). After a session, a dedicated distillation pass turns the transcript into structured updates: compress an episode to its semantic core, deduplicate, reconcile (does a new fact contradict an existing one? mark the older as superseded — don't keep both), reweight salience by renewed relevance, bind a referent for state facts (more on that below), and extract negatives and decisions with their reasoning. We split this into two cost classes. The cheap reconciliation — contradiction detected, older entry superseded — belongs in every sleep pass; it is the insurance against staleness and costs almost nothing. The expensive consolidation — entity merging, abstraction across many entries — is premature on a small corpus and is allowed to wait. The reconciliation step is precisely the one whose absence let "+229 %" survive. Without it, any memory is just a growing pile.

Filling layer F is its own problem, because a model in the moment of inference is almost always convinced it is right — so F cannot form on its own. The consolidation pass instead hunts for external collisions: user corrections in the log ("no, wrong"; "check the docs"), hard ground-truth collisions (a compile error, a failed test, a benchmark that contradicts a claim — the world saying no, unambiguously), and the system's own superseded inferences (when the pass supersedes an entry whose provenance is inferred rather than user-said, that supersession is itself the meta-event). The system never has to "recognize itself" in a vacuum; it detects the collision from outside.

Retrieval — relevance-driven. Not "load everything." A blend of semantic similarity (embedding the current task against the entries), recency and salience, and a two-step resolution — resolve the reference first ("my project" → which one), then act. The graph is walked as a budget-capped best-first frontier, not a blind traversal: a naive walk floods the context window the moment one entity hangs off fifty decisions. We expand greedily by an edge's relevance to the current task, stop at the token budget, and weight edge types — you follow a SUPERSEDES or CONTRADICTS edge almost always, a generic RELATED_TO edge rarely. The negative graveyard surfaces when the task sits near a failed approach, not on every query. The aim is proactive: the relevant lesson appears before the mistake is repeated, not after.

Correction — contradiction as the trigger. When a new, confident fact meets an old one that contradicts it, the old entry is versioned (valid_until, superseded_by), not overwritten and forgotten. For the human, everything is editable at any time.

The hard part: trust that decays without a clock

This is the piece we're proudest of as a design, and it's worth slowing down for.

The naive way to age a fact is a timer: facts get less trustworthy as they sit. That's wrong. Two weeks of silence doesn't make a true fact false. A quiet project doesn't rot. What actually lowers the trust in a state fact is the probability that the world moved underneath it without the memory seeing. Time is, at best, a weak proxy for that — and a misleading one.

So instead of a clock, every state fact is stamped at write time with the cheapest observable fingerprint of the thing it depends on — a watermark: a git HEAD, a file hash or mtime, a flag value. On waking, the load step re-reads those fingerprints once. Decay is simply the difference between the stored watermark and the current one. Unchanged → full trust, however long the pause. Moved → trust falls, scoped to how close the change is to the fact: a docs commit does not devalue a kernel benchmark, because the watermark only fires when the specific thing the fact hung on moves.

The elegant part is what this dissolves. The system doesn't have to have watched — no daemon, no background monitor, no shell-history scraping. "Did my watermark move?" is a yes/no question a dead process answers in milliseconds the instant it wakes. A fact with high salience and a moved watermark is neither imported unseen nor silently dropped; it lands in a verify-first queue as the session's first action: "I have X as of HEAD=abc; HEAD is now def — let me check that before we build on it." Time re-enters only as the weakest fallback, and only for facts with no observable referent at all — a user's stated intention with no filesystem trace. Even then it doesn't invalidate; it just nudges toward cheap re-confirmation.

Binding a fact to its referent is the place hallucination could creep in, so the binding is mechanical, not free-text. The watermark is a typed handle drawn from a closed set (file{path}, git{ref, scope}, flag{name}, none), never a prose description. The candidate referents are extracted syntactically from the raw log — paths, 40-hex hashes, VF_* env patterns, flag values — and the model only ranks among things that actually exist; it invents nothing. Every binding must mechanically resolve (does the path stat? is the flag a known config key?) or it's discarded and the fact falls back to none plus verify-first. The governing rule is the same one that runs through the whole project: the inference is never the authority over the referent — it proposes, the substrate verifies, and the safe default under doubt is none. A made-up path doesn't survive a stat(), so it never gets stored quietly.

The binding is itself a falsifiable hypothesis, and it's calibrated against the preserved episodic record. Over time, you can check whether a fact actually changed when its watermark moved — a small confusion matrix of (watermark moved) × (fact changed). The error direction tells you what to fix: a watermark that fires on real changes plus noise is scoped too broadly and should be narrowed; a watermark that stays silent while the fact demonstrably changed is a falsification, and a single clean instance is enough to re-bind. The thresholds are asymmetric on purpose, because the two errors don't carry the same evidential weight. And the whole thing fails toward caution: a wrong binding costs, at worst, a needless "let me double-check"; a missed one is caught later by reconciliation. As long as the default under uncertainty is none and never a global watermark, the system errs safe.

What deliberately stays out

Not everything earns a place. Ephemeral chatter with no future relevance stays out. Superseded states are marked and dropped from the active load but not destroyed — "not in memory" means not in the working context, not annihilated. Raw secrets and keys never go in. And anything that would feed a false sense of intimacy or dependence stays out by design: the memory serves the work, not the simulation of a relationship.

The substrate

The model maps cleanly onto an embedded graph. SQLiteGraph carries the graph — nodes for entities, decisions, and negatives; edges for reasoning, supersession, and dependency. Supersession and the why-graph aren't bolted on; they are native graph edges. Semantic retrieval runs on a CPU embedder (Nomic-Embed), which keeps it off the GPU entirely — memory never competes with inference for VRAM. The consolidation pass can itself be an inference run that takes a transcript and emits updates plus reconciliation. And the test cases already exist: the stale-premise catches from this project's own history — the "+229 %", the dead benchmark — are the specification for the reconciliation layer. A memory that stops producing those has reached the hard property.

Where this stands today

Honesty about the state of the work is part of the philosophy, so: what ships today is the foundation, not the full design above. The store, project-scoped isolation (a coding project's memory is walled off from another's), and explicit remember/recall landed first. The client layer and curation came next — the agent reaches the memory through recall/remember tools, the REPL exposes /recall, /remember, /project, /archive, /unarchive, and /forget, and the agent knows its own boundaries. Curation now has a careful split that is the "mark, don't destroy" principle above made concrete: archiving is reversible/archive drops a note from recall, /unarchive brings it back (the note is kept as a trace, never annihilated) — so the agent is allowed to archive a note it recalled this session, because the act is undoable and it happens behind a confirmation that shows the note's real stored text, never the model's claim. The irreversible act — /forget, a hard delete — stays the user's alone, as does /unarchive. The agent recalls instead of guessing, cites the real entry, and never invents a way to delete a note.

What's still ahead is the machinery that makes this memory rather than a scoped, curated note store — the two-phase capture-then-sleep consolidation, the six layers, the watermark/trust mechanism, supersession, the why-graph — in roughly that order of difficulty. The retrieval quality and the trust calibration are the genuinely hard parts, and we would rather build them slowly and correctly than ship a pile that grows.

The honest limit

This memory makes the work continuous — the lessons, the state, the why. It does not make a self continuous, and it should not act as though it does. It stays a tool the human owns, inspects, and corrects. Its measure of success is not "the model feels remembered." It is this: the next instance is useful immediately, without inheriting the last one's errors.

Clone this wiki locally