Memory as beliefs with lifecycle + provenance — rethinking the kind taxonomy and #57 instrumentation #61

shakystar · 2026-06-10T04:03:23Z

shakystar
Jun 10, 2026
Maintainer

Context

#57 proposes observe-only free-form tags on extracted memories to gather taxonomy evidence before changing the kind enum. After discussing the direction (kant + Claude, 2026-06-10), we think the methodology of #57 is right but the instrumentation is aimed at the wrong evidence. Posting here so the design discussion doesn't get buried in issue comments; #57 links back to this thread.

1. The brain has no tags — `kind` is really a lifecycle-policy selector

In the brain, a memory's "type" is not a label; it is a difference in storage system and dynamics. Episodic / semantic / procedural memories live in different circuits (hippocampus vs neocortex vs basal ganglia) with different consolidation speeds, decay curves, and retrieval modes. Retrieval is content-addressed association, not label lookup. "Type" is where and how a memory lives — a tag is a researcher's annotation, not a mechanism.

Re-read #57's force-fit evidence through that lens and the three misfits differ not in semantic category but in lifecycle dynamics:

"run verify:full + smoke + user approval before merge" → salience 10 now, instantly void once the merge happens: conditional expiry.
"Convention 0033 is the SoT for role-based visibility" → persists until amended: amendable-persistent.
"panel was deleted / implementation done" → naturally fades: fast-decay status.

So the real job of kind is not classification — it selects a decay/supersession policy (contradiction filtering, dedup, injection priority are all policy applications). The misfit evidence says "labels are a proxy for dynamics", not "we need more labels".

It follows that extractor-emitted tags measure the wrong thing: they capture the extractor LLM's folk taxonomy — its classification intuitions — not the memory's actual behavior. The brain-faithful instrumentation is behavioral: when does a memory get contradicted, superseded, injected-and-actually-used, or go stale? Most of that is already in the event log or is cheap to add as telemetry, and it is more honest evidence than self-reported labels.

2. Where the brain analogy must break: N selves, one store

A brain has one self, so it never needs contradiction detection — a single self serializes its own belief updates. memorize is a store shared by N agents + humans, which makes it a distributed belief-revision problem, not a single-brain problem. That requires things brains don't have:

provenance — who formed this belief, from what evidence;
causal ordering — sync: HLC (hybrid logical clock) for cross-machine deterministic ordering — upgrade from createdAt+id tie-break #39 (HLC) is exactly this;
supersession semantics — rules for "which belief wins" across concurrent writers.

Direction: single-memory lifecycle modeled like a brain; multi-writer reconciliation modeled like a distributed system. Neither alone is the North Star.

3. The cost argument for tags doesn't hold

Consolidation already pays the expensive step: an LLM reads the full observation window to compress it. The marginal cost of richer output fields is ~zero — tags are a cost-saving compromise with no cost to save. Keeping #57's (correct) observe-only methodology but spending the same budget on essential evidence, the extractor can emit:

obsolete_when — free-form expiry condition: "when PR X merges", "until the convention is amended", "never / persistent";
kind-misfit signal — "none of the 3 kinds fits well" + a one-line why;
supersedes — free-form mention of an existing memory this one replaces.

All stored, read by no consumer — the same safety properties as tags (a malformed or absent field must never trip the #43 parse-failure path) — but the evidence directly answers the question we'll actually face later: what per-kind lifecycle policies should exist?

4. Embedding similarity: recall-only, never the judge

The known traps of cosine-similarity scores are real and structural, and they bite contradiction detection hardest:

Negation blindness. "merge is fine" and "merge must NOT proceed" sit nearly on top of each other in embedding space — same topic, opposite meaning. Contradiction detection exists precisely to find such pairs, i.e. embeddings are worst at exactly what this feature needs most.
Threshold non-transfer. Cosine cutoffs don't transfer across models or domains — we already hit this in practice (9598a0f recalibrated the contradiction pre-filter for real embeddings).
Surface-form bias, anisotropy, etc.

So the only trustworthy role for similarity is candidate recall ("plausibly the same topic?"); the judge must remain an LLM, with the current pre-filter → judge pipeline kept. The residual risk is silent false negatives when the pre-filter is too tight — nobody sees a missed contradiction. A cheap complement: a parallel lexical/entity-overlap recall path (BM25-ish) feeding the same judge; it catches pairs embeddings miss, and vice versa.

Proposal: reframe #57, don't close it

Keep — observe-only methodology, parser tolerance, both extractor backends, the distribution-report idea.
Replace the collected evidence — tags → obsolete_when + kind-misfit signal (+ optional tags as a sidecar, since it's nearly free).
Add behavioral telemetry — per-memory events/counters: injected, used, contradicted, superseded, age-at-invalidation.
Decision criteria later — design named lifecycle policies (and only then any enum change) from observed invalidation patterns, not from label clusters.

Long-term direction this implies: memories are beliefs with provenance + lifecycle conditions; kind dissolves into named lifecycle policies; contradiction resolution rests on causal order (HLC, #39) + LLM judgment, with similarity demoted to recall.

shakystar · 2026-06-10T04:09:53Z

shakystar
Jun 10, 2026
Maintainer Author

Consensus reached 2026-06-10 (kant + Claude, live session). All four open points resolved:

tags — kept as a near-free sidecar (lets us later compare the LLM folk taxonomy against observed lifecycle conditions).
Behavioral telemetry — split to telemetry: behavioral lifecycle evidence for consolidated memories (injected / contradicted / superseded / age-at-invalidation) #62 (touches hook-service + event schema, bigger than consolidate: observe-only lifecycle evidence (obsolete_when, kind-misfit, supersedes, tags) before changing the kind enum #57's parser-only change).
supersedes — included as an observe-only field; the in-batch replacement relations the extractor sees naturally are exactly what post-hoc embedding recall misses.
Lexical-overlap contradiction recall — parked as contradiction: lexical/entity-overlap recall path to complement embedding pre-filter (gated on observed false negatives) #63, gated on an actually-observed false negative.

#57 has been retitled and its body rewritten to the lifecycle-evidence framing (obsolete_when + kind_misfit + supersedes + tags sidecar). Decision criteria are now joint with #62's invalidation curves.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory as beliefs with lifecycle + provenance — rethinking the kind taxonomy and #57 instrumentation #61

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Memory as beliefs with lifecycle + provenance — rethinking the kind taxonomy and #57 instrumentation #61

Uh oh!

shakystar Jun 10, 2026 Maintainer

Context

1. The brain has no tags — kind is really a lifecycle-policy selector

2. Where the brain analogy must break: N selves, one store

3. The cost argument for tags doesn't hold

4. Embedding similarity: recall-only, never the judge

Proposal: reframe #57, don't close it

Replies: 1 comment

Uh oh!

shakystar Jun 10, 2026 Maintainer Author

shakystar
Jun 10, 2026
Maintainer

1. The brain has no tags — `kind` is really a lifecycle-policy selector

shakystar
Jun 10, 2026
Maintainer Author