Plugin idea: optional yantrik_memory — semantic memory + supersession #464

spranab · 2026-06-10T21:59:50Z

spranab
Jun 10, 2026

Proposal: an optional `yantrik_memory` plugin for Code Puppy

Asking for a thumbs-up before I open a PR. Everything I've built and measured is
laid out below — including where it does not beat the current design.

1. Context

I use Code Puppy daily, and puppy_kennel is a genuinely good memory design — the
SQLite/FTS5 store plus the packer's tiered prefs band already covers brand/
convention recall well. This is not a change to the kennel. It's an optional
sibling plugin for users who want a semantic/learning memory layer, off by
default, with the kennel remaining the default.

(For background: I filed the Windows cp1252 console fix, PR #389.)

2. The gap I keep hitting

For longer-running, non-coder workflows (business users building charts and
shareable pages over weeks), an append-only BM25 store — even with a prefs band —
has two structural limits:

It can't supersede an outdated fact. User says "we rebranded — brand color
is now green." The old "blue" stays in the store and keeps surfacing, so the
agent receives contradictory context.
It can't learn from outcomes. Nothing records that "the minimalist style
the user accepted three times" should outrank an alternative.

3. What I built and tested (standalone, before proposing)

A single-substrate memory layer on YantrikDB (one store; three memory types —
episodic verbatim turns, semantic distilled facts, procedural skills — plus
importance, decay, a knowledge graph, and outcome reinforcement). Pipeline:

natural turn
  ├─ logged verbatim as EPISODIC  (so artifacts/events are recallable: "like last time")
  └─ distiller (local LLM) extracts durable facts → SEMANTIC
        • new fact            → store
        • update of a fact    → correct()  (supersede the stale value)
        • accepted outcome    → reinforce()
recall = CURRENT band (authoritative semantic facts, post-correction)
       + HISTORY band (query-relevant episodic)

The distiller reads each message and returns only durable facts (ignores chatter),
and it flags when a message updates a prior fact. Example (real output):

"Our brand color is #1F4E79, logo top-right."   -> LEARN  brand color fact
"Did you catch the game last night?"             -> [] (chatter, nothing stored)
"Actually we rebranded, color is now green ..."  -> UPDATE prior brand fact -> green

4. How it integrates (already built + tested on a branch)

I've built the plugin — branch
feat/yantrik-memory-plugin,
directory code_puppy/plugins/yantrik_memory/ (11 files, zero changes to any
existing file). It mirrors puppy_kennel's exact plugin contract:

Hook	Behavior
`load_prompt`	inject the banded recall block into the system prompt
`agent_run_end`	distill the turn, store episodic + semantic, correct/reinforce
`register_tools` / `register_agent_tools`	memory tools for the agent
`custom_command` / `_help`	`/yantrik` (status, enable/disable, stats)

Opt-in (default off), fail-soft (if yantrikdb isn't installed it
disables itself — zero impact on existing users), zero changes to existing
code (a new plugin directory, like the kennel).
Tested: an integration test simulates a conversation through the hooks and
confirms supersession (after "we rebranded to green", the recall block's current
band shows green and the stale blue is gone), that chatter adds no durable facts,
and that the plugin is inert when disabled or when yantrikdb is absent.
Configurable: backend = kennel | yantrikdb | hybrid; embedder selectable
from a bundled no-ONNX option up to MiniLM-384 / bge-small; distiller on/off and
which local model; prefs-band size; reinforcement on/off; namespace per project.

5. Evidence — including the honest parts

5a. Recall vs the kennel's prefs band

The kennel already does the hard part of basic recall well: its P0 tier
always-includes the user's stated preferences in the prompt. So the honest
comparison isn't "vs keyword search" — it's "vs an always-include prefs band like
the kennel's." I tested 8 non-coder scenarios (each with ~40 distractor memories)
against exactly that baseline.

Result: on "is the fact recalled," the prefs-band baseline ties YantrikDB. I'm
not claiming better recall — the kennel's P0 design already handles "remember
my brand color." That's the right framing; anything that beats bare keyword
search would be a strawman, since the kennel isn't bare keyword search.

5b. What YantrikDB adds that the kennel's design doesn't do

Supersession. The kennel's P0 is append-only: after "we rebranded
blue→green" it keeps both facts, so the agent gets contradictory context.
YantrikDB supersedes (correct()) → only green survives. (Measured at scale below.)
Passive semantic recall. The kennel's passive layer is "all prefs + recent"
(not query-relevant); surfacing the right past artifact for the current request
relies on recency, or on the agent explicitly invoking the kennel's BM25 search
tool. YantrikDB surfaces query-relevant memories passively, every turn.
Reinforcement. Ranks the option that kept getting accepted, by outcome.

5c. Scale (1k → 10k memories) — does the supersession edge hold?

Real users accumulate thousands of memories. Across stores of 1k/2k/5k/10k
(durable facts + a rebrand update buried in N background memories):

store	recall (YantrikDB)	stale facts in the always-include band	recall latency
1k	6/6	YantrikDB: none · append-only prefs band: both blue+green	6–16ms
5k	6/6	YantrikDB: none · append-only prefs band: both	~48ms
10k	6/6	YantrikDB: none · append-only prefs band: both	64–103ms

Recall is scale-invariant and stays interactive (~100ms at 10k). The point:
an append-only prefs band accumulates contradictions as it grows — at 10k it's
still carrying both "blue" and "green"; YantrikDB drops the stale value. (Bulk
ingest in the harness is ~10/sec due to pacing; live use adds ~1 memory/turn, so
this is a bulk-import note, not a runtime concern.)

5d. Long-session demo (`demo_long_session.py`)

A non-coder ("Priya") across 4 sessions / ~a month, all natural conversation,
nothing re-told between sessions:

Session	what happens	memory behavior
1	states brand/logo/footer/fiscal-year/currency + chatter; builds a Q1 dashboard	distills facts, ignores chatter
2 (new session)	"make a chart"; "redo Q1 for Q2"; likes minimalist	cross-session recall; artifact reuse; learns style
3	"we rebranded to green (drop blue)"; investor page	supersession (blue→green); current band stays correct
4	"remind me what you know"	full accumulated memory (green, footer, logo, minimalist, INR, FY=Feb)

6. Honest caveats

The 8 scenarios are synthetic and author-written — they isolate capabilities;
they are not a third-party benchmark. Real-data validation (LongMemEval-style)
is a planned next step.
N=8, single run.
The distiller adds a local-LLM call per turn (latency/cost) and JSON-parse
risk; it's optional and the model is configurable.
One dependency (yantrikdb) — kept fail-soft.

7. An engine quirk I found (full disclosure)

YantrikDB's automatic conflict detection (think()) does not currently fire on
free-text attribute-value updates — it needs claim edges that the extractor
doesn't emit for such facts. So supersession in this plugin uses an explicit
correct() at the app layer (the distiller detects the update and calls it). The
YantrikDB engine maintainers are addressing the extractor side; either way the
plugin behaves correctly today via correct().

8. The ask

Would you be open to this as an optional, off-by-default plugin? The working,
tested branch is feat/yantrik-memory-plugin (I can open the PR whenever you're
good with it). If yes, any constraints I should respect first — dependency policy,
plugin conventions, where the config knobs should live? Happy to walk through the
plugin or the scenario harness + long-session demo so you can run it yourself
before reviewing code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugin idea: optional yantrik_memory — semantic memory + supersession #464

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Plugin idea: optional yantrik_memory — semantic memory + supersession #464

Uh oh!

spranab Jun 10, 2026

Proposal: an optional yantrik_memory plugin for Code Puppy

1. Context

2. The gap I keep hitting

3. What I built and tested (standalone, before proposing)

4. How it integrates (already built + tested on a branch)

5. Evidence — including the honest parts

5a. Recall vs the kennel's prefs band

5b. What YantrikDB adds that the kennel's design doesn't do

5c. Scale (1k → 10k memories) — does the supersession edge hold?

5d. Long-session demo (demo_long_session.py)

6. Honest caveats

7. An engine quirk I found (full disclosure)

8. The ask

Replies: 0 comments

spranab
Jun 10, 2026

Proposal: an optional `yantrik_memory` plugin for Code Puppy

5d. Long-session demo (`demo_long_session.py`)