You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Proposal: an optional yantrik_memory plugin for Code Puppy
Asking for a thumbs-up before I open a PR. Everything I've built and measured is
laid out below — including where it does not beat the current design.
1. Context
I use Code Puppy daily, and puppy_kennel is a genuinely good memory design — the
SQLite/FTS5 store plus the packer's tiered prefs band already covers brand/
convention recall well. This is not a change to the kennel. It's an optional
sibling plugin for users who want a semantic/learning memory layer, off by
default, with the kennel remaining the default.
(For background: I filed the Windows cp1252 console fix, PR #389.)
2. The gap I keep hitting
For longer-running, non-coder workflows (business users building charts and
shareable pages over weeks), an append-only BM25 store — even with a prefs band —
has two structural limits:
It can't supersede an outdated fact. User says "we rebranded — brand color
is now green." The old "blue" stays in the store and keeps surfacing, so the
agent receives contradictory context.
It can't learn from outcomes. Nothing records that "the minimalist style
the user accepted three times" should outrank an alternative.
3. What I built and tested (standalone, before proposing)
A single-substrate memory layer on YantrikDB (one store; three memory types — episodic verbatim turns, semantic distilled facts, procedural skills — plus
importance, decay, a knowledge graph, and outcome reinforcement). Pipeline:
natural turn
├─ logged verbatim as EPISODIC (so artifacts/events are recallable: "like last time")
└─ distiller (local LLM) extracts durable facts → SEMANTIC
• new fact → store
• update of a fact → correct() (supersede the stale value)
• accepted outcome → reinforce()
recall = CURRENT band (authoritative semantic facts, post-correction)
+ HISTORY band (query-relevant episodic)
The distiller reads each message and returns only durable facts (ignores chatter),
and it flags when a message updates a prior fact. Example (real output):
"Our brand color is #1F4E79, logo top-right." -> LEARN brand color fact
"Did you catch the game last night?" -> [] (chatter, nothing stored)
"Actually we rebranded, color is now green ..." -> UPDATE prior brand fact -> green
4. How it integrates (already built + tested on a branch)
I've built the plugin — branch feat/yantrik-memory-plugin,
directory code_puppy/plugins/yantrik_memory/ (11 files, zero changes to any
existing file). It mirrors puppy_kennel's exact plugin contract:
Hook
Behavior
load_prompt
inject the banded recall block into the system prompt
agent_run_end
distill the turn, store episodic + semantic, correct/reinforce
register_tools / register_agent_tools
memory tools for the agent
custom_command / _help
/yantrik (status, enable/disable, stats)
Opt-in (default off), fail-soft (if yantrikdb isn't installed it
disables itself — zero impact on existing users), zero changes to existing
code (a new plugin directory, like the kennel).
Tested: an integration test simulates a conversation through the hooks and
confirms supersession (after "we rebranded to green", the recall block's current
band shows green and the stale blue is gone), that chatter adds no durable facts,
and that the plugin is inert when disabled or when yantrikdb is absent.
Configurable:backend = kennel | yantrikdb | hybrid; embedder selectable
from a bundled no-ONNX option up to MiniLM-384 / bge-small; distiller on/off and
which local model; prefs-band size; reinforcement on/off; namespace per project.
5. Evidence — including the honest parts
5a. Recall vs the kennel's prefs band
The kennel already does the hard part of basic recall well: its P0 tier
always-includes the user's stated preferences in the prompt. So the honest
comparison isn't "vs keyword search" — it's "vs an always-include prefs band like
the kennel's." I tested 8 non-coder scenarios (each with ~40 distractor memories)
against exactly that baseline.
Result: on "is the fact recalled," the prefs-band baseline ties YantrikDB. I'm not claiming better recall — the kennel's P0 design already handles "remember
my brand color." That's the right framing; anything that beats bare keyword
search would be a strawman, since the kennel isn't bare keyword search.
5b. What YantrikDB adds that the kennel's design doesn't do
Supersession. The kennel's P0 is append-only: after "we rebranded
blue→green" it keeps both facts, so the agent gets contradictory context.
YantrikDB supersedes (correct()) → only green survives. (Measured at scale below.)
Passive semantic recall. The kennel's passive layer is "all prefs + recent"
(not query-relevant); surfacing the right past artifact for the current request
relies on recency, or on the agent explicitly invoking the kennel's BM25 search
tool. YantrikDB surfaces query-relevant memories passively, every turn.
Reinforcement. Ranks the option that kept getting accepted, by outcome.
5c. Scale (1k → 10k memories) — does the supersession edge hold?
Real users accumulate thousands of memories. Across stores of 1k/2k/5k/10k
(durable facts + a rebrand update buried in N background memories):
store
recall (YantrikDB)
stale facts in the always-include band
recall latency
1k
6/6
YantrikDB: none · append-only prefs band: both blue+green
6–16ms
5k
6/6
YantrikDB: none · append-only prefs band: both
~48ms
10k
6/6
YantrikDB: none · append-only prefs band: both
64–103ms
Recall is scale-invariant and stays interactive (~100ms at 10k). The point:
an append-only prefs band accumulates contradictions as it grows — at 10k it's
still carrying both "blue" and "green"; YantrikDB drops the stale value. (Bulk
ingest in the harness is ~10/sec due to pacing; live use adds ~1 memory/turn, so
this is a bulk-import note, not a runtime concern.)
5d. Long-session demo (demo_long_session.py)
A non-coder ("Priya") across 4 sessions / ~a month, all natural conversation,
nothing re-told between sessions:
Session
what happens
memory behavior
1
states brand/logo/footer/fiscal-year/currency + chatter; builds a Q1 dashboard
distills facts, ignores chatter
2 (new session)
"make a chart"; "redo Q1 for Q2"; likes minimalist
"we rebranded to green (drop blue)"; investor page
supersession (blue→green); current band stays correct
4
"remind me what you know"
full accumulated memory (green, footer, logo, minimalist, INR, FY=Feb)
6. Honest caveats
The 8 scenarios are synthetic and author-written — they isolate capabilities;
they are not a third-party benchmark. Real-data validation (LongMemEval-style)
is a planned next step.
N=8, single run.
The distiller adds a local-LLM call per turn (latency/cost) and JSON-parse
risk; it's optional and the model is configurable.
One dependency (yantrikdb) — kept fail-soft.
7. An engine quirk I found (full disclosure)
YantrikDB's automatic conflict detection (think()) does not currently fire on
free-text attribute-value updates — it needs claim edges that the extractor
doesn't emit for such facts. So supersession in this plugin uses an explicit correct() at the app layer (the distiller detects the update and calls it). The
YantrikDB engine maintainers are addressing the extractor side; either way the
plugin behaves correctly today via correct().
8. The ask
Would you be open to this as an optional, off-by-default plugin? The working,
tested branch is feat/yantrik-memory-plugin (I can open the PR whenever you're
good with it). If yes, any constraints I should respect first — dependency policy,
plugin conventions, where the config knobs should live? Happy to walk through the
plugin or the scenario harness + long-session demo so you can run it yourself
before reviewing code.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Proposal: an optional
yantrik_memoryplugin for Code PuppyAsking for a thumbs-up before I open a PR. Everything I've built and measured is
laid out below — including where it does not beat the current design.
1. Context
I use Code Puppy daily, and
puppy_kennelis a genuinely good memory design — theSQLite/FTS5 store plus the packer's tiered prefs band already covers brand/
convention recall well. This is not a change to the kennel. It's an optional
sibling plugin for users who want a semantic/learning memory layer, off by
default, with the kennel remaining the default.
(For background: I filed the Windows cp1252 console fix, PR #389.)
2. The gap I keep hitting
For longer-running, non-coder workflows (business users building charts and
shareable pages over weeks), an append-only BM25 store — even with a prefs band —
has two structural limits:
is now green." The old "blue" stays in the store and keeps surfacing, so the
agent receives contradictory context.
the user accepted three times" should outrank an alternative.
3. What I built and tested (standalone, before proposing)
A single-substrate memory layer on YantrikDB (one store; three memory types —
episodicverbatim turns,semanticdistilled facts,proceduralskills — plusimportance, decay, a knowledge graph, and outcome reinforcement). Pipeline:
The distiller reads each message and returns only durable facts (ignores chatter),
and it flags when a message updates a prior fact. Example (real output):
4. How it integrates (already built + tested on a branch)
I've built the plugin — branch
feat/yantrik-memory-plugin,directory
code_puppy/plugins/yantrik_memory/(11 files, zero changes to anyexisting file). It mirrors
puppy_kennel's exact plugin contract:load_promptagent_run_endregister_tools/register_agent_toolscustom_command/_help/yantrik(status, enable/disable, stats)yantrikdbisn't installed itdisables itself — zero impact on existing users), zero changes to existing
code (a new plugin directory, like the kennel).
confirms supersession (after "we rebranded to green", the recall block's current
band shows green and the stale blue is gone), that chatter adds no durable facts,
and that the plugin is inert when disabled or when
yantrikdbis absent.backend = kennel | yantrikdb | hybrid; embedder selectablefrom a bundled no-ONNX option up to MiniLM-384 / bge-small; distiller on/off and
which local model; prefs-band size; reinforcement on/off; namespace per project.
5. Evidence — including the honest parts
5a. Recall vs the kennel's prefs band
The kennel already does the hard part of basic recall well: its P0 tier
always-includes the user's stated preferences in the prompt. So the honest
comparison isn't "vs keyword search" — it's "vs an always-include prefs band like
the kennel's." I tested 8 non-coder scenarios (each with ~40 distractor memories)
against exactly that baseline.
Result: on "is the fact recalled," the prefs-band baseline ties YantrikDB. I'm
not claiming better recall — the kennel's P0 design already handles "remember
my brand color." That's the right framing; anything that beats bare keyword
search would be a strawman, since the kennel isn't bare keyword search.
5b. What YantrikDB adds that the kennel's design doesn't do
blue→green" it keeps both facts, so the agent gets contradictory context.
YantrikDB supersedes (
correct()) → only green survives. (Measured at scale below.)(not query-relevant); surfacing the right past artifact for the current request
relies on recency, or on the agent explicitly invoking the kennel's BM25 search
tool. YantrikDB surfaces query-relevant memories passively, every turn.
5c. Scale (1k → 10k memories) — does the supersession edge hold?
Real users accumulate thousands of memories. Across stores of 1k/2k/5k/10k
(durable facts + a rebrand update buried in N background memories):
Recall is scale-invariant and stays interactive (~100ms at 10k). The point:
an append-only prefs band accumulates contradictions as it grows — at 10k it's
still carrying both "blue" and "green"; YantrikDB drops the stale value. (Bulk
ingest in the harness is ~10/sec due to pacing; live use adds ~1 memory/turn, so
this is a bulk-import note, not a runtime concern.)
5d. Long-session demo (
demo_long_session.py)A non-coder ("Priya") across 4 sessions / ~a month, all natural conversation,
nothing re-told between sessions:
6. Honest caveats
they are not a third-party benchmark. Real-data validation (LongMemEval-style)
is a planned next step.
risk; it's optional and the model is configurable.
yantrikdb) — kept fail-soft.7. An engine quirk I found (full disclosure)
YantrikDB's automatic conflict detection (
think()) does not currently fire onfree-text attribute-value updates — it needs claim edges that the extractor
doesn't emit for such facts. So supersession in this plugin uses an explicit
correct()at the app layer (the distiller detects the update and calls it). TheYantrikDB engine maintainers are addressing the extractor side; either way the
plugin behaves correctly today via
correct().8. The ask
Would you be open to this as an optional, off-by-default plugin? The working,
tested branch is
feat/yantrik-memory-plugin(I can open the PR whenever you'regood with it). If yes, any constraints I should respect first — dependency policy,
plugin conventions, where the config knobs should live? Happy to walk through the
plugin or the scenario harness + long-session demo so you can run it yourself
before reviewing code.
Beta Was this translation helpful? Give feedback.
All reactions