fix(memory): harden curation tier — no silent LLM no-op + lossless update guard#1242
Conversation
…date guard 1. Silent LLM no-op: the curation call used MaxOutputTokens=50, so a reasoning model burned the budget on hidden thinking and returned empty content — the tier then silently fell through to deterministic resolution. Bump to 512, strip inline <think>...</think> in ParseResponse, and log curation_llm_no_decision instead of returning null silently. 2. Destructive UPDATE: the update path overwrote markdown_body wholesale, so a narrower/divergent proposal destroyed the existing memory's content. GuardDestructiveUpdate now only allows an UPDATE when the proposal preserves (is a superset of) the existing content; otherwise it downgrades to Skip and keeps the existing memory. Applied to both rules- and LLM-tier decisions. Refs netclaw-dev#1241
Aaronontheweb
left a comment
There was a problem hiding this comment.
Relatively targeted change and preventing useful memories from being damaged by updates and for keeping garbage tokens out of the memory observer stream in the first place.
| return null; | ||
| } | ||
|
|
||
| private static string StripThinkBlocks(string text) |
There was a problem hiding this comment.
Only impacts thinking models, obviously, but will probably need to be hardened against more variants (i.e. incomplete thinking blocks) in the future. This is more of an issue for self-hosted inference than it is for cloud-hosted.
| /// divergent proposal's new detail, which is recoverable; destroying accumulated | ||
| /// content is not. Non-UPDATE decisions pass through unchanged. | ||
| /// </summary> | ||
| public static CurationDecision GuardDestructiveUpdate( |
There was a problem hiding this comment.
What this does is essentially stops old memories from being overwritten by something different unless the new content is a word for word superset (proposed by the LLM) of the old memory. This is designed to just prevent valuable information from being lost.
| // real signal (empty/garbled output, or a reasoning model that consumed | ||
| // its token budget). Surface it so the deterministic fallback that | ||
| // follows is observable rather than invisible. | ||
| log.Warning( |
There was a problem hiding this comment.
not a lot a user can do it about this other than let us know - it's a model-alignment problem rather than a technical / code issue.
Two safety fixes to the memory curation tier (
MemoryCurationActor+ helpers), both about the tier failing quietly. Surfaced while measuring the dedup path against a real store — context and the broader plan in #1241.1. The LLM curation tier silently no-ops on reasoning models
TryLlmEvaluationAsynccalled the model withMaxOutputTokens = 50. A reasoning model spends that budget on hidden thinking and returns empty content, soParseResponsegot nothing and the tier silently fell through to deterministic auto-resolution — the curation LLM effectively disabled, with no signal. That's a "no silent fallback" violation.Fix
MaxOutputTokens = 512so a thinking model has room to emit the bare keyword the prompt asks for; non-reasoning models stop after the keyword and pay nothing extra.ParseResponsestrips inline<think>…</think>traces before matching, for stacks that inline reasoning in the content.curation_llm_no_decision(Warning) instead of returningnullsilently — the deterministic fallback is now observable.Fully suppressing reasoning at the provider level is a follow-up; this hardens the failure mode.
2. UPDATE can silently overwrite richer memories with narrower proposals
The UPDATE path does
markdown_body = excluded.markdown_body— a wholesale replace. When a proposal is narrower or divergent than the memory it updates, the existing content is destroyed. I measured this clobbering dated point-in-time readings (newer overwriting older) — real, irreversible data loss.Fix —
CurationRulesEvaluator.GuardDestructiveUpdate, applied to both rules- and LLM-tier UPDATE decisions: an UPDATE only proceeds when the proposal preserves the existing content (the existing body is wholly contained in the proposal). Otherwise it downgrades toSkipand keeps the existing memory.Deliberate tradeoff (Option A): when a proposal isn't a superset, its new detail is dropped in favor of keeping the existing memory intact — a dropped line is recoverable, destroyed content is not. Note this also means a genuine value-replace ("v1.0" → "v2.0") will now
Skiprather than overwrite (keeping the old value), since "2.0" doesn't contain "1.0". Clean handling of value-replacements and lossless enrich both need the LLM-synthesis / record-classification work tracked in #1241; this is the minimal stop-the-data-loss guard until then.Relation to #1241 — partial step, does not close it
This is intentionally not a resolution of #1241; it stays open after this merges. It ships only the data-loss stopgap from #1241's Problem 2 — the destructive-overwrite guard (Option A), which keeps the existing memory rather than letting a narrower auto-curation proposal clobber it.
Still open in #1241:
(Scope note: the guard applies only to the automatic curation pipeline. Explicit
update_memoryedits go through a separate find-and-replace path and are not affected.)Tests / gates
CurationPromptBuilderTests(think-block stripping, unclosed think),CurationRulesEvaluatorTests(guard: drops-content→skip, superset→allow, whitespace/case-insensitive, non-update passthrough, missing-target passthrough).slopwatch analyzeclean; copyright headers present../evals/run-evals.shshould run in CI before merge.