feat(memory-core): dreaming circuit breaker to prevent runaway cost and data corruption by bahadorkhaleghi1982 · Pull Request #65589 · openclaw/openclaw

bahadorkhaleghi1982 · 2026-04-12T23:15:36Z

Summary

Adds a DreamingBudgetEnforcer module to the memory-core plugin that prevents dreaming runaway loops from burning unbounded API costs and corrupting daily notes
Implements three independent safety layers: per-cycle deduplication, sliding-window cost circuit breaker, and confidence-gated candidate filtering
Includes an integration helper (filterCandidatesThroughEnforcer) showing exactly how the enforcer plugs into the existing dreaming.ts pipeline
Covers all functionality with 51 unit tests including boundary conditions, persistence round-trips, and edge cases

Motivation

Issue #65550 documents a real production incident where the dreaming system entered an uncontrolled loop:

94 LLM subagent sessions spawned in 65 minutes
$4.35 burned on API calls producing entirely garbage output
302 lines of dream fragments overwrote real daily notes (data corruption)
All candidates had confidence: 0.00, recalls: 0 — zero-value entries that should never have been processed
76 of 94 sessions reprocessed the same stale data with no deduplication

Root causes identified:

No per-cycle deduplication — same candidates reprocessed in tight loops
No cost tracking or budget cap — no awareness of accumulated API spend
No candidate quality gate — zero-confidence entries passed through to expensive LLM calls

Users' only recourse is disabling dreaming entirely (dreaming.enabled: false), losing the long-term memory consolidation feature that is a core differentiator of OpenClaw.

Design

`DreamingBudgetEnforcer` (dreaming-budget.ts)

A stateful class instantiated at the start of each dreaming cycle with three guard methods:

Layer	Method	What it prevents
Deduplication	`shouldSkipDuplicate(snippet)`	Same content processed twice (SHA-256 fingerprint of normalized text)
Cost breaker	`isBudgetExceeded(nowMs?)`	Cumulative API cost exceeding configurable budget ($1.00/60min default)
Quality gate	`shouldSkipLowQuality(candidate)`	Zero-confidence/zero-recall candidates reaching LLM calls

Plus a composite checkCandidate() that runs all three checks in priority order (budget > quality > dedup).

Persistence: Budget state is saved to memory/.dreams/dreaming-budget.json via atomic write (temp file + rename) so it survives SIGUSR1 restarts. Uses the same file I/O patterns as short-term-promotion.ts.

Configuration: All thresholds are configurable via the plugin config schema under dreaming.budget:

{
  "dreaming": {
    "budget": {
      "maxCostUsd": 1.0,
      "windowMs": 3600000,
      "minConfidence": 0.05,
      "minRecalls": 1
    }
  }
}

Integration guide (dreaming-budget-integration.ts)

Documents the 6 exact integration points in the existing dreaming.ts pipeline with code snippets showing where each enforcer call is inserted. Also exports filterCandidatesThroughEnforcer() — a helper that filters ranked promotion candidates through all three safety layers and returns a breakdown of skip reasons.

Test plan

51 vitest unit tests covering:
- Fingerprinting: consistency, normalization, uniqueness, format validation
- Deduplication: first/second encounter, case variants, cross-instance independence
- Quality gate: zero confidence, zero recall, NaN, negative, custom thresholds
- Cost breaker: under/over budget, latching behavior, window reset, default cost, invalid values
- Composite check: priority ordering (budget > quality > dedup)
- Persistence: save/load round-trip, missing file, corrupt JSON, wrong version, restart survival
- Integration filter: valid candidates, duplicates, low quality, budget exceeded, empty list
- Boundary conditions: exactly-at-threshold for confidence/cost, latch persistence through window expiry, state immutability
Verify existing dreaming.test.ts tests still pass after integration
Manual test: enable dreaming with budget.maxCostUsd: 0.10 and verify the cycle halts at the budget with a warning log

Closes #65550

…st and data corruption The dreaming memory consolidation system currently has no runtime safeguards against runaway execution. Issue openclaw#65550 documents a real incident where 94 LLM subagent sessions spawned in 65 minutes, burning $4.35 on zero-confidence garbage while overwriting daily notes with 302 lines of dream fragments. This adds a DreamingBudgetEnforcer with three independent safety layers: 1. Per-cycle deduplication — SHA-256 fingerprinting of normalized snippets prevents the same candidates from being reprocessed in tight loops (76 of 94 sessions in the incident processed identical data). 2. Sliding-window cost circuit breaker — tracks cumulative estimated API cost within a configurable window (default $1.00/60min) and halts the cycle when exceeded. State persists to disk via atomic writes so it survives SIGUSR1 restarts. 3. Confidence-gated candidate filter — skips candidates below configurable quality thresholds (default: confidence > 0.05, recalls >= 1) before any LLM call is made, directly preventing the zero-confidence garbage that caused the incident. Includes 51 unit tests covering all three layers, boundary conditions, persistence round-trips, and the integration filter helper. Closes openclaw#65550

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 250c10210f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-12T23:18:40Z

+ * `runDreamingSweepPhases()` call path. In a real PR these changes would
+ * be made inline in dreaming.ts and dreaming-phases.ts.


Integrate budget enforcer into live dreaming flow

This change adds DreamingBudgetEnforcer and tests, but the runtime path is still unchanged: runShortTermDreamingPromotionIfTriggered in extensions/memory-core/src/dreaming.ts never imports or calls the enforcer, so no dedup/cost/quality checks actually run during production dreaming cycles. Because this file is explicitly an integration sketch rather than applied wiring, the runaway-cost/data-corruption scenario the commit claims to fix can still occur whenever dreaming is triggered.

Useful? React with 👍 / 👎.

greptile-apps · 2026-04-12T23:19:09Z

Greptile Summary

This PR adds a well-designed DreamingBudgetEnforcer module with three independent safety layers (deduplication, cost circuit breaker, and quality gate) plus 51 unit tests — but does not actually wire the enforcer into dreaming.ts or dreaming-phases.ts. Both files have zero imports or calls to the new code, so the runaway-loop production incident described in #65550 is not prevented by this change.

The integration file's own header comment confirms this: "In a real PR these changes would be made inline in dreaming.ts and dreaming-phases.ts." The six integration points (cycle init, loop guard, candidate filtering, cost recording, teardown, config schema) all remain unimplemented.
The test plan has two unchecked items that depend on the integration being present.

Confidence Score: 3/5

Not safe to merge as-is: the enforcer is never called from the dreaming pipeline, so the production incident it claims to fix remains open.

The enforcer implementation and tests are high quality, but the PR's primary stated goal — closing a real production incident — is unachieved because dreaming.ts and dreaming-phases.ts are unchanged. All three enforcement layers are inert until those integration points are added. This P1 gap blocks the intended safety guarantee.

extensions/memory-core/src/dreaming-budget-integration.ts — the integration guide describes changes that must be made to dreaming.ts and dreaming-phases.ts but those changes are absent from the PR.

Comments Outside Diff (1)

extensions/memory-core/src/dreaming-budget-integration.ts, line 147-171 (link)

Early-exit opportunity once budget is tripped

Once the budget latch is set, checkCandidate will return budget_exceeded for every remaining candidate without any useful work. The loop can break at that point instead of iterating the entire candidate list.

(The exact counting arithmetic depends on how you want to batch-count the tail — simplest is to count remaining candidates in one shot after the break, or leave this as-is if exact per-candidate accounting is preferred over early exit.)

Prompt To Fix With AI

This is a comment left during a code review.
Path: extensions/memory-core/src/dreaming-budget-integration.ts
Line: 147-171

Comment:
**Early-exit opportunity once budget is tripped**

Once the budget latch is set, `checkCandidate` will return `budget_exceeded` for every remaining candidate without any useful work. The loop can `break` at that point instead of iterating the entire candidate list.

(The exact counting arithmetic depends on how you want to batch-count the tail — simplest is to count remaining candidates in one shot after the break, or leave this as-is if exact per-candidate accounting is preferred over early exit.)

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: extensions/memory-core/src/dreaming-budget-integration.ts
Line: 6-13

Comment:
**Enforcer is never wired into the dreaming pipeline**

`dreaming.ts` and `dreaming-phases.ts` contain no imports or calls to `DreamingBudgetEnforcer` — confirmed with a grep of both files. The comment here explicitly acknowledges this: *"In a real PR these changes would be made inline in dreaming.ts and dreaming-phases.ts."*

Because the enforcer is never invoked, the runaway-loop bug from #65550 (94 sessions, $4.35, 302 lines of data corruption) is not prevented by this PR. Candidates with `confidence: 0.00, recalls: 0` still reach the LLM call path unchanged, and duplicate candidates are still reprocessed. The PR claims `Closes #65550` but the protection is entirely inert until `dreaming.ts` is updated to call `loadState()`, `isBudgetExceeded()`, `checkCandidate()`, `recordSessionCost()`, and `saveState()` at the described integration points.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/memory-core/src/dreaming-budget-integration.ts
Line: 147-171

Comment:
**Early-exit opportunity once budget is tripped**

Once the budget latch is set, `checkCandidate` will return `budget_exceeded` for every remaining candidate without any useful work. The loop can `break` at that point instead of iterating the entire candidate list.

(The exact counting arithmetic depends on how you want to batch-count the tail — simplest is to count remaining candidates in one shot after the break, or leave this as-is if exact per-candidate accounting is preferred over early exit.)

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "feat(memory-core): add dreaming circuit ..." | Re-trigger Greptile}

greptile-apps · 2026-04-12T23:19:13Z

+ * `runShortTermDreamingPromotionIfTriggered()` function and the
+ * `runDreamingSweepPhases()` call path. In a real PR these changes would
+ * be made inline in dreaming.ts and dreaming-phases.ts.
+ *
+ * ─── Integration Point 1: Cycle initialization (dreaming.ts) ──────────
+ *
+ * At the top of `runShortTermDreamingPromotionIfTriggered()`, after
+ * resolving the dreaming config, instantiate the enforcer:


Enforcer is never wired into the dreaming pipeline

dreaming.ts and dreaming-phases.ts contain no imports or calls to DreamingBudgetEnforcer — confirmed with a grep of both files. The comment here explicitly acknowledges this: "In a real PR these changes would be made inline in dreaming.ts and dreaming-phases.ts."

Because the enforcer is never invoked, the runaway-loop bug from #65550 (94 sessions, $4.35, 302 lines of data corruption) is not prevented by this PR. Candidates with confidence: 0.00, recalls: 0 still reach the LLM call path unchanged, and duplicate candidates are still reprocessed. The PR claims Closes #65550 but the protection is entirely inert until dreaming.ts is updated to call loadState(), isBudgetExceeded(), checkCandidate(), recordSessionCost(), and saveState() at the described integration points.

Prompt To Fix With AI

This is a comment left during a code review. Path: extensions/memory-core/src/dreaming-budget-integration.ts Line: 6-13 Comment: **Enforcer is never wired into the dreaming pipeline** `dreaming.ts` and `dreaming-phases.ts` contain no imports or calls to `DreamingBudgetEnforcer` — confirmed with a grep of both files. The comment here explicitly acknowledges this: *"In a real PR these changes would be made inline in dreaming.ts and dreaming-phases.ts."* Because the enforcer is never invoked, the runaway-loop bug from #65550 (94 sessions, $4.35, 302 lines of data corruption) is not prevented by this PR. Candidates with `confidence: 0.00, recalls: 0` still reach the LLM call path unchanged, and duplicate candidates are still reprocessed. The PR claims `Closes #65550` but the protection is entirely inert until `dreaming.ts` is updated to call `loadState()`, `isBudgetExceeded()`, `checkCandidate()`, `recordSessionCost()`, and `saveState()` at the described integration points. How can I resolve this? If you propose a fix, please make it concise.

mjamiv · 2026-04-13T23:29:42Z

Strong production repro + confirmation data from a 4-agent Linux fleet on v2026.4.11 — this bug is real and not QMD-specific.

Fleet context

4 independent OpenClaw sandboxes (Atlas / Axel / Mason / Buck) on Ubuntu, each with memory.backend unset (i.e. builtin, not QMD), each configured identically:

"plugins": {
  "entries": {
    "memory-core": {
      "config": {
        "dreaming": { "enabled": true, "frequency": "0 3 * * *" }
      }
    }
  }
}

Expected: 1 dreaming cycle per day at 03:00 UTC.

Observed for calendar day 2026-04-13 (as of ~21:00 UTC, partial day):

Agent	`light dreaming staged` runs today	Memory backend
Atlas (agent)	62	builtin
Axel (agent2)	42	builtin
Mason (agent3)	41	builtin
Buck (agent4)	41	builtin
Fleet total	186	—

Roughly 60–100× the configured rate, fleet-wide, with zero promotions every cycle (candidates=0, applied=0). The original #65550 reporter was on QMD with a much tighter 94/65min burst; ours is a slower but persistent grind that produces the same symptom: runaway light + REM cycles with no promotion progression.

Sample log pattern (Atlas, full cycle ~45s, gaps as tight as 49s)

19:00:00.803 memory-core: light dreaming staged 47 candidate(s)
19:00:24.679 memory-core: REM dreaming wrote reflections from 688 recent memory trace(s)
19:00:46.669 memory-core: dreaming promotion complete (workspaces=1, candidates=0, applied=0, failed=0)
19:00:49.399 memory-core: light dreaming staged 47 candidate(s)         ← 49s later, back-to-back cycle
19:01:12.831 memory-core: REM dreaming wrote reflections from 692 recent memory trace(s)
19:01:35.619 memory-core: dreaming promotion complete (workspaces=1, candidates=0, applied=0, failed=0)

The 49-second re-fire between promotion complete and the next light dreaming staged is exactly what the enforceDeduplication layer in this PR should block. Candidate counts (47 → 46 → 47 → 47 → …) stay roughly flat because new memory traces arrive between cycles but nothing is ever actually promoted to MEMORY.md.

Secondary operational impact we can confirm from production

Session sprawl (matches Dreaming: session sprawl, missing model override, no auto-cleanup #65963): Atlas now has 229 .jsonl files under agents/main/sessions/, with a small number already tagged .reset.* / .deleted.*. openclaw sessions cleanup treats them all as keep.
Dream artifact bloat: workspace/memory/.dreams/ is now 1.4 MB on Atlas (events.jsonl 231 KB, short-term-recall.json 788 KB, phase-signals.json 127 KB, session-ingestion.json 95 KB) — growing continuously with ~60 cycles/day producing no promotions.
DREAMS.md diary keeps receiving new entries from each light/REM cycle, so the data corruption risk the PR calls out (302 lines overwriting real daily notes) is also active in our environment — just at a slower fill rate.

Offer to test

Per our internal notes Buck (agent4) is designated as our test candidate for this PR — happy to pull the branch onto Buck, re-deploy, and report back with before/after 24-hour run counts + any diary / promotion deltas once the PR is ready for a real-install smoke test. Let us know whether you'd like us to wait for a review cycle or go now.

Bottom line: the bug is not confined to QMD + macOS. +1 from us on merging the deduplication + circuit-breaker layers ASAP; we have 4 production reproductions waiting for the fix.

mjamiv · 2026-04-14T16:13:11Z

Today's test confirms this PR is still load-bearing for sites with a loaded contextEngine plugin. We attempted 2026.4.14 specifically to validate the cited dreaming fixes and immediately hit #66601 / #66591, forcing a rollback. The circuit breaker is the only mechanism that gets runaway-protection to sites that can't run 4.14.

openclaw-barnacle Bot added extensions: memory-core Extension: memory-core size: L labels Apr 12, 2026

chatgpt-codex-connector Bot reviewed Apr 12, 2026

View reviewed changes

greptile-apps Bot reviewed Apr 12, 2026

View reviewed changes

mjamiv mentioned this pull request Apr 14, 2026

[Bug]: memory-core dreaming runaway loop — 94 sessions spawned in 65 min, $4.35 burned on zero-confidence garbage #65550

Closed

Svtter mentioned this pull request Apr 18, 2026

[Feature]: Dream Mode — Periodic Memory Consolidation & Reflective Learning zeroclaw-labs/zeroclaw#5849

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(memory-core): dreaming circuit breaker to prevent runaway cost and data corruption#65589

feat(memory-core): dreaming circuit breaker to prevent runaway cost and data corruption#65589
bahadorkhaleghi1982 wants to merge 1 commit intoopenclaw:mainfrom
bahadorkhaleghi1982:feat/dreaming-circuit-breaker

bahadorkhaleghi1982 commented Apr 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 12, 2026

Uh oh!

greptile-apps Bot commented Apr 12, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 12, 2026

Uh oh!

mjamiv commented Apr 13, 2026

Uh oh!

mjamiv commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		* `runDreamingSweepPhases()` call path. In a real PR these changes would
		* be made inline in dreaming.ts and dreaming-phases.ts.

Uh oh!

Conversation

bahadorkhaleghi1982 commented Apr 12, 2026

Summary

Motivation

Design

DreamingBudgetEnforcer (dreaming-budget.ts)

Integration guide (dreaming-budget-integration.ts)

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

mjamiv commented Apr 13, 2026

Fleet context

Sample log pattern (Atlas, full cycle ~45s, gaps as tight as 49s)

Secondary operational impact we can confirm from production

Offer to test

Uh oh!

mjamiv commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`DreamingBudgetEnforcer` (dreaming-budget.ts)

greptile-apps Bot commented Apr 12, 2026 •

edited

Loading