Skip to content

fix(memory-core): retry dreaming cron registration after startup deferral [AI-assisted]#73170

Closed
luyao618 wants to merge 1 commit into
openclaw:mainfrom
luyao618:fix/memory-core-dreaming-cron-startup-race
Closed

fix(memory-core): retry dreaming cron registration after startup deferral [AI-assisted]#73170
luyao618 wants to merge 1 commit into
openclaw:mainfrom
luyao618:fix/memory-core-dreaming-cron-startup-race

Conversation

@luyao618
Copy link
Copy Markdown
Contributor

🤖 AI-assisted (built with Claude Code via Hermes orchestration). Test level: fully tested. Prompt summary available on request.

Summary

  • Problem: The memory-core dreaming cron job fails to register at gateway startup due to a timing race — the gateway_start hook fires before the cron service is available. The runtime reconciliation path only triggers on heartbeat/cron events, creating a deadlock when the cron was never registered.
  • Why it matters: The Memory Dreaming Promotion task never executes, breaking the dreaming feature entirely for affected users.
  • What changed: Added a deferred retry (5-second setTimeout) after the startup deferral so the managed dreaming cron job registers once sidecars have settled.
  • What did NOT change (scope boundary): No changes to core gateway code, no changes to the cron service lifecycle, no changes to the runtime reconciliation logic.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Plugin / extension

Linked Issue/PR

Root Cause

  • Root cause: The gateway_start hook fires via void hookRunner.runGatewayStart() (fire-and-forget) before sidecars fully start, so getCron() returns null. The existing runtime retry path in before_agent_reply only fires on heartbeat/cron triggers — if the dreaming cron never registered and no heartbeat fires, the reconciliation never runs, creating a deadlock.
  • Missing detection / guardrail: No fallback mechanism to retry cron registration independent of incoming heartbeat/cron events.
  • Contributing context: The fix for [Bug]: memory-core dreaming cron reconciliation depends on stale one-time startup cron reference — fails at runtime #67362 added runtime retry via gatewayContext.getCron() in before_agent_reply, but this path depends on heartbeat/cron triggers that may not fire if dreaming cron never registered.

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
  • Target test or file: extensions/memory-core/src/dreaming.test.ts
  • Scenario the test should lock in: When cron service is unavailable at startup but becomes available during the deferred retry window, the managed dreaming cron job is successfully registered.
  • Why this is the smallest reliable guardrail: The unit test mocks the cron service availability timing and verifies the deferred retry path fires and succeeds.
  • Existing test that already covers this (if any): N/A — existing tests only covered the startup success and runtime heartbeat-triggered reconciliation paths.

User-visible / Behavior Changes

  • The Memory Dreaming Promotion cron job now registers successfully even when the cron service is briefly unavailable at gateway startup.

Diagram (if applicable)

N/A

Security Impact (required)

  • New permissions/capabilities? No
  • This fix adds a deferred retry timer within the existing memory-core plugin scope. No new permissions, no security surface changes, no credential handling.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

Adds a 5-second setTimeout deferred retry in the gateway_start handler so the managed dreaming cron job registers even when the cron service is unavailable at startup, breaking the deadlock described in #72841. Two regression tests cover the retry-success and retry-failure paths.

Confidence Score: 4/5

Safe to merge — the fix is narrowly scoped and the deferred retry is well-guarded by existing throttle logic.

No P0 or P1 findings. Single P2: the timer handle is not stored, preventing future cancellation. Throttle in reconcileManagedDreamingCron prevents double-registration. Tests cover both retry-success and retry-failure paths.

No files require special attention.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/memory-core/src/dreaming.ts
Line: 797-803

Comment:
**No handle stored for the deferred `setTimeout`**

The timer ID is discarded, so there is no way to cancel it if the gateway shuts down or the plugin is torn down within the 5-second window. This is harmless in production (it fires once and exits), but it can leave a dangling async operation in tests that don't advance the fake timers past the delay, and it prevents any future cleanup hook from cancelling it.

If a plugin-level teardown / `cleanup` hook exists, storing the handle and calling `clearTimeout(retryHandle)` there would be the complete fix.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(memory-core): retry dreaming cron re..." | Re-trigger Greptile

Comment on lines +797 to +803
setTimeout(() => {
void reconcileManagedDreamingCron({ reason: "runtime" }).catch((err) => {
api.logger.error(
`memory-core: deferred dreaming cron retry failed: ${formatErrorMessage(err)}`,
);
});
}, STARTUP_CRON_RETRY_DELAY_MS);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No handle stored for the deferred setTimeout

The timer ID is discarded, so there is no way to cancel it if the gateway shuts down or the plugin is torn down within the 5-second window. This is harmless in production (it fires once and exits), but it can leave a dangling async operation in tests that don't advance the fake timers past the delay, and it prevents any future cleanup hook from cancelling it.

If a plugin-level teardown / cleanup hook exists, storing the handle and calling clearTimeout(retryHandle) there would be the complete fix.

Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/memory-core/src/dreaming.ts
Line: 797-803

Comment:
**No handle stored for the deferred `setTimeout`**

The timer ID is discarded, so there is no way to cancel it if the gateway shuts down or the plugin is torn down within the 5-second window. This is harmless in production (it fires once and exits), but it can leave a dangling async operation in tests that don't advance the fake timers past the delay, and it prevents any future cleanup hook from cancelling it.

If a plugin-level teardown / `cleanup` hook exists, storing the handle and calling `clearTimeout(retryHandle)` there would be the complete fix.

How can I resolve this? If you propose a fix, please make it concise.

…rral

When the cron service is unavailable at gateway_start (startup timing
race), schedule a deferred retry after 5 seconds so the managed
dreaming cron job registers once sidecars have settled. This prevents
a deadlock where the runtime reconciliation path (heartbeat/cron
triggered) never fires because the cron was never registered.

Closes openclaw#72841

AI-assisted (built with Claude Code via Hermes orchestration).
@luyao618 luyao618 force-pushed the fix/memory-core-dreaming-cron-startup-race branch from bb52527 to 68462d5 Compare April 28, 2026 15:07
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 28, 2026

Thanks for the context here. I swept through the related work, and this is now duplicate or superseded.

Current main already contains a more complete maintainer implementation of this PR's intended memory-core dreaming cron startup retry. The fix landed through merged PR #73493 as commit e84ebea, with bounded retries, timer cleanup, gateway_stop disposal, tests, and an Unreleased changelog entry for #72841.

Best possible solution:

Close this PR as superseded by the merged maintainer implementation in #73493. Keep the bounded retry and cleanup behavior already on main, and let the fix ship with the next OpenClaw release.

What I checked:

So I’m closing this here and keeping the remaining discussion on the canonical linked item.

Codex review notes: model gpt-5.5, reasoning high; reviewed against f256eeba431b; fix evidence: commit e84ebeafbd67.

@clawsweeper clawsweeper Bot closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: memory-core: dreaming cron never registers — startup timing + reconciliation skip bug

1 participant