fix(heartbeat): clamp scheduler delay to Node setTimeout cap (#71414) by hclsys · Pull Request #71478 · openclaw/openclaw

hclsys · 2026-04-25T08:19:22Z

Bug

When agents.defaults.heartbeat.every resolves to >2_147_483_647 ms (~24.85d), scheduleNext() in src/infra/heartbeat-runner.ts called setTimeout(fn, delay) with the raw oversized delay. Node clamps any delay > 2^31-1 to 1 ms, fires the callback, and the heartbeat re-arms with the same oversized value — a tight loop that floods logs with TimeoutOverflowWarning: ... Timeout duration was set to 1. and crashes the gateway with exit code 1.

Reproduces with the reporter's recipe: { "agents": { "defaults": { "heartbeat": { "every": "365d" } } } }.

Fix

Clamp the computed delay to HEARTBEAT_MAX_TIMEOUT_MS = 2_147_483_647 ms before calling setTimeout. Worst case is now one heartbeat every ~24.85d instead of crash-loop. Warn once per process when the clamp fires, so a misconfigured 365d is still visible without flooding logs.

This is a defense-in-depth fix at the scheduler layer. loadConfig-level rejection (suggested in the issue) is a broader change with more blast radius and a separate semantic question — some users likely want every: 365d to mean "effectively never", and the clamped behaviour matches that intent better than a hard error does.

Test

New src/infra/heartbeat-runner.scheduler.test.ts case: sets heartbeat.every: \"365d\" with fake timers, advances 60s, and asserts runSpy was never invoked. With the bug present, runSpy would have been called tens of thousands of times during the advance.

Lint clean: pnpm oxlint src/infra/heartbeat-runner.ts src/infra/heartbeat-runner.scheduler.test.ts — 0 warnings, 0 errors.

Out of scope (deliberately)

Wrapper/supervisor auto-respawn after gateway exit code 1 (mentioned in the issue) — that lives in container/wrapper code, separate concern.
CLI silent embedded-mode fallback — tracked separately at CLI silently falls back to embedded mode when gateway is unreachable #71416.

🤖 generated with assistance from Claude Code
Co-authored-by: HCL chenglunhu@gmail.com

greptile-apps · 2026-04-25T08:21:23Z

Greptile Summary

Clamps the scheduleNext delay to 2_147_483_647 ms before passing it to setTimeout, preventing the Node crash-loop that occurred when heartbeat.every exceeded the signed-32-bit cap. The fix is well-scoped, the comment explains the Node behavior clearly, and the regression test correctly catches the original bug.

Confidence Score: 4/5

Safe to merge; the crash-loop is fixed and no regressions are introduced.

One P2 finding: the warned flag is module-level rather than per-runner, so a second startHeartbeatRunner instance with an oversized delay in the same process silently clamps without logging. No P0 or P1 issues found.

src/infra/heartbeat-runner.ts — module-level heartbeatTimeoutOverflowWarned flag

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/infra/heartbeat-runner.ts
Line: 130

Comment:
**Module-level flag shared across all runner instances**

`heartbeatTimeoutOverflowWarned` is a module-level singleton. If `startHeartbeatRunner` is called more than once in the same process (e.g., on config reload or in back-to-back tests without module isolation), a second runner with an oversized delay will silently clamp without warning. In production a config reload that preserves the oversized delay produces no signal.

A per-runner flag would scope the warning to each runner lifetime:

```ts
// inside startHeartbeatRunner, alongside other state:
let timeoutOverflowWarned = false;
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(heartbeat): clamp scheduler delay to..." | Re-trigger Greptile}

greptile-apps · 2026-04-25T08:21:26Z

+// every ~24.85 days instead of crash-loop. Warn once so misconfig is visible
+// without flooding logs. (#71414)
+const HEARTBEAT_MAX_TIMEOUT_MS = 2_147_483_647;
+let heartbeatTimeoutOverflowWarned = false;


Module-level flag shared across all runner instances

heartbeatTimeoutOverflowWarned is a module-level singleton. If startHeartbeatRunner is called more than once in the same process (e.g., on config reload or in back-to-back tests without module isolation), a second runner with an oversized delay will silently clamp without warning. In production a config reload that preserves the oversized delay produces no signal.

A per-runner flag would scope the warning to each runner lifetime:

// inside startHeartbeatRunner, alongside other state: let timeoutOverflowWarned = false;

Prompt To Fix With AI

This is a comment left during a code review. Path: src/infra/heartbeat-runner.ts Line: 130 Comment: **Module-level flag shared across all runner instances** `heartbeatTimeoutOverflowWarned` is a module-level singleton. If `startHeartbeatRunner` is called more than once in the same process (e.g., on config reload or in back-to-back tests without module isolation), a second runner with an oversized delay will silently clamp without warning. In production a config reload that preserves the oversized delay produces no signal. A per-runner flag would scope the warning to each runner lifetime: ```ts // inside startHeartbeatRunner, alongside other state: let timeoutOverflowWarned = false; ``` How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80ba6b27d1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-25T08:23:30Z

+    // 2_147_483_647 ms cap. Without clamping, setTimeout would fire after
+    // 1ms and re-arm in a tight loop, exhausting the runner.
+    const runner = startHeartbeatRunner({
+      cfg: heartbeatConfig([{ id: "main", heartbeat: { every: "365d" } }]),


Exercise actual overflow branch in regression test

This test case does not currently trigger the setTimeout overflow condition it claims to guard. scheduleNext() only overflows when nextDue - now > HEARTBEAT_MAX_TIMEOUT_MS (src/infra/heartbeat-runner.ts), but with TEST_SCHEDULER_SEED, agent main, and now=0, the phase-based first due time is about 8 days, so setTimeout is scheduled normally and the test passes even on the pre-fix behavior. That means this regression test won’t catch a future removal/break of the clamp logic; it should force rawDelay above the Node cap (e.g., by controlling phase/clock so next due is >24.85 days).

Useful? React with 👍 / 👎.

steipete

Codex maintainer review: good fix, approved.

What I checked:

#71414 is a real scheduler bug: Node turns any setTimeout delay greater than 2_147_483_647 ms into a 1 ms timer. I reproduced that directly with Node; it emitted TimeoutOverflowWarning and fired immediately.
heartbeat.every: "365d" flows through resolveHeartbeatIntervalMs() as a valid positive duration, then into resolveNextHeartbeatDueMs() / scheduleNext(), so the scheduler layer is the implicated path.
The fix clamps only the timer arm delay. It does not reject the config and does not change the stored heartbeat interval.
Early clamp ticks are safe: the wake handler checks isInterval && now < agent.nextDueMs and returns not-due, then scheduleNext() re-arms. So a 365d heartbeat does not run at ~24.85d; the clamp effectively becomes a long-timer chain.
The new regression covers the original tight loop: fake time advances 60s with 365d, and runOnce stays uncalled.
Greptile's module-level warning flag note is real but non-blocking. It only affects repeated warning visibility across multiple runner instances in the same process; the scheduling safety and user behavior are correct.

I pushed one maintainer fixup commit to the PR branch: style: format heartbeat scheduler clamp, because local pnpm format:check caught formatter drift in src/infra/heartbeat-runner.ts.

Verification:

pnpm format:check src/infra/heartbeat-runner.ts src/infra/heartbeat-runner.scheduler.test.ts
pnpm test src/infra/heartbeat-runner.scheduler.test.ts src/infra/heartbeat-runner.returns-default-unset.test.ts
node -e 'let fired=false; const t=setTimeout(()=>{fired=true; console.log("fired");},2147483648); setTimeout(()=>{clearTimeout(t); console.log({fired});},20)'

Results: format passed; infra heartbeat tests passed (2 files / 42 tests); Node repro confirmed the overflow warning plus immediate timer fire.

…w#71414) When `agents.defaults.heartbeat.every` resolves to >2_147_483_647 ms (~24.85d), the previous scheduleNext() called setTimeout with the raw delay. Node clamps any delay > 2^31-1 to 1 ms, fires the callback, and the heartbeat re-arms with the same oversized value - a tight loop that floods the log with TimeoutOverflowWarning and crashes the gateway with exit code 1. Clamp the computed delay to HEARTBEAT_MAX_TIMEOUT_MS (2_147_483_647) before calling setTimeout. The worst case is now one heartbeat every ~24.85d instead of crash-loop. Warn once per process when clamping fires, so a misconfigured "365d" remains visible without flooding. This is a defense-in-depth fix at the scheduler layer; loadConfig-level rejection is a broader change with more blast radius and a separate question (some users may legitimately want "every: 365d" to mean "effectively never"). The clamped behaviour is closer to that intent than the crash is. Test: new scheduler test sets heartbeat.every="365d" with fake timers, advances 60s, and asserts runSpy was never called (with the bug, it would be called ~60_000 times).

@hclsys

…w#71414) (openclaw#71478) * fix(heartbeat): clamp scheduler delay to Node setTimeout cap (openclaw#71414) When `agents.defaults.heartbeat.every` resolves to >2_147_483_647 ms (~24.85d), the previous scheduleNext() called setTimeout with the raw delay. Node clamps any delay > 2^31-1 to 1 ms, fires the callback, and the heartbeat re-arms with the same oversized value - a tight loop that floods the log with TimeoutOverflowWarning and crashes the gateway with exit code 1. Clamp the computed delay to HEARTBEAT_MAX_TIMEOUT_MS (2_147_483_647) before calling setTimeout. The worst case is now one heartbeat every ~24.85d instead of crash-loop. Warn once per process when clamping fires, so a misconfigured "365d" remains visible without flooding. This is a defense-in-depth fix at the scheduler layer; loadConfig-level rejection is a broader change with more blast radius and a separate question (some users may legitimately want "every: 365d" to mean "effectively never"). The clamped behaviour is closer to that intent than the crash is. Test: new scheduler test sets heartbeat.every="365d" with fake timers, advances 60s, and asserts runSpy was never called (with the bug, it would be called ~60_000 times). * style: format heartbeat scheduler clamp * fix: share safe timeout delay clamp (openclaw#71478) (thanks @hclsys) --------- Co-authored-by: Peter Steinberger <steipete@gmail.com>

@hclsys

…#71478) * fix(heartbeat): clamp scheduler delay to Node setTimeout cap (#71414) When `agents.defaults.heartbeat.every` resolves to >2_147_483_647 ms (~24.85d), the previous scheduleNext() called setTimeout with the raw delay. Node clamps any delay > 2^31-1 to 1 ms, fires the callback, and the heartbeat re-arms with the same oversized value - a tight loop that floods the log with TimeoutOverflowWarning and crashes the gateway with exit code 1. Clamp the computed delay to HEARTBEAT_MAX_TIMEOUT_MS (2_147_483_647) before calling setTimeout. The worst case is now one heartbeat every ~24.85d instead of crash-loop. Warn once per process when clamping fires, so a misconfigured "365d" remains visible without flooding. This is a defense-in-depth fix at the scheduler layer; loadConfig-level rejection is a broader change with more blast radius and a separate question (some users may legitimately want "every: 365d" to mean "effectively never"). The clamped behaviour is closer to that intent than the crash is. Test: new scheduler test sets heartbeat.every="365d" with fake timers, advances 60s, and asserts runSpy was never called (with the bug, it would be called ~60_000 times). * style: format heartbeat scheduler clamp * fix: share safe timeout delay clamp (#71478) (thanks @hclsys) --------- Co-authored-by: Peter Steinberger <steipete@gmail.com> (cherry picked from commit fd74fc5)

@hclsys

…w#71414) (openclaw#71478) * fix(heartbeat): clamp scheduler delay to Node setTimeout cap (openclaw#71414) When `agents.defaults.heartbeat.every` resolves to >2_147_483_647 ms (~24.85d), the previous scheduleNext() called setTimeout with the raw delay. Node clamps any delay > 2^31-1 to 1 ms, fires the callback, and the heartbeat re-arms with the same oversized value - a tight loop that floods the log with TimeoutOverflowWarning and crashes the gateway with exit code 1. Clamp the computed delay to HEARTBEAT_MAX_TIMEOUT_MS (2_147_483_647) before calling setTimeout. The worst case is now one heartbeat every ~24.85d instead of crash-loop. Warn once per process when clamping fires, so a misconfigured "365d" remains visible without flooding. This is a defense-in-depth fix at the scheduler layer; loadConfig-level rejection is a broader change with more blast radius and a separate question (some users may legitimately want "every: 365d" to mean "effectively never"). The clamped behaviour is closer to that intent than the crash is. Test: new scheduler test sets heartbeat.every="365d" with fake timers, advances 60s, and asserts runSpy was never called (with the bug, it would be called ~60_000 times). * style: format heartbeat scheduler clamp * fix: share safe timeout delay clamp (openclaw#71478) (thanks @hclsys) --------- Co-authored-by: Peter Steinberger <steipete@gmail.com>

@hclsys

…w#71414) (openclaw#71478) * fix(heartbeat): clamp scheduler delay to Node setTimeout cap (openclaw#71414) When `agents.defaults.heartbeat.every` resolves to >2_147_483_647 ms (~24.85d), the previous scheduleNext() called setTimeout with the raw delay. Node clamps any delay > 2^31-1 to 1 ms, fires the callback, and the heartbeat re-arms with the same oversized value - a tight loop that floods the log with TimeoutOverflowWarning and crashes the gateway with exit code 1. Clamp the computed delay to HEARTBEAT_MAX_TIMEOUT_MS (2_147_483_647) before calling setTimeout. The worst case is now one heartbeat every ~24.85d instead of crash-loop. Warn once per process when clamping fires, so a misconfigured "365d" remains visible without flooding. This is a defense-in-depth fix at the scheduler layer; loadConfig-level rejection is a broader change with more blast radius and a separate question (some users may legitimately want "every: 365d" to mean "effectively never"). The clamped behaviour is closer to that intent than the crash is. Test: new scheduler test sets heartbeat.every="365d" with fake timers, advances 60s, and asserts runSpy was never called (with the bug, it would be called ~60_000 times). * style: format heartbeat scheduler clamp * fix: share safe timeout delay clamp (openclaw#71478) (thanks @hclsys) --------- Co-authored-by: Peter Steinberger <steipete@gmail.com>

greptile-apps Bot reviewed Apr 25, 2026

View reviewed changes

openclaw-barnacle Bot added the size: XS label Apr 25, 2026

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

steipete approved these changes Apr 25, 2026

View reviewed changes

steipete added a commit to hclsys/moltbot that referenced this pull request Apr 25, 2026

fix: share safe timeout delay clamp (openclaw#71478) (thanks @hclsys)

0d3ceec

steipete force-pushed the fix/heartbeat-clamp-setTimeout-overflow-71414 branch from 3455d2f to 0d3ceec Compare April 25, 2026 08:49

steipete requested a review from a team as a code owner April 25, 2026 08:49

openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling size: S and removed size: XS labels Apr 25, 2026

hclsys and others added 3 commits April 25, 2026 09:56

style: format heartbeat scheduler clamp

9420e47

fix: share safe timeout delay clamp (openclaw#71478) (thanks @hclsys)

4e7ccb7

steipete force-pushed the fix/heartbeat-clamp-setTimeout-overflow-71414 branch from 0d3ceec to 4e7ccb7 Compare April 25, 2026 08:57

steipete merged commit fd74fc5 into openclaw:main Apr 25, 2026
65 checks passed

elliott-dandelion-cult mentioned this pull request Apr 25, 2026

2026.04.24 - initial analysis karmaterminal/openclaw#323

Open

cael-dandelion-cult mentioned this pull request Apr 25, 2026

2026.04.24 frond release - root issue for PR branch update karmaterminal/openclaw#325

Open

steipete mentioned this pull request Apr 26, 2026

CLI silently falls back to embedded mode when gateway is unreachable #71416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(heartbeat): clamp scheduler delay to Node setTimeout cap (#71414)#71478

fix(heartbeat): clamp scheduler delay to Node setTimeout cap (#71414)#71478
steipete merged 3 commits intoopenclaw:mainfrom
hclsys:fix/heartbeat-clamp-setTimeout-overflow-71414

hclsys commented Apr 25, 2026

Uh oh!

greptile-apps Bot commented Apr 25, 2026

Uh oh!

greptile-apps Bot Apr 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Uh oh!

steipete left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hclsys commented Apr 25, 2026

Bug

Fix

Test

Out of scope (deliberately)

Uh oh!

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

Confidence Score: 4/5

Uh oh!

greptile-apps Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

steipete left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants