Skip to content

fix(heartbeat): clamp scheduler delay to Node setTimeout cap (#71414)#71478

Merged
steipete merged 3 commits intoopenclaw:mainfrom
hclsys:fix/heartbeat-clamp-setTimeout-overflow-71414
Apr 25, 2026
Merged

fix(heartbeat): clamp scheduler delay to Node setTimeout cap (#71414)#71478
steipete merged 3 commits intoopenclaw:mainfrom
hclsys:fix/heartbeat-clamp-setTimeout-overflow-71414

Conversation

@hclsys
Copy link
Copy Markdown
Contributor

@hclsys hclsys commented Apr 25, 2026

Closes #71414.

Bug

When agents.defaults.heartbeat.every resolves to >2_147_483_647 ms (~24.85d), scheduleNext() in src/infra/heartbeat-runner.ts called setTimeout(fn, delay) with the raw oversized delay. Node clamps any delay > 2^31-1 to 1 ms, fires the callback, and the heartbeat re-arms with the same oversized value — a tight loop that floods logs with TimeoutOverflowWarning: ... Timeout duration was set to 1. and crashes the gateway with exit code 1.

Reproduces with the reporter's recipe: { "agents": { "defaults": { "heartbeat": { "every": "365d" } } } }.

Fix

Clamp the computed delay to HEARTBEAT_MAX_TIMEOUT_MS = 2_147_483_647 ms before calling setTimeout. Worst case is now one heartbeat every ~24.85d instead of crash-loop. Warn once per process when the clamp fires, so a misconfigured 365d is still visible without flooding logs.

This is a defense-in-depth fix at the scheduler layer. loadConfig-level rejection (suggested in the issue) is a broader change with more blast radius and a separate semantic question — some users likely want every: 365d to mean "effectively never", and the clamped behaviour matches that intent better than a hard error does.

Test

New src/infra/heartbeat-runner.scheduler.test.ts case: sets heartbeat.every: \"365d\" with fake timers, advances 60s, and asserts runSpy was never invoked. With the bug present, runSpy would have been called tens of thousands of times during the advance.

Lint clean: pnpm oxlint src/infra/heartbeat-runner.ts src/infra/heartbeat-runner.scheduler.test.ts — 0 warnings, 0 errors.

Out of scope (deliberately)

🤖 generated with assistance from Claude Code
Co-authored-by: HCL chenglunhu@gmail.com

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

Clamps the scheduleNext delay to 2_147_483_647 ms before passing it to setTimeout, preventing the Node crash-loop that occurred when heartbeat.every exceeded the signed-32-bit cap. The fix is well-scoped, the comment explains the Node behavior clearly, and the regression test correctly catches the original bug.

Confidence Score: 4/5

Safe to merge; the crash-loop is fixed and no regressions are introduced.

One P2 finding: the warned flag is module-level rather than per-runner, so a second startHeartbeatRunner instance with an oversized delay in the same process silently clamps without logging. No P0 or P1 issues found.

src/infra/heartbeat-runner.ts — module-level heartbeatTimeoutOverflowWarned flag

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/infra/heartbeat-runner.ts
Line: 130

Comment:
**Module-level flag shared across all runner instances**

`heartbeatTimeoutOverflowWarned` is a module-level singleton. If `startHeartbeatRunner` is called more than once in the same process (e.g., on config reload or in back-to-back tests without module isolation), a second runner with an oversized delay will silently clamp without warning. In production a config reload that preserves the oversized delay produces no signal.

A per-runner flag would scope the warning to each runner lifetime:

```ts
// inside startHeartbeatRunner, alongside other state:
let timeoutOverflowWarned = false;
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(heartbeat): clamp scheduler delay to..." | Re-trigger Greptile

Comment thread src/infra/heartbeat-runner.ts Outdated
// every ~24.85 days instead of crash-loop. Warn once so misconfig is visible
// without flooding logs. (#71414)
const HEARTBEAT_MAX_TIMEOUT_MS = 2_147_483_647;
let heartbeatTimeoutOverflowWarned = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Module-level flag shared across all runner instances

heartbeatTimeoutOverflowWarned is a module-level singleton. If startHeartbeatRunner is called more than once in the same process (e.g., on config reload or in back-to-back tests without module isolation), a second runner with an oversized delay will silently clamp without warning. In production a config reload that preserves the oversized delay produces no signal.

A per-runner flag would scope the warning to each runner lifetime:

// inside startHeartbeatRunner, alongside other state:
let timeoutOverflowWarned = false;
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/heartbeat-runner.ts
Line: 130

Comment:
**Module-level flag shared across all runner instances**

`heartbeatTimeoutOverflowWarned` is a module-level singleton. If `startHeartbeatRunner` is called more than once in the same process (e.g., on config reload or in back-to-back tests without module isolation), a second runner with an oversized delay will silently clamp without warning. In production a config reload that preserves the oversized delay produces no signal.

A per-runner flag would scope the warning to each runner lifetime:

```ts
// inside startHeartbeatRunner, alongside other state:
let timeoutOverflowWarned = false;
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80ba6b27d1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// 2_147_483_647 ms cap. Without clamping, setTimeout would fire after
// 1ms and re-arm in a tight loop, exhausting the runner.
const runner = startHeartbeatRunner({
cfg: heartbeatConfig([{ id: "main", heartbeat: { every: "365d" } }]),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exercise actual overflow branch in regression test

This test case does not currently trigger the setTimeout overflow condition it claims to guard. scheduleNext() only overflows when nextDue - now > HEARTBEAT_MAX_TIMEOUT_MS (src/infra/heartbeat-runner.ts), but with TEST_SCHEDULER_SEED, agent main, and now=0, the phase-based first due time is about 8 days, so setTimeout is scheduled normally and the test passes even on the pre-fix behavior. That means this regression test won’t catch a future removal/break of the clamp logic; it should force rawDelay above the Node cap (e.g., by controlling phase/clock so next due is >24.85 days).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@steipete steipete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex maintainer review: good fix, approved.

What I checked:

  • #71414 is a real scheduler bug: Node turns any setTimeout delay greater than 2_147_483_647 ms into a 1 ms timer. I reproduced that directly with Node; it emitted TimeoutOverflowWarning and fired immediately.
  • heartbeat.every: "365d" flows through resolveHeartbeatIntervalMs() as a valid positive duration, then into resolveNextHeartbeatDueMs() / scheduleNext(), so the scheduler layer is the implicated path.
  • The fix clamps only the timer arm delay. It does not reject the config and does not change the stored heartbeat interval.
  • Early clamp ticks are safe: the wake handler checks isInterval && now < agent.nextDueMs and returns not-due, then scheduleNext() re-arms. So a 365d heartbeat does not run at ~24.85d; the clamp effectively becomes a long-timer chain.
  • The new regression covers the original tight loop: fake time advances 60s with 365d, and runOnce stays uncalled.
  • Greptile's module-level warning flag note is real but non-blocking. It only affects repeated warning visibility across multiple runner instances in the same process; the scheduling safety and user behavior are correct.

I pushed one maintainer fixup commit to the PR branch: style: format heartbeat scheduler clamp, because local pnpm format:check caught formatter drift in src/infra/heartbeat-runner.ts.

Verification:

pnpm format:check src/infra/heartbeat-runner.ts src/infra/heartbeat-runner.scheduler.test.ts
pnpm test src/infra/heartbeat-runner.scheduler.test.ts src/infra/heartbeat-runner.returns-default-unset.test.ts
node -e 'let fired=false; const t=setTimeout(()=>{fired=true; console.log("fired");},2147483648); setTimeout(()=>{clearTimeout(t); console.log({fired});},20)'

Results: format passed; infra heartbeat tests passed (2 files / 42 tests); Node repro confirmed the overflow warning plus immediate timer fire.

steipete added a commit to hclsys/moltbot that referenced this pull request Apr 25, 2026
@steipete steipete force-pushed the fix/heartbeat-clamp-setTimeout-overflow-71414 branch from 3455d2f to 0d3ceec Compare April 25, 2026 08:49
@steipete steipete requested a review from a team as a code owner April 25, 2026 08:49
@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling size: S and removed size: XS labels Apr 25, 2026
hclsys and others added 3 commits April 25, 2026 09:56
…w#71414)

When `agents.defaults.heartbeat.every` resolves to >2_147_483_647 ms
(~24.85d), the previous scheduleNext() called setTimeout with the raw
delay. Node clamps any delay > 2^31-1 to 1 ms, fires the callback, and
the heartbeat re-arms with the same oversized value - a tight loop that
floods the log with TimeoutOverflowWarning and crashes the gateway with
exit code 1.

Clamp the computed delay to HEARTBEAT_MAX_TIMEOUT_MS (2_147_483_647)
before calling setTimeout. The worst case is now one heartbeat every
~24.85d instead of crash-loop. Warn once per process when clamping
fires, so a misconfigured "365d" remains visible without flooding.

This is a defense-in-depth fix at the scheduler layer; loadConfig-level
rejection is a broader change with more blast radius and a separate
question (some users may legitimately want "every: 365d" to mean
"effectively never"). The clamped behaviour is closer to that intent
than the crash is.

Test: new scheduler test sets heartbeat.every="365d" with fake timers,
advances 60s, and asserts runSpy was never called (with the bug, it
would be called ~60_000 times).
@steipete steipete force-pushed the fix/heartbeat-clamp-setTimeout-overflow-71414 branch from 0d3ceec to 4e7ccb7 Compare April 25, 2026 08:57
@steipete steipete merged commit fd74fc5 into openclaw:main Apr 25, 2026
65 checks passed
steipete added a commit to MonkeyLeeT/openclaw that referenced this pull request Apr 25, 2026
…w#71414) (openclaw#71478)

* fix(heartbeat): clamp scheduler delay to Node setTimeout cap (openclaw#71414)

When `agents.defaults.heartbeat.every` resolves to >2_147_483_647 ms
(~24.85d), the previous scheduleNext() called setTimeout with the raw
delay. Node clamps any delay > 2^31-1 to 1 ms, fires the callback, and
the heartbeat re-arms with the same oversized value - a tight loop that
floods the log with TimeoutOverflowWarning and crashes the gateway with
exit code 1.

Clamp the computed delay to HEARTBEAT_MAX_TIMEOUT_MS (2_147_483_647)
before calling setTimeout. The worst case is now one heartbeat every
~24.85d instead of crash-loop. Warn once per process when clamping
fires, so a misconfigured "365d" remains visible without flooding.

This is a defense-in-depth fix at the scheduler layer; loadConfig-level
rejection is a broader change with more blast radius and a separate
question (some users may legitimately want "every: 365d" to mean
"effectively never"). The clamped behaviour is closer to that intent
than the crash is.

Test: new scheduler test sets heartbeat.every="365d" with fake timers,
advances 60s, and asserts runSpy was never called (with the bug, it
would be called ~60_000 times).

* style: format heartbeat scheduler clamp

* fix: share safe timeout delay clamp (openclaw#71478) (thanks @hclsys)

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
steipete added a commit that referenced this pull request Apr 25, 2026
…#71478)

* fix(heartbeat): clamp scheduler delay to Node setTimeout cap (#71414)

When `agents.defaults.heartbeat.every` resolves to >2_147_483_647 ms
(~24.85d), the previous scheduleNext() called setTimeout with the raw
delay. Node clamps any delay > 2^31-1 to 1 ms, fires the callback, and
the heartbeat re-arms with the same oversized value - a tight loop that
floods the log with TimeoutOverflowWarning and crashes the gateway with
exit code 1.

Clamp the computed delay to HEARTBEAT_MAX_TIMEOUT_MS (2_147_483_647)
before calling setTimeout. The worst case is now one heartbeat every
~24.85d instead of crash-loop. Warn once per process when clamping
fires, so a misconfigured "365d" remains visible without flooding.

This is a defense-in-depth fix at the scheduler layer; loadConfig-level
rejection is a broader change with more blast radius and a separate
question (some users may legitimately want "every: 365d" to mean
"effectively never"). The clamped behaviour is closer to that intent
than the crash is.

Test: new scheduler test sets heartbeat.every="365d" with fake timers,
advances 60s, and asserts runSpy was never called (with the bug, it
would be called ~60_000 times).

* style: format heartbeat scheduler clamp

* fix: share safe timeout delay clamp (#71478) (thanks @hclsys)

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
(cherry picked from commit fd74fc5)
Angfr95 pushed a commit to Angfr95/openclaw that referenced this pull request Apr 25, 2026
…w#71414) (openclaw#71478)

* fix(heartbeat): clamp scheduler delay to Node setTimeout cap (openclaw#71414)

When `agents.defaults.heartbeat.every` resolves to >2_147_483_647 ms
(~24.85d), the previous scheduleNext() called setTimeout with the raw
delay. Node clamps any delay > 2^31-1 to 1 ms, fires the callback, and
the heartbeat re-arms with the same oversized value - a tight loop that
floods the log with TimeoutOverflowWarning and crashes the gateway with
exit code 1.

Clamp the computed delay to HEARTBEAT_MAX_TIMEOUT_MS (2_147_483_647)
before calling setTimeout. The worst case is now one heartbeat every
~24.85d instead of crash-loop. Warn once per process when clamping
fires, so a misconfigured "365d" remains visible without flooding.

This is a defense-in-depth fix at the scheduler layer; loadConfig-level
rejection is a broader change with more blast radius and a separate
question (some users may legitimately want "every: 365d" to mean
"effectively never"). The clamped behaviour is closer to that intent
than the crash is.

Test: new scheduler test sets heartbeat.every="365d" with fake timers,
advances 60s, and asserts runSpy was never called (with the bug, it
would be called ~60_000 times).

* style: format heartbeat scheduler clamp

* fix: share safe timeout delay clamp (openclaw#71478) (thanks @hclsys)

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
ayesha-aziz123 pushed a commit to ayesha-aziz123/openclaw that referenced this pull request Apr 26, 2026
…w#71414) (openclaw#71478)

* fix(heartbeat): clamp scheduler delay to Node setTimeout cap (openclaw#71414)

When `agents.defaults.heartbeat.every` resolves to >2_147_483_647 ms
(~24.85d), the previous scheduleNext() called setTimeout with the raw
delay. Node clamps any delay > 2^31-1 to 1 ms, fires the callback, and
the heartbeat re-arms with the same oversized value - a tight loop that
floods the log with TimeoutOverflowWarning and crashes the gateway with
exit code 1.

Clamp the computed delay to HEARTBEAT_MAX_TIMEOUT_MS (2_147_483_647)
before calling setTimeout. The worst case is now one heartbeat every
~24.85d instead of crash-loop. Warn once per process when clamping
fires, so a misconfigured "365d" remains visible without flooding.

This is a defense-in-depth fix at the scheduler layer; loadConfig-level
rejection is a broader change with more blast radius and a separate
question (some users may legitimately want "every: 365d" to mean
"effectively never"). The clamped behaviour is closer to that intent
than the crash is.

Test: new scheduler test sets heartbeat.every="365d" with fake timers,
advances 60s, and asserts runSpy was never called (with the bug, it
would be called ~60_000 times).

* style: format heartbeat scheduler clamp

* fix: share safe timeout delay clamp (openclaw#71478) (thanks @hclsys)

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling gateway Gateway runtime size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Heartbeat duration every >24.85d overflows Node setTimeout, crashes gateway with no auto-respawn

2 participants