fix(cron): re-arm timer when onTimer rejects unexpectedly#7
Conversation
Before this fix, if onTimer rejected unexpectedly (e.g. a Node.js internal error or GC pressure causing an exception in the finally block's armTimer call), the .catch() handler only logged the error. The scheduler chain was then permanently broken with no timer set, silently halting all cron jobs until the next gateway restart. Fix: call armTimer(state) inside the .catch() handler so a rare unexpected rejection does not permanently stop the scheduler. Regression test exercises the path by making nowMs() throw on the 4th call (inside the finally block's armTimer), which causes onTimer to reject; the .catch() re-arm is then verified via state.timer. Closes openclaw#73166. https://claude.ai/code/session_01NHHoPHTrH4F9qFJBJHqjTk
There was a problem hiding this comment.
🟡 armRunningRecheckTimer's .catch() doesn't re-arm, leaving scheduler vulnerable to the same permanent death the PR aims to fix
The PR adds re-arm logic in armTimer's .catch() handler (lines 553-559) to prevent permanent scheduler death when onTimer rejects unexpectedly. However, armRunningRecheckTimer at src/cron/service/timer.ts:572-576 has the same void onTimer(state).catch(...) pattern without the re-arm fix. If onTimer is called from armRunningRecheckTimer's callback with state.running = false (possible via a race where the watchdog fires at the same event-loop tick that the primary onTimer completes), and armTimer throws in onTimer's finally block (line 736), the .catch() at line 573-574 only logs the error without re-arming. At that point, armRunningRecheckTimer's callback cleared the timer that armTimer had set (line 597 calls armRunningRecheckTimer which clears state.timer), and the finally block's armTimer cleared the watchdog before throwing (line 508-509). No active timer remains, permanently killing the scheduler chain — the exact failure mode this PR intends to prevent.
(Refers to lines 572-576)
Was this helpful? React with 👍 or 👎 to provide feedback.
Closes openclaw#73166
Problem
When
onTimerrejects unexpectedly (e.g. a transient error thrown from inside thefinallyblock'sarmTimercall due to Node.js internals or GC pressure), the.catch()handler inarmTimer's setTimeout callback only logs the error. No new timer is registered, permanently breaking the scheduler chain with no recovery path until the next gateway restart.Root cause
src/cron/service/timer.ts— thesetTimeoutcallback insidearmTimer:If
onTimerrejects, the catch block logs but does not re-arm.state.timeris left asnull(set to null at the top ofarmTimerbefore the throw).Fix
Call
armTimer(state)inside the.catch()handler so the scheduler chain survives an unexpected rejection.Regression test
Added to
src/cron/service.armtimer-tight-loop.test.ts: makesnowMs()throw on the 4th call (inside thefinallyblock'sarmTimer), which causesonTimerto reject. Verifies thatlog.erroris called andstate.timeris non-null after the.catch()re-arm. All 4 tests in the file pass.Generated by Claude Code