fix(cron): finalize task ledger before clearing active-jobs flag (#71963) by Sanjays2402 · Pull Request #71968 · openclaw/openclaw

Sanjays2402 · 2026-04-26T04:47:51Z

Fixes part of #71963.

What

applyOutcomeToStoredJob currently clears the in-memory active-jobs flag before finalizing the cron run's task-ledger record. The registry maintenance sweep (60s timer) treats cron tasks as lost / backing session missing when they are still running AND isCronJobActive(jobId) returns false. The clear-then-finalize order opens a sub-millisecond window where an in-flight sweep can observe exactly that state and mark a successful cron run lost.

Swap the order: finalize the task ledger first, then clear the active-jobs flag. The active flag now vouches for the task right up until it has reached a terminal status, so the sweep can never observe the false-lost state in-process.

Why this matches the report

The reporter (#71963) sees Task error lost ... backing session missing errors accumulating against sessionTarget: "isolated" cron jobs whose lastRunStatus is ok. Their proposal #2 — "write the terminal task state before the isolated session is reaped" — is exactly this change at the cron-side bookkeeping layer.

Scope

7 lines, comment included.
Existing cron timer/ops tests still pass (timer.test.ts, timer.regression.test.ts, ops.test.ts, task-registry.maintenance.issue-60299.test.ts).
No public API change.

What this does NOT fix

A second cause in the same report — cron tasks left in running state across gateway restarts — is not addressed here. After a restart the in-memory active-jobs set is empty, so any pre-existing running cron task records hit the same condition on the first sweep and get marked lost. Reconciling those needs a startup-time pass (e.g. cancel-with-reason or promote-to-succeeded based on cronJob.lastRunAtMs) and intentionally lives in a follow-up PR to keep this change reviewable.

The 883 inconsistent_timestamps warnings the reporter mentions are also a separate issue.

Test

pnpm test src/cron/service/timer.test.ts src/cron/service/ops.test.ts \
  src/cron/service/timer.regression.test.ts \
  src/tasks/task-registry.maintenance.issue-60299.test.ts -- --run
# Test Files  4 passed (4)
# Tests       40 passed (40)

…nclaw#71963) When a cron run completes, applyOutcomeToStoredJob currently clears the in-memory active-jobs flag *before* finalizing the task ledger. The task registry maintenance sweep runs every 60s on a separate timer and treats cron tasks as lost when they are still in 'running' state AND isCronJobActive(jobId) returns false. The clear-then-finalize order opens a window where an in-flight sweep can observe exactly that state and mis-mark a successful cron run as 'lost' with detail 'backing session missing'. Swapping the order eliminates the race for in-process sweeps: the task moves to a terminal status while the active-jobs flag still vouches for it, so the sweep can never observe the false-lost state. This addresses the steady accumulation of false-positive 'lost / backing session missing' findings for sessionTarget=isolated cron jobs reported in openclaw#71963. A follow-up is still warranted to reconcile cron tasks left in 'running' state across gateway restarts (no in-memory active-jobs state to vouch for them on first sweep), but that touches startup reconciliation and is intentionally out of scope for this change.

greptile-apps · 2026-04-26T04:49:08Z

Greptile Summary

Swaps the order of tryFinishCronTaskRun and clearCronJobActive in applyOutcomeToStoredJob so the task ledger reaches a terminal state before the in-memory active-jobs flag is cleared. Both called functions are synchronous, making this a hard ordering guarantee that closes the race window described in #71963.

Confidence Score: 5/5

Safe to merge — a 7-line, correctly-ordered fix with no API changes and all existing tests passing.

The change is minimal and targeted, both affected functions are synchronous (no new async risk), the comment accurately describes the invariant being enforced, and the described test suite covers the relevant code paths.

No files require special attention.

_{Reviews (1): Last reviewed commit: "fix(cron): finalize task ledger before c..." | Re-trigger Greptile}

clawsweeper · 2026-04-26T10:45:22Z

Closing this as implemented after Codex automated review.

Current main already fixes the #71963 user problem that #71968 targeted, but through task-registry reconciliation rather than the PR's timer-order-only patch. Main recovers completed cron task records from durable run logs or cron job state before projecting or marking them lost, and regression tests cover the successful-run and restart/interrupted paths.

Best possible solution:

Close #71968 as handled on current main. Keep the broader task-registry recovery implementation and its issue-specific tests; only revisit the timer-order change if a new reproducer shows a remaining in-process ordering bug after the current main fix.

What I checked:

Current implementation recovers completed cron tasks before lost handling: resolveDurableCronTaskRecovery maps finished cron run-log entries or matching cron job state into terminal task states, and runTaskRegistryMaintenance applies that recovery before shouldMarkLost can mark the task lost. (src/tasks/task-registry.maintenance.ts:223, 6d60b035b4e7)
Audit/inspection path projects recovered cron tasks: reconcileTaskRecordForOperatorInspection applies the same durable cron recovery before projecting a task as lost, so openclaw tasks audit no longer needs to report completed isolated cron runs as lost / backing session missing. (src/tasks/task-registry.maintenance.ts:442, 6d60b035b4e7)
Regression tests cover the Cron jobs with sessionTarget=isolated produce false lost errors after successful runs #71963 recovery behavior: The issue-specific maintenance test verifies that a stale cron task with an ok durable run-log entry is shown and persisted as succeeded, with maintenance reporting recovered: 1 and reconciled: 0. The next test covers recovery from durable cron job state when run logs are absent. (src/tasks/task-registry.maintenance.issue-60299.test.ts:214, 6d60b035b4e7)
Changelog records the main fix for the linked report: The current changelog entry explicitly says cron/tasks now recover completed cron task ledger records from durable run logs and job state before marking them lost, reducing false backing session missing audit errors for isolated cron runs, and marks it as fixing Cron jobs with sessionTarget=isolated produce false lost errors after successful runs #71963. (CHANGELOG.md:143, 6d60b035b4e7)
PR patch is a narrower alternate implementation: The PR patch only swaps clearCronJobActive(result.jobId) and tryFinishCronTaskRun(state, result) in applyOutcomeToStoredJob; current main still has the old order, confirming the shipped/main solution is the broader reconciliation fix rather than this exact diff. (src/cron/service/timer.ts:585, 6d60b035b4e7)

So I’m closing this as already implemented rather than keeping a duplicate issue open.

Codex Review notes: model gpt-5.5, reasoning high; reviewed against 6d60b035b4e7; fix evidence: commit 6d60b035b4e7.

openclaw-barnacle Bot added the size: XS label Apr 26, 2026

Sanjays2402 mentioned this pull request Apr 26, 2026

Cron jobs with sessionTarget=isolated produce false lost errors after successful runs #71963

Closed

clawsweeper Bot closed this Apr 26, 2026

clawsweeper Bot mentioned this pull request May 11, 2026

[Bug] Cron tasks marked as lost after gateway restart — activeJobIds not persisted #79196

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cron): finalize task ledger before clearing active-jobs flag (#71963)#71968

fix(cron): finalize task ledger before clearing active-jobs flag (#71963)#71968
Sanjays2402 wants to merge 1 commit into
openclaw:mainfrom
Sanjays2402:fix/71963-cron-isolated-false-lost

Sanjays2402 commented Apr 26, 2026

Uh oh!

greptile-apps Bot commented Apr 26, 2026

Uh oh!

clawsweeper Bot commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Sanjays2402 commented Apr 26, 2026

What

Why this matches the report

Scope

What this does NOT fix

Test

Uh oh!

greptile-apps Bot commented Apr 26, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

clawsweeper Bot commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant