fix: cron failureAlert fires correctly on error and skipped runs by lml2468 · Pull Request #60876 · openclaw/openclaw

lml2468 · 2026-04-04T13:28:43Z

Summary

Two related cron monitoring failures fixed in applyJobResult (src/cron/service/timer.ts).

Fix: #60845 — `failureAlert` never fires on error runs

Root cause: The local fork added an extra condition to the isBestEffort guard in applyJobResult:

// Before (broken):
const isBestEffort =
  job.delivery?.bestEffort === true ||
  (job.payload.kind === "agentTurn" && job.payload.bestEffortDeliver === true);

payload.bestEffortDeliver is a legacy field that gates output delivery, not failure alerting. Any agentTurn job that had bestEffortDeliver: true in its payload (set during migration from the old top-level format) would silently skip failureAlert forever — consecutiveErrors incremented correctly but emitFailureAlert was never called.

Fix: Revert to the upstream guard:

// After (correct):
const isBestEffort = job.delivery?.bestEffort === true;

Fix: #60846 — `failureAlert` never evaluated for "skipped" runs

Root cause: The else branch in applyJobResult handled both "ok" and "skipped" results identically — resetting consecutiveErrors to 0 and never calling resolveFailureAlert. A job that is permanently stuck in "skipped" state (e.g. gateway-restart health-check jobs, jobs with empty systemEvent text) generated zero alerts regardless of failureAlert configuration.

Fix: Split "skipped" into its own branch with:

A new consecutiveSkips counter (mirrors consecutiveErrors)
A new lastSkipAlertAtMs cooldown timestamp (mirrors lastFailureAlertAtMs)
resolveFailureAlert evaluation against consecutiveSkips >= alertConfig.after
emitFailureAlert with isSkip: true for a distinct message ("skipped N times / Reason: ...")
consecutiveErrors + lastFailureAlertAtMs still reset on "skipped" (a skip is not an error)
Both counters reset on "ok" (clean run clears all alert state)

Files Changed

src/cron/service/timer.ts — emitFailureAlert + applyJobResult
src/cron/types.ts — added consecutiveSkips?: number and lastSkipAlertAtMs?: number to CronJobState
src/gateway/protocol/schema/cron.ts — added consecutiveSkips and lastSkipAlertAtMs to CronJobStateSchema

Fixes #60845
Fixes #60846

The config tamper detection audit log was recording full CLI argv including plaintext gateway tokens, bot tokens, and API keys. Now strips/redacts known secret patterns from argv before writing to config-audit.jsonl to prevent credential exposure at rest. Added redactArgv() helper function and SENSITIVE_ARGV_FLAGS set that covers: --token, --bot-token, --app-token, --access-token, --gateway-token, --password, --api-key, --secret, --secret-key, --secret-input. Both --flag value and --flag=value forms are handled. Fixes openclaw#60826

openclaw#60845: Fixed isBestEffort check in applyJobResult — the local fork had incorrectly included payload.bestEffortDeliver in the best-effort guard, which prevented failureAlert from firing on any agentTurn job with the legacy bestEffortDeliver=true payload field. The correct guard is job.delivery?.bestEffort only (gates output delivery, not failure alerting). openclaw#60846: Added consecutiveSkips tracking and failureAlert evaluation for 'skipped' run results. Previously, every 'skipped' run reset consecutiveErrors to zero and did not evaluate failureAlert at all, so jobs permanently stuck in 'skipped' state generated zero alerts. New state fields lastSkipAlertAtMs and consecutiveSkips mirror the error-alert mechanism; emitFailureAlert now produces a descriptive message for skip-based alerts ('skipped N times / Reason: ...'). Fixes openclaw#60845 Fixes openclaw#60846

greptile-apps · 2026-04-04T13:32:30Z

Greptile Summary

This PR fixes two failureAlert bugs in applyJobResult (src/cron/service/timer.ts): the isBestEffort guard no longer incorrectly checks payload.bestEffortDeliver (which gates output delivery, not alerting), and a new "skipped" branch tracks consecutiveSkips / lastSkipAlertAtMs so permanently-skipped jobs can now fire emitFailureAlert with a distinct "skipped N times" message. Corresponding type and schema additions are correctly placed in src/cron/types.ts and src/gateway/protocol/schema/cron.ts. The logic is sound, with only minor style suggestions around test coverage and parameter naming.

Confidence Score: 5/5

Safe to merge; all remaining findings are P2 style and test-coverage suggestions.

Both bug fixes are logically correct: the isBestEffort guard is now properly scoped, and the new skip-tracking branch correctly mirrors the error-tracking pattern with its own cooldown timestamp. Type and schema additions are aligned. The only gaps are missing regression tests for the two fixed bugs and a misleading parameter name in emitFailureAlert, both P2.

No files require special attention; the gateway protocol schema addition is additive and backward-compatible.

Comments Outside Diff (2)

src/cron/service.failure-alert.test.ts, line 270 (link)

Missing regression tests for both fixes

The PR fixes two distinct behavioral bugs, but neither has a covering test. The existing service.failure-alert.test.ts already tests the delivery.bestEffort: true suppression path, but there is no test for the payload.bestEffortDeliver: true case that was incorrectly suppressing alerts (cron: failureAlert never fires — all error jobs show deliveryStatus 'not-requested' #60845), and no test that asserts a "skipped" run eventually fires sendCronFailureAlert (cron: 'skipped' status bypasses failureAlert — persistently skipped jobs go undetected #60846). Without these, both regressions could silently reappear.

CLAUDE.md explicitly asks: "For narrowly scoped changes, prefer narrowly scoped tests that directly validate the touched behavior." Two cases worth adding:

it("#60845: alerts even when payload.bestEffortDeliver is true", async () => {
  // add job with payload: { kind: "agentTurn", message: "...", bestEffortDeliver: true }
  // delivery.bestEffort must be false (or omitted)
  // run until consecutiveErrors >= alertConfig.after → expect sendCronFailureAlert called
});

it("#60846: alerts after configured consecutive skips", async () => {
  // add job whose runIsolatedAgentJob returns { status: "skipped" }
  // run N times (>= alertConfig.after) → expect sendCronFailureAlert called with isSkip text
});

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/cron/service.failure-alert.test.ts
Line: 270

Comment:
**Missing regression tests for both fixes**

The PR fixes two distinct behavioral bugs, but neither has a covering test. The existing `service.failure-alert.test.ts` already tests the `delivery.bestEffort: true` suppression path, but there is no test for the `payload.bestEffortDeliver: true` case that was incorrectly suppressing alerts (#60845), and no test that asserts a `"skipped"` run eventually fires `sendCronFailureAlert` (#60846). Without these, both regressions could silently reappear.

`CLAUDE.md` explicitly asks: *"For narrowly scoped changes, prefer narrowly scoped tests that directly validate the touched behavior."* Two cases worth adding:

```ts
it("#60845: alerts even when payload.bestEffortDeliver is true", async () => {
  // add job with payload: { kind: "agentTurn", message: "...", bestEffortDeliver: true }
  // delivery.bestEffort must be false (or omitted)
  // run until consecutiveErrors >= alertConfig.after → expect sendCronFailureAlert called
});

it("#60846: alerts after configured consecutive skips", async () => {
  // add job whose runIsolatedAgentJob returns { status: "skipped" }
  // run N times (>= alertConfig.after) → expect sendCronFailureAlert called with isSkip text
});
```

How can I resolve this? If you propose a fix, please make it concise.

src/cron/service/timer.ts, line 248-258 (link)

Overloaded consecutiveErrors parameter name

The consecutiveErrors field in the emitFailureAlert params interface doubles as the skip count when isSkip: true (see line 390: consecutiveErrors: job.state.consecutiveSkips). The generated message text happens to be correct because isSkip selects the right template, but the parameter name implies error semantics and will confuse future readers/maintainers.

Consider renaming it to consecutiveCount (or count) to keep it neutral:

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/cron/service/timer.ts
Line: 248-258

Comment:
**Overloaded `consecutiveErrors` parameter name**

The `consecutiveErrors` field in the `emitFailureAlert` params interface doubles as the skip count when `isSkip: true` (see line 390: `consecutiveErrors: job.state.consecutiveSkips`). The generated message text happens to be correct because `isSkip` selects the right template, but the parameter name implies error semantics and will confuse future readers/maintainers.

Consider renaming it to `consecutiveCount` (or `count`) to keep it neutral:



How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/cron/service.failure-alert.test.ts
Line: 270

Comment:
**Missing regression tests for both fixes**

The PR fixes two distinct behavioral bugs, but neither has a covering test. The existing `service.failure-alert.test.ts` already tests the `delivery.bestEffort: true` suppression path, but there is no test for the `payload.bestEffortDeliver: true` case that was incorrectly suppressing alerts (#60845), and no test that asserts a `"skipped"` run eventually fires `sendCronFailureAlert` (#60846). Without these, both regressions could silently reappear.

`CLAUDE.md` explicitly asks: *"For narrowly scoped changes, prefer narrowly scoped tests that directly validate the touched behavior."* Two cases worth adding:

```ts
it("#60845: alerts even when payload.bestEffortDeliver is true", async () => {
  // add job with payload: { kind: "agentTurn", message: "...", bestEffortDeliver: true }
  // delivery.bestEffort must be false (or omitted)
  // run until consecutiveErrors >= alertConfig.after → expect sendCronFailureAlert called
});

it("#60846: alerts after configured consecutive skips", async () => {
  // add job whose runIsolatedAgentJob returns { status: "skipped" }
  // run N times (>= alertConfig.after) → expect sendCronFailureAlert called with isSkip text
});
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/cron/service/timer.ts
Line: 248-258

Comment:
**Overloaded `consecutiveErrors` parameter name**

The `consecutiveErrors` field in the `emitFailureAlert` params interface doubles as the skip count when `isSkip: true` (see line 390: `consecutiveErrors: job.state.consecutiveSkips`). The generated message text happens to be correct because `isSkip` selects the right template, but the parameter name implies error semantics and will confuse future readers/maintainers.

Consider renaming it to `consecutiveCount` (or `count`) to keep it neutral:

```suggestion
function emitFailureAlert(
  state: CronServiceState,
  params: {
    job: CronJob;
    error?: string;
    consecutiveCount: number;
    channel: CronMessageChannel;
    to?: string;
    mode?: "announce" | "webhook";
    accountId?: string;
    /** When true, the alert is for consecutive skipped runs rather than errors. */
    isSkip?: boolean;
  },
) {
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix: cron failureAlert now fires on erro..." | Re-trigger Greptile}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c0482bb526

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-04T13:34:06Z

+      const flag = arg.slice(0, eqIdx);
+      if (SENSITIVE_ARGV_FLAGS.has(flag)) {
+        out.push(`${flag}=[REDACTED]`);


Redact secret-suffixed CLI flags in argv audit logging

The redaction path only matches exact flag names, so sensitive options like --openai-api-key or --gateway-password are not sanitized and their plaintext values still get written to config-audit.jsonl. This happens because redactArgv checks SENSITIVE_ARGV_FLAGS.has(flag)/has(arg) with a fixed set containing --api-key and --password, but not the many real secret-bearing variants used by commands (for example onboarding/provider flags), which leaves a credential-at-rest exposure in the audit log.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-04T13:34:06Z

+  // Track consecutive errors/skips for backoff / auto-disable and failure alerts.
  if (result.status === "error") {
    job.state.consecutiveErrors = (job.state.consecutiveErrors ?? 0) + 1;
+    job.state.consecutiveSkips = 0;


Clear skip-alert cooldown when a run is not skipped

When an error run interrupts a skip streak, consecutiveSkips is reset to 0 but lastSkipAlertAtMs is left intact, so the next skip streak can be incorrectly suppressed by the previous streak's cooldown. For example with after: 1, a skipped -> error -> skipped sequence within cooldown will not alert on the second skip even though it is a new streak; this differs from the error-alert path, where non-error runs clear the cooldown timestamp.

Useful? React with 👍 / 👎.

X-MerlinLee added 2 commits April 4, 2026 21:13

openclaw-barnacle Bot added app: web-ui App: web-ui gateway Gateway runtime size: S labels Apr 4, 2026

chatgpt-codex-connector Bot reviewed Apr 4, 2026

View reviewed changes

lml2468 closed this Apr 12, 2026

steipete mentioned this pull request Apr 25, 2026

cron: failureAlert never fires — all error jobs show deliveryStatus 'not-requested' #60845

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: cron failureAlert fires correctly on error and skipped runs#60876

fix: cron failureAlert fires correctly on error and skipped runs#60876
lml2468 wants to merge 2 commits into
openclaw:mainfrom
lml2468:fix/issue-60845-60846

lml2468 commented Apr 4, 2026

Uh oh!

greptile-apps Bot commented Apr 4, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 4, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lml2468 commented Apr 4, 2026

Summary

Fix: #60845 — failureAlert never fires on error runs

Fix: #60846 — failureAlert never evaluated for "skipped" runs

Files Changed

Uh oh!

greptile-apps Bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Comments Outside Diff (2)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix: #60845 — `failureAlert` never fires on error runs

Fix: #60846 — `failureAlert` never evaluated for "skipped" runs

greptile-apps Bot commented Apr 4, 2026 •

edited

Loading