feat: session idle nudge — prevent silent hangs (#71)#29199
feat: session idle nudge — prevent silent hangs (#71)#29199RecursiveRabbit wants to merge 1 commit intoopenclaw:mainfrom
Conversation
Greptile SummaryImplements session idle nudge feature to prevent silent hangs in subagent, cron, and ticket sessions. When a non-main session is idle for 5+ minutes (configurable), the system triggers a new agent turn with a nudge message. The implementation includes comprehensive test coverage (16 tests), flexible configuration options, and intelligent filtering (skips sessions with active runs, sessions ending in END, and respects maxNudges limit). Key implementation details:
Note: PR includes unrelated changes to Confidence Score: 4/5
Last reviewed commit: e84a88f |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e84a88f830
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (options?.allowFinalSymlink && isLast) { | ||
| return; | ||
| } | ||
| const target = await tryRealpath(current); | ||
| if (!isPathInside(rootReal, target)) { | ||
| throw new Error( | ||
| `Symlink escapes sandbox root (${shortPath(rootReal)}): ${shortPath(current)}`, | ||
| ); | ||
| } | ||
| current = target; | ||
| current = await tryRealpath(current); |
There was a problem hiding this comment.
Restore sandbox root check after resolving symlinks
This change resolves symlinks and keeps walking without verifying that the resolved path is still inside root, which reopens a workspace escape: with workspaceOnly enabled, tools that call assertSandboxPath (for example apply_patch) can follow a symlink in the workspace and write/delete files outside the workspace tree. That breaks the documented workspace-only isolation and allows unintended host file modification whenever such a symlink exists.
Useful? React with 👍 / 👎.
| if (!params.force && now - lastSweepAtMs < MIN_SWEEP_INTERVAL_MS) { | ||
| return { swept: false, nudged: 0 }; | ||
| } | ||
| lastSweepAtMs = now; |
There was a problem hiding this comment.
Scope idle-nudge throttle per session store path
The sweep throttle uses a single global lastSweepAtMs, but onTimer invokes sweepIdleSessions once per storePath in the same tick. After the first store is processed, subsequent stores in that loop immediately hit the throttle and return unswept, so in multi-agent setups only the first store path gets nudged while other agents' session stores are effectively starved.
Useful? React with 👍 / 👎.
| for (const key of nudgeCounts.keys()) { | ||
| if (!store[key]) { | ||
| nudgeCounts.delete(key); |
There was a problem hiding this comment.
Avoid deleting nudge counters for other stores
The nudge counter map is global across calls, but this cleanup pass compares keys only against the currently loaded store and deletes everything not present there. When multiple session stores are swept, counts for sessions in other stores get wiped, so maxNudges is not enforced consistently and those sessions can be nudged more times than configured.
Useful? React with 👍 / 👎.
Periodic sweep in the cron timer tick detects non-main sessions (subagent, cron, ticket) that have been idle longer than 5 minutes with no active run. Triggers a new agent turn with a nudge message prompting the agent to wrap up or continue. - Skips sessions with active embedded runs - Skips sessions where the last assistant message ends with END - Caps at 3 nudges per session to avoid infinite loops - Nudge message lives in prompts/idle-nudge.md for easy iteration - Configurable via agents.defaults.idleNudge in openclaw.json - Session key prefix stripping ensures nudge runs on the existing session, not a new one
e84a88f to
cd3c1d4
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cd3c1d4372
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (entry.type !== "message" || entry.message?.role !== "assistant") { | ||
| continue; |
There was a problem hiding this comment.
Gate END suppression on the latest transcript message
sessionEndedWithEND skips over newer non-assistant records and returns based on the first assistant message it finds while scanning backward, so an older END still suppresses nudges even after a later user message reopened the session. In that case, if the follow-up turn hangs, the idle nudge never fires because the stale END is treated as terminal state forever.
Useful? React with 👍 / 👎.
| path.dirname(new URL(import.meta.url).pathname), | ||
| "../../prompts/idle-nudge.md", |
There was a problem hiding this comment.
Resolve prompt path with fileURLToPath
Building the prompt path from new URL(import.meta.url).pathname is platform-fragile: pathname keeps URL encoding and produces malformed drive-rooted paths on Windows, so prompts/idle-nudge.md can fail to load when the install path contains spaces/non-ASCII or on Windows hosts. That silently forces the hardcoded fallback and prevents configured prompt-file wording from being applied.
Useful? React with 👍 / 👎.
| for (const storePath of storePaths) { | ||
| try { | ||
| await sweepIdleSessions({ | ||
| sessionStorePath: storePath, |
There was a problem hiding this comment.
Sweep idle sessions for agents without cron jobs
The idle-nudge sweep iterates only the storePaths set assembled from cron job agent IDs (plus default), so agent session stores that currently have no cron jobs are never scanned. In multi-agent deployments this leaves idle subagent/ticket sessions in those agents permanently unnudged, which defeats the feature for exactly the sessions most likely to hang silently.
Useful? React with 👍 / 👎.
|
This pull request has been automatically marked as stale due to inactivity. |
Add escalating stall detection for sub-agent runs:
- Nudge: after configurable idle time (default 90s), inject a system
message prompting the agent to resume or report blockers
- Kill: after continued inactivity post-nudge (default 180s), auto-kill
the run and notify the parent session with stall context
Config (agents.defaults.subagents):
- stallNudgeAfterSeconds (default: 90, 0 = disabled)
- stallKillAfterSeconds (default: 180, 0 = disabled)
Both available as per-spawn overrides on sessions_spawn.
Implementation:
- Track tool activity (start/end events) on SubagentRunRecord via
ensureListener() in subagent-registry
- checkStallRecovery() called every 60s from the existing sweeper
- Nudge delivery: queueEmbeddedPiMessage (preferred), fallback to
callGateway({ method: 'agent' })
- Kill uses completeSubagentRun with SUBAGENT_ENDED_REASON_KILLED for
proper announce flow + abortEmbeddedPiRun for cleanup
- Tool activity resets stall timer and clears nudge state (recovery)
- Stall state reset on steer-restart (replaceSubagentRunAfterSteer)
- Sweeper auto-starts when stall config is non-zero
Tests: 14 cases covering config validation, timing, tool activity
reset, nudge/kill lifecycle, disabled config, and delivery fallback.
Refs: openclaw#23867, openclaw#5551, openclaw#29199, openclaw#38303, openclaw#39127, openclaw#39141
|
This pull request has been automatically marked as stale due to inactivity. |
|
Closing due to inactivity. |
Session idle nudge
When a non-main session (subagent, cron, ticket) goes idle for 5+ minutes with no active run, a new agent turn is triggered with a nudge message prompting the agent to wrap up or continue.
How it works
ENDFiles
prompts/idle-nudge.md— nudge message text (edit and restart to revise wording)src/cron/idle-nudge.ts— sweep logic + config resolutionsrc/cron/idle-nudge.test.ts— 16 testssrc/cron/service/timer.ts— sweep call after session reapersrc/cron/service/state.ts— new depssrc/gateway/server-cron.ts— wires deps with session key parsingsrc/config/types.agent-defaults.ts—idleNudgeconfig typeConfig
Enabled by default (5 min). Override via
agents.defaults.idleNudge:{ "idleNudge": false } { "idleNudge": 600000 } { "idleNudge": { "idleMs": 300000, "message": "...", "maxNudges": 5 } }Closes #71