Skip to content

feat: session idle nudge — prevent silent hangs (#71)#29199

Closed
RecursiveRabbit wants to merge 1 commit intoopenclaw:mainfrom
RecursiveRabbit:voss/idle-nudge-71
Closed

feat: session idle nudge — prevent silent hangs (#71)#29199
RecursiveRabbit wants to merge 1 commit intoopenclaw:mainfrom
RecursiveRabbit:voss/idle-nudge-71

Conversation

@RecursiveRabbit
Copy link
Copy Markdown

Session idle nudge

When a non-main session (subagent, cron, ticket) goes idle for 5+ minutes with no active run, a new agent turn is triggered with a nudge message prompting the agent to wrap up or continue.

How it works

  • Periodic sweep piggybacked on the cron timer tick (like the session reaper)
  • Scans session store for eligible sessions past the idle threshold
  • Skips sessions with active embedded runs
  • Skips sessions where the last assistant message ends with END
  • Caps at 3 nudges per session
  • Session key prefix is stripped so the nudge runs on the existing session, not a new one

Files

  • prompts/idle-nudge.md — nudge message text (edit and restart to revise wording)
  • src/cron/idle-nudge.ts — sweep logic + config resolution
  • src/cron/idle-nudge.test.ts — 16 tests
  • src/cron/service/timer.ts — sweep call after session reaper
  • src/cron/service/state.ts — new deps
  • src/gateway/server-cron.ts — wires deps with session key parsing
  • src/config/types.agent-defaults.tsidleNudge config type

Config

Enabled by default (5 min). Override via agents.defaults.idleNudge:

{ "idleNudge": false }
{ "idleNudge": 600000 }
{ "idleNudge": { "idleMs": 300000, "message": "...", "maxNudges": 5 } }

Closes #71

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime docker Docker and sandbox tooling agents Agent runtime and tooling size: L labels Feb 27, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Feb 27, 2026

Greptile Summary

Implements session idle nudge feature to prevent silent hangs in subagent, cron, and ticket sessions. When a non-main session is idle for 5+ minutes (configurable), the system triggers a new agent turn with a nudge message. The implementation includes comprehensive test coverage (16 tests), flexible configuration options, and intelligent filtering (skips sessions with active runs, sessions ending in END, and respects maxNudges limit).

Key implementation details:

  • Piggybacks on existing cron timer sweep for efficiency
  • Tracks nudge counts per session to prevent spam
  • Parses session transcripts to detect completed sessions (ending in END)
  • Strips session key prefix when nudging to run on existing session context
  • Includes fallback hardcoded message if prompt file is unavailable

Note: PR includes unrelated changes to src/agents/sandbox-paths.ts removing symlink escape validation. While these changes appear intentional based on commit history, they're not mentioned in the PR description and seem unrelated to the idle nudge feature.

Confidence Score: 4/5

  • This PR is safe to merge with low risk
  • Score reflects high-quality implementation with comprehensive test coverage (16 tests) and thorough edge case handling. The feature is well-integrated into existing infrastructure and includes proper error handling. Minor deduction due to unrelated sandbox security changes included without explicit mention in PR description.
  • Review src/agents/sandbox-paths.ts — contains unrelated symlink security changes that should be documented in PR description

Last reviewed commit: e84a88f

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e84a88f830

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/sandbox-paths.ts Outdated
Comment on lines +120 to +123
if (options?.allowFinalSymlink && isLast) {
return;
}
const target = await tryRealpath(current);
if (!isPathInside(rootReal, target)) {
throw new Error(
`Symlink escapes sandbox root (${shortPath(rootReal)}): ${shortPath(current)}`,
);
}
current = target;
current = await tryRealpath(current);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore sandbox root check after resolving symlinks

This change resolves symlinks and keeps walking without verifying that the resolved path is still inside root, which reopens a workspace escape: with workspaceOnly enabled, tools that call assertSandboxPath (for example apply_patch) can follow a symlink in the workspace and write/delete files outside the workspace tree. That breaks the documented workspace-only isolation and allows unintended host file modification whenever such a symlink exists.

Useful? React with 👍 / 👎.

Comment thread src/cron/idle-nudge.ts
Comment on lines +199 to +202
if (!params.force && now - lastSweepAtMs < MIN_SWEEP_INTERVAL_MS) {
return { swept: false, nudged: 0 };
}
lastSweepAtMs = now;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scope idle-nudge throttle per session store path

The sweep throttle uses a single global lastSweepAtMs, but onTimer invokes sweepIdleSessions once per storePath in the same tick. After the first store is processed, subsequent stores in that loop immediately hit the throttle and return unswept, so in multi-agent setups only the first store path gets nudged while other agents' session stores are effectively starved.

Useful? React with 👍 / 👎.

Comment thread src/cron/idle-nudge.ts
Comment on lines +271 to +273
for (const key of nudgeCounts.keys()) {
if (!store[key]) {
nudgeCounts.delete(key);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid deleting nudge counters for other stores

The nudge counter map is global across calls, but this cleanup pass compares keys only against the currently loaded store and deletes everything not present there. When multiple session stores are swept, counts for sessions in other stores get wiped, so maxNudges is not enforced consistently and those sessions can be nudged more times than configured.

Useful? React with 👍 / 👎.

Periodic sweep in the cron timer tick detects non-main sessions
(subagent, cron, ticket) that have been idle longer than 5 minutes
with no active run. Triggers a new agent turn with a nudge message
prompting the agent to wrap up or continue.

- Skips sessions with active embedded runs
- Skips sessions where the last assistant message ends with END
- Caps at 3 nudges per session to avoid infinite loops
- Nudge message lives in prompts/idle-nudge.md for easy iteration
- Configurable via agents.defaults.idleNudge in openclaw.json
- Session key prefix stripping ensures nudge runs on the existing
  session, not a new one
@openclaw-barnacle openclaw-barnacle Bot removed docker Docker and sandbox tooling agents Agent runtime and tooling labels Feb 27, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd3c1d4372

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cron/idle-nudge.ts
Comment on lines +148 to +149
if (entry.type !== "message" || entry.message?.role !== "assistant") {
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate END suppression on the latest transcript message

sessionEndedWithEND skips over newer non-assistant records and returns based on the first assistant message it finds while scanning backward, so an older END still suppresses nudges even after a later user message reopened the session. In that case, if the follow-up turn hangs, the idle nudge never fires because the stale END is treated as terminal state forever.

Useful? React with 👍 / 👎.

Comment thread src/cron/idle-nudge.ts
Comment on lines +33 to +34
path.dirname(new URL(import.meta.url).pathname),
"../../prompts/idle-nudge.md",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve prompt path with fileURLToPath

Building the prompt path from new URL(import.meta.url).pathname is platform-fragile: pathname keeps URL encoding and produces malformed drive-rooted paths on Windows, so prompts/idle-nudge.md can fail to load when the install path contains spaces/non-ASCII or on Windows hosts. That silently forces the hardcoded fallback and prevents configured prompt-file wording from being applied.

Useful? React with 👍 / 👎.

Comment thread src/cron/service/timer.ts
Comment on lines +444 to +447
for (const storePath of storePaths) {
try {
await sweepIdleSessions({
sessionStorePath: storePath,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Sweep idle sessions for agents without cron jobs

The idle-nudge sweep iterates only the storePaths set assembled from cron job agent IDs (plus default), so agent session stores that currently have no cron jobs are never scanned. In multi-agent deployments this leaves idle subagent/ticket sessions in those agents permanently unnudged, which defeats the feature for exactly the sessions most likely to hang silently.

Useful? React with 👍 / 👎.

@openclaw-barnacle
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Mar 5, 2026
emcc24444 added a commit to emcc24444/openclaw that referenced this pull request Mar 11, 2026
Add escalating stall detection for sub-agent runs:
- Nudge: after configurable idle time (default 90s), inject a system
  message prompting the agent to resume or report blockers
- Kill: after continued inactivity post-nudge (default 180s), auto-kill
  the run and notify the parent session with stall context

Config (agents.defaults.subagents):
- stallNudgeAfterSeconds (default: 90, 0 = disabled)
- stallKillAfterSeconds (default: 180, 0 = disabled)
Both available as per-spawn overrides on sessions_spawn.

Implementation:
- Track tool activity (start/end events) on SubagentRunRecord via
  ensureListener() in subagent-registry
- checkStallRecovery() called every 60s from the existing sweeper
- Nudge delivery: queueEmbeddedPiMessage (preferred), fallback to
  callGateway({ method: 'agent' })
- Kill uses completeSubagentRun with SUBAGENT_ENDED_REASON_KILLED for
  proper announce flow + abortEmbeddedPiRun for cleanup
- Tool activity resets stall timer and clears nudge state (recovery)
- Stall state reset on steer-restart (replaceSubagentRunAfterSteer)
- Sweeper auto-starts when stall config is non-zero

Tests: 14 cases covering config validation, timing, tool activity
reset, nudge/kill lifecycle, disabled config, and delivery fallback.

Refs: openclaw#23867, openclaw#5551, openclaw#29199, openclaw#38303, openclaw#39127, openclaw#39141
@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Mar 28, 2026
@openclaw-barnacle
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Apr 4, 2026
@openclaw-barnacle
Copy link
Copy Markdown

Closing due to inactivity.
If you believe this PR should be revived, post in #pr-thunderdome-dangerzone on Discord to talk to a maintainer.
That channel is the escape hatch for high-quality PRs that get auto-closed.

@openclaw-barnacle openclaw-barnacle Bot closed this Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime size: L stale Marked as stale due to inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant