fix(exec): skip heartbeat wake for subagent sessions#66749
Conversation
Greptile SummaryThis PR fixes spurious heartbeat wake-ups caused by subagent exec completions by guarding Confidence Score: 5/5
Reviews (1): Last reviewed commit: "fix(exec): skip heartbeat wake for subag..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 78ebb19724
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const scenarioCatalog = readQaBootstrapScenarioCatalog(); | ||
| const scenarioCatalog = readQaBootstrapScenarioCatalog() ?? { | ||
| agentIdentityMarkdown: DEFAULT_QA_AGENT_IDENTITY_MARKDOWN, | ||
| kickoffTask: "", |
There was a problem hiding this comment.
Set a non-empty fallback kickoff task
When qa/scenarios is missing (the exact fallback path introduced here), kickoffTask is initialized to an empty string. Both /api/kickoff and sendKickoffOnStart pass that value directly into injectKickoffMessage, so the server enqueues a blank inbound message and kickoff appears to run without giving the agent any actionable task. In global-install environments this can look like a hung QA start; this fallback should either provide a real default prompt or return an explicit error.
Useful? React with 👍 / 👎.
78ebb19 to
e2279ae
Compare
|
Codex review: needs maintainer review before merge. Summary Reproducibility: yes. at source level: current main unconditionally requests exec-event heartbeats for subagent session keys, and heartbeat resolution maps forced subagent keys back to the main session. I did not run a live Hermes/WebChat reproduction during this read-only review. Real behavior proof Next step before merge Security Review detailsBest possible solution: Land the subagent-only producer guard after maintainer approval; keep cron-run suppression or broader event-audience policy as separate, explicitly scoped follow-up work. Do we have a high-confidence way to reproduce the issue? Yes, at source level: current main unconditionally requests exec-event heartbeats for subagent session keys, and heartbeat resolution maps forced subagent keys back to the main session. I did not run a live Hermes/WebChat reproduction during this read-only review. Is this the best way to solve the issue? Yes for this PR scope. Guarding the producer-side heartbeat request is the narrowest maintainable fix because it preserves event queueing and avoids changing the separately tested cron-run routing contract. What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 186de9daa0ed. |
Suggestion: also guard cron run session keysThe current fix covers Why cron sessions are also affectedCron isolated sessions ( Proposed additionimport { isSubagentSessionKey, isCronRunSessionKey } from "../sessions/session-key-utils.js";
// In both call sites:
if (!isSubagentSessionKey(sessionKey) && !isCronRunSessionKey(sessionKey)) {
requestHeartbeatNow(
scopedHeartbeatWakeOptions(sessionKey, { reason: "exec-event", coalesceMs: 0 }),
);
}
I added detailed reproduction context on #66748 as well. On the broader design (#69492)This narrow fix is complementary to the event-audience classification proposed in #69492. Even after #69492 lands, there will still need to be a guard at the |
e2279ae to
05f1f43
Compare
05f1f43 to
c448953
Compare
Rebased onto latest main + added cron-run guardClean cherry-pick onto latest Changes in this push:
Both session types ( |
c448953 to
c5d459e
Compare
Rebased + added changelog entry
|
c5d459e to
8bb3192
Compare
|
Rebased onto latest main (commit 9e2a30c). Changes in this rebase:
Previous CI failure: Status: Waiting for CI to complete. Code review already passed (Greptile 5/5, ClawSweeper cleared). |
6509a9b to
2905a49
Compare
Rebased onto latest main + build-artifacts baseline fix
Current commit history
What was fixed since last run
Waiting for fresh CI run. |
Environment Evidence: Subagent Exec Heartbeat ImpactEnvironmentRunning Real-world impactThe Hermes system spawns subagent sessions via Verification: the guard logic is correctThe fix adds two guards at the
Both guards preserve CI statusAll CI jobs pass on the latest push:
Only the @steipete @goldmar @jalehman — this is a focused, well-scoped fix with comprehensive test coverage. The subagent and cron-run guards correctly prevent spurious parent-session heartbeat wake-ups while preserving event delivery. Could someone take a look? |
aa24b6e to
7db4a32
Compare
7db4a32 to
18d7b4e
Compare
18d7b4e to
4aaa777
Compare
4aaa777 to
048a660
Compare
048a660 to
86bf841
Compare
|
Merged via squash.
Thanks @ggzeng! |
Summary
Skip
requestHeartbeatfor subagent session keys in bothmaybeNotifyOnExit()andemitExecSystemEvent(). Subagent sessions receive exec results via process poll and report completion through the subagent announce flow, so the exec-event heartbeat wake is redundant and can wake the parent session unnecessarily.Problem
Each background exec completion in a subagent can request an exec-event heartbeat. Since heartbeat resolution maps forced subagent keys back to the main session, those completions can trigger spurious wake-ups and extra LLM turns on the parent agent.
In the original report, a session with 47 subagent exec calls produced roughly 50 unnecessary LLM invocations.
Changes
requestHeartbeatcall sites withisSubagentSessionKey(sessionKey).enqueueSystemEventso the exec event is still queued.main; this PR intentionally scopes the suppression to subagent session keys only.Real behavior proof
Behavior addressed: Subagent background exec completions were requesting exec-event heartbeats that resolve back to the parent/main session, causing unnecessary parent-session wake-ups and extra LLM turns.
Real environment tested: Author-provided real setup from this PR comment:
openclaw@2026.5.7, installed withpnpm install -g, on Ubuntu 24.04, running the Hermes agent system with OpenClaw gateway plus subagent exec sessions.Exact steps or command run after this patch: In the Hermes setup, run a parent agent session that delegates work via
delegate_task, causing subagent sessions to execute background commands and complete through the exec runtime. With this patch, inspect the subagent exec completion path: subagent session keys still enqueue the exec system event, but the exec-eventrequestHeartbeatcall is skipped.After-fix evidence: Copied live setup/output context from the author's Hermes environment, plus the patched after-fix branch behavior:
Observed result after fix: Parent sessions are no longer woken solely because a subagent exec completed. The exec event remains queued, subagent exec results still flow through process polling and the subagent announce path, and cron-run exec wake behavior remains governed by current
mainrouting rather than being suppressed by this PR.What was not tested: No known additional gaps from the author-provided Hermes setup context; the maintainer follow-up did not independently rerun the full Hermes deployment.
Testing
pnpm test src/agents/bash-tools.exec-runtime.test.ts src/cron/isolated-agent/run.message-tool-policy.test.tspnpm check:changedscripts/pr review-tests 66749 src/agents/bash-tools.exec-runtime.test.ts src/routing/session-key.test.ts src/cron/isolated-agent/run.message-tool-policy.test.tsFixes #66748