fix(subagent): include role, session key, and timing in error payloads#68726
fix(subagent): include role, session key, and timing in error payloads#68726gumadeiras merged 8 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR enriches error payloads from Confidence Score: 5/5Safe to merge; changes are purely additive with no control-flow modifications. Both findings are P2 style observations. Neither represents a defect or behavioral regression. The implementation is correct and well-guarded. No files require special attention. Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/subagent-announce-output.ts
Line: 324-325
Comment:
**Timing also applied to "ok" outcomes (undocumented scope)**
The PR description and root-cause section say timing fields are added to "timeout and error outcomes," but the implementation spreads `timing` into all three branches, including `status: "ok"`. This is strictly additive and arguably more useful, but it's worth noting the actual scope is broader than the description to avoid surprises for callers that pattern-match on status to decide whether to inspect timing fields.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/tools/sessions-spawn-tool.ts
Line: 316-320
Comment:
**`role` added to any non-`"accepted"` status, not just errors**
`result.status !== "accepted"` also covers any other non-error terminal statuses (`"queued"`, `"pending"`, etc.) that `spawnAcpDirect` might return now or in the future. If the intent is specifically to enrich failure payloads, a tighter condition (`result.status === "error"`) would be more explicit and resilient to new status codes being introduced upstream. The same pattern appears on the subagent path at line 358.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Merge branch 'main' into fix/subagent-er..." | Re-trigger Greptile |
|
Thanks @greptile — addressed both: Re: timing on Re: |
|
Both look good. The explicit On timing for Changes LGTM. Tip: You can customize Greptile's behavior for this repo with |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 92c4c5dc0c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Error payloads from sessions_spawn and subagent wait outcomes now carry the context a parent needs to retry or report clearly: - sessions-spawn-tool: add role (requested agentId) to early validation errors, to the ACP register-failure payload, and to forwarded error results from both the ACP and subagent spawn paths. childSessionKey and runId are already populated by the inner spawn for the errors that know them; this just plumbs role through alongside. - subagent-announce-output: extend SubagentRunOutcome with optional startedAt/endedAt/elapsedMs and populate them in applySubagentWaitOutcome so timeout and error outcomes convey how long the child ran before failing. Scoped verification: tsgo:core, tsgo:core:test, and 43 targeted tests in src/agents (sessions-spawn-tool, subagent-registry lifecycle retry grace, subagent-announce timeout, subagent-announce, and capture-completion-reply) all green. Repo-wide pnpm check is red on latest origin/main for unrelated extensions/discord and extensions/qa-lab surfaces (missing @buape/carbon and @copilotkit/aimock members); not addressed here.
Codex review on PR 68726 noted the role context was only merged when status === "error", but both ACP and subagent spawns return status: "forbidden" for real failures the parent most needs to identify in parallel spawns: ACP disabled, runtime policy, agent allowlist denial, depth limit, max children, requireAgentId, and sandbox policy. Switch both forwarded result sites to "status !== accepted" so forbidden also carries the requested role.
…ing accepted Greptile review on PR 68726 flagged that result.status !== "accepted" would silently enroll any future non-accepted status (e.g. queued/pending) into the role-augmentation path. Switch both the ACP and subagent forward sites to an explicit enumeration of (error | forbidden) so new upstream statuses must opt in deliberately. Behavior unchanged for today's closed union.
3dcfc05 to
55c7561
Compare
|
Merged via squash.
Thanks @BKF-Gitty! |
Aligns the steer-restart killed-run test with the timing fields added to subagent run outcomes in #68726. The production code now returns startedAt/endedAt/elapsedMs alongside status and error on the error outcome, but this test's toEqual still asserted only status+error, so it has been failing on main since #68726 landed. Uses the same expect.any(Number) matcher already in use a few lines below for the ended hook payload.
…and #65986 (#69173) * test(agents): expect timing fields in killed-run outcome Aligns the steer-restart killed-run test with the timing fields added to subagent run outcomes in #68726. The production code now returns startedAt/endedAt/elapsedMs alongside status and error on the error outcome, but this test's toEqual still asserted only status+error, so it has been failing on main since #68726 landed. Uses the same expect.any(Number) matcher already in use a few lines below for the ended hook payload. * test(gateway): register ops agent in sessions.create task-start test The "sessions.create can start the first agent turn from an initial task" test triggers the auto chat.send path by passing `task:`. After #65986 added a deleted-agent guard to chat.send, an unregistered `ops` agent triggers the reject path and the auto-started run never happens, so runStarted comes back false. Register `ops` via testState.agentsConfig (matching the pattern already used by other ops-agent tests in this file) so the guard lets chat.send through and the first turn starts as expected. --------- Co-authored-by: Omar Shahine <10343873+omarshahine@users.noreply.github.com>
Error payloads from sessions_spawn and subagent wait outcomes now carry the context a parent needs to retry or report clearly:
Scoped verification: tsgo:core, tsgo:core:test, and 43 targeted tests in src/agents (sessions-spawn-tool, subagent-registry lifecycle retry grace, subagent-announce timeout, subagent-announce, and capture-completion-reply) all green. Repo-wide pnpm check is red on latest origin/main for unrelated extensions/discord and extensions/qa-lab surfaces (missing @buape/carbon and @copilotkit/aimock members); not addressed here.
Summary
{status: "error", error: "<message>"}with no subagent identifier and no duration. The parent cannot re-target the specific child session, correlate failures across concurrent subagents, or distinguish "timed out after real work" from "failed instantly."Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause (if applicable)
{status, error}without forwarding the spawn-time context that was already in scope (requestedAgentId). Separately, applySubagentWaitOutcome computed startedAt/endedAt on the wrapper object but never embedded them on the outcome object that downstream consumers read.Regression Test Plan (if applicable)
User-visible / Behavior Changes
Additive only. Error payloads emitted by the sessions_spawn tool may now include a
rolefield (attached toerrorandforbiddenresults when anagentIdwas requested), andSubagentRunOutcomevalues may now includestartedAt,endedAt, andelapsedMs. Timing fields are populated on all three outcome branches (ok,timeout,error) when the timestamps are known — not only failures — since duration is also useful for success-case stats. No existing field is removed or renamed; no status codes change.Diagram (if applicable)
Security Impact (required)
rolefield is the same agentId the caller passed in as input; no new data crosses a trust boundary.Repro + Verification
Environment
Steps
pnpm tsgo:coreandpnpm tsgo:core:test.pnpm test src/agents/tools/sessions-spawn-tool.test.ts src/agents/subagent-registry.lifecycle-retry-grace.e2e.test.ts src/agents/subagent-announce.timeout.test.ts src/agents/subagent-announce.test.ts src/agents/subagent-announce.capture-completion-reply.test.ts.Expected
Actual
Evidence
Relevant scoped-test output (trimmed):
Repo-wide pnpm check was red on latest origin/main before this branch existed, entirely in extensions/discord/** and extensions/qa-lab/** (missing @buape/carbon exports and @copilotkit/aimock module). Not in scope here.
Human Verification (required)
roleContextspread is a no-op; startedAt/endedAt missing → timing object is empty and not spread onto outcome;result.status !== "error"in the forwarded jsonResult path → original result is returned unmodified.Review Conversations
Compatibility / Migration
Risks and Mitigations
🤖 Generated with Claude Code