[Search Subagent] Handle context window limit exceeded error#316529
Conversation
There was a problem hiding this comment.
Pull request overview
Improves resilience of the Copilot search subagent loop when the rendered prompt exceeds the model context window by proactively rendering against a safer effective budget and reactively re-rendering with progressively smaller budgets on detected context-overflow failures.
Changes:
- Adjust prompt rendering budget in
buildPromptby subtracting tool-definition tokens, applying a safety margin, and rendering with an endpoint cloned to that effective budget. - Add context-overflow detection and retry logic in
fetchthat re-renders with progressively smaller budgets before giving up. - Add unit tests for the overflow detection helper and the retry behavior.
Show a summary per file
| File | Description |
|---|---|
| extensions/copilot/src/extension/prompt/node/searchSubagentToolCallingLoop.ts | Adds proactive prompt-budget sizing, overflow detection, and reactive re-render/retry behavior. |
| extensions/copilot/src/extension/prompt/test/node/searchSubagentToolCallingLoop.spec.ts | Adds tests for isContextOverflowError and for fetch retry behavior on context overflow. |
Copilot's findings
Comments suppressed due to low confidence (2)
extensions/copilot/src/extension/prompt/test/node/searchSubagentToolCallingLoop.spec.ts:86
createMockChatRequestsetslocation: 1. Even though this currently corresponds toChatLocation.Panel, using the enum value directly avoids coupling the test to the numeric representation and makes intent clearer.
enableCommandDetection: false,
isParticipantDetected: false,
toolReferences: [],
toolInvocationToken: {} as ChatRequest['toolInvocationToken'],
extensions/copilot/src/extension/prompt/test/node/searchSubagentToolCallingLoop.spec.ts:128
successResponse()builds a partial success object and then casts withas unknown as ChatResponse. This weakens the test’s type-safety and can hide breaking changes to theChatResponseshape (e.g. requiredresolvedModel/usage). Prefer constructing a fully-typedChatResponseobject instead of double-casting.
requestId: 'req-ok',
serverRequestId: undefined,
} as unknown as ChatResponse;
}
- Files reviewed: 2/2 changed files
- Comments generated: 1
|
This feels a bit more complicated than it needs to be. Could we lean on what the main agent loop already does?
|
bhavyaus
left a comment
There was a problem hiding this comment.
Could you also ensure this works well with the autopilot mode? This new error looks like it falls through and cause additional retries by the main agent if we still choose to pursue this solution.
3b49bfd to
01e4b4f
Compare
I think the goal for this PR is that we don't want to surface context limit exceeded requests from the subagent to the main agent, and in that case I think a reactionary fallback would make sense; If we continue with this approach, would it be better to only retry once on 400 before returning a benign message to the main agent (or is there a cleaner solution)? |
Let's go with a single retry. Concretely: render proactively at 0.9 (matching agentIntent.ts), and on a context-overflow 400 do one retry at an aggressively smaller factor (~0.5) before falling back. skip the 0.66rung, since if 0.9 overflowed a small step down usually won't fit either. Then the benign fallback. That keeps it close to the main path. Also, one more thing to check in autopilot mode - when the loop returns the exhausted overflow BadRequest, the subagent's own ToolCallingLoop will still auto-retry it ~3× under autopilot before the wrapper converts it. Can we skip auto-retry for context-overflow BadRequest (override shouldAutoRetry or run the subagent loop at a non-autopilot permission level)? Longer term, if 0.9 still isn't wide enough, tuning the prompt-tsx priorities to prune tool-call history harder is ok to me. Your telemetry should tell us whether the single retry actually recovers anything or if we should go that route. |

Handle context-window overflow in the search subagent. The subagent's prompt can exceed the model's context after a few tool-call rounds, returning a 400 context_length_exceeded. When the main agent runs in autopilot, that error also drives a full-turn retry, making things worse.
Changes to SearchSubagentToolCallingLoop:
Changes to SearchSubagentTool (autopilot-safe wrapper):
Tests: