Skip to content

[Search Subagent] Handle context window limit exceeded error#316529

Merged
bhavyaus merged 9 commits into
microsoft:mainfrom
guomaggie:maggie/handle-context-limit-exceeded-error
May 27, 2026
Merged

[Search Subagent] Handle context window limit exceeded error#316529
bhavyaus merged 9 commits into
microsoft:mainfrom
guomaggie:maggie/handle-context-limit-exceeded-error

Conversation

@guomaggie
Copy link
Copy Markdown
Contributor

@guomaggie guomaggie commented May 15, 2026

Handle context-window overflow in the search subagent. The subagent's prompt can exceed the model's context after a few tool-call rounds, returning a 400 context_length_exceeded. When the main agent runs in autopilot, that error also drives a full-turn retry, making things worse.

Changes to SearchSubagentToolCallingLoop:

  • Proactive budget sizing in buildPrompt: subtracts tool-definition tokens and applies the current safety factor, then renders via endpoint.cloneWithTokenOverride so prompt-tsx prunes against a realistic budget. Mirrors agentIntent.ts.
  • Reactive shrink-retry in fetch: on a context-overflow BadRequest, advances _safetyFactorIndex through SAFETY_FACTORS = [0.9, 0.66, 0.4], re-renders, and retries. When the smallest factor is exhausted, the original error surfaces.
  • isContextOverflowBadRequest helper matches BadRequest reasons against known patterns (context_length_exceeded, context length, context window, maximum context, prompt is too long, request too large, request_too_large).
  • Telemetry: emits searchSubagent.contextOverflow with kind ('retried' | 'exhausted'), model, and safetyFactorIndex.

Changes to SearchSubagentTool (autopilot-safe wrapper):

  • If the loop exhausts its safety factors, the wrapper converts the failure to a benign CONTEXT_OVERFLOW_FALLBACK <final_answer> message instead of surfacing the BadRequest to the main agent. Prevents autopilot from retrying the whole turn.
  • A BudgetExceededError from prompt-tsx (render-time) is mapped to the same fallback. Other errors are rethrown.
  • Response dispatch extracted into pure helper mapLoopResponseToText(result) so it can be unit-tested without stubbing every service.

Tests:

  • New searchSubagentToolCallingLoop.spec.ts: isContextOverflowBadRequest reason matching + fetch shrink-retry loop behavior.
  • New tests in searchSubagentTool.spec.ts (success / overflow → fallback / non-overflow → error text).
  • New mockChatHookService.ts shared by these and existing autopilot/hooks specs.

Copilot AI review requested due to automatic review settings May 15, 2026 00:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves resilience of the Copilot search subagent loop when the rendered prompt exceeds the model context window by proactively rendering against a safer effective budget and reactively re-rendering with progressively smaller budgets on detected context-overflow failures.

Changes:

  • Adjust prompt rendering budget in buildPrompt by subtracting tool-definition tokens, applying a safety margin, and rendering with an endpoint cloned to that effective budget.
  • Add context-overflow detection and retry logic in fetch that re-renders with progressively smaller budgets before giving up.
  • Add unit tests for the overflow detection helper and the retry behavior.
Show a summary per file
File Description
extensions/copilot/src/extension/prompt/node/searchSubagentToolCallingLoop.ts Adds proactive prompt-budget sizing, overflow detection, and reactive re-render/retry behavior.
extensions/copilot/src/extension/prompt/test/node/searchSubagentToolCallingLoop.spec.ts Adds tests for isContextOverflowError and for fetch retry behavior on context overflow.

Copilot's findings

Comments suppressed due to low confidence (2)

extensions/copilot/src/extension/prompt/test/node/searchSubagentToolCallingLoop.spec.ts:86

  • createMockChatRequest sets location: 1. Even though this currently corresponds to ChatLocation.Panel, using the enum value directly avoids coupling the test to the numeric representation and makes intent clearer.
		enableCommandDetection: false,
		isParticipantDetected: false,
		toolReferences: [],
		toolInvocationToken: {} as ChatRequest['toolInvocationToken'],

extensions/copilot/src/extension/prompt/test/node/searchSubagentToolCallingLoop.spec.ts:128

  • successResponse() builds a partial success object and then casts with as unknown as ChatResponse. This weakens the test’s type-safety and can hide breaking changes to the ChatResponse shape (e.g. required resolvedModel/usage). Prefer constructing a fully-typed ChatResponse object instead of double-casting.
		requestId: 'req-ok',
		serverRequestId: undefined,
	} as unknown as ChatResponse;
}

  • Files reviewed: 2/2 changed files
  • Comments generated: 1

@guomaggie guomaggie marked this pull request as ready for review May 15, 2026 16:45
@osortega osortega assigned mjbvz and unassigned osortega May 15, 2026
@24anisha 24anisha assigned bhavyaus and 24anisha and unassigned mjbvz and bhavyaus May 18, 2026
@bhavyaus
Copy link
Copy Markdown
Collaborator

This feels a bit more complicated than it needs to be. Could we lean on what the main agent loop already does?

  • How often is ContextWindowExceeded actually hit by the search subagent in practice? It would help to see numbers before adding a retry mechanism for it.
  • Could the proactive budget alone be enough? The main agent loop does the same (modelMaxPromptTokens - toolTokens) * 0.9 + cloneWithTokenOverride math in agentIntent.ts` and then trusts prompt-tsx to fit — there is no reactive shrink-retry anywhere on the main path, and ContextWindowExceeded is treated as a genuine failure. If 0.9 isn't wide enough for the search subagent, dropping to 0.85/0.8 and tuning prompt-tsx priorities to prune tool-call history more aggressively seems like a simpler, more aligned solution than a second budget loop.

Copy link
Copy Markdown
Collaborator

@bhavyaus bhavyaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also ensure this works well with the autopilot mode? This new error looks like it falls through and cause additional retries by the main agent if we still choose to pursue this solution.

@24anisha
Copy link
Copy Markdown
Contributor

  • How often is ContextWindowExceeded actually hit by the search subagent in practice? It would help to see numbers before adding a retry mechanism for it.
Screenshot 2026-05-20 at 12 18 49 PM

We see this error crop up ~ 600 - 800 times per day, and especially in cases with more complicated, deep search requests. So, not prohibitively frequent, but enough that we'd like to resolve it

@guomaggie guomaggie force-pushed the maggie/handle-context-limit-exceeded-error branch from 3b49bfd to 01e4b4f Compare May 20, 2026 21:35
@guomaggie
Copy link
Copy Markdown
Contributor Author

This feels a bit more complicated than it needs to be. Could we lean on what the main agent loop already does?

  • How often is ContextWindowExceeded actually hit by the search subagent in practice? It would help to see numbers before adding a retry mechanism for it.
  • Could the proactive budget alone be enough? The main agent loop does the same (modelMaxPromptTokens - toolTokens) * 0.9 + cloneWithTokenOverride math in agentIntent.ts` and then trusts prompt-tsx to fit — there is no reactive shrink-retry anywhere on the main path, and ContextWindowExceeded is treated as a genuine failure. If 0.9 isn't wide enough for the search subagent, dropping to 0.85/0.8 and tuning prompt-tsx priorities to prune tool-call history more aggressively seems like a simpler, more aligned solution than a second budget loop.

I think the goal for this PR is that we don't want to surface context limit exceeded requests from the subagent to the main agent, and in that case I think a reactionary fallback would make sense; If we continue with this approach, would it be better to only retry once on 400 before returning a benign message to the main agent (or is there a cleaner solution)?

@guomaggie guomaggie requested a review from bhavyaus May 20, 2026 22:53
@bhavyaus
Copy link
Copy Markdown
Collaborator

This feels a bit more complicated than it needs to be. Could we lean on what the main agent loop already does?

  • How often is ContextWindowExceeded actually hit by the search subagent in practice? It would help to see numbers before adding a retry mechanism for it.
  • Could the proactive budget alone be enough? The main agent loop does the same (modelMaxPromptTokens - toolTokens) * 0.9 + cloneWithTokenOverride math in agentIntent.ts` and then trusts prompt-tsx to fit — there is no reactive shrink-retry anywhere on the main path, and ContextWindowExceeded is treated as a genuine failure. If 0.9 isn't wide enough for the search subagent, dropping to 0.85/0.8 and tuning prompt-tsx priorities to prune tool-call history more aggressively seems like a simpler, more aligned solution than a second budget loop.

I think the goal for this PR is that we don't want to surface context limit exceeded requests from the subagent to the main agent, and in that case I think a reactionary fallback would make sense; If we continue with this approach, would it be better to only retry once on 400 before returning a benign message to the main agent (or is there a cleaner solution)?

Let's go with a single retry. Concretely: render proactively at 0.9 (matching agentIntent.ts), and on a context-overflow 400 do one retry at an aggressively smaller factor (~0.5) before falling back. skip the 0.66rung, since if 0.9 overflowed a small step down usually won't fit either. Then the benign fallback. That keeps it close to the main path.

Also, one more thing to check in autopilot mode - when the loop returns the exhausted overflow BadRequest, the subagent's own ToolCallingLoop will still auto-retry it ~3× under autopilot before the wrapper converts it. Can we skip auto-retry for context-overflow BadRequest (override shouldAutoRetry or run the subagent loop at a non-autopilot permission level)?

Longer term, if 0.9 still isn't wide enough, tuning the prompt-tsx priorities to prune tool-call history harder is ok to me. Your telemetry should tell us whether the single retry actually recovers anything or if we should go that route.

@bhavyaus bhavyaus enabled auto-merge (squash) May 27, 2026 21:44
@bhavyaus bhavyaus merged commit 8544893 into microsoft:main May 27, 2026
25 checks passed
@vs-code-engineering vs-code-engineering Bot added this to the 1.123.0 milestone May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants