[Search Subagent] Handle context window limit exceeded error by guomaggie · Pull Request #316529 · microsoft/vscode

guomaggie · 2026-05-15T00:16:35Z

Handle context-window overflow in the search subagent. The subagent's prompt can exceed the model's context after a few tool-call rounds, returning a 400 context_length_exceeded. When the main agent runs in autopilot, that error also drives a full-turn retry, making things worse.

Changes to SearchSubagentToolCallingLoop:

Proactive budget sizing in buildPrompt: subtracts tool-definition tokens and applies the current safety factor, then renders via endpoint.cloneWithTokenOverride so prompt-tsx prunes against a realistic budget. Mirrors agentIntent.ts.
Reactive shrink-retry in fetch: on a context-overflow BadRequest, advances _safetyFactorIndex through SAFETY_FACTORS = [0.9, 0.66, 0.4], re-renders, and retries. When the smallest factor is exhausted, the original error surfaces.
isContextOverflowBadRequest helper matches BadRequest reasons against known patterns (context_length_exceeded, context length, context window, maximum context, prompt is too long, request too large, request_too_large).
Telemetry: emits searchSubagent.contextOverflow with kind ('retried' | 'exhausted'), model, and safetyFactorIndex.

Changes to SearchSubagentTool (autopilot-safe wrapper):

If the loop exhausts its safety factors, the wrapper converts the failure to a benign CONTEXT_OVERFLOW_FALLBACK <final_answer> message instead of surfacing the BadRequest to the main agent. Prevents autopilot from retrying the whole turn.
A BudgetExceededError from prompt-tsx (render-time) is mapped to the same fallback. Other errors are rethrown.
Response dispatch extracted into pure helper mapLoopResponseToText(result) so it can be unit-tested without stubbing every service.

Tests:

New searchSubagentToolCallingLoop.spec.ts: isContextOverflowBadRequest reason matching + fetch shrink-retry loop behavior.
New tests in searchSubagentTool.spec.ts (success / overflow → fallback / non-overflow → error text).
New mockChatHookService.ts shared by these and existing autopilot/hooks specs.

Copilot

Pull request overview

Improves resilience of the Copilot search subagent loop when the rendered prompt exceeds the model context window by proactively rendering against a safer effective budget and reactively re-rendering with progressively smaller budgets on detected context-overflow failures.

Changes:

Adjust prompt rendering budget in buildPrompt by subtracting tool-definition tokens, applying a safety margin, and rendering with an endpoint cloned to that effective budget.
Add context-overflow detection and retry logic in fetch that re-renders with progressively smaller budgets before giving up.
Add unit tests for the overflow detection helper and the retry behavior.

Show a summary per file

File	Description
extensions/copilot/src/extension/prompt/node/searchSubagentToolCallingLoop.ts	Adds proactive prompt-budget sizing, overflow detection, and reactive re-render/retry behavior.
extensions/copilot/src/extension/prompt/test/node/searchSubagentToolCallingLoop.spec.ts	Adds tests for `isContextOverflowError` and for fetch retry behavior on context overflow.

Copilot's findings

Comments suppressed due to low confidence (2)

extensions/copilot/src/extension/prompt/test/node/searchSubagentToolCallingLoop.spec.ts:86

createMockChatRequest sets location: 1. Even though this currently corresponds to ChatLocation.Panel, using the enum value directly avoids coupling the test to the numeric representation and makes intent clearer.

		enableCommandDetection: false,
		isParticipantDetected: false,
		toolReferences: [],
		toolInvocationToken: {} as ChatRequest['toolInvocationToken'],

extensions/copilot/src/extension/prompt/test/node/searchSubagentToolCallingLoop.spec.ts:128

successResponse() builds a partial success object and then casts with as unknown as ChatResponse. This weakens the test’s type-safety and can hide breaking changes to the ChatResponse shape (e.g. required resolvedModel/usage). Prefer constructing a fully-typed ChatResponse object instead of double-casting.

		requestId: 'req-ok',
		serverRequestId: undefined,
	} as unknown as ChatResponse;
}

Files reviewed: 2/2 changed files
Comments generated: 1

bhavyaus · 2026-05-20T03:38:00Z

This feels a bit more complicated than it needs to be. Could we lean on what the main agent loop already does?

How often is ContextWindowExceeded actually hit by the search subagent in practice? It would help to see numbers before adding a retry mechanism for it.
Could the proactive budget alone be enough? The main agent loop does the same (modelMaxPromptTokens - toolTokens) * 0.9 + cloneWithTokenOverride math in agentIntent.ts` and then trusts prompt-tsx to fit — there is no reactive shrink-retry anywhere on the main path, and ContextWindowExceeded is treated as a genuine failure. If 0.9 isn't wide enough for the search subagent, dropping to 0.85/0.8 and tuning prompt-tsx priorities to prune tool-call history more aggressively seems like a simpler, more aligned solution than a second budget loop.

bhavyaus

Could you also ensure this works well with the autopilot mode? This new error looks like it falls through and cause additional retries by the main agent if we still choose to pursue this solution.

24anisha · 2026-05-20T19:23:10Z

How often is ContextWindowExceeded actually hit by the search subagent in practice? It would help to see numbers before adding a retry mechanism for it.

We see this error crop up ~ 600 - 800 times per day, and especially in cases with more complicated, deep search requests. So, not prohibitively frequent, but enough that we'd like to resolve it

guomaggie · 2026-05-20T22:51:04Z

This feels a bit more complicated than it needs to be. Could we lean on what the main agent loop already does?

How often is ContextWindowExceeded actually hit by the search subagent in practice? It would help to see numbers before adding a retry mechanism for it.

Could the proactive budget alone be enough? The main agent loop does the same (modelMaxPromptTokens - toolTokens) * 0.9 + cloneWithTokenOverride math in agentIntent.ts` and then trusts prompt-tsx to fit — there is no reactive shrink-retry anywhere on the main path, and ContextWindowExceeded is treated as a genuine failure. If 0.9 isn't wide enough for the search subagent, dropping to 0.85/0.8 and tuning prompt-tsx priorities to prune tool-call history more aggressively seems like a simpler, more aligned solution than a second budget loop.

I think the goal for this PR is that we don't want to surface context limit exceeded requests from the subagent to the main agent, and in that case I think a reactionary fallback would make sense; If we continue with this approach, would it be better to only retry once on 400 before returning a benign message to the main agent (or is there a cleaner solution)?

bhavyaus · 2026-05-22T00:26:23Z

This feels a bit more complicated than it needs to be. Could we lean on what the main agent loop already does?

How often is ContextWindowExceeded actually hit by the search subagent in practice? It would help to see numbers before adding a retry mechanism for it.

Could the proactive budget alone be enough? The main agent loop does the same (modelMaxPromptTokens - toolTokens) * 0.9 + cloneWithTokenOverride math in agentIntent.ts` and then trusts prompt-tsx to fit — there is no reactive shrink-retry anywhere on the main path, and ContextWindowExceeded is treated as a genuine failure. If 0.9 isn't wide enough for the search subagent, dropping to 0.85/0.8 and tuning prompt-tsx priorities to prune tool-call history more aggressively seems like a simpler, more aligned solution than a second budget loop.

I think the goal for this PR is that we don't want to surface context limit exceeded requests from the subagent to the main agent, and in that case I think a reactionary fallback would make sense; If we continue with this approach, would it be better to only retry once on 400 before returning a benign message to the main agent (or is there a cleaner solution)?

Let's go with a single retry. Concretely: render proactively at 0.9 (matching agentIntent.ts), and on a context-overflow 400 do one retry at an aggressively smaller factor (~0.5) before falling back. skip the 0.66rung, since if 0.9 overflowed a small step down usually won't fit either. Then the benign fallback. That keeps it close to the main path.

Also, one more thing to check in autopilot mode - when the loop returns the exhausted overflow BadRequest, the subagent's own ToolCallingLoop will still auto-retry it ~3× under autopilot before the wrapper converts it. Can we skip auto-retry for context-overflow BadRequest (override shouldAutoRetry or run the subagent loop at a non-autopilot permission level)?

Longer term, if 0.9 still isn't wide enough, tuning the prompt-tsx priorities to prune tool-call history harder is ok to me. Your telemetry should tell us whether the single retry actually recovers anything or if we should go that route.

truncate and retry tool calls

82af8a0

Copilot AI review requested due to automatic review settings May 15, 2026 00:16

Copilot started reviewing on behalf of guomaggie May 15, 2026 00:19 View session

vs-code-engineering Bot assigned osortega May 15, 2026

keep visibility of getEndpoint

2dc046c

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread extensions/copilot/src/extension/prompt/test/node/searchSubagentToolCallingLoop.spec.ts

guomaggie added 2 commits May 14, 2026 17:57

address code review comments

fbd8a06

fix ci errors

a23ebf6

guomaggie marked this pull request as ready for review May 15, 2026 16:45

osortega assigned mjbvz and unassigned osortega May 15, 2026

24anisha assigned bhavyaus and 24anisha and unassigned mjbvz and bhavyaus May 18, 2026

bhavyaus requested changes May 20, 2026

View reviewed changes

address code review comments

01e4b4f

guomaggie force-pushed the maggie/handle-context-limit-exceeded-error branch from 3b49bfd to 01e4b4f Compare May 20, 2026 21:35

guomaggie added 2 commits May 20, 2026 15:31

add testing

77ab3e3

Merge branch 'main' into maggie/handle-context-limit-exceeded-error

502c6c2

guomaggie requested a review from bhavyaus May 20, 2026 22:53

Merge branch 'main' into maggie/handle-context-limit-exceeded-error

388ecf0

address code review comments

12d568a

bhavyaus approved these changes May 27, 2026

View reviewed changes

bhavyaus enabled auto-merge (squash) May 27, 2026 21:44

TylerLeonhardt approved these changes May 27, 2026

View reviewed changes

bhavyaus merged commit 8544893 into microsoft:main May 27, 2026
25 checks passed

vs-code-engineering Bot added this to the 1.123.0 milestone May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Search Subagent] Handle context window limit exceeded error#316529

[Search Subagent] Handle context window limit exceeded error#316529
bhavyaus merged 9 commits into
microsoft:mainfrom
guomaggie:maggie/handle-context-limit-exceeded-error

guomaggie commented May 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

bhavyaus commented May 20, 2026

Uh oh!

bhavyaus left a comment

Uh oh!

24anisha commented May 20, 2026

Uh oh!

guomaggie commented May 20, 2026

Uh oh!

bhavyaus commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

guomaggie commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

bhavyaus commented May 20, 2026

Uh oh!

bhavyaus left a comment

Choose a reason for hiding this comment

Uh oh!

24anisha commented May 20, 2026

Uh oh!

guomaggie commented May 20, 2026

Uh oh!

bhavyaus commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

guomaggie commented May 15, 2026 •

edited

Loading