Skip to content

fix(byok): populate Context Window usage for local models (ollama, cu…#313667

Open
Adityakk9031 wants to merge 2 commits intomicrosoft:mainfrom
Adityakk9031:#313458
Open

fix(byok): populate Context Window usage for local models (ollama, cu…#313667
Adityakk9031 wants to merge 2 commits intomicrosoft:mainfrom
Adityakk9031:#313458

Conversation

@Adityakk9031
Copy link
Copy Markdown

Fixes: #313458

Problem
The Context Window usage widget stayed at 0 / for all local model providers (Ollama, LM Studio, custom OpenAI-compatible proxies), even when:

The model's token limits (maxInputTokens, maxOutputTokens) were known
The backend was returning usage data in the streaming response
Root Cause
Two issues in the streaming response pipeline:

  1. isApiUsage() required total_tokens — openai.ts:70

// Before — silently rejected usage from models that omit total_tokens
export function isApiUsage(obj: unknown): obj is APIUsage {
return typeof (obj as APIUsage).prompt_tokens === 'number' &&
typeof (obj as APIUsage).completion_tokens === 'number' &&
typeof (obj as APIUsage).total_tokens === 'number'; // ← blocked local models
}
Many OpenAI-compatible local models omit total_tokens since it's derivable. When isApiUsage() returned false, the usage object was silently dropped in stream.ts, leaving completion.usage = undefined.

  1. The cascade from there was complete — fetchResult.usage was undefined → toolCallingLoop.ts skipped stream.usage() → response.setUsage() was never called → widget always read response.usage = undefined → ring stayed gray at 0.

Fix
openai.ts — Remove total_tokens from the isApiUsage() guard. It is always derivable and is not required by the OpenAI streaming spec.

stream.ts — Synthesize total_tokens before attaching usage to the completion, preserving the APIUsage type contract for downstream consumers (OTel telemetry, etc.).

// After — synthesize total_tokens when absent
if (usage && typeof usage.total_tokens !== 'number') {
usage = { ...usage, total_tokens: usage.prompt_tokens + usage.completion_tokens };
}
completion.usage = usage;

Copilot AI review requested due to automatic review settings May 1, 2026 06:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes missing Context Window usage reporting for local/OpenAI-compatible model providers by ensuring streamed usage objects are accepted even when total_tokens is omitted, and by synthesizing total_tokens before attaching usage to the emitted completion.

Changes:

  • Relax isApiUsage to accept usage objects that omit total_tokens.
  • In the SSE streaming pipeline, synthesize total_tokens from prompt_tokens + completion_tokens when absent before yielding the final completion.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
extensions/copilot/src/platform/networking/node/stream.ts Synthesizes total_tokens when missing and attaches usage to the final completion for single-choice streams.
extensions/copilot/src/platform/networking/common/openai.ts Updates usage type guard to not require total_tokens for streamed usage objects from local providers.

Comment on lines 70 to 75
export function isApiUsage(obj: unknown): obj is APIUsage {
// total_tokens is intentionally not required — many OpenAI-compatible local
// models (Ollama, LM Studio, custom proxies) omit it; it equals prompt_tokens + completion_tokens.
return typeof (obj as APIUsage).prompt_tokens === 'number' &&
typeof (obj as APIUsage).completion_tokens === 'number' &&
typeof (obj as APIUsage).total_tokens === 'number';
typeof (obj as APIUsage).completion_tokens === 'number';
}
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isApiUsage is now a type guard for APIUsage, but it no longer verifies the required total_tokens field (per the APIUsage interface). This makes the type guard unsound and could lead callers to assume total_tokens is present when it may be missing. Consider either (1) making total_tokens optional in a separate “streaming usage” type returned by the guard, or (2) keeping APIUsage strict and changing isApiUsage to return a narrower type (e.g., { prompt_tokens: number; completion_tokens: number; total_tokens?: number; ... }) and only upcast to APIUsage after synthesizing total_tokens.

Copilot uses AI. Check for mistakes.
Comment on lines +298 to +301
if (usage && typeof usage.total_tokens !== 'number') {
// Synthesize total_tokens for models that omit it (e.g. Ollama, LM Studio)
usage = { ...usage, total_tokens: usage.prompt_tokens + usage.completion_tokens };
}
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior synthesizes total_tokens when absent, but there’s no regression test covering the “usage without total_tokens” scenario. Please add a unit test in the existing stream.sseProcessor.spec.ts suite that streams usage containing only prompt_tokens + completion_tokens (either inline with the final chunk or as a separate usage-only chunk) and asserts results[0].usage.total_tokens is populated as their sum.

Copilot uses AI. Check for mistakes.
@roblourens roblourens assigned lramos15 and unassigned roblourens May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Copilot Chat local models do not populate Context Window usage even when backend usage and model limits are available

4 participants