fix(byok): populate Context Window usage for local models (ollama, cu…#313667
fix(byok): populate Context Window usage for local models (ollama, cu…#313667Adityakk9031 wants to merge 2 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes missing Context Window usage reporting for local/OpenAI-compatible model providers by ensuring streamed usage objects are accepted even when total_tokens is omitted, and by synthesizing total_tokens before attaching usage to the emitted completion.
Changes:
- Relax
isApiUsageto accept usage objects that omittotal_tokens. - In the SSE streaming pipeline, synthesize
total_tokensfromprompt_tokens + completion_tokenswhen absent before yielding the final completion.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| extensions/copilot/src/platform/networking/node/stream.ts | Synthesizes total_tokens when missing and attaches usage to the final completion for single-choice streams. |
| extensions/copilot/src/platform/networking/common/openai.ts | Updates usage type guard to not require total_tokens for streamed usage objects from local providers. |
| export function isApiUsage(obj: unknown): obj is APIUsage { | ||
| // total_tokens is intentionally not required — many OpenAI-compatible local | ||
| // models (Ollama, LM Studio, custom proxies) omit it; it equals prompt_tokens + completion_tokens. | ||
| return typeof (obj as APIUsage).prompt_tokens === 'number' && | ||
| typeof (obj as APIUsage).completion_tokens === 'number' && | ||
| typeof (obj as APIUsage).total_tokens === 'number'; | ||
| typeof (obj as APIUsage).completion_tokens === 'number'; | ||
| } |
There was a problem hiding this comment.
isApiUsage is now a type guard for APIUsage, but it no longer verifies the required total_tokens field (per the APIUsage interface). This makes the type guard unsound and could lead callers to assume total_tokens is present when it may be missing. Consider either (1) making total_tokens optional in a separate “streaming usage” type returned by the guard, or (2) keeping APIUsage strict and changing isApiUsage to return a narrower type (e.g., { prompt_tokens: number; completion_tokens: number; total_tokens?: number; ... }) and only upcast to APIUsage after synthesizing total_tokens.
| if (usage && typeof usage.total_tokens !== 'number') { | ||
| // Synthesize total_tokens for models that omit it (e.g. Ollama, LM Studio) | ||
| usage = { ...usage, total_tokens: usage.prompt_tokens + usage.completion_tokens }; | ||
| } |
There was a problem hiding this comment.
New behavior synthesizes total_tokens when absent, but there’s no regression test covering the “usage without total_tokens” scenario. Please add a unit test in the existing stream.sseProcessor.spec.ts suite that streams usage containing only prompt_tokens + completion_tokens (either inline with the final chunk or as a separate usage-only chunk) and asserts results[0].usage.total_tokens is populated as their sum.
Fixes: #313458
Problem
The Context Window usage widget stayed at 0 / for all local model providers (Ollama, LM Studio, custom OpenAI-compatible proxies), even when:
The model's token limits (maxInputTokens, maxOutputTokens) were known
The backend was returning usage data in the streaming response
Root Cause
Two issues in the streaming response pipeline:
// Before — silently rejected usage from models that omit total_tokens
export function isApiUsage(obj: unknown): obj is APIUsage {
return typeof (obj as APIUsage).prompt_tokens === 'number' &&
typeof (obj as APIUsage).completion_tokens === 'number' &&
typeof (obj as APIUsage).total_tokens === 'number'; // ← blocked local models
}
Many OpenAI-compatible local models omit total_tokens since it's derivable. When isApiUsage() returned false, the usage object was silently dropped in stream.ts, leaving completion.usage = undefined.
Fix
openai.ts — Remove total_tokens from the isApiUsage() guard. It is always derivable and is not required by the OpenAI streaming spec.
stream.ts — Synthesize total_tokens before attaching usage to the completion, preserving the APIUsage type contract for downstream consumers (OTel telemetry, etc.).
// After — synthesize total_tokens when absent
if (usage && typeof usage.total_tokens !== 'number') {
usage = { ...usage, total_tokens: usage.prompt_tokens + usage.completion_tokens };
}
completion.usage = usage;