fix: default tokenizer to O200K when server omits it (fixes #319618)#319620
Merged
vs-code-engineering[bot] merged 1 commit intoJun 2, 2026
Merged
Conversation
When the CAPI model metadata API returns a model without a tokenizer field, ChatEndpoint.tokenizer would be undefined, causing acquireTokenizer() to throw 'Unknown tokenizer: undefined'. Default to TokenizerType.O200K (the most common tokenizer) when the server response omits the field. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
lramos15
approved these changes
Jun 2, 2026
benvillalobos
approved these changes
Jun 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The CAPI model metadata API can return models without a
tokenizerfield in their capabilities. When this happens,ChatEndpoint.tokenizeris set toundefined, andTokenizerProvider.acquireTokenizer()throwsError: Unknown tokenizer: undefined. This error affects all platforms (Linux, Windows, Mac) and has been consistently hitting since v1.121.0 with 44,506 hits on the latest version alone.Fixes #319618
Recommended reviewer:
@lramos15Culprit Commit
Not a single-commit regression. The error has been present since at least v1.121.0 (first seen 2026-05-20). The root cause is that
chatEndpoint.tsline 167 trusts the server API response to always includecapabilities.tokenizer, but some models omit this field.Code Flow
graph TD A[CAPI /models API response] -->|capabilities.tokenizer = undefined| B[ChatEndpoint constructor] B -->|this.tokenizer = undefined| C[ChatEndpoint instance] C -->|endpoint passed to PromptRenderer| D[PromptRenderer constructor] D -->|tokenizerProvider.acquireTokenizer endpoint| E[TokenizerProvider.acquireTokenizer] E -->|switch on undefined| F[throw Error Unknown tokenizer: undefined]Affected Files
extensions/copilot/src/platform/endpoint/node/chatEndpoint.ts— producer of the undefined tokenizer value (line 167)extensions/copilot/src/platform/tokenizer/node/tokenizer.ts— crash site (line 82)extensions/copilot/src/extension/prompts/node/base/promptRenderer.ts— intermediate callerRepro Steps
tokenizerfield in its capabilities metadataHow the Fix Works
Chosen approach (
extensions/copilot/src/platform/endpoint/node/chatEndpoint.ts): Added a nullish coalescing default (?? TokenizerType.O200K) at line 167 wheremodelMetadata.capabilities.tokenizeris assigned tothis.tokenizer. This fixes the issue at the data producer — where the potentially-undefined server value is first consumed — rather than at the crash site inacquireTokenizer(). O200K is the standard default used across all other endpoint implementations (BYOK, proxy endpoints, xtab, etc.).Alternatives considered: Adding a fallback case in
TokenizerProvider.acquireTokenizer()would fix at the crash site rather than the data producer, hiding the fact that server data was incomplete and potentially masking other issues downstream that depend onendpoint.tokenizerbeing a valid enum value.Recommended Owner
@lramos15— primary author of the endpoint/model-metadata infrastructure inextensions/copilot/src/platform/endpoint/.