Skip to content

fix: default tokenizer to O200K when server omits it (fixes #319618)#319620

Merged
vs-code-engineering[bot] merged 1 commit into
mainfrom
fix/unknown-tokenizer-undefined-319618-e748906f1f34b706
Jun 2, 2026
Merged

fix: default tokenizer to O200K when server omits it (fixes #319618)#319620
vs-code-engineering[bot] merged 1 commit into
mainfrom
fix/unknown-tokenizer-undefined-319618-e748906f1f34b706

Conversation

@vs-code-engineering
Copy link
Copy Markdown
Contributor

Summary

The CAPI model metadata API can return models without a tokenizer field in their capabilities. When this happens, ChatEndpoint.tokenizer is set to undefined, and TokenizerProvider.acquireTokenizer() throws Error: Unknown tokenizer: undefined. This error affects all platforms (Linux, Windows, Mac) and has been consistently hitting since v1.121.0 with 44,506 hits on the latest version alone.

Fixes #319618
Recommended reviewer: @lramos15

Culprit Commit

Not a single-commit regression. The error has been present since at least v1.121.0 (first seen 2026-05-20). The root cause is that chatEndpoint.ts line 167 trusts the server API response to always include capabilities.tokenizer, but some models omit this field.

Code Flow

graph TD
    A[CAPI /models API response] -->|capabilities.tokenizer = undefined| B[ChatEndpoint constructor]
    B -->|this.tokenizer = undefined| C[ChatEndpoint instance]
    C -->|endpoint passed to PromptRenderer| D[PromptRenderer constructor]
    D -->|tokenizerProvider.acquireTokenizer endpoint| E[TokenizerProvider.acquireTokenizer]
    E -->|switch on undefined| F[throw Error Unknown tokenizer: undefined]
Loading

Affected Files

  • extensions/copilot/src/platform/endpoint/node/chatEndpoint.ts — producer of the undefined tokenizer value (line 167)
  • extensions/copilot/src/platform/tokenizer/node/tokenizer.ts — crash site (line 82)
  • extensions/copilot/src/extension/prompts/node/base/promptRenderer.ts — intermediate caller

Repro Steps

  1. Have a Copilot model available via CAPI that does not include a tokenizer field in its capabilities metadata
  2. Trigger any chat interaction that uses that model (e.g., send a message)
  3. The PromptRenderer attempts to acquire a tokenizer for the endpoint, hitting the throw

How the Fix Works

Chosen approach (extensions/copilot/src/platform/endpoint/node/chatEndpoint.ts): Added a nullish coalescing default (?? TokenizerType.O200K) at line 167 where modelMetadata.capabilities.tokenizer is assigned to this.tokenizer. This fixes the issue at the data producer — where the potentially-undefined server value is first consumed — rather than at the crash site in acquireTokenizer(). O200K is the standard default used across all other endpoint implementations (BYOK, proxy endpoints, xtab, etc.).

Alternatives considered: Adding a fallback case in TokenizerProvider.acquireTokenizer() would fix at the crash site rather than the data producer, hiding the fact that server data was incomplete and potentially masking other issues downstream that depend on endpoint.tokenizer being a valid enum value.

Recommended Owner

@lramos15 — primary author of the endpoint/model-metadata infrastructure in extensions/copilot/src/platform/endpoint/.

Generated by errors-fix · ● 65.4M ·

When the CAPI model metadata API returns a model without a tokenizer
field, ChatEndpoint.tokenizer would be undefined, causing
acquireTokenizer() to throw 'Unknown tokenizer: undefined'.

Default to TokenizerType.O200K (the most common tokenizer) when the
server response omits the field.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 2, 2026 16:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@vs-code-engineering vs-code-engineering Bot requested review from Copilot and lramos15 June 2, 2026 16:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@vs-code-engineering vs-code-engineering Bot marked this pull request as ready for review June 2, 2026 16:58
@vs-code-engineering vs-code-engineering Bot enabled auto-merge (squash) June 2, 2026 16:58
@vs-code-engineering vs-code-engineering Bot merged commit b789347 into main Jun 2, 2026
25 checks passed
@vs-code-engineering vs-code-engineering Bot deleted the fix/unknown-tokenizer-undefined-319618-e748906f1f34b706 branch June 2, 2026 18:55
@vs-code-engineering vs-code-engineering Bot added this to the 1.124.0 milestone Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Error] unhandlederror-Unknown tokenizer: undefined

3 participants