Does this issue occur when all extensions are disabled?: Yes/No
- VS Code Version: 1.92.2
- OS Version:
I have discovered an issue in vscode's LineTokens metadata generation, specifically related to the 8-bit limit for LanguageID.
According to the official blog (Syntax Highlighting Optimizations), vscode utilizes the first 8 bits of the metadata integer for LanguageID and bits 9-11 for TokenType.
In getDefaultMetadata, the value is constructed as follows:
// vscode/src/vs/editor/common/tokens/contiguousTokensStore.ts
public getTokens(topLevelLanguageId: string, lineIndex: number, lineText: string): LineTokens {
let rawLineTokens: Uint32Array | ArrayBuffer | null = null;
if (lineIndex < this._len) {
rawLineTokens = this._lineTokens[lineIndex];
}
if (rawLineTokens !== null && rawLineTokens !== EMPTY_LINE_TOKENS) {
return new LineTokens(toUint32Array(rawLineTokens), lineText, this._languageIdCodec);
}
const lineTokens = new Uint32Array(2);
lineTokens[0] = lineText.length;
lineTokens[1] = getDefaultMetadata(this._languageIdCodec.encodeLanguageId(topLevelLanguageId));
return new LineTokens(lineTokens, lineText, this._languageIdCodec);
}
...
function getDefaultMetadata(topLevelLanguageId) {
return ((topLevelLanguageId << MetadataConsts.LANGUAGEID_OFFSET) // <- here
| (StandardTokenType.Other << MetadataConsts.TOKEN_TYPE_OFFSET)
| ... ) >>> 0;
}
Reproduction
When the number of registered languages exceeds 255 (the 8-bit limit), topLevelLanguageId values for subsequent languages will overflow. Because topLevelLanguageId is shifted and bitwise OR-ed with StandardTokenType, this overflow causes a carry-over into the bits reserved for TokenType.
For example, a LanguageID of 325 (binary 1 0100 0101) effectively truncates/overflows into the TokenType bit field, make the metadata becomes to xxxx 001 0100 0101, which should be xxxx 000 (1)0100 0101. A StandardTokenType.Others (0) bit field is erroneously modified to StandardTokenType.Comments (1) due to this bit carry-over.
Actual
Since quickSuggestion logic relies on TokenType being Other to trigger by default, this metadata corruption causes auto-completion to be silently disabled for all languages with an ID >= 256.
Expected
Even though it might not be a long-term goal to support more than 255 languages in a single instance, this behavior is a silent failure that leads to difficult-to-debug issues. I propose implementing a bitwise mask or modulo operation on topLevelLanguageId (e.g., topLevelLanguageId & 0xFF) before shifting to ensure that overflow does not contaminate the TokenType field.
Does this issue occur when all extensions are disabled?: Yes/No
I have discovered an issue in vscode's LineTokens metadata generation, specifically related to the 8-bit limit for LanguageID.
According to the official blog (Syntax Highlighting Optimizations), vscode utilizes the first 8 bits of the metadata integer for LanguageID and bits 9-11 for TokenType.
In getDefaultMetadata, the value is constructed as follows:
Reproduction
When the number of registered languages exceeds 255 (the 8-bit limit), topLevelLanguageId values for subsequent languages will overflow. Because topLevelLanguageId is shifted and bitwise OR-ed with StandardTokenType, this overflow causes a carry-over into the bits reserved for TokenType.
For example, a LanguageID of 325 (binary 1 0100 0101) effectively truncates/overflows into the TokenType bit field, make the metadata becomes to
xxxx 001 0100 0101, which should bexxxx 000 (1)0100 0101. A StandardTokenType.Others (0) bit field is erroneously modified to StandardTokenType.Comments (1) due to this bit carry-over.Actual
Since quickSuggestion logic relies on TokenType being Other to trigger by default, this metadata corruption causes auto-completion to be silently disabled for all languages with an ID >= 256.
Expected
Even though it might not be a long-term goal to support more than 255 languages in a single instance, this behavior is a silent failure that leads to difficult-to-debug issues. I propose implementing a bitwise mask or modulo operation on topLevelLanguageId (e.g., topLevelLanguageId & 0xFF) before shifting to ensure that overflow does not contaminate the TokenType field.