Skip to content

Auto-completion disabled for languages with ID >= 256 due to LanguageID bits overlapping with TokenType bits #319118

@Smileslime47

Description

@Smileslime47

Does this issue occur when all extensions are disabled?: Yes/No

  • VS Code Version: 1.92.2
  • OS Version:

I have discovered an issue in vscode's LineTokens metadata generation, specifically related to the 8-bit limit for LanguageID.

According to the official blog (Syntax Highlighting Optimizations), vscode utilizes the first 8 bits of the metadata integer for LanguageID and bits 9-11 for TokenType.

Image

In getDefaultMetadata, the value is constructed as follows:

// vscode/src/vs/editor/common/tokens/contiguousTokensStore.ts

public getTokens(topLevelLanguageId: string, lineIndex: number, lineText: string): LineTokens {
	let rawLineTokens: Uint32Array | ArrayBuffer | null = null;
	if (lineIndex < this._len) {
		rawLineTokens = this._lineTokens[lineIndex];
	}

	if (rawLineTokens !== null && rawLineTokens !== EMPTY_LINE_TOKENS) {
		return new LineTokens(toUint32Array(rawLineTokens), lineText, this._languageIdCodec);
	}

	const lineTokens = new Uint32Array(2);
	lineTokens[0] = lineText.length;
	lineTokens[1] = getDefaultMetadata(this._languageIdCodec.encodeLanguageId(topLevelLanguageId));
	return new LineTokens(lineTokens, lineText, this._languageIdCodec);
}

...

function getDefaultMetadata(topLevelLanguageId) {
    return ((topLevelLanguageId << MetadataConsts.LANGUAGEID_OFFSET) // <- here
        | (StandardTokenType.Other << MetadataConsts.TOKEN_TYPE_OFFSET)
        | ... ) >>> 0;
}

Reproduction

When the number of registered languages exceeds 255 (the 8-bit limit), topLevelLanguageId values for subsequent languages will overflow. Because topLevelLanguageId is shifted and bitwise OR-ed with StandardTokenType, this overflow causes a carry-over into the bits reserved for TokenType.

For example, a LanguageID of 325 (binary 1 0100 0101) effectively truncates/overflows into the TokenType bit field, make the metadata becomes to xxxx 001 0100 0101, which should be xxxx 000 (1)0100 0101. A StandardTokenType.Others (0) bit field is erroneously modified to StandardTokenType.Comments (1) due to this bit carry-over.

Actual

Since quickSuggestion logic relies on TokenType being Other to trigger by default, this metadata corruption causes auto-completion to be silently disabled for all languages with an ID >= 256.

Expected

Even though it might not be a long-term goal to support more than 255 languages in a single instance, this behavior is a silent failure that leads to difficult-to-debug issues. I propose implementing a bitwise mask or modulo operation on topLevelLanguageId (e.g., topLevelLanguageId & 0xFF) before shifting to ensure that overflow does not contaminate the TokenType field.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions