Skip to content

Perf: Convert keyword/operator arrays to Sets in Monarch tokenizer #582

@tnaum-ms

Description

@tnaum-ms

Summary

The resolveCases function in monarchRunner.ts uses Array.includes() (O(n)) to check if a matched token belongs to a named category (keywords, BSON constructors, shell commands, operators). Converting these arrays to Set objects would make lookups O(1).

Context

From PR #580 review item I-05.

File: src/documentdb/shell/highlighting/monarchRunner.ts (line ~195)

if (Array.isArray(array) && (array as string[]).includes(matchedText)) {

With 47 keywords, 10 BSON constructors, and 8 shell commands, the linear scan is called on every identifier token during tokenization. For typical shell input (< 200 chars), this is well within the 0.5ms target and is not a correctness issue.

Proposed Fix

Lazily convert the readonly string[] arrays in MonarchLanguageRules to Set<string> on first use (e.g., via a WeakMap<MonarchLanguageRules, Map<string, Set<string>>> cache), then use Set.has() instead of Array.includes().

Priority

Low — This is a minor performance optimization. Current performance is acceptable for all practical shell input sizes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions