Skip to content

Move LSP feature logic into kson-tooling with shared parse via ToolingDocument#346

Merged
holodorum merged 16 commits intokson-org:mainfrom
holodorum:lsp-to-tooling
Mar 29, 2026
Merged

Move LSP feature logic into kson-tooling with shared parse via ToolingDocument#346
holodorum merged 16 commits intokson-org:mainfrom
holodorum:lsp-to-tooling

Conversation

@holodorum
Copy link
Copy Markdown
Collaborator

The TypeScript LSP services contained significant tree-walking and token-classification logic that duplicated
what the Kotlin AST already knows. Moving this logic into kson-tooling gives it direct AST access (eliminating
fragile heuristics like lookahead-based token classification), makes it testable in Kotlin multiplatform tests,
and reduces the TS layer to pure LSP plumbing.

The redundant parsing was the other key problem: each feature independently called parseToAst, so a single
keystroke could trigger 6 parses. ToolingDocument wraps one error-tolerant parse result shared by all builders.

holodorum and others added 16 commits March 23, 2026 20:24
The pull configuration model (a1c9d2b) introduced an inherent race:
the server receives a didChangeConfiguration notification and then
makes an async round-trip to pull the new settings.  A format request
issued immediately after a config change can arrive before that pull
completes, producing output with stale settings.

Fix by polling in the test: format and re-check until the expected
output appears, with a timeout.  Extract a `formatAndAwait` helper
to keep the test bodies clean and document the race in its JSDoc.
Move document symbol and semantic token logic into Kotlin where it has
direct AST access. DocumentSymbolBuilder recursively walks the KsonValue
tree. SemanticTokenBuilder uses AST-aware key token collection instead
of fragile lookahead heuristics. Both use domain-pure enums with no LSP
coupling.
Replace tree-walking and token classification logic with thin wrappers
that delegate to kson-tooling. TypeScript now only maps domain-pure
Kotlin enums to LSP constants. DocumentSymbolService drops from 152 to
~60 LOC; SemanticTokensService from 104 to ~50 LOC.
Tighten assertions across the integration test suite: replace >= checks
with exact values for semantic token counts, document highlight counts,
and document symbol children. Add error-handling tests that verify each
service handler returns a safe default when its underlying service throws.
These tests exercise the try/catch boundaries that protect the LSP
connection from individual service failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Token-stream based FoldingRangeService that matches open/close pairs
({}, [], $tag..$$) using a stack. Only creates fold ranges when the
open and close tokens are on different lines. Server advertises
foldingRangeProvider capability; 8 new tests cover single-line,
multi-line, nested, embed, and mixed scenarios.

Note: registering a foldingRangeProvider replaces VS Code's built-in
indentation/bracket folding for the language, so this service must
provide ranges for all structural constructs — not just embeds.
Walks the KsonValue AST to build a parent chain of ranges from
innermost node to full document. Handles objects (key → property →
container), arrays (element → array), and leaf values. Deduplicates
identical adjacent ranges. 9 new tests verify hierarchy correctness
including nested objects, arrays, and the strictly-expanding property.
Delete IndexedDocumentSymbols.ts and SymbolPositionIndex.test.ts,
replaced by Kotlin-side SiblingKeyBuilder. Clean up KsonDocument.ts
and KsonTextDocumentService.ts to remove IndexedDocumentSymbols usage.
The 5 non-diagnostic builders in kson-tooling each independently called
KsonCore.parseToAst, so a single document change could trigger up to 6
parses. ToolingDocument wraps a single error-tolerant parse result that
is shared across all builder calls.

Refactored builders to accept pre-parsed data instead of raw strings:
- SemanticTokenBuilder.build now takes tokens + ast
- FoldingRangeBuilder.build now takes tokens
- SelectionRangeBuilder.build now takes KsonValue
- SiblingKeyBuilder.build now takes KsonValue
- DocumentSymbolBuilder.build unchanged (already took KsonValue)

KsonTooling.parse(content) creates a ToolingDocument; the 5 feature
methods now accept ToolingDocument instead of String. Diagnostics keeps
its own strict parse path for accurate error messages.

Because ToolingDocument parses with ignoreErrors=true, features like
folding and semantic tokens now return partial results for broken
documents rather than empty — better editor behavior.
KsonDocument now holds a lazily-created ToolingDocument, created once
per content change via getToolingDocument(). All five non-diagnostic
TS services now accept ToolingDocument directly, so a single document
change triggers one parse instead of five.

Services updated to accept ToolingDocument:
- DocumentSymbolService.getDocumentSymbols
- SemanticTokensService.getSemanticTokens
- FoldingRangeService.getFoldingRanges
- SelectionRangeService.getSelectionRanges
- DocumentHighlightService.getDocumentHighlights

To enable SelectionRangeService to take ToolingDocument (instead of
KsonDocument for getFullDocumentRange()), KsonTooling.getEnclosingRanges
now includes the full-document range as the outermost entry, computed
from the EOF token position. This eliminates the previous contract
where the caller had to append it.

DiagnosticService unchanged — keeps its own strict parse path via
validateDocument(content, schemaContent).
With ignoreErrors=true, the parser produces an AST where the root is a
valid KsonRootImpl but child nodes may contain AstNodeError nodes. The
error-walking step is skipped, so hasErrors() returns false, and the
lazy ksonValue property proceeds to call toKsonValue() which throws
ShouldNotHappenException or UnsupportedOperationException on those
error nodes.

The fix: catch RuntimeException in ksonValue and return null, which is
already the documented contract for "errors during parse." This removes
the need for callers to defensively catch — ToolingDocument had a
20-line workaround for exactly this, now replaced by a direct delegation
to parseResult.ksonValue.
The document symbol tree was being rebuilt from the KsonValue AST on
every call to getDocumentSymbols() and getSiblingKeys(). Since
getSiblingKeys() fires on every cursor position change via
DocumentHighlightService, this meant a full recursive tree construction
on every keystroke in large documents.

ToolingDocument now lazily builds and caches the symbol tree, shared
across both getDocumentSymbols() and getSiblingKeys(). SiblingKeyBuilder
accepts the pre-built symbol list instead of rebuilding it from
KsonValue.
The four schema-aware methods (getSchemaInfoAtLocation,
getSchemaLocationAtLocation, resolveRefAtLocation,
getCompletionsAtLocation) previously accepted raw content strings and
re-parsed them on every call. This meant the document was parsed up to
3 times and the schema once per invocation.

These methods now accept ToolingDocument, which callers cache. This
eliminates redundant parsing across repeated calls (e.g. hover then
completion on the same document version).

Internal collaborators are updated to match:
- KsonValuePathBuilder accepts an optional pre-parsed KsonValue for
  navigation, still doing its own strict parse for token context only
- SchemaFilteringService accepts KsonValue? instead of String
- ResolvedSchemaContext accepts pre-parsed values instead of strings

ToolingDocument gains two new properties:
- content: exposes the raw string for KsonValuePathBuilder's strict
  token analysis and recovery path
- strictKsonValue: a lazily-computed strict parse for accurate location
  information, since gap-free parsing (used by ToolingDocument) produces
  slightly broader node spans that include trailing whitespace

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
KsonDocument now caches a ToolingDocument for its associated schema,
and KsonSchemaDocument does the same for its metaschema. The three
schema-aware services (HoverService, CompletionService,
DefinitionService) pass these cached ToolingDocuments to the Kotlin
API instead of raw content strings.

This eliminates all redundant parsing in the schema-aware path:
across repeated hover, completion, and definition requests on the same
document version, neither the document nor the schema is re-parsed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kotlin:
- Make ResolvedSchemaContext.resolveAndFilterSchemas return non-nullable:
  the null-parse check moved to callers, so the method always returns a
  value. Remove dead ?: return null checks from all four call sites.
- Fix KsonValuePathBuilder test to use gap-free ksonValue (via
  KsonTooling.parse) matching the production path, instead of strict
  parse which trivially equals the fallback.

TypeScript:
- Restructure DefinitionService to avoid non-null assertions on schema
  URI retrieval. Compute schemaUri alongside schemaToolingDoc in a
  single branch, using optional chaining instead of !.
- Add isKsonSchemaDocument check to CompletionService, matching the
  pattern in HoverService and DefinitionService. Previously completions
  would silently return null when editing schema files.
- Remove restating comment in HoverService.
In gap-free mode, KsonMarker.done() records lastTokenIndex as
getTokenIndex()-1. Because advanceLexer() skips past WHITESPACE and
COMMENT tokens, this index can point to a trailing whitespace token
rather than the meaningful closing token (e.g. "}"). Since
AstNodeImpl.location computes from sourceTokens.last(), AST node
locations would extend into the whitespace following the construct —
potentially spanning additional lines.

Fix done() to walk lastTokenIndex back past WHITESPACE tokens for
non-error nodes, so the marker always ends at the last meaningful
token.  Two important distinctions from a naive "skip all ignored
tokens" approach:

- Only skip WHITESPACE, not COMMENT — comments carry semantic content
  that nodes must retain (e.g. trailing comments on formatted output).

- Skip the walk-back entirely for ERROR nodes — error nodes must
  retain all their tokens including trailing whitespace so they can
  faithfully reconstruct the original source.  (The gap-free comment
  in KsonCore.parseToAst explicitly notes this: "we tokenize gapFree
  when we are errorTolerant so that error nodes can reconstruct their
  whitespace".)

Also removes the existing dropLastWhile workaround in
ObjectPropertyNodeError which was compensating for this same issue.

With gap-free positions now matching strict positions, eliminate two
sources of redundant parsing:

- Remove ToolingDocument.strictKsonValue (lazy strict re-parse of
  schema documents). Schema-aware methods now use ksonValue directly.

- Refactor KsonValuePathBuilder to accept pre-lexed tokens from
  ToolingDocument, filtering out WHITESPACE/COMMENT internally. This
  eliminates the strict re-parse that previously happened on every
  hover, completion, and definition request. Falls back to strict
  parse when no tokens are provided (backward compatibility).

Update SelectionRangeTest expectations to match corrected gap-free
positions. Add tests for gap-free/strict position equivalence and
for the pre-lexed tokens path in KsonValuePathBuilder.
@holodorum holodorum merged commit 4efbe3b into kson-org:main Mar 29, 2026
1 check passed
@holodorum holodorum deleted the lsp-to-tooling branch March 29, 2026 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant