Part of #1117
Part of posit-dev/positron#2321
To support static analysis in the LSP, we plan to introduce a semantic index modelled after ty's implementation (https://github.com/astral-sh/ruff/tree/main/crates/ty_python_semantic/src/semantic_index). The index contains in particular:
- A tree of scopes with a symbol table for every scope. Symbol flags indicate for each scope whether a symbol is used or bound.
- Definitions indexed by syntax node (
definitions_by_node), each carrying a DefinitionKind that classifies the syntactic construct (assignment, parameter, for-variable, etc.). This is the input to type inference: given a definition, inspect the syntax node to determine the type. As such, definitions bridge the semantic index and type checking.
- Use-def maps which track which definitions reach each use through control flow.
This data structure will be built in a similar fashion to how the current diagnostics for e.g. unused variables work (https://github.com/posit-dev/ark/blob/main/crates/ark/src/lsp/diagnostics.rs), with recursive descent over the syntax tree. Whereas our current diagnostics path mixes scope analysis and diagnostics emission, we plan to separate the concerns in the future by first building the semantic index, then using it to emit diagnostics.
The index will provide the foundation for new features (like rename and jump-to-def) and existing ones:
It will also be the input for our first type inference (is this binding a function, supporting diagnostics like "object of type closure is not subsettable").
Design-wise, the index is a nice middle ground regarding the thoughts in rust-lang/rust-analyzer#8713 and https://hackmd.io/@matklad/rJzQhvk2u.
-
There is no separate IR or lossy lowering as in Rust-Analyzer, instead the index is a flat overlay on the syntax tree (arenas of metadata laid out in preorder and keyed by TextRange).
-
In Rust-Analyzer, the semantic information is encoded in ID objects representing a state in the Salsa DB. This makes the types opaque and hinders debuggability. In ty, semantic information is encoded in full structs that contain an explicit reference to the Salsa DB lifetime 'db. That's only the case for the tracked structs, all internal data structures are plain data without 'db. The structs can be deparsed and logged at any point, which makes development / debugging easier.
The index can be built in multiple steps:
Part of #1117
Part of posit-dev/positron#2321
To support static analysis in the LSP, we plan to introduce a semantic index modelled after ty's implementation (https://github.com/astral-sh/ruff/tree/main/crates/ty_python_semantic/src/semantic_index). The index contains in particular:
definitions_by_node), each carrying aDefinitionKindthat classifies the syntactic construct (assignment, parameter, for-variable, etc.). This is the input to type inference: given a definition, inspect the syntax node to determine the type. As such, definitions bridge the semantic index and type checking.This data structure will be built in a similar fashion to how the current diagnostics for e.g. unused variables work (https://github.com/posit-dev/ark/blob/main/crates/ark/src/lsp/diagnostics.rs), with recursive descent over the syntax tree. Whereas our current diagnostics path mixes scope analysis and diagnostics emission, we plan to separate the concerns in the future by first building the semantic index, then using it to emit diagnostics.
The index will provide the foundation for new features (like rename and jump-to-def) and existing ones:
It will also be the input for our first type inference (is this binding a function, supporting diagnostics like "object of type closure is not subsettable").
Design-wise, the index is a nice middle ground regarding the thoughts in rust-lang/rust-analyzer#8713 and https://hackmd.io/@matklad/rJzQhvk2u.
There is no separate IR or lossy lowering as in Rust-Analyzer, instead the index is a flat overlay on the syntax tree (arenas of metadata laid out in preorder and keyed by
TextRange).In Rust-Analyzer, the semantic information is encoded in ID objects representing a state in the Salsa DB. This makes the types opaque and hinders debuggability. In ty, semantic information is encoded in full structs that contain an explicit reference to the Salsa DB lifetime
'db. That's only the case for the tracked structs, all internal data structures are plain data without'db. The structs can be deparsed and logged at any point, which makes development / debugging easier.The index can be built in multiple steps:
'dblifetimes for incremental computation on the index