Implement the semantic index

Part of https://github.com/posit-dev/ark/issues/1117
Part of https://github.com/posit-dev/positron/issues/2321

To support static analysis in the LSP, we plan to introduce a semantic index modelled after ty's implementation (https://github.com/astral-sh/ruff/tree/main/crates/ty_python_semantic/src/semantic_index). The index contains in particular:

- A tree of scopes with a symbol table for every scope. Symbol flags indicate for each scope whether a symbol is used or bound.
- Definitions indexed by syntax node (`definitions_by_node`), each carrying a `DefinitionKind` that classifies the syntactic construct (assignment, parameter, for-variable, etc.). This is the input to type inference: given a definition, inspect the syntax node to determine the type. As such, definitions bridge the semantic index and type checking.
- Use-def maps which track which definitions reach each use through control flow.

This data structure will be built in a similar fashion to how the current diagnostics for e.g. unused variables work (https://github.com/posit-dev/ark/blob/main/crates/ark/src/lsp/diagnostics.rs), with recursive descent over the syntax tree. Whereas our current diagnostics path mixes scope analysis and diagnostics emission, we plan to separate the concerns in the future by first building the semantic index, then using it to emit diagnostics.

The index will provide the foundation for new features (like rename and jump-to-def) and existing ones:

- Rename: scope-aware, within a file https://github.com/posit-dev/positron/issues/4525
- Go-to-definition: jump to binding site https://github.com/posit-dev/positron/issues/8631
- Completions: symbols in scope at cursor
- Outline / document symbols: top-level bindings
- Diagnostics: "symbol not in scope" with proper scoping

It will also be the input for our first type inference (is this binding a function, supporting diagnostics like "object of type closure is not subsettable").

Design-wise, the index is a nice middle ground regarding the thoughts in https://github.com/rust-lang/rust-analyzer/issues/8713 and https://hackmd.io/@matklad/rJzQhvk2u.

- There is no separate IR or lossy lowering as in Rust-Analyzer, instead the index is a flat overlay on the syntax tree (arenas of metadata laid out in preorder and keyed by `TextRange`).

- In Rust-Analyzer, the semantic information is encoded in ID objects representing a state in the Salsa DB. This makes the types opaque and hinders debuggability. In ty, semantic information is encoded in full structs that contain an explicit reference to the Salsa DB lifetime `'db`. That's only the case for the tracked structs, all internal data structures are plain data without `'db`. The structs can be deparsed and logged at any point, which makes development / debugging easier.

The index can be built in multiple steps:

- [x] First the scopes, symbol tables, and definitions
- [ ] Use-def maps for precise flow analysis can come later
- [ ] Tie in Salsa and the `'db` lifetimes for incremental computation on the index


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the semantic index #1141

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Implement the semantic index #1141

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions