Skip to content

Implement the semantic index #1141

@lionel-

Description

@lionel-

Part of #1117
Part of posit-dev/positron#2321

To support static analysis in the LSP, we plan to introduce a semantic index modelled after ty's implementation (https://github.com/astral-sh/ruff/tree/main/crates/ty_python_semantic/src/semantic_index). The index contains in particular:

  • A tree of scopes with a symbol table for every scope. Symbol flags indicate for each scope whether a symbol is used or bound.
  • Definitions indexed by syntax node (definitions_by_node), each carrying a DefinitionKind that classifies the syntactic construct (assignment, parameter, for-variable, etc.). This is the input to type inference: given a definition, inspect the syntax node to determine the type. As such, definitions bridge the semantic index and type checking.
  • Use-def maps which track which definitions reach each use through control flow.

This data structure will be built in a similar fashion to how the current diagnostics for e.g. unused variables work (https://github.com/posit-dev/ark/blob/main/crates/ark/src/lsp/diagnostics.rs), with recursive descent over the syntax tree. Whereas our current diagnostics path mixes scope analysis and diagnostics emission, we plan to separate the concerns in the future by first building the semantic index, then using it to emit diagnostics.

The index will provide the foundation for new features (like rename and jump-to-def) and existing ones:

It will also be the input for our first type inference (is this binding a function, supporting diagnostics like "object of type closure is not subsettable").

Design-wise, the index is a nice middle ground regarding the thoughts in rust-lang/rust-analyzer#8713 and https://hackmd.io/@matklad/rJzQhvk2u.

  • There is no separate IR or lossy lowering as in Rust-Analyzer, instead the index is a flat overlay on the syntax tree (arenas of metadata laid out in preorder and keyed by TextRange).

  • In Rust-Analyzer, the semantic information is encoded in ID objects representing a state in the Salsa DB. This makes the types opaque and hinders debuggability. In ty, semantic information is encoded in full structs that contain an explicit reference to the Salsa DB lifetime 'db. That's only the case for the tracked structs, all internal data structures are plain data without 'db. The structs can be deparsed and logged at any point, which makes development / debugging easier.

The index can be built in multiple steps:

  • First the scopes, symbol tables, and definitions
  • Use-def maps for precise flow analysis can come later
  • Tie in Salsa and the 'db lifetimes for incremental computation on the index

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions