Skip to content

Add name_spans and body_span to Node for graph↔text navigation#30

Merged
melonamin merged 5 commits intomasterfrom
feat/issue-20-node-name-spans
Apr 13, 2026
Merged

Add name_spans and body_span to Node for graph↔text navigation#30
melonamin merged 5 commits intomasterfrom
feat/issue-20-node-name-spans

Conversation

@melonamin
Copy link
Copy Markdown
Member

Closes #20. First step from the #17 breakdown.

What

Extends Node with two source-location fields that future navigation features need:

  • name_spans: Span[] — every occurrence of the node's name in the statement source (declaration + references). Enables the UI to cycle through references with ◀ n/total ▶ controls.
  • body_span?: Span — for CTE nodes, the parenthesized subquery body after AS. Enables the UI to highlight the definition body separately from the name.

Mirrored in Rust (crates/flowscope-core/src/types/response.rs) and TypeScript (packages/core/src/types.ts, docs/api-types.md, docs/api_schema.json).

Scope

Populated for table, view, and CTE nodes.

Column nodes deliberately get empty name_spans in this PR. Accurate per-occurrence column spans require the alias/scope resolver tracked in #27 — a naive text match would pick up all occurrences of id regardless of which relation's column is being referenced. Empty name_spans on columns is forward-compatible: consumers that need a source location for a column node fall back to the existing single span.

Column coverage will ship in a follow-up PR once #27 lands.

How

The text-search helpers in analyzer/helpers/span.rs:

  • skip SQL string literals (with '' escape handling) and line/block comments, so -- users or 'users' do not produce false positives;
  • use a word-boundary matcher so users_archive does not match users;
  • for CTE nodes, filter out occurrences that fall inside the node's own body_span (a WHERE active column reference inside WITH active AS (...) is not a reference to the CTE).

Population happens in a single pass at the end of Analyzer::analyze_statement, so every table-like node gets its spans computed exactly once against the statement's source text.

Validation

  • cargo test --workspace: 2705 passed, 0 failed
  • cargo clippy --workspace -- -D warnings: clean
  • cargo fmt --all -- --check: clean
  • yarn workspaces run typecheck + test: clean
  • just check-schema: clean (Rust ↔ TS schema compat verified)
  • 11 new unit tests in analyzer::helpers::span
  • 5 new integration tests in tests/lineage_engine.rs covering single refs, multiple refs, CTE body, comment/literal skipping, and the empty-column-spans contract
  • 42 insta snapshots accepted; diffs are purely additive (new field with correct byte offsets)

Test plan

  • Unit tests for span helpers (word boundaries, literals, comments, nested parens, AS whitespace)
  • Integration tests for table/CTE nodes
  • Golden / BigQuery / Snowflake / Postgres snapshot updates accepted
  • Schema compatibility test (packages/core/tests/schema-compat.test.ts) passes

Follow-ups

Extends Node with:
- name_spans: every occurrence of a node's name in the statement source,
  enabling the UI to cycle through references during graph<->text navigation.
- body_span: the parenthesized subquery body of a CTE, for separate
  highlighting of the definition.

Populated for table, view, and CTE nodes. Columns intentionally retain
empty name_spans in this release — accurate per-occurrence column spans
require alias/scope resolution, tracked separately as a follow-up to
the semantic resolver epic.

Text-search strategy skips SQL string literals and comments, and for CTEs
excludes matches that fall inside the CTE's own body so internal column
references don't inflate the occurrence count.

Refs #20, #17.
Refactor span.rs helpers (skip_string_or_comment, find_matching_paren,
first_skip_between, find_cte_body_span) to operate on &[u8] so that
non-ASCII content in SQL identifiers, comments, or string literals no
longer triggers panics on byte-offset string slicing. Emit feature-gated
tracing warnings when the scan range violates UTF-8 char boundaries
instead of silently returning empty results. Add regression tests
covering multi-byte characters in block/line comments and string
literals.

Also populate name_spans incrementally through add_name_span on the
statement context and introduce Default for Node so that analyzer call
sites can use struct-update syntax instead of enumerating every field.
Replace the two-pass locate_relation_name_span logic with a single
find_relation_occurrence_spans scan that returns the full and tail spans
together, so quoted identifiers with embedded dots (e.g.
"my.schema"."my.table") resolve to the correct name span. Teach
find_cte_body_span to skip optional column lists and [NOT] MATERIALIZED
modifiers, and make find_identifier_span skip string literals and
comments. Add node_index to StatementContext for O(1) add_name_span
lookups, a Node::all_name_spans helper, and document the
left-to-right traversal contract on locate_statement_span.
Fix relation and CTE span detection for hash comments and Postgres dollar-quoted strings.

Also tighten StatementContext node/index invariants, switch relation occurrence tracking to per-name cursors, and update snapshots/tests for the new span metadata.
Capture formatting changes from the workspace checks and refresh generated schema and WASM binding artifacts.
@melonamin melonamin merged commit 4356030 into master Apr 13, 2026
@melonamin melonamin deleted the feat/issue-20-node-name-spans branch April 13, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Model: add nameSpans and bodySpan to Node for bidirectional navigation

1 participant