Flatten AnalyzeResult to top-level nodes/edges (#19)#33
Merged
Conversation
Replaces StatementLineage.nodes/edges + GlobalLineage with a single flat
graph at AnalyzeResult.{nodes,edges}, plus a StatementMeta vec for
per-statement metadata. Fold canonical_name + statement_ids onto Node;
edges carry statement_ids (cross-statement edges hold [producer, consumer]).
State at this commit:
- Rust production code compiles workspace-wide
- flowscope-core: model + analyzer/global.rs + cross_statement migrated
- flowscope-export: schema redesigned (drops global_*, adds
node_statements / edge_statements / node_name_spans junctions),
duckdb + sql backends + extract + mermaid + join_export migrated
- flowscope-cli: output/table.rs migrated
- Tests still broken (~200 errors). Mechanical patterns:
- result.global_lineage.{nodes,edges} -> result.{nodes,edges} (sed-able)
- stmt.{nodes,edges} -> result.{nodes_in_statement,edges_in_statement}(stmt.statement_index)
- node.statement_refs -> node.statement_ids
- edge.{producer,consumer}_statement -> edge.statement_ids[0/1]
- node.canonical_name.X -> node.canonical_name.as_ref().map_or(.., |c| &c.X)
- Insta snapshots will regenerate after tests pass
- TS bindings (packages/core/src/types.ts), packages/react, vscode/, app/,
docs all still on the old shape
Helper added: AnalyzeResult::nodes_in_statement / edges_in_statement
let consumers project the flat graph back down to a per-statement view.
Migrate packages/core types to the flat lineage model: statements carry metadata only, with nodes/edges hoisted to the top level and each item tracking the statementIds it participates in. Drop GlobalLineage, GlobalNode, GlobalEdge, StatementRef, and StatementLineage; add StatementMeta plus nodesInStatement/edgesInStatement helpers.
Update the standalone vscode types mirror, hover/codelens providers, and the lineage panel to filter the flat nodes/edges by statementIds instead of reading from per-statement subgraphs or globalLineage.
Redefine StatementLineage locally in packages/react as a per-statement view hydrated from the flat result (statement metadata + the subset of result.nodes/edges that list the statement in statementIds). Add a hydrateStatements helper and use it at the GraphView / MatrixView entry points so the graph-building utilities and workers can keep operating on the legacy per-statement shape. Drop the globalLineage/globalNode indirection used for canonical-name lookup; canonicalName is now carried on the Node itself. Migrate useGraphSearch, useSearchSuggestions, TableFilterDropdown, ColumnPanel, and nodeOccurrences.findMergedNodeById to read the flat graph directly.
Update schema-parser, useIssueLocations, useDebugData, AnalysisView, and HierarchyView to read from the top-level result.nodes/edges and the per-node statementIds / canonicalName fields instead of the removed globalLineage/statementRefs indirection.
The flatten step in analyzer/global.rs was collapsing self-join instances of the same table back into a single node, contradicting the doc comment that said "Self-join instances remain distinct". Two fixes: 1. global_node_id for Table/View now keeps the local node ID when it differs from relation_identity(canonical) — i.e. when the analyzer minted an instance-specific ID via relation_instance_identity (canonical+alias+scope hash). The canonical-only ID is still used for simple references that share canonical identity across statements. 2. statement_scoped_relation_ids in flatten_lineages now also includes self-join instance table nodes, so columns owned by those instances stay statement-scoped (otherwise e1.name and e2.name in `users e1 JOIN users e2` would reconnect through a shared global column node). Updates the four self_join_global_lineage_* tests that had previously asserted the old "merge by canonical" semantics; those names were renamed and assertions inverted to expect distinct per-instance nodes, which is the new (correct) behavior. CTE self-joins remain collapsed at the analyzer level (the analyzer does not synthesize per-instance CTE column nodes the way it does for base tables) — documented in global_lineage_merges_qualified_columns_across_self_joins_and_cte_instances. State after this commit: - cargo test --workspace: 280/281 lineage_engine pass; 36 insta snapshots need regeneration via `cargo insta accept` (10 in tests/golden.rs, 26 in tests/snapshots.rs). All other tests green. - TS migration committed in agent commits (a7b9960, b819416, fab863d, 42e90ce). just typecheck + check-schema clean.
- docs/api-types.md: replace StatementLineage/GlobalLineage section with flat AnalyzeResult shape, document canonicalName + statementIds on Node and Edge, document cross_statement [producer, consumer] ordering. - docs/core-engine-spec.md: rewrite Lineage Graph Output to describe the single flat graph and the self-join instance preservation rule. - analyzer/global.rs: clippy::manual_contains — replace iter().any() on Vec<Span> with .contains(). Also picks up Prettier formatting on the regenerated TS bindings and api_schema.json from `just fmt-ts`.
Delete the per-statement `StatementLineage` compatibility layer and the `hydrateStatements` helper introduced alongside the flat `AnalyzeResult` migration. All consumers now read `result.nodes` / `result.edges` / `result.statements` directly, using `nodesInStatement` / `edgesInStatement` from `@pondpilot/flowscope-core` for per-statement views. - Introduce `MergedLineage` + `mergeAnalyzeResult` in graphBuilders.ts and the graph builder worker to replace the legacy merged statement shape; the builders now accept a merged view or the full AnalyzeResult instead of a per-statement lineage. - matrixUtils / matrix worker / matrix worker service take AnalyzeResult. - graphBuilder worker + service take AnalyzeResult; lineageHelpers `getCreatedRelationNodeIds` takes (statementType, nodes, edges). - Delete unused `mergeStatementNodesForNavigation`, `StatementLineageWithSource`, `mergeStatements` and the legacy `normalizeStatement` / `withSourceName` helpers in the worker. - Update MatrixView + GraphView to pass the flat result straight to the worker services. - Update `graphBuilders.test.ts` and `matrixView.test.ts` to build AnalyzeResult fixtures via a local `toResult` helper that preserves per-statement node instances (matching `buildMultiStatementResult` in occurrenceCycling tests).
- Split Analyzer::flatten_lineages into focused helpers (collect_statement_scoped_ids, merge_lineage_nodes, merge_lineage_edges, append_cross_statement_edges, finalize_nodes, finalize_edges). - Expose STATEMENT_FILTERS_METADATA_KEY and add Node::filters_for_statement so per-statement filter lookups are centralized on the type. - Harden per-statement filter serialization with .expect to surface serialization bugs instead of silently dropping filters. - Standardize statement-scoped iteration on nodesInStatement / edgesInStatement in VSCode providers and ColumnPanel; add the helpers to the VSCode types module. - Document SQL export identifier-quoting caveats on the export crate. - Add regression test ensuring the SQL joins export dedups column-level edges per statement while preserving one row per statement.
Restore per-statement aggregation and filter semantics after flattening AnalyzeResult. Export backends now write filters and aggregations per node+statement, and UI statement scoping reads the preserved aggregation metadata. Also exclude virtual output nodes from React search suggestions and add regressions for the flattened graph/export behavior.
- Drop redundant statement_ids reassignment in merge_lineage_edges; the struct literal already overrides the spread value. - Invalidate CodeLens and Hover caches on flowscope config changes so dialect switches don't serve stale analysis.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #19.
Summary
StatementLineage.nodes/edges+ the parallelGlobalLineagewith a single flat graph atAnalyzeResult.{nodes, edges}plusstatements: StatementMeta[]for per-statement metadata only.canonicalNameandstatementIdsontoNode;statementIdsontoEdge(cross_statementedges carry[producer, consumer]).canonical+alias+scope) are not collapsed into one canonical node, sousers a JOIN users brenders as two nodes. Cross-statement merging by canonical identity still applies.GlobalLineage,GlobalNode,GlobalEdge,StatementRef(Rust + TS).StatementLineagebecomes a crate-private analyzer intermediate.AnalyzeResult::nodes_in_statement(idx)/edges_in_statement(idx)for projecting the flat graph back into a per-statement view.Migration scope
analyzer/global.rsflatten step +cross_statement.rsrewritten; ~280 test assertions migrated.global_*tables, addsnode_statements/edge_statements/node_name_spansjunctions; SQL + DuckDB + extract + mermaid + join_export backends migrated.output/table.rsmigrated.docs/api_schema.jsonregenerated viajust update-schema.hydrateStatements()rebuilds the per-statement view at the entry points to keep the graph builder diff small (follow-up: PR to read the flat shape directly will land separately).Test plan
cargo test --workspace— 2500+ tests pass, 0 failjust check— fmt + clippy + typecheck + Rust tests + schema-compat all greenyarn workspaces run test --silent— 131 react + 41 core + 3 schema-compat all passjust check-schema— Rust JSON Schema and TS shape agree