Skip to content

Data lineage canvas for the web IDE #1

@JonathanTurnock

Description

@JonathanTurnock

Summary

Add an interactive data lineage canvas to the web IDE. When a user opens the canvas for a data type (e.g. AccountData), it shows:

  1. Usage locations — every callable/node that returns it, takes it as a param, holds it as a field, composes it (from), or feeds it as a from source, with source positions.
  2. Lineage tree — what flows into the type (transitively) and what it flows into.

Motivating case

web-ide/src/lib/sample-workspace/api.pds:

public component MainframeFacade for internet_banking::ApiApplication {
  public GetAccountData(): Result<AccountData, string> {
    raw = mainframe::MainframeBanking.Fetch()   // -> AccountRecord
    account = AccountData from { raw }
    return Ok(account)
  }
}

Desired lineage edge: mainframe::AccountRecord → AccountData (via the raw binding), inside api::MainframeFacade.GetAccountData.

The core gap

Provenance edges today are only emitted between resolved node/data FQNs (crates/pseudoscript-model/src/graph.rs:627). The from source raw is a bare binding, so expr_node_fqn (graph.rs:812) returns None and the edge is dropped — the chain MainframeBanking.Fetch() → raw → AccountData is lost. The flattened Step bodies (graph.rs:208) don't preserve binding names either. Reconstructing the chain needs binding-level analysis — the substantive new work.

Decisions

  • Surface: web IDE.
  • Lineage depth: binding-level tree.
  • Spec rigor: implementation-first — minimal LANG.md §9 note; ADR + CONFORMANCE/generation goldens deferred.

Approach (five layers, bottom-up)

Reuse the C4Flow / SvelteFlow + dagre rendering stack and the existing references engine for usage sites. The only genuinely new analysis is binding-level flow resolution.

  1. Model — new crates/pseudoscript-model/src/lineage.rs: Lineage::for_symbol(workspace, data_fqn). Walks callable bodies tracking x = ... bindings; on ExprKind::From { ty, sources } resolves each source (bare binding → recurse; call → method return_ty; field access → binding/param type) to its originating data type, emitting FlowEdges. Usage sites reuse resolve::resolve_at (the references engine, mirrored at crates/pseudoscript-wasm/src/lib.rs:445), classified by role (ReturnType / Param / Field / FromTarget / FromSource).
  2. EmitView::Lineage { of } + Scene::Lineage(LineageScene) in crates/pseudoscript-emit/{project,scene}.rs. Reuse the C4 node/edge shape so the canvas reuses dagre (rankdir=LR). Build lineage in Graph::build, expose graph.lineage(fqn), keep emit a pure graph→scene projector.
  3. Wasmsymbol_lineage(modules_json, fqn) export mirroring symbol_scene (crates/pseudoscript-wasm/src/lib.rs:243).
  4. Web IDEsymbolLineage wrapper in web-ide/src/lib/pds.js; new DataLineageCanvas.svelte (clone of C4Flow.svelte); route in DiagramPane.svelte; "Show data lineage" action on data nodes in C4Flow.svelte; state + handler in src/routes/+page.svelte. Rebuild via npm run build:wasm.
  5. Spec — promote LANG.md §9 line 406 into a short §9.x Data lineage view subsection (via the spec-style skill). Defer ADR + conformance goldens.

Verification

  • Model/emit/wasm unit tests: assert AccountRecord → AccountData flow + transitive chains + param/field/cycle cases.
  • cargo build && cargo test.
  • E2E: cd web-ide && npm run build:wasm && npm run dev → load sample → select AccountDataShow data lineage; confirm flow + usages list (MainframeFacade.GetAccountData, ViewAccounts, Summary); clicking a usage navigates.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions