fix: consolidate mermaid ref resolution (fixes #283, #263)#299
Conversation
Introduces a single Scope primitive for template-ref → mermaid-ID resolution in src/pflow/core/workflow/mermaid/. The seven places that previously answered this question inconsistently now route through one function, making both fidelity bugs impossible to reintroduce: - #283: data-flow edges collapse to coarse subgraph-box edges when child inputs use the canonical `inputs:` dict (post-task-153). _generate_data_flow_edges now iterates params["inputs"] directly. - #263: output `source:` expressions that reference declared workflow inputs are silently dropped. _connect_sources_to_output now recognizes input-root refs and handles bare `${data}` (no field). Deletes dead concepts that existed only as compensations for the fragmentation: _RESERVED_PARAMS, _SOURCE_NODE_FIELD_RE, _collect_param_refs (orphan), _refs_input, _resolve_ref_source. Scope.source_refs_in uses a two-stage scan (find ${...} blocks, then extract refs inside) to avoid false positives on literal text. Goldens regenerated for deep-research (TD + LR) with per-input edges restored. document-processor unchanged (all its bindings reference top-level inputs, handled separately by _connect_top_level_inputs's nearest-consumer heuristic). Establishes the Scope seam that task 155 (GraphModel extraction for multi-renderer support) builds on.
… review] Addresses PR #299 review findings: - **Silent fidelity gap in data-flow bindings (primary)**: `Scope.refs_in` missed the second operand of coalesce expressions (``${a ?? b}``), which are a general-purpose template operator (not output-only — confirmed in `TemplateResolver:608-617`). Delegated `refs_in` to `source_refs_in` so the same two-stage scan handles both contexts. Added `test_coalesce_in_data_flow_binding` as mutation-tested regression guard. - Deleted `_DATA_FLOW_REF_PATTERN` (subsumed by `source_refs_in`) and `_PARAM_REF_RE` (last caller `_dynamic_batch_label` migrated to `Scope.refs_in`). Mermaid package now has a single regex concept for template-ref extraction. - Documented input/node name precedence in `_connect_sources_to_output` (inputs resolved before nodes). WorkflowValidator does not enforce name-uniqueness between the two — filed GH #300 to tighten the validator. - Split weak ``or`` assertion in `test_opaque_template_inputs_fall_through_gracefully` into a precise check for the input-node parallelogram rendering. - Added `"item"` short-circuit note to `Scope.resolve` docstring. All 79 mermaid tests pass; `make check` clean.
There was a problem hiding this comment.
Code Review
This pull request implements the Scope consolidation for the mermaid visualizer, a prerequisite for the planned graph model extraction. It introduces a centralized Scope class in _scope.py to unify template reference resolution, replacing scattered logic in _edges.py and _io.py. These changes fix fidelity bugs where data-flow edges were omitted for nested inputs: dictionaries (GH #283) and workflow-level input references in output sources (GH #263). Feedback focuses on improving the robustness of the new reference extraction logic to handle coalesce expressions and correcting a regex pattern.
| def refs_in(value: str) -> list[tuple[str, Optional[str]]]: | ||
| """Extract ``(root, field)`` pairs from template refs in a binding. | ||
|
|
||
| Use for data-flow bindings where each template expression is one ref | ||
| (e.g. ``"${producer.response}"``). Returns ``field=None`` for bare | ||
| refs like ``"${data}"``. | ||
| """ | ||
| return [(m.group(1), m.group(2)) for m in _DATA_FLOW_REF_PATTERN.finditer(value)] |
There was a problem hiding this comment.
The refs_in method currently misses references in coalesce expressions (e.g., ${a.x ?? b.y}) because _DATA_FLOW_REF_PATTERN only matches identifiers preceded by ${. To maintain consistency with source_refs_in and correctly identify all dependencies in a binding, this method should leverage the same two-stage scanning logic.
| def refs_in(value: str) -> list[tuple[str, Optional[str]]]: | |
| """Extract ``(root, field)`` pairs from template refs in a binding. | |
| Use for data-flow bindings where each template expression is one ref | |
| (e.g. ``"${producer.response}"``). Returns ``field=None`` for bare | |
| refs like ``"${data}"``. | |
| """ | |
| return [(m.group(1), m.group(2)) for m in _DATA_FLOW_REF_PATTERN.finditer(value)] | |
| @staticmethod | |
| def refs_in(value: str) -> list[tuple[str, Optional[str]]]: | |
| """Extract ``(root, field)`` pairs from template refs in a binding. | |
| Handles multiple templates and coalesce expressions (e.g. ``${a.x ?? b.y}``). | |
| Returns ``field=None`` for bare refs like ``"${data}"``. | |
| """ | |
| return Scope.source_refs_in(value) |
| # Matches ``${root}`` (no field group) or ``${root.field}`` (field group captured). | ||
| # Deeper refs like ``${a.b.c}`` capture only the first field segment (a, b) — matches | ||
| # pre-consolidation semantics, which only used the root + first field. | ||
| _DATA_FLOW_REF_PATTERN = re.compile(r"\$\{([a-zA-Z0-9_-]+)(?:\.([a-zA-Z0-9_-]+))?") |
Review — PR #299 (fixes #283 + #263)Solid refactor. The No blockers. A few minor issues worth addressing before merge. Warnings — should be addressed1. Stale comment on
|
3c0405a to
42fd114
Compare
Summary
Consolidates reference resolution in the mermaid visualizer package (
src/pflow/core/workflow/mermaid/) behind a singleScopeprimitive. Both open fidelity bugs (#283, #263) are fixed as structural byproducts of the consolidation.Task
Follow-up task filed:
.taskmaster/tasks/task_155/task-155.md(GraphModel extraction, pre-step to planned web UI — builds on theScopeseam established here).Changes
New primitive —
_scope.pyScopedataclass holding a liveMermaidContextreference (not a snapshot — siblingoutgoing_routesgrow incrementally during rendering).Scope.resolve(root, field)— three resolution cases: batch item / sibling node (with optional output-field routing) / declared input.Scope.refs_in(value)— extract(root, field)pairs from data-flow bindings.Scope.source_refs_in(source)— two-stage scan (find${...}blocks, then extract refs inside) to handle coalesce (${a.x ?? b.y}) and bare input refs (${data}) without false-positive matches on literal text.Bug fixes
_generate_data_flow_edges+_generate_batch_item_data_flownow iterateparams["inputs"](canonical post-task-153) instead of top-levelparams.items(). Opaqueinputs: ${item.inputs}templates skipped gracefully._connect_sources_to_outputtakesinput_ids: dict[str, str]from the caller's scope; input-root refs (including bare${data}) now emit edges from the input parallelogram.Dead code deleted
_RESERVED_PARAMS(blocklist for the pre-task-153 dual-input-passing world)_SOURCE_NODE_FIELD_RE(alternate regex, superseded byScope.source_refs_in)_collect_param_refs(orphan — zero callers)_refs_input(superseded byScope.refs_in)_resolve_ref_source(inlined into sole call site)Goldens regenerated
tests/test_core/golden_mermaid/deep-research-TD.mmdtests/test_core/golden_mermaid/deep-research-LR.mmdRestored per-input edges (e.g.
prepare --> analyze-sources__in_content,combine --> reviews__accuracy__in_summary) that were lost during the task-153 migration.document-processorgolden unchanged (its bindings all reference top-level inputs, handled by_connect_top_level_inputs's nearest-consumer heuristic).KNOWN REGRESSION (GH #283)docstring attest_mermaid_golden.py:59-66removed.Tests
inputs:dict shape.test_opaque_template_inputs_fall_through_gracefully—inputs: ${item.inputs}doesn't crashtest_batch_data_flow_with_inputs_dict— Mermaid visualizer: data-flow edges lose fidelity when child inputs use inputs: dict #283 batch varianttest_output_source_from_declared_input— Mermaid output wiring ignores input-root output sources #263 with fieldtest_output_source_from_bare_input_ref— Mermaid output wiring ignores input-root output sources #263 bare ref (no dot)test_output_source_from_input_in_subworkflow— Mermaid output wiring ignores input-root output sources #263 at sub-workflow scope ({prefix}in_{name}convention)test_data_flow_edges_from_paramsto guard the depth-0 double-emit skip (caught by mutation testing — the skip was previously load-bearing but untested).Explanation
Both bugs share one cause: the visualizer answered the question "given a
${x.y}, what mermaid ID does it resolve to?" in seven different places with seven different levels of completeness. Fixing each function independently would preserve that fragmentation and keep the door open for the next similar bug.The refactor collapses all seven call sites to route through
Scope.resolve. Any ref-consuming site now knows about nodes AND declared inputs AND batch items — no "silently drops inputs" path remains.Depth-0 double-emit concern:
_connect_top_level_inputsemitsinput_X → consumer__in_Yat the top level using a layout-preserving nearest-consumer heuristic (from task 146 — long-range edges destroy dagre layout).Scope.resolveis pure (always resolves), so the two data-flow generators add a two-line caller-side skip for depth-0 top-level input refs to avoid duplicate emission.Created docs
.taskmaster/tasks/task_155/task-155.md— follow-up task for GraphModel extraction (pre-step to planned React Flow or equivalent web UI). Builds on theScopeprimitive.Testing
tests/test_core/test_mermaid.py,test_mermaid_golden.py,tests/test_cli/test_visualize.py).make test).make checkclean (ruff, ruff-format, mypy, deptry).${}.Manual visual verification in mermaid.live is still recommended for the regenerated
deep-research-TD.mmd/deep-research-LR.mmd— string-level diffs miss visual breakage permermaid/CLAUDE.md.