Skip to content

feat: MERGE condition columns in column lineage (Gaps 3, 9, 10)#64

Merged
mingjerli merged 9 commits into
mainfrom
worktree-merge-condition-lineage
Apr 14, 2026
Merged

feat: MERGE condition columns in column lineage (Gaps 3, 9, 10)#64
mingjerli merged 9 commits into
mainfrom
worktree-merge-condition-lineage

Conversation

@mingjerli
Copy link
Copy Markdown
Owner

Summary

  • Gap 9: Literal-bound ON predicates (t.is_active = 'Y') now extracted as match_filter_columns and produce merge_match_filter lineage edges
  • Gap 3: WHEN MATCHED condition columns (e.g., AND (t.name <> s.name)) parsed into column refs and emitted as condition-gating edges upstream of assigned target columns
  • Gap 10: New merge_column_role field on ColumnEdge distinguishes None (match), "value" (SET RHS), and "condition" (WHEN guard / ON filter)

Changes

File Change
models.py Add merge_column_role field to ColumnEdge
query_parser.py Extract literal-bound ON predicates into match_filter_columns
column_extractor.py Parse WHEN condition SQL into condition_columns; emit merge_match_filter col_info
trace_strategies.py Refactor trace_merge_columns with _resolve_to_node helper; emit condition-gating edges
lineage_builder.py Add merge_match_filter to dispatch
pipeline_lineage_builder.py Propagate merge_column_role during pipeline edge reconstruction
export.py Include merge_column_role in JSON export

+489 lines, -7 lines across 8 files (19 new tests)

Test plan

  • 56 tests pass (19 new + 37 existing, zero regressions)
  • SCD2 end-to-end integration tests verify all three gaps together (Databricks dialect)
  • Impact analysis: staging.namedim_customer.end_time via condition dependency
  • Value vs condition role tagging verified (merge_match → None, merge_update → "value", condition → "condition")
  • JSON export includes merge_column_role
  • NOT MATCHED INSERT conditional actions also supported

Closes #63

🤖 Generated with Claude Code

Adds Optional[str] field to distinguish 'value' (RHS assignment)
from 'condition' (WHEN clause gating) edges. No behavior change yet.

Refs #63
Column-literal EQ pairs in ON clause (e.g., t.is_active = 'Y') are
now captured in match_filter_columns. Column-column pairs continue
to flow through match_columns unchanged.

Refs #63
Literal-bound ON predicates now produce merge_match_filter edges
with merge_column_role='condition'. The target-side column (e.g.,
dim_customer.is_active) appears in lineage as a self-referencing
condition dependency.

Refs #63
…ap 3+10)

Reuses extract_columns_from_expr to extract column references from
WHEN MATCHED AND conditions. trace_merge_columns emits condition edges
with merge_column_role='condition'. Impact analysis on condition
columns (e.g., staging.name -> dim_customer.end_time) now works.

Also tags value-assignment edges with merge_column_role='value' and
keeps merge_match edges with merge_column_role=None.

Refs #63
Applies the same condition_columns extraction to WHEN NOT MATCHED
INSERT actions, so conditional inserts like 'AND s.op = c' produce
condition-gating lineage edges.

Refs #63
End-to-end tests verify all three gaps (3, 9, 10) working together
on the canonical SCD2 MERGE pattern from the CDC pipeline design.

Refs #63
Adds merge_column_role to the MERGE metadata block in JSONExporter
so condition vs value edges are visible in exported data.

Also fixes pipeline_lineage_builder.py to propagate merge_column_role
when reconstructing ColumnEdge objects during pipeline assembly — without
this, the field was always None even though trace_strategies set it correctly.

Refs #63
Examples 5-8 demonstrate SCD2 condition dependencies, impact analysis,
ON clause literal filters, and JSON export with merge_column_role.

Refs #63
@mingjerli mingjerli merged commit d4e854c into main Apr 14, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MERGE condition columns missing from column lineage (Gaps 3, 9, 10)

1 participant