Skip to content

feat: Foundational variable scope resolution redesign#115

Merged
genezhang merged 26 commits intomainfrom
fix/variable-scope-resolution
Feb 19, 2026
Merged

feat: Foundational variable scope resolution redesign#115
genezhang merged 26 commits intomainfrom
fix/variable-scope-resolution

Conversation

@genezhang
Copy link
Owner

Summary

Replaces the reverse_mapping hack with a proper VariableScope struct that correctly resolves variables across Cypher WITH barriers. This is a foundational fix for correct SQL generation from multi-WITH queries.

Problem

Cypher's WITH creates scope barriers — only exported variables survive. The SQL rendering pipeline had no scope context: it resolved all variables against the raw schema, then applied post-hoc reverse_mapping rewrites (~88 usages) to partially fix the results. This caused:

  • Leaked JOINs from previous MATCH scopes into post-WITH queries
  • Wrong column references (DB columns used where CTE columns needed)
  • Broken ORDER BY / GROUP BY / HAVING for CTE-scoped variables
  • Invalid a.* expansion when returning node variables after WITH

Solution

VariableScope — a single, forward-only resolution source built iteratively with each WITH iteration:

VariableScope.resolve(alias, property) → CteColumn | DbColumn | Unresolved

Threaded into every resolution site: SELECT, WHERE, ORDER BY, GROUP BY, HAVING, JOIN conditions. Node CTE variables are expanded into individual columns; scalar CTE variables resolve directly. Alias renaming through WITH (WITH u AS person) is handled via rename mapping.

Changes (22 commits)

Added

  • src/render_plan/variable_scope.rs: VariableScope, CteVariableInfo, rewrite_render_plan_with_scope()

Fixed

  • UNION CTE SELECT * → push projected SELECT into branches
  • Aggregate UNION rendering (inner projects raw cols, outer aggregates)
  • Deterministic join ordering (HashMap+Vec)
  • VLP+WITH JOIN type mismatch (toString() removed for UInt64 IDs)
  • Bare CTE node variable in SELECT (RETURN a after WITH → individual columns)
  • Alias renaming through WITH (WITH u AS person → correct property lookup)

Removed

  • intermediate_reverse_mapping block
  • Final reverse_mapping block (SELECT/WHERE/JOIN/ORDER/GROUP rewriting)
  • 6 helper functions for reverse-mapping rewrites
  • Net: -1,362 lines

Test Results

  • ✅ 1,032/1,032 unit tests
  • ✅ Integration tests at parity with main (13 pre-existing failures on both branches — same tests)
  • ✅ LDBC mini: 14/37 (38%) — up from 10/37 (27%) baseline (+4 queries)
  • ✅ Zero new regressions

Remaining Failures (all pre-existing on main)

The 13 integration test failures and 23 LDBC failures are feature gaps unrelated to scope:

  • collect/UNWIND language features (7 LDBC queries)
  • VLP+WITH integration edge cases (3 LDBC queries)
  • Type casting (toYear on Int64 epoch millis)
  • Correlated subqueries (ClickHouse limitation)

genezhang and others added 24 commits February 18, 2026 09:34
- Create VariableScope struct with resolve(), advance_with(), get_labels()
- Add scope field to ExpressionRewriteContext (None = backward compatible)
- Add scope parameter to to_render_plan_with_ctx and build_chained_with_match_cte_plan
- Add scope-aware Filter, OrderBy, GroupBy handlers in to_render_plan_with_ctx
- Make find_label_for_alias_in_plan and map_property_to_db_column pub(crate)
- All 1102 tests pass, zero behavior change (all scope = None)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Build VariableScope iteratively in build_chained_with_match_cte_plan
- scope_cte_variables accumulator tracks CTE property mappings per alias
- CTE body rendering receives scope for proper variable resolution
- Final rendering receives scope for post-WITH variable resolution
- CteColumn resolution uses cte_name as table_alias (not original alias)
- Clear stale joins after WITH replacement (alias + condition based)
- Increase thread stack to 512MB for deep VLP recursion

LDBC results: 11/37 (30%), up from 10/37 (27%) baseline
Newly passing: short-6, complex-11

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Enhanced clear_stale_joins_for_cte_aliases to also remove joins whose
conditions reference aliases no longer present in the plan tree (dead
references from consumed pre-WITH variables). Added collect_live_table_aliases
helper to find only physical GraphNode/GraphRel aliases.

Fixes complex-4 join leak in final SELECT (edge table joins t327, t328
no longer leak). CTE 3 now correctly references CTE 2 aliases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… BY panic

Phase 3 step 1: Remove the ~180-line intermediate_reverse_mapping block
from build_chained_with_match_cte_plan. This code built reverse mapping
from CTE columns and rewrote CTE body expressions — now handled by
scope-based resolution via VariableScope passed to to_render_plan_with_ctx.

Also remove now-dead functions:
- rewrite_render_plan_expressions (~108 lines) - only caller was removed
- rewrite_operator_application_with_cte_alias (~15 lines) - only caller removed

Fix index-out-of-bounds panic in UNION ORDER BY rendering when
order_by_columns count doesn't match plan.order_by count (pre-existing
bug triggered by non-deterministic HashMap join ordering).

Note: Final reverse_mapping block (post-loop) is still needed — scope-based
resolution doesn't yet cover OPTIONAL MATCH UNION ORDER BY rewriting or
final CTE column reference rewriting. This will be addressed when scope
coverage is extended.

LDBC: 10-11/37 (same as main baseline ±1 due to non-deterministic join
ordering, complex-11 is Phase 2 win).
Total removed: ~310 lines of code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Major cleanup: eliminate the post-loop reverse_mapping rewriting that was the core
hack for variable resolution. Scope-based rewriting now handles all cases.

Changes:
- Remove FROM-based reverse_mapping block (~800 lines) from build_chained_with_match_cte_plan
- Remove JOIN-based reverse_mapping block from build_chained_with_match_cte_plan
- Remove 5 dead helper functions only used by reverse_mapping:
  - rewrite_cte_expression_with_alias_resolution (~330 lines)
  - rewrite_cte_expression (~25 lines)
  - rewrite_cte_expression_with_context (~135 lines)
  - rewrite_operator_application (~16 lines)
  - rewrite_expression_simple (~100 lines)
  - rewrite_expression_with_cte_alias (~147 lines)
- Fix bare ViewScan cross-alias contamination bug in get_property_from_viewscan()
  (was returning CTE column names for wrong aliases through short-circuit evaluation)
- Add CTE name resolution to VariableScope.resolve() (handles JOIN conditions
  that reference CTE table names directly)
- Replace Union branch SELECT rewriting with scope-based rewrite_render_expr()
- Make rewrite_render_expr() public for use from plan_builder_utils

Results:
- Unit tests: 1031/1031 passing (0 regressions)
- LDBC mini: 11/37 (30%) — improved from 10/37 baseline
- Both short-7 AND complex-11 pass (previously mutually exclusive due to
  reverse_mapping cross-alias contamination)
- plan_builder_utils.rs: 13141 → 11569 lines (-1572)
- Net change: -1362 lines across 4 files

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Check plan tree FIRST via detect_vlp_endpoint_from_plan() before using
PlanCtx VLP endpoint info. PlanCtx registers VLP endpoints globally during
plan building, so when building a WITH CTE body that precedes the VLP MATCH,
the alias incorrectly gets VLP columns (start_*, end_*) from a later scope.

Now: plan tree must confirm VLP presence before PlanCtx info is used.
This prevents cross-scope VLP column contamination.

LDBC mini: 11/37 (+1 bi-6 from baseline)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two fixes for scope-based CTE variable resolution:

1. Pass body_scope_ref to ExpressionRewriteContext when rewriting WITH
   projection expressions (CASE, arithmetic, etc.). Previously these
   expressions used ExpressionRewriteContext::new() without scope,
   causing CTE-scoped variables (e.g., post.creationDate after a prior
   WITH) to not resolve to CTE columns.

2. Use FROM alias (e.g., 'post_tag') instead of CTE name
   (e.g., 'with_post_tag_cte_1') as table qualifier in CTE column
   references. SQL requires using the FROM alias when one is defined.
   Added extract_from_alias_from_cte_name() helper to variable_scope.rs.

Unit tests: 1032/1032 passing (+1 new test)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix non-deterministic join ordering caused by HashMap random seed:
- When multiple joins are ready in topological sort, pick by smallest
  alias name (deterministic tie-breaking)
- When circular dependency detected, pick join with fewest unsatisfied
  deps (tie-break by alias name) instead of breaking out of loop
- Continue sorting after cycle-break instead of appending remainder

This eliminates ±1-2 LDBC test variance between binary compilations.
Both bi-6 and short-7 now pass consistently.

LDBC mini: 11/37 (30%), up from 10/37 baseline

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… CTEs

- When compute_cte_id_column_for_alias returns None (alias from prior CTE),
  fall back to inheriting the ID column from the previous CTE's alias_to_id
- Fix expand_table_alias_to_group_by_id_only to use FROM alias (not raw alias)
  when generating GROUP BY for CTE-referenced variables

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The analyzer assigns CTE names with its counter (e.g., _cte_5) but the
renderer creates CTEs with sequential numbering (_cte_1). Add comprehensive
post-render fixup that scans all table aliases in the final RenderPlan and
remaps any stale CTE references to match actual CTE names by base name.

Also add collect_with_cte_table_aliases() to scan all expression types
(PropertyAccess, Operator, Aggregate, Scalar, Case) for CTE references.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two fixes:
1. plan_builder_utils.rs: Extract HAVING from individual rendered plans and
   propagate to UNION wrapper (was set to None, dropping WHERE→HAVING conversion)
2. to_sql_query.rs: Add HAVING rendering after GROUP BY in UNION subquery path
   (only the standard single-query path had it)

This enables WITH ... WHERE ... patterns with aggregation to correctly
generate HAVING clauses in the CTE body SQL.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When bidirectional relationships generate UNION ALL CTEs, each branch now
gets the CTE's SELECT projection pushed down directly, instead of using
SELECT * with an outer wrapper. This fixes unresolvable table-qualified
column references (e.g., tag.id) that were invisible outside the UNION
subquery scope.

Also adds join condition rewriting for CTE-scoped aliases, converting
base-table column names (e.g., friend.id) to CTE column names
(e.g., friend.p6_friend_id) when joins reference aliases from prior
WITH barriers.

LDBC mini: 12/37 (32%), up from 10/37 (27%).
New passes: complex-4, short-7.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…er branches

- Refactor build_non_aggregate_select() into build_union_inner_select() that
  extracts property-access expressions from aggregate function arguments and
  projects them in inner UNION branches (e.g., mutualFriend.id AS 'mutualFriend.id')
- Outer SELECT rewrites aggregate references with backtick-escaped column aliases
  (e.g., count(DISTINCT `mutualFriend.id`) instead of count(DISTINCT mutualFriend.id))
- Strip ALL __order_col items from both outer and inner SELECT in aggregation case
- Use plan.order_by.to_sql() for ORDER BY in aggregation case instead of
  __union.__order_col_* references
- bi-18 now passes (friend recommendation with count(DISTINCT) over UNION)
- LDBC mini: 12/37 (32%) - bi-18 gained, short-7 lost to HashMap ordering sensitivity

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace HashMap-based deduplicate_joins() with insertion-order-preserving
approach using HashMap for lookup + Vec for ordered storage. This eliminates
non-deterministic join ordering that caused ±1-2 LDBC query variations
between compilations (short-7 and bi-6 were particularly sensitive).

LDBC mini: 13/37 (35%) - up from 12/37 baseline

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…xt fields, convert CteSchemaMetadata tuple to named struct

- CTERewriteContext: Remove unused reverse_mapping, with_aliases, cte_schemas fields
  and dead for_with_cte() constructor. Rename for_complex_cte() → new()
- CteSchemaMetadata: Convert from 4-element tuple type alias to named struct with
  select_items, column_names, alias_to_id, property_mapping fields
- Update all 12 destructuring sites and 4 insert sites in plan_builder_utils.rs
- Update plan_builder.rs find_id_column_for_alias_from_cte to use struct fields
- Remove unused HashSet import from expression_utils.rs
- Net: -55 lines, improved readability of CTE schema access patterns

All 1032 unit tests pass. LDBC mini: 13/37 (no regression).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Extract build_outer_aggregate_select() for UNION outer SELECT with aggregation
- Extract build_aliased_group_by() for GROUP BY with aliased column references
- CTE UNION with aggregation: inner branches project raw columns, outer
  SELECT applies aggregation (matching the main query path behavior)
- GROUP BY in CTE UNION context uses backtick-escaped column aliases
  instead of stale table-qualified references
- Refactor main query UNION path to use shared helpers (no behavior change)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The contains_graph_joins block in to_render_plan_with_ctx was extracting
ORDER BY items via extract_order_by (which uses apply_property_mapping_to_expr,
non-scope-aware) before the Limit/OrderBy/Skip unwrapping arms could fire.
This meant ORDER BY expressions like friend.firstName were resolved against
the plan tree instead of the scope, producing stale references like
with_cnt_friend_cte_1.firstName instead of cnt_friend.p6_friend_firstName.

Fix: After extracting ORDER BY items in both contains_graph_joins blocks,
apply scope-aware rewriting via rewrite_render_expr when scope is available.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When multiple Cypher aliases (e.g., person, messageCount, likeCount)
map to the same CTE, find_by_cte_name could return any entry due to
HashMap iteration order. If it found an alias with empty per-alias
property mapping (like messageCount), it would fail to resolve
properties (like person.id) that exist in another entry.

Fix: iterate ALL entries matching the CTE name and check each for
the requested property. Also remove unused find_by_cte_name method
and cte_variable_count method.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Development diagnostic logs (emoji-prefixed: 🔧🔍🔀🔄🎯🐛📊) were all at
warn level, flooding server logs. Converted to debug level. Only legitimate
warnings (⚠️ fallback behavior, mismatches, circular dependencies) remain
at warn level.

Before: 319 warn! calls in render_plan/ pipeline
After:  18 warn! calls (all genuine warnings)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…H JOINs

VLP CTE's start_id/end_id are UInt64 (native ID type). For single-column IDs,
wrapping the JOIN condition's CTE side in toString() caused type mismatch:
UInt64 = String comparison fails. toString() is only needed for composite IDs
(concat). Fixes short-2 and unblocks other VLP+WITH queries.

LDBC: 14/37 (38%), up from 13/37 (35%)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rough WITH

- In rewrite_render_plan_with_scope(), expand bare TableAlias SELECT items
  for CTE variables into individual property column references (fixes a.*
  after CTE which generates invalid SQL like 'a.*' when 'a' is a CTE)
- Handle node CTE variables (non-empty property_mapping): expand to
  individual PropertyAccessExp columns with proper CTE column names
- Handle scalar CTE variables (empty property_mapping): rewrite to
  direct CTE column reference instead of invalid table.* expansion
- Build alias rename mapping from WithClause items to correctly resolve
  renamed aliases (e.g., 'WITH u AS person' maps person→u so properties
  can be found under the original alias in property_mapping)
- Look up labels using both renamed and original alias names

Fixes 17 pytest regressions from scope work:
- WITH node expansion (5 tests)
- HAVING (2 tests)
- WITH aggregation pattern (4 tests)
- Variable alias renaming (3 tests)
- HAVING security (2 tests)
- Shortest path undirected was pre-existing (1 test)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… redesign

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a foundational redesign of variable scope resolution in the Cypher-to-SQL rendering pipeline. It replaces a post-hoc reverse_mapping hack (~88 usages across 13 resolution paths) with a proper VariableScope struct that correctly resolves variables across Cypher WITH barriers.

Changes:

  • Introduces VariableScope as a single source of truth for variable resolution, built iteratively with each WITH barrier
  • Threads scope through the entire rendering pipeline (SELECT, WHERE, ORDER BY, GROUP BY, HAVING, JOIN conditions)
  • Fixes UNION CTE rendering, aggregate UNION handling, and deterministic join ordering
  • Removes ~1,362 net lines including reverse_mapping blocks and helper functions

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/render_plan/variable_scope.rs New module implementing VariableScope, CteVariableInfo, and scope-aware expression rewriting
src/render_plan/plan_builder_utils.rs Builds and accumulates scope through WITH iterations; passes scope to recursive rendering calls
src/render_plan/plan_builder.rs Threads scope parameter through to_render_plan_with_ctx; applies scope-aware rewriting for ORDER BY, GROUP BY, HAVING, Filter
src/clickhouse_query_generator/to_sql_query.rs Fixes UNION+aggregation rendering with inner/outer SELECT separation; adds deterministic GROUP BY aliasing
src/query_planner/logical_expr/expression_rewriter.rs Integrates scope into ExpressionRewriteContext for CTE-aware property resolution
src/query_planner/analyzer/graph_join/helpers.rs Fixes deduplicate_joins to preserve insertion order for determinism
src/render_plan/plan_builder_helpers.rs Improves sort_joins_by_dependency to break ties deterministically by alias name
src/render_plan/mod.rs Converts CteSchemaMetadata from tuple to named struct for clarity
src/render_plan/expression_utils.rs Simplifies CTERewriteContext by removing fields now handled by VariableScope
src/main.rs Increases thread stack size from 128 MB to 512 MB
Multiple files Changes log level from warn to debug for non-critical informational messages
src/server/*.rs Updates to_render_plan_with_ctx call sites with new scope parameter
STATUS.md, CHANGELOG.md Documents the architecture change and test results

…nexpected CTE name format

- main.rs: Fix comment/code inconsistency — default was 512 MB but comment
  said 128 MB handles all known cases. Lower default to 128 MB; override
  via CLICKGRAPH_THREAD_STACK_MB env var for environments needing more.
- variable_scope.rs: Add debug log in extract_from_alias_from_cte_name()
  when CTE name doesn't match expected 'with_<alias>_cte[_N]' pattern,
  making unexpected naming conventions visible in logs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@genezhang genezhang merged commit 6137e1b into main Feb 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants