Conversation
…r_tree (#1200 slice 1/N) Parser now captures Lark's already-parsed and_op / or_op / xor_op / not_op tree structurally and surfaces it on ``WhereClause.expr_tree`` alongside the existing raw-text ``.expr``. Downstream consumers (binder, predicate pushdown) stay on the text path for now; subsequent PRs migrate them to walk the structural tree and eventually retire ``expr_split.py``. What this PR adds: - ``BooleanExpr`` dataclass in ``cypher/ast.py`` with ``op`` (``and`` / ``or`` / ``xor`` / ``not`` / ``atom``), span, left/right children, and atom_text/atom_span for leaves. - Parser transformer methods ``and_op`` / ``or_op`` / ``xor_op`` / ``not_op`` that construct ``BooleanExpr`` nodes bottom-up. - ``grouped_expr`` passthrough so ``(...)`` parens do not collapse nested boolean structure into atom leaves. - ``_wrap_as_boolean_atom`` helper that handles BooleanExpr passthrough, _ExpressionSlice spans, Lark Tree .meta spans, and primitive literal operands (bool/int/None) without meta. - ``WhereClause.expr_tree: Optional[BooleanExpr]`` additive field; ``.expr`` preserved for backward compat. - Wiring in ``generic_where_clause`` to extract BooleanExpr from items[0] when Lark produces one; ``.expr`` still populated. Tests (``test_boolean_expr.py``, 12 cases): - Structured paths (single predicate, pure-AND of comparables, structured label narrowing) correctly leave expr_tree=None. - OR / XOR / NOT shapes populate expr_tree with the expected op. - Precedence: AND binds tighter than OR; NOT binds tighter than AND/OR. - Parenthesized subexpressions preserve nested boolean structure via the grouped_expr passthrough. - Spans: atom spans quote the source text exactly; branch spans cover the full subexpression. - Backward compat: expr_tree and expr both populated when Lark routes through the generic expr path. No binder, pushdown, or expr_split.py changes this PR — those are tracked in follow-up slices of #1200. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
IMPORTANT fixes: - ast_normalizer._rewrite_where_pattern_predicates_to_matches: when reconstructing a remaining WhereClause after pulling a WHERE pattern predicate into its own MatchClause, the constructor now preserves ``expr_tree=query.where.expr_tree``. Previously it was silently dropped, breaking structural capture for any query mixing a WHERE pattern predicate with a boolean-expression tail. - Documented the primitive-literal ``atom_text`` fidelity limitation on ``_wrap_as_boolean_atom`` (``str(True)`` → ``\"True\"`` not Cypher ``\"true\"``). No consumer currently reads atom_text on literal atoms; locked by a new test ``test_literal_boolean_atoms_known_ limitation_python_style_text`` so any future fix flips intentionally. SUGGESTION fixes: - DRY: ``_boolean_binary`` helper collapses ``and_op`` / ``or_op`` / ``xor_op`` bodies into single-line delegates, centralizing the arity check and error message. - ``WhereClause`` class docstring now states field coexistence contract (structured path vs raw-text path vs structured-tree path). - Tighter branch-span assertion (no ``.strip()``) so future drift is caught. - Three new tests: deeply nested OR grouping preserves structure, ``NOT NOT`` produces nested NOT nodes, mixed AND/XOR precedence via parens. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ause (wave-2 #1202) The where_pattern_and_expr_clause and expr_and_where_pattern_clause routes reconstruct expr from source text and do not capture Lark's structural boolean-expression tree. This is pre-existing (they already re-parse from text for the pattern-side anyway) but the new WhereClause docstring contract around expr_tree made it worth noting inline as a known limitation for the slice-1 scope of #1200. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
lmeyerov
added a commit
that referenced
this pull request
Apr 25, 2026
…vel AND (#1200 slice 2/N) (#1207) * feat(gfql/binder): walk expr_tree, emit one BoundPredicate per top-level AND (#1200 slice 2/N) Slice 2 of issue #1200. Slice 1 (PR #1202) populated ``WhereClause.expr_tree``; this slice teaches the binder to walk it. When ``where.expr_tree`` is present, ``_where_predicates`` flattens top-level AND conjuncts and emits one ``BoundPredicate`` per conjunct instead of one giant ``BoundPredicate`` with the whole expression text. Downstream passes (predicate pushdown, filters) now see pre-split single-conjunct strings, eliminating one source of text re-parsing. When ``where.expr_tree`` is None (mixed-clause routes; pure structured ``where_predicates``; queries with no boolean operators), behavior is unchanged — fallback to the existing ``where.expr.text`` path or the structured ``where.predicates`` path. What this PR adds: - ``_flatten_top_level_ands(expr) -> List[BooleanExpr]``: descends through ``and`` nodes only; or/xor/not/atom stop the recursion and count as one conjunct. Handles left-, right-, and mixed-associative AND chains. - ``_boolean_expr_to_text(expr) -> str``: recursive surface-text reconstruction for a boolean subtree. Atoms emit atom_text; binary ops glue with ``L OP R``; NOT prefixes; branch operands are wrapped in parens for unambiguous round-tripping. Inherits the slice-1 literal-atom fidelity caveat (``str(True) == "True"``). - Updated ``_where_predicates`` branches on ``expr_tree`` availability. 20 tests in ``test_binder_expr_tree.py`` covering helper invariants, ``_where_predicates`` with hand-built ``WhereClause`` fixtures (backward-compat, OR root, AND root, no-expr), and end-to-end through ``parse_cypher`` + binder for OR / NOT / nested AND-with-OR queries. What this PR does NOT change: - ``predicate_pushdown._split_conjuncts`` (slice 3 — relax to fallback only since binder now pre-splits). - ``graphistry/compute/gfql/expr_split.py`` (retired in slice 4 once pushdown migrates). - ``WhereClause.expr`` field (kept for backward compat with mixed-clause routes that still use the text path). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test+docs: address wave-1 review findings on #1207 - Tighten test_e2e_pure_and_of_comparables_unchanged to assert expr_tree is None directly, not just BoundPredicate count. A future parser change that accidentally populates expr_tree on the pure-AND path would now fail loudly rather than silently match the count. - Document the parser-layer mutual-exclusion invariant inline in _where_predicates so a future refactor introducing overlap does not silently double-emit BoundPredicates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(changelog): add #1200/#1207 binder-walks-expr-tree slice-2 entry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(gfql/conformance): update WHERE boolean-shape contract for #1200 slice 2 The conformance suite added in #1204 (commit 91c9777) was explicitly designed to track the #1194 grammar-disambiguation and #1200 IR boolean-tree migrations. Slice 2 (#1207) flips three shape contracts from one combined BoundPredicate to per-conjunct BoundPredicates as the binder now flattens top-level AND. Updates: - Shape C ``(A OR B) AND C`` → 2 BoundPredicates (OR-compound + bare C) - Shape D ``NOT A AND B`` → 2 BoundPredicates (NOT-compound + bare B) - Shape F ``n:Admin AND n.active = true`` → 2 BoundPredicates (label + property, top-level AND splits regardless of operand flavor) Assertions now check both count AND content (each conjunct identifiable by substring) so future shape-contract drift is caught precisely. Other shapes in the suite (A, B, E, G) are unaffected: shape A already emitted N BoundPredicates via where_predicates structured route; shape B raises GFQLSyntaxError; shape E (XOR) stays one conjunct because top-level isn't AND; shape G (quoted AND) already produced 2. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This was referenced Apr 25, 2026
Closed
lmeyerov
added a commit
that referenced
this pull request
Apr 25, 2026
… (#1209) Replace the text-level `split_top_level_and` + regex loop in `generic_where_clause` with a structural walker over Lark's parsed `BooleanExpr` (and `_ExpressionSlice` for the single-atom path). Closes #1194 in the spirit of the issue: the grammar already declares the structured rule we want, and slice 1 (#1202) already exposes the parsed tree on `WhereClause.expr_tree` — `generic_where_clause` should trust that source of truth instead of re-splitting the WHERE body on top-level AND. Adds two helpers: - `_match_bare_label_atom(text)` — fullmatch atom text against `_BARE_LABEL_PREDICATE_RE` (preserves the #1125 false-positive guard). - `_lift_label_only_and_spine(node)` — DFS over a `BooleanExpr` AND-spine; returns lifted `(alias, labels)` tuples iff every leaf is a bare-label atom, else `None` (all-or-nothing — mixed/OR/XOR/NOT fall through). Behaviorally identical to master across the WHERE-shape matrix: single label, multi-AND chains, mixed label+property, OR/XOR/NOT, parenthesized boolean trees, and string-literal false-positive guards all produce the same `WhereClause` shape. Verified by parser, binder, slice-1 producer, slice-2 conformance + binder_expr_tree, and lowering test suites (1528 passed, no regressions). Targeted mypy clean. Adds 4 focused unit tests for the new helpers in `test_parser.py` and updates the stale comment in `test_parse_where_triple_and_label_conjunction_through_generic_where_clause` to reference the walker (was: `split_top_level_and`). `expr_split.py` and the regex are unchanged — both still have other callers (where-pattern canonicalization, lowering); their retirement is deferred to later slices of #1200. Supersedes the Earley-sub-parser approach in PR #1203, which would have introduced a second Lark parser path; the structural walker is cleaner now that slice 1 has landed on master. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First of N slices for issue #1200. Parser now captures Lark's already-parsed `and_op` / `or_op` / `xor_op` / `not_op` tree structurally and surfaces it on `WhereClause.expr_tree` alongside the existing raw-text `.expr`.
Downstream consumers (binder, predicate pushdown) stay on the text path for now; subsequent PRs migrate them to walk the structural tree and eventually retire `expr_split.py`.
What this PR adds
What this PR does NOT change
Strictly additive producer; zero downstream behavior change.
Routing note
`expr_tree` is populated only when Lark routes through the generic `expr` path. That happens for:
Pure AND conjunctions of comparable predicates (`WHERE a = 1 AND b = 2`) continue to route through the structured `where_predicates` rule and populate `.predicates` rather than `.expr_tree`. AND-joined bare labels (`n:A AND n:B`) continue to lift to structured predicates via the existing label-narrowing logic.
Tests
12 cases in `test_boolean_expr.py`:
Test plan
Follow-ups (tracked in #1200)
🤖 Generated with Claude Code