Skip to content

feat(gfql/binder): walk expr_tree, emit one BoundPredicate per top-level AND (#1200 slice 2/N)#1207

Merged
lmeyerov merged 5 commits intomasterfrom
issue-1200-slice-2-binder-walks-expr-tree
Apr 25, 2026
Merged

feat(gfql/binder): walk expr_tree, emit one BoundPredicate per top-level AND (#1200 slice 2/N)#1207
lmeyerov merged 5 commits intomasterfrom
issue-1200-slice-2-binder-walks-expr-tree

Conversation

@lmeyerov
Copy link
Copy Markdown
Contributor

Summary

Slice 2 of N for issue #1200. Slice 1 (PR #1202, merged) populated `WhereClause.expr_tree` from Lark's already-parsed boolean tree. This slice teaches the binder to walk it.

When `where.expr_tree` is present, `_where_predicates` flattens top-level AND conjuncts and emits one `BoundPredicate` per conjunct instead of one giant `BoundPredicate` with the whole expression text. Downstream passes (predicate pushdown, filters) now see pre-split single-conjunct strings, eliminating one source of text re-parsing.

When `where.expr_tree` is None (mixed-clause routes, pure structured `where_predicates`, queries with no boolean operators), behavior is unchanged — fallback to the existing text path or the structured-predicates path.

What this PR adds

  • `_flatten_top_level_ands(expr) -> List[BooleanExpr]` — descends through `and` nodes only; or/xor/not/atom stop the recursion and count as one conjunct. Handles left-, right-, and mixed-associative AND chains.
  • `_boolean_expr_to_text(expr) -> str` — recursive surface-text reconstruction for a boolean subtree. Atoms emit atom_text; binary ops glue with `L OP R`; NOT prefixes; branch operands are wrapped in parens for unambiguous round-tripping. Inherits the slice-1 literal-atom fidelity caveat (`str(True) == "True"`).
  • Updated `_where_predicates` branches on `expr_tree` availability.

Test plan

20 tests in `test_binder_expr_tree.py`:

  • Helpers (no parser): `_flatten_top_level_ands` covers single atom, left-/right-associative AND chains, OR/NOT/atom stop recursion, AND-with-OR-subtree keeps OR intact. `_boolean_expr_to_text` covers atom passthrough, binary glue, NOT prefix, parens around branch operands.

  • `_where_predicates` with hand-built fixtures: AND chain → N BoundPredicates; OR root → single compound BoundPredicate; `expr_tree=None` → backward-compat one BoundPredicate from `expr.text`; no expr → empty list.

  • End-to-end through `parse_cypher` + `FrontendBinder`: OR query → 1 BoundPredicate with OR text; pure-AND comparables (structured route) → unchanged 2 BoundPredicates; NOT query → 1 NOT BoundPredicate; `((a) OR (b)) AND c` → 2 BoundPredicates (1 OR-compound, 1 bare comparison).

  • `pytest test_binder_expr_tree.py` — 20/20 pass

  • `pytest graphistry/tests/compute/gfql/` — 1493 pass (+20 new)

  • `ruff check` clean

  • No backward-compat breakage on `expr_tree=None` paths

What this PR does NOT change

  • `predicate_pushdown._split_conjuncts` — slice 3 will relax it to fallback-only since binder now pre-splits.
  • `graphistry/compute/gfql/expr_split.py` — retired in slice 4.
  • `WhereClause.expr` field — kept for backward compat with mixed-clause routes.

Follow-ups (tracked in #1200)

  1. Pushdown slice: relax `_split_conjuncts` to fallback-only.
  2. Retirement slice: delete `expr_split.py` and `WhereClause.expr`.

🤖 Generated with Claude Code

lmeyerov and others added 5 commits April 24, 2026 15:36
…vel AND (#1200 slice 2/N)

Slice 2 of issue #1200.  Slice 1 (PR #1202) populated
``WhereClause.expr_tree``; this slice teaches the binder to walk it.

When ``where.expr_tree`` is present, ``_where_predicates`` flattens
top-level AND conjuncts and emits one ``BoundPredicate`` per conjunct
instead of one giant ``BoundPredicate`` with the whole expression
text.  Downstream passes (predicate pushdown, filters) now see
pre-split single-conjunct strings, eliminating one source of text
re-parsing.

When ``where.expr_tree`` is None (mixed-clause routes; pure structured
``where_predicates``; queries with no boolean operators), behavior is
unchanged — fallback to the existing ``where.expr.text`` path or the
structured ``where.predicates`` path.

What this PR adds:

- ``_flatten_top_level_ands(expr) -> List[BooleanExpr]``: descends
  through ``and`` nodes only; or/xor/not/atom stop the recursion and
  count as one conjunct.  Handles left-, right-, and mixed-associative
  AND chains.
- ``_boolean_expr_to_text(expr) -> str``: recursive surface-text
  reconstruction for a boolean subtree.  Atoms emit atom_text; binary
  ops glue with ``L OP R``; NOT prefixes; branch operands are wrapped
  in parens for unambiguous round-tripping.  Inherits the slice-1
  literal-atom fidelity caveat (``str(True) == "True"``).
- Updated ``_where_predicates`` branches on ``expr_tree`` availability.

20 tests in ``test_binder_expr_tree.py`` covering helper invariants,
``_where_predicates`` with hand-built ``WhereClause`` fixtures
(backward-compat, OR root, AND root, no-expr), and end-to-end through
``parse_cypher`` + binder for OR / NOT / nested AND-with-OR queries.

What this PR does NOT change:

- ``predicate_pushdown._split_conjuncts`` (slice 3 — relax to fallback
  only since binder now pre-splits).
- ``graphistry/compute/gfql/expr_split.py`` (retired in slice 4 once
  pushdown migrates).
- ``WhereClause.expr`` field (kept for backward compat with mixed-clause
  routes that still use the text path).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Tighten test_e2e_pure_and_of_comparables_unchanged to assert
  expr_tree is None directly, not just BoundPredicate count.  A future
  parser change that accidentally populates expr_tree on the pure-AND
  path would now fail loudly rather than silently match the count.
- Document the parser-layer mutual-exclusion invariant inline in
  _where_predicates so a future refactor introducing overlap does not
  silently double-emit BoundPredicates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…slice 2

The conformance suite added in #1204 (commit 91c9777) was explicitly
designed to track the #1194 grammar-disambiguation and #1200 IR
boolean-tree migrations.  Slice 2 (#1207) flips three shape contracts
from one combined BoundPredicate to per-conjunct BoundPredicates as
the binder now flattens top-level AND.

Updates:

- Shape C ``(A OR B) AND C`` → 2 BoundPredicates (OR-compound + bare C)
- Shape D ``NOT A AND B`` → 2 BoundPredicates (NOT-compound + bare B)
- Shape F ``n:Admin AND n.active = true`` → 2 BoundPredicates (label
  + property, top-level AND splits regardless of operand flavor)

Assertions now check both count AND content (each conjunct identifiable
by substring) so future shape-contract drift is caught precisely.

Other shapes in the suite (A, B, E, G) are unaffected: shape A already
emitted N BoundPredicates via where_predicates structured route; shape
B raises GFQLSyntaxError; shape E (XOR) stays one conjunct because
top-level isn't AND; shape G (quoted AND) already produced 2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant