Skip to content

refactor(#1213 sub-PR A): lift boolean_expr_to_text + establish expr_tree invariant#1214

Merged
lmeyerov merged 2 commits intomasterfrom
feat/issue-1213-sub-pr-a-lift-helper
Apr 25, 2026
Merged

refactor(#1213 sub-PR A): lift boolean_expr_to_text + establish expr_tree invariant#1214
lmeyerov merged 2 commits intomasterfrom
feat/issue-1213-sub-pr-a-lift-helper

Conversation

@lmeyerov
Copy link
Copy Markdown
Contributor

Summary

Sub-PR A of #1213. Two tightly-related steps that unblock the rest of the migration:

  1. Lift helpers: Move _boolean_expr_to_text and _flatten_top_level_ands from frontends/cypher/binder.py to new shared module graphistry/compute/gfql/cypher/_boolean_expr_text.py. Subsequent slices (lowering, ast_normalizer migrations) can now reconstruct WHERE-body text from expr_tree without depending on binder internals.

  2. Establish parser invariant: (WhereClause.expr is None) == (WhereClause.expr_tree is None). Three sites previously violated this by populating expr without expr_tree:

    • _mixed_where_clause (parser.py:984): WHERE pattern AND expr
    • where_clause non-structured branch (parser.py:1121): atom-shaped WHERE bypassing and_op/or_op
    • generic_where_clause single-atom fallback (parser.py:1176): items[0] not a BooleanExpr

    All three now synthesize a single-atom BooleanExpr(op="atom", atom_text=expr_text, ...). Behavior-preserving: binder's primary expr_tree path emits one BoundPredicate per top-level AND conjunct; a single-atom tree yields one BoundPredicate with expression == atom_text, identical to the legacy expr.text fallback the binder takes today.

Adds test_where_clause_expr_tree_invariant.py with 11 tests covering structured, label-narrowed, atom, OR, XOR, NOT, and mixed-pattern shapes.

Test plan

  • 11/11 invariant tests pass
  • Full GFQL suite locally: 1535 passed, 0 failed (no regressions)
  • CI green

Refs

🤖 Generated with Claude Code

lmeyerov and others added 2 commits April 25, 2026 04:57
…tree invariant

Move `_boolean_expr_to_text` and `_flatten_top_level_ands` from binder.py to
new shared module `cypher/_boolean_expr_text.py` so lowering and ast_normalizer
can reconstruct WHERE-body text from `expr_tree` in subsequent slices.

Establish parser invariant `(WhereClause.expr is None) == (WhereClause.expr_tree is None)`
by synthesizing single-atom `BooleanExpr` at three sites that previously left
`expr_tree` unset while populating `expr`:

  - `_mixed_where_clause` (parser.py): WHERE pattern AND expr / expr AND pattern
  - `where_clause` non-structured branch: atom-shaped WHERE bypassing and_op/or_op
  - `generic_where_clause` single-atom fallback: items[0] is not a BooleanExpr

Behavior-preserving: binder primary path emits one `BoundPredicate` per
top-level AND conjunct; a single-atom tree yields one `BoundPredicate` with
`expression == atom_text`, identical to the legacy `expr.text` fallback.

Add `test_where_clause_expr_tree_invariant.py` covering all three previously-
broken paths plus the structured-predicates and parenthesized-OR shapes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ifted helper

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lmeyerov lmeyerov marked this pull request as ready for review April 25, 2026 12:06
@lmeyerov lmeyerov merged commit 1f90c57 into master Apr 25, 2026
133 checks passed
@lmeyerov lmeyerov deleted the feat/issue-1213-sub-pr-a-lift-helper branch April 25, 2026 12:06
lmeyerov added a commit that referenced this pull request Apr 25, 2026
Compose with #1214's parser invariant `(expr is None) == (expr_tree is None)`:

- `generic_where_clause` now relies on #1214's single-atom synthesis to
  guarantee `expr_tree` is always a `BooleanExpr`. The walker
  (`_lift_label_only_and_spine`) runs over that uniform tree.
- Drops the `_ExpressionSlice` branch this PR added — redundant now that
  #1214 wraps non-BooleanExpr operands as single-atom trees upstream of
  this point.
- Merged CHANGELOG: kept both entries (this PR's #1194 walker entry +
  #1214's invariant entry); updated this PR's entry to note the
  composition with #1214.

Verified: 940 targeted (parser/binder/boolean_expr/conformance/
binder_expr_tree/where_clause_expr_tree_invariant/lowering) passed
including #1214's 11 new invariant tests; 1539 full GFQL sweep passed.
mypy on parser.py clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lmeyerov added a commit that referenced this pull request Apr 26, 2026
…ree readers (#1213 sub-PR B) (#1216)

First read-side slice of #1213 (umbrella #1200 slice 5).  Migrate all
text-only readers of `WhereClause.expr.text` to consume
`boolean_expr_to_text(WhereClause.expr_tree)` directly, leveraging the
parser invariant `(expr is None) == (expr_tree is None)` established by
#1214.

Sites migrated:

- `frontends/cypher/binder.py::_where_predicates` — drop the dead
  `elif where.expr is not None` fallback.  Per #1214's invariant the
  branch is unreachable for parser-produced WhereClauses; it was
  defensive code for hand-built fixtures only.
- `cypher/lowering.py::_reject_unsupported_where_expr_forms` — gate +
  `expr.text.strip()` access.
- `cypher/lowering.py::_reject_unsupported_variable_length_where_pattern_predicates`
  — error-reporting `value=expr.text` access.
- `cypher/lowering.py::_check_expr` callers (three sites — `query.where`,
  `reentry_where`, and the post-binding `query.where` re-check) —
  gate + `expr.text` access.

Out of scope (left for sub-PR C):
- `ExpressionText` passthrough callers (`rewrite_expr` in lowering at
  7515/7533, `_extract_relationship_type_where` at 8131,
  `pre_join_filters.append` at 7998, `row_where` assignment at 6322-6329,
  `_rewrite_shortest_path_expr_text` at ast_normalizer.py:432-436).
- Span-access readers at lowering.py:8143-8144 (entangled with the
  passthrough at 8131; migrate together in C).

Tests: removed `test_where_predicates_with_expr_tree_none_falls_back_to_expr_text`
which exercised the now-dropped binder fallback via a hand-built
`WhereClause(expr=..., expr_tree=None)` fixture — that shape violates
the parser invariant and is unreachable from any real input.  The
sibling test `test_where_predicates_with_no_expr_at_all_returns_empty`
still covers the legitimate no-expr-tree case.

Verified:
- 1538/1538 GFQL tests pass (was 1539, minus deleted test).
- mypy clean on `cypher/lowering.py` and `frontends/cypher/binder.py`.

Coordination: this is sub-PR B per the split agreed on #1213
(reader-side: us; writer-side D+E: colleague who landed #1214).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lmeyerov added a commit that referenced this pull request Apr 26, 2026
…WhereClause.expr → expr_tree readers (#1213 sub-PR C) (#1218)

Second read-side slice of #1213.  Migrate all `ExpressionText`-passthrough
readers of `WhereClause.expr` to consume `WhereClause.expr_tree`-derived
values via two new helpers in `cypher/lowering.py`:

- `_where_clause_expr_text(where)` — synthesizes a fresh `ExpressionText`
  from `where.expr_tree` using `where.span` (which equals the original
  `where.expr.span` for parser-produced WhereClauses, both derived from
  the same `_span_from_meta(meta)` of `generic_where_clause`) so error
  positions match master's behavior.
- `_rewrite_where_clause_and_resync(where, rewrite, field)` — applies a
  text rewrite via the existing `(ExpressionText, str) -> ExpressionText`
  callback shape and resynchronizes `expr_tree` to a single-atom
  `BooleanExpr` carrying the rewritten text (Option B of the C decision
  posted on #1213; documented in plan.md).

Side-effect fix: closes the latent text/tree staleness bug on master
post-#1214 where `replace(where, expr=rewrite(where.expr, ...))` left
`expr_tree` pointing at the pre-rewrite text.  Now both fields stay in
sync after rewriting.

Sites migrated:

- `lowering.py:6322` — dynamic row-where extraction
  (`_extract_relationship_type_where(query.where.expr, ...)` + `row_where`
  passthrough → use `_where_clause_expr_text`).
- `lowering.py:7530-7538` — reentry-where rewrite of remaining wheres
  (use `_rewrite_where_clause_and_resync`).
- `lowering.py:7550-7553` — reentry-where rewrite of primary
  `query.reentry_where` (same helper).
- `lowering.py:7998` — `pre_join_filters.append(query.where.expr)` →
  append the synthesized ExpressionText.
- `lowering.py:8176-8191` — connected-OPTIONAL-MATCH where lowering
  (`_extract_relationship_type_where` + `_unsupported` error reporting
  with `.text` and `.span.line/.column`) → all consume the synthesized
  ExpressionText uniformly.
- `ast_normalizer.py::_rewrite_where` — shortest-path expr rewrite via
  `_rewrite_shortest_path_expr_text` (inline equivalent of the lowering
  helper since this module only has one callsite).

Out of scope (deferred to D + E):

- `ast_normalizer.py:520-526` constructor passthrough (`expr=query.where.expr`
  inside a `WhereClause(...)` re-construction) — left for sub-PR D to drop
  along with the other writer-side construction-site updates in parser.py.
  The gate at L520 is migrated (`expr is not None` → `expr_tree is not None`).

Verified:
- 1538/1538 GFQL tests pass; no regressions.
- `mypy` clean on `cypher/lowering.py` and `cypher/ast_normalizer.py`.

Coordination: this is sub-PR C per the split agreed on #1213.  Ready for
the colleague's combined sub-PRs D + E (drop `expr=` from `WhereClause(...)`
constructions; remove the field) once this lands.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lmeyerov added a commit that referenced this pull request Apr 26, 2026
IMPORTANT findings from multi-dimension review:

1. Dead-code in OR test (test_lowering.py:3199-3205) — the
   ``if False else`` made the first comprehension permanently dead;
   the fallback branch then asserted on either of two contradictory
   shapes.  Replaced with a single deterministic assertion: 5 rows
   expected, every kept row carries either ``p1: 1`` or ``p2: 2`` in
   its rendered form.
2. EQ duplicated the new mixin's overrides (comparison.py:245-263).
   The mixin docstring already invited the retrofit ("EQ predates this
   mixin with an inline override").  EQ now uses
   ``_StringAllowingComparisonMixin`` for its single-source contract.

SUGGESTIONS deferred (documented in plan):
- NaN-NOT semantics test name (locks pandas-not-Cypher behavior;
  acknowledged in test docstring + #1219 link).
- ``_rebuild_and_tree`` invents source-spans across non-adjacent
  residuals (cosmetic, error-reporting only).
- No deeply-nested pattern-under-multi-level-boolean test
  (``_has_pattern_descendant`` handles correctly by inspection).
- PEP-8 spacing nit after constant deletes.
- Defensive ``_split_top_level_and_pattern_leaves`` bad-flag drop on
  malformed AND nodes (#1214 invariant prevents the case).

1680 GFQL + predicates tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
lmeyerov added a commit that referenced this pull request Apr 26, 2026
* refactor(#1213 sub-PR D+E): drop WhereClause.expr field

Final slice of #1200 / #1213. Removes the legacy raw-text WHERE field now
that all production readers (binder, lowering, ast_normalizer) consume
expr_tree-derived text via boolean_expr_to_text() per sub-PR B/C.

Sub-PR D — drop expr= from writer construction sites:
  - parser.py: _mixed_where_clause, where_clause non-structured branch,
    generic_where_clause fallback (the three sites that synthesized
    BooleanExpr atoms in #1214); also strip expr=None from the three
    structured-path constructions
  - ast_normalizer.py: drop expr= forwarder + switch gate to expr_tree

Sub-PR E — remove the field:
  - ast.py: drop WhereClause.expr; update docstring to two-shape contract
    (structured vs tree, with mixed pattern+expr as the third combination)
  - test_where_clause_expr_tree_invariant.py: rewrite from the now-vacuous
    (expr is None) == (expr_tree is None) symmetry to the surviving
    routing-shape contract (structured / tree / mixed)
  - test_parser.py, test_boolean_expr.py, test_ast_normalizer.py,
    test_where_bool_conformance.py: replace `where.expr is None/not None`
    with `where.expr_tree is None/not None`; replace `where.expr.text`
    with `boolean_expr_to_text(where.expr_tree)`; drop the slice-1
    backward-compat test that asserted both fields were populated

NOT YET REBASED ONTO POST-C MASTER. This commit produces the expected
93 test failures from lowering/ast_normalizer readers that still access
.expr; sub-PR C migrates them. Rebase onto post-C master before pushing
or running CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(#1213 sub-PR D+E): pre-stage CHANGELOG entry for field removal

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#1213 sub-PR D+E): drop leftover expr= from sub-PR C's replace() calls

Sub-PR C (#1218) introduced two ``replace(where, expr=rewritten, expr_tree=new_tree)``
calls in ``ast_normalizer.py::_rewrite_where`` and ``lowering.py::_rewrite_where_clause_and_resync``.
These passed both fields because the field still existed at C's merge time.

After this PR removes the field, the ``expr=`` arg becomes invalid keyword
to the dataclass constructor. Drop it from both sites; ``expr_tree`` is the
single source of truth post-D+E.

Also drops the now-unused ``expr=...`` and ``expr=None`` args from three
hand-built ``WhereClause(...)`` test fixtures in ``test_binder_expr_tree.py``
that broke for the same reason, plus the now-unused ``ExpressionText`` import.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(#1213 sub-PR D+E): apply review findings — docstring + test coverage

Two review waves (focus + general per the new review policy) converged on:

1. **WhereClause docstring missing the "mixed" routing shape** (IMPORTANT, both
   waves). The new docstring listed only 2 shapes; reality is 3 (and arguably
   subdivided further by which kind of structured `predicates` populated).
   Rewrote to enumerate all three observable shapes, distinguishing
   `WherePredicate` from `WherePatternPredicate` in the structured case.

2. **Invariant test missing parser construction paths** (IMPORTANT, focus
   wave). Added `MATCH (a) WHERE (a)-[]->(:Admin) RETURN a` covering
   `where_pattern_only_clause` (structured, no AND, single
   `WherePatternPredicate`).  `expr_and_where_pattern_clause` (expr-first
   mixed) doesn't currently parse; `where_clause` _ExpressionSlice branch is
   Lark-internal routing already exercised by integration tests.

3. **Stale binder comment claiming `predicates`/`expr_tree` mutually exclusive**
   (SUGGESTION, general wave).  They aren't — the mixed shape populates
   both — and the binder body correctly handles all three shapes.  Updated
   the comment to match the AST docstring's 3-shape contract.

4. **CHANGELOG polish** (SUGGESTIONS, both waves): clarified count
   ("9 query forms across the 3 shapes" rather than the Cartesian-reading
   "3 × 8"), added an API note that `WhereClause` is exported and external
   constructors must drop `expr=` kwarg.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant