fix(cypher): walk expr_tree to lift label-only WHERE predicates (#1194)#1209
fix(cypher): walk expr_tree to lift label-only WHERE predicates (#1194)#1209
Conversation
Replace the text-level `split_top_level_and` + regex loop in `generic_where_clause` with a structural walker over Lark's parsed `BooleanExpr` (and `_ExpressionSlice` for the single-atom path). Closes #1194 in the spirit of the issue: the grammar already declares the structured rule we want, and slice 1 (#1202) already exposes the parsed tree on `WhereClause.expr_tree` — `generic_where_clause` should trust that source of truth instead of re-splitting the WHERE body on top-level AND. Adds two helpers: - `_match_bare_label_atom(text)` — fullmatch atom text against `_BARE_LABEL_PREDICATE_RE` (preserves the #1125 false-positive guard). - `_lift_label_only_and_spine(node)` — DFS over a `BooleanExpr` AND-spine; returns lifted `(alias, labels)` tuples iff every leaf is a bare-label atom, else `None` (all-or-nothing — mixed/OR/XOR/NOT fall through). Behaviorally identical to master across the WHERE-shape matrix: single label, multi-AND chains, mixed label+property, OR/XOR/NOT, parenthesized boolean trees, and string-literal false-positive guards all produce the same `WhereClause` shape. Verified by parser, binder, slice-1 producer, slice-2 conformance + binder_expr_tree, and lowering test suites (1528 passed, no regressions). Targeted mypy clean. Adds 4 focused unit tests for the new helpers in `test_parser.py` and updates the stale comment in `test_parse_where_triple_and_label_conjunction_through_generic_where_clause` to reference the walker (was: `split_top_level_and`). `expr_split.py` and the regex are unchanged — both still have other callers (where-pattern canonicalization, lowering); their retirement is deferred to later slices of #1200. Supersedes the Earley-sub-parser approach in PR #1203, which would have introduced a second Lark parser path; the structural walker is cleaner now that slice 1 has landed on master. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Compose with #1214's parser invariant `(expr is None) == (expr_tree is None)`: - `generic_where_clause` now relies on #1214's single-atom synthesis to guarantee `expr_tree` is always a `BooleanExpr`. The walker (`_lift_label_only_and_spine`) runs over that uniform tree. - Drops the `_ExpressionSlice` branch this PR added — redundant now that #1214 wraps non-BooleanExpr operands as single-atom trees upstream of this point. - Merged CHANGELOG: kept both entries (this PR's #1194 walker entry + #1214's invariant entry); updated this PR's entry to note the composition with #1214. Verified: 940 targeted (parser/binder/boolean_expr/conformance/ binder_expr_tree/where_clause_expr_tree_invariant/lowering) passed including #1214's 11 new invariant tests; 1539 full GFQL sweep passed. mypy on parser.py clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR Review: PR #1209 — fix(cypher): walk expr_tree to lift label-only WHERE predicates (#1194)Branch: Blockers (must fix before merge)None. Important (should fix)None. Suggestions (nice to have)
(Other Wave-1 SUGGESTIONs — right-assoc test, direct Human checks required (operator decision needed)
Rejected / False positives (with proof)
MethodologyRun per
Cross-references
RecommendationApprove and merge. No defects. The single confirmed SUGGESTION is a 4-LOC test-fixture typing cleanup that does not warrant blocking the merge. |
| assert _match_bare_label_atom("'A:B'") is None # quoted string fragment | ||
|
|
||
|
|
||
| def _bx_atom(text: str): # type: ignore[no-untyped-def] |
There was a problem hiding this comment.
[SUGGESTION, non-blocking] These two helpers use # type: ignore[no-untyped-def] while sibling helpers in this file (_parse_query, _match_parts) carry explicit type annotations. ~4 LOC cleanup:
def _bx_atom(text: str) -> "BooleanExpr": ...
def _bx_branch(op: str, left: "BooleanExpr", right: Optional["BooleanExpr"] = None) -> "BooleanExpr": ...Trivial; can land in this PR or as a follow-up. Surfaced by Wave 1 of the review skill (plans/1194-lark-where-predicates-ambiguity/review-1209/waves/wave-1/quality/report.md).
Summary
Closes #1194 in the spirit of the issue: lift bare label predicates and AND-spines of bare label predicates by walking Lark's already-parsed
BooleanExprincypher/parser.py::generic_where_clause, replacing the previous text-levelsplit_top_level_and+ regex loop. The grammar already declares the structured rule we want; this PR makes the transformer trust it.Builds on slice 1 (#1202) which exposed
WhereClause.expr_tree. Independent of slice 2 (#1207, binder-side). Shrinks the duplicate-parsing surface flagged by #1200.What changed
graphistry/compute/gfql/cypher/parser.py:_match_bare_label_atom(text) -> Optional[(alias, labels)]— fullmatches atom text against_BARE_LABEL_PREDICATE_RE.fullmatchis load-bearing as the false-positive guard from Binder follow-up: replace regex WHERE label narrowing with AST-based analysis #1125 — atom fragments that merely look label-shaped (e.g. quoted-string contents) must not lift._lift_label_only_and_spine(node)— DFS over aBooleanExpr; returns lifted(alias, labels)tuples iff every AND-spine leaf is a bare-label atom, elseNone. Any non-AND op (OR/XOR/NOT), missing child, or non-label atom rejects the whole spine (all-or-nothing).generic_where_clausesimplified: walkBooleanExprwhenitems[0]is one (AND-spine path); for the single-atom path (items[0] is _ExpressionSlice) call_match_bare_label_atomonitems[0].textdirectly. No top-level AND splitting on the WHERE body text inside this function. Mixed/OR/XOR/NOT inputs fall through to raw expr (withexpr_treeretained for downstream slices).graphistry/tests/compute/gfql/cypher/test_parser.py:test_parse_where_triple_and_label_conjunction_through_generic_where_clauseto point at the walker (was:split_top_level_and).CHANGELOG.md: entry under Internal.What did not change
expr_split.pyandsplit_top_level_and— still imported by other call sites in the parser (where_pattern_*canonicalization atparser.py:524) and bypasses/predicate_pushdown.py. Their retirement is the eventual goal of gfql/ir: expose Lark and_op/or_op tree in WhereClause so passes walk structure instead of re-parsing text #1200 slices 3+, not this PR.generic_where_clauseis restructured.Behavioral compatibility
End-to-end I/O contract is identical to master across the WHERE-shape matrix:
predicatescountexprexpr_treeWHERE n:Adminwhere_predicatesruleWHERE n:Admin AND n:Active [AND ...]generic_where_clause, walker liftsWHERE n.x = 1 AND n:Adminwhere_predicatesruleWHERE n:Admin AND n.x = 1generic_where_clause, walker rejects (mixed)WHERE n:Admin OR n:Activegeneric_where_clause, walker rejects (OR root)WHERE NOT n:Admingeneric_where_clause, walker rejects (NOT root)WHERE (n:Admin OR n:Active) AND n.k = 1generic_where_clause, walker rejects (mixed leaves)WHERE n.x = 'A AND B'(string with AND)where_predicatesrule#1207's conformance matrix and
test_binder_expr_tree.pycontinue to pass — verified.Test plan
pytest graphistry/tests/compute/gfql/cypher/test_parser.py— 111 pass (was 108 + 3 new helper tests; +1 for the existing single-label test still green)pytest graphistry/tests/compute/gfql/cypher/test_binder.py— 44 passpytest graphistry/tests/compute/gfql/cypher/test_boolean_expr.py— slice-1 producer tests, all passpytest graphistry/tests/compute/gfql/cypher/test_where_bool_conformance.py— slice-2 conformance, all passpytest graphistry/tests/compute/gfql/cypher/test_binder_expr_tree.py— slice-2 binder, all passpytest graphistry/tests/compute/gfql/cypher/test_lowering.py— all passpytest graphistry/tests/compute/gfql/(full sweep) — 1528 passed, 80 skipped, 15 xfailed, no regressionsmypy graphistry/compute/gfql/cypher/parser.py— cleanDRY scoreboard delta
Before this PR (master post-#1207):
generic_where_clauseregex+splitand_op+bare_label_predicate_exprexpr_split.py::split_top_level_andparser.py:524(where-pattern canon) andpredicate_pushdown.py_BARE_LABEL_PREDICATE_RE(parser:305)_CYPHER_BARE_LABEL_PREDICATE_RE(lowering:505)After this PR:
generic_where_clauseexpr_split.py::split_top_level_andparser.pyretains its other call site)_BARE_LABEL_PREDICATE_RECloses
Closes #1194.
Related
WhereClause.expr_treeexposed (merged)expr_tree, emits per-conjunctBoundPredicates (merged)fullmatchinvariant preserved here)Supersedes
This PR replaces the abandoned approach in PR #1203 (separate Earley sub-parser instance), which would have introduced a second Lark parser path. With slice 1 already merged on master, the structural walker is the cleaner mechanism and aligns with the #1200 trajectory.