fix(cypher): walk expr_tree to lift label-only WHERE predicates (#1194) by lmeyerov · Pull Request #1209 · graphistry/pygraphistry

lmeyerov · 2026-04-25T10:40:49Z

Summary

Closes #1194 in the spirit of the issue: lift bare label predicates and AND-spines of bare label predicates by walking Lark's already-parsed BooleanExpr in cypher/parser.py::generic_where_clause, replacing the previous text-level split_top_level_and + regex loop. The grammar already declares the structured rule we want; this PR makes the transformer trust it.

Builds on slice 1 (#1202) which exposed WhereClause.expr_tree. Independent of slice 2 (#1207, binder-side). Shrinks the duplicate-parsing surface flagged by #1200.

What changed

graphistry/compute/gfql/cypher/parser.py:

New helpers (module level):
- _match_bare_label_atom(text) -> Optional[(alias, labels)] — fullmatches atom text against _BARE_LABEL_PREDICATE_RE. fullmatch is load-bearing as the false-positive guard from Binder follow-up: replace regex WHERE label narrowing with AST-based analysis #1125 — atom fragments that merely look label-shaped (e.g. quoted-string contents) must not lift.
- _lift_label_only_and_spine(node) — DFS over a BooleanExpr; returns lifted (alias, labels) tuples iff every AND-spine leaf is a bare-label atom, else None. Any non-AND op (OR/XOR/NOT), missing child, or non-label atom rejects the whole spine (all-or-nothing).
generic_where_clause simplified: walk BooleanExpr when items[0] is one (AND-spine path); for the single-atom path (items[0] is _ExpressionSlice) call _match_bare_label_atom on items[0].text directly. No top-level AND splitting on the WHERE body text inside this function. Mixed/OR/XOR/NOT inputs fall through to raw expr (with expr_tree retained for downstream slices).

graphistry/tests/compute/gfql/cypher/test_parser.py:

4 focused unit tests for the new helpers (positive + negative, single atom, AND-spine, OR/NOT/mixed rejection).
Updated stale comment in test_parse_where_triple_and_label_conjunction_through_generic_where_clause to point at the walker (was: split_top_level_and).

CHANGELOG.md: entry under Internal.

What did not change

expr_split.py and split_top_level_and — still imported by other call sites in the parser (where_pattern_* canonicalization at parser.py:524) and by passes/predicate_pushdown.py. Their retirement is the eventual goal of gfql/ir: expose Lark and_op/or_op tree in WhereClause so passes walk structure instead of re-parsing text #1200 slices 3+, not this PR.
Binder, predicate pushdown, lowering — untouched.
Grammar — no change to rule alternatives, priorities, or routing. Lark still produces the same parse tree; only the transformer's interpretation of generic_where_clause is restructured.

Behavioral compatibility

End-to-end I/O contract is identical to master across the WHERE-shape matrix:

Input	Routing	`predicates` count	`expr`	`expr_tree`
`WHERE n:Admin`	`where_predicates` rule	1	None	None
`WHERE n:Admin AND n:Active [AND ...]`	`generic_where_clause`, walker lifts	n	None	None
`WHERE n.x = 1 AND n:Admin`	`where_predicates` rule	2	None	None
`WHERE n:Admin AND n.x = 1`	`generic_where_clause`, walker rejects (mixed)	0	text	BooleanExpr
`WHERE n:Admin OR n:Active`	`generic_where_clause`, walker rejects (OR root)	0	text	BooleanExpr
`WHERE NOT n:Admin`	`generic_where_clause`, walker rejects (NOT root)	0	text	BooleanExpr
`WHERE (n:Admin OR n:Active) AND n.k = 1`	`generic_where_clause`, walker rejects (mixed leaves)	0	text	BooleanExpr
`WHERE n.x = 'A AND B'` (string with AND)	`where_predicates` rule	1	None	None

#1207's conformance matrix and test_binder_expr_tree.py continue to pass — verified.

Test plan

pytest graphistry/tests/compute/gfql/cypher/test_parser.py — 111 pass (was 108 + 3 new helper tests; +1 for the existing single-label test still green)
pytest graphistry/tests/compute/gfql/cypher/test_binder.py — 44 pass
pytest graphistry/tests/compute/gfql/cypher/test_boolean_expr.py — slice-1 producer tests, all pass
pytest graphistry/tests/compute/gfql/cypher/test_where_bool_conformance.py — slice-2 conformance, all pass
pytest graphistry/tests/compute/gfql/cypher/test_binder_expr_tree.py — slice-2 binder, all pass
pytest graphistry/tests/compute/gfql/cypher/test_lowering.py — all pass
pytest graphistry/tests/compute/gfql/ (full sweep) — 1528 passed, 80 skipped, 15 xfailed, no regressions
mypy graphistry/compute/gfql/cypher/parser.py — clean

DRY scoreboard delta

Before this PR (master post-#1207):

Site	Status
`generic_where_clause` regex+split	duplicated `and_op` + `bare_label_predicate_expr`
`expr_split.py::split_top_level_and`	still used at `parser.py:524` (where-pattern canon) and `predicate_pushdown.py`
`_BARE_LABEL_PREDICATE_RE` (parser:305)	one of two copies in repo
`_CYPHER_BARE_LABEL_PREDICATE_RE` (lowering:505)	second copy

After this PR:

Site	Status
`generic_where_clause`	structural walker over Lark's parse tree (#1194 closed)
`expr_split.py::split_top_level_and`	unchanged (1 fewer caller — `parser.py` retains its other call site)
`_BARE_LABEL_PREDICATE_RE`	unchanged — still load-bearing inside the walker (atom-text fullmatch guard)
Lowering's regex	unchanged — independent path, deferred to a later slice

Closes

Closes #1194.

gfql/ir: expose Lark and_op/or_op tree in WhereClause so passes walk structure instead of re-parsing text #1200 — umbrella IR cleanup (Lark = single source of truth for boolean structure)
feat(gfql/ir): expose Lark boolean-expression tree as WhereClause.expr_tree (#1200 slice 1/N) #1202 — slice 1: WhereClause.expr_tree exposed (merged)
feat(gfql/binder): walk expr_tree, emit one BoundPredicate per top-level AND (#1200 slice 2/N) #1207 — slice 2: binder walks expr_tree, emits per-conjunct BoundPredicates (merged)
Binder follow-up: replace regex WHERE label narrowing with AST-based analysis #1125 / fix(gfql/binder): replace regex WHERE label narrowing with AST-based analysis (#1125) #1193 — original WHERE label narrowing regex removal (the fullmatch invariant preserved here)

Supersedes

This PR replaces the abandoned approach in PR #1203 (separate Earley sub-parser instance), which would have introduced a second Lark parser path. With slice 1 already merged on master, the structural walker is the cleaner mechanism and aligns with the #1200 trajectory.

Replace the text-level `split_top_level_and` + regex loop in `generic_where_clause` with a structural walker over Lark's parsed `BooleanExpr` (and `_ExpressionSlice` for the single-atom path). Closes #1194 in the spirit of the issue: the grammar already declares the structured rule we want, and slice 1 (#1202) already exposes the parsed tree on `WhereClause.expr_tree` — `generic_where_clause` should trust that source of truth instead of re-splitting the WHERE body on top-level AND. Adds two helpers: - `_match_bare_label_atom(text)` — fullmatch atom text against `_BARE_LABEL_PREDICATE_RE` (preserves the #1125 false-positive guard). - `_lift_label_only_and_spine(node)` — DFS over a `BooleanExpr` AND-spine; returns lifted `(alias, labels)` tuples iff every leaf is a bare-label atom, else `None` (all-or-nothing — mixed/OR/XOR/NOT fall through). Behaviorally identical to master across the WHERE-shape matrix: single label, multi-AND chains, mixed label+property, OR/XOR/NOT, parenthesized boolean trees, and string-literal false-positive guards all produce the same `WhereClause` shape. Verified by parser, binder, slice-1 producer, slice-2 conformance + binder_expr_tree, and lowering test suites (1528 passed, no regressions). Targeted mypy clean. Adds 4 focused unit tests for the new helpers in `test_parser.py` and updates the stale comment in `test_parse_where_triple_and_label_conjunction_through_generic_where_clause` to reference the walker (was: `split_top_level_and`). `expr_split.py` and the regex are unchanged — both still have other callers (where-pattern canonicalization, lowering); their retirement is deferred to later slices of #1200. Supersedes the Earley-sub-parser approach in PR #1203, which would have introduced a second Lark parser path; the structural walker is cleaner now that slice 1 has landed on master. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Compose with #1214's parser invariant `(expr is None) == (expr_tree is None)`: - `generic_where_clause` now relies on #1214's single-atom synthesis to guarantee `expr_tree` is always a `BooleanExpr`. The walker (`_lift_label_only_and_spine`) runs over that uniform tree. - Drops the `_ExpressionSlice` branch this PR added — redundant now that #1214 wraps non-BooleanExpr operands as single-atom trees upstream of this point. - Merged CHANGELOG: kept both entries (this PR's #1194 walker entry + #1214's invariant entry); updated this PR's entry to note the composition with #1214. Verified: 940 targeted (parser/binder/boolean_expr/conformance/ binder_expr_tree/where_clause_expr_tree_invariant/lowering) passed including #1214's 11 new invariant tests; 1539 full GFQL sweep passed. mypy on parser.py clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lmeyerov · 2026-04-25T12:27:12Z

PR Review: PR #1209 — fix(cypher): walk expr_tree to lift label-only WHERE predicates (#1194)

Branch: m4-1194-expr-tree-walk @ 163d885f6
Base: master (post-#1207 merge at 0b1fb0cc6)
Stack: not stacked, no downstream PRs
Files changed: graphistry/compute/gfql/cypher/parser.py, graphistry/tests/compute/gfql/cypher/test_parser.py, CHANGELOG.md
LOC: +140 / −39 (3 files, 1 commit)
CI: 131/131 conclusive checks pass; mergeable; awaiting REVIEW_REQUIRED.

Blockers (must fix before merge)

None.

Important (should fix)

None.

Suggestions (nice to have)

[Quality] Test fixtures _bx_atom / _bx_branch in test_parser.py:457-465 use # type: ignore[no-untyped-def] instead of explicit return annotations. Sibling helpers (_parse_query, _match_parts) are typed properly; aligning is a 4-LOC cleanup. Not blocking. — test_parser.py:457-465.

(Other Wave-1 SUGGESTIONs — right-assoc test, direct _ExpressionSlice unit test, nested mixed AND test, CHANGELOG bullet style — all rejected or downgraded under adversarial pressure. See adversarial/wave-1.md for proofs.)

Human checks required (operator decision needed)

Approve and merge. No code-level concerns.
Confirm Closes #1194 triggers correctly on merge (squash or merge-commit; squash is the predominant pattern for slice-style PRs in this repo).
Optional post-merge cleanup: archive the abandoned m4-1194-lark-where-ambiguity branch (git push origin --delete m4-1194-lark-where-ambiguity) if no longer needed for cherry-pick reference.

Rejected / False positives (with proof)

Wave 1 finding	Disposition	Proof
Right-assoc AND test missing	DOWNGRADED	Lark grammar at `parser.py:177-183` is left-recursive; right-assoc trees only via `grouped_expr` parens which the walker handles via the same `op == "and"` recursion.
Direct unit test for `_ExpressionSlice` branch missing	DOWNGRADED	E2E coverage exists via `test_parse_where_single_label_predicate_produces_structured_ast` and `test_parse_where_label_predicate` (both verified to route through the branch via plan.md Step 12.4 debug probe).
Nested mixed-AND test missing	DOWNGRADED	DFS visits every leaf; depth-1 mixed-leaf rejection test covers the same code path.
CHANGELOG should use bullets	REJECTED	#1202 / #1207 entries in the same Internal section also use long-paragraph form; matching convention.

Methodology

Run per /home/lmeyerov/Work/graphistry/.agents/skills/review/SKILL.md.

Phase 0: PR stack check, scope identification, plan setup at plans/1194-lark-where-predicates-ambiguity/review-1209/.
Phase 1: research → research/policies.md (relevance/freshness/key-rules per discovered policy doc — cypher_frontend_ci_gates.md, conformance.md, predicates_checklist.md, ARCHITECTURE.md, AGENTS.md, mypy.ini, pyproject.toml, .github/workflows/ci.yml).
Phase 2 Wave 1: 9 dimension passes (spec, correctness, tests, security, dry, quality, architecture, operability, conventions) with per-dimension report files. Adversarial pressure test in adversarial/wave-1.md.
Phase 2 Wave 2: confirmation pass with re-pressure of the surviving SUGGESTION + new dimension probes (producer invariants, primitive-literal atom edge case, items[0] type handling, WITH-WHERE separation, backtick aliases, CHANGELOG re-read). 0 new findings.
Convergence: 2 consecutive non-significant waves → converged.

Cross-references

Issue: gfql/cypher: resolve Lark ambiguity between where_predicates and expr (grammar-level follow-up to #1125) #1194 (closes on merge)
Umbrella: gfql/ir: expose Lark and_op/or_op tree in WhereClause so passes walk structure instead of re-parsing text #1200; corrected slice-tracker comment at gfql/ir: expose Lark and_op/or_op tree in WhereClause so passes walk structure instead of re-parsing text #1200 (comment)
Slice 1: PR feat(gfql/ir): expose Lark boolean-expression tree as WhereClause.expr_tree (#1200 slice 1/N) #1202 (merged)
Slice 2: PR feat(gfql/binder): walk expr_tree, emit one BoundPredicate per top-level AND (#1200 slice 2/N) #1207 (merged)
Slice 5: gfql/ir: migrate WhereClause.expr readers to expr_tree, then remove the field (#1200 slice 5) #1213 (delegated)
Grammar refactor follow-up: Cypher/GFQL WHERE cannot mix generic row predicates with pattern predicates #1031 (delegated; would unblock expr_split.py deletion as a side effect)
Superseded: PR fix(cypher): grammar-resolve where_predicates vs expr ambiguity (#1194) #1203 (closed; would have introduced a second Lark parser instance — rejected for architectural reasons)

Recommendation

Approve and merge. No defects. The single confirmed SUGGESTION is a 4-LOC test-fixture typing cleanup that does not warrant blocking the merge.

lmeyerov · 2026-04-25T12:27:34Z

+    assert _match_bare_label_atom("'A:B'") is None  # quoted string fragment
+
+
+def _bx_atom(text: str):  # type: ignore[no-untyped-def]


[SUGGESTION, non-blocking] These two helpers use # type: ignore[no-untyped-def] while sibling helpers in this file (_parse_query, _match_parts) carry explicit type annotations. ~4 LOC cleanup:

def _bx_atom(text: str) -> "BooleanExpr": ... def _bx_branch(op: str, left: "BooleanExpr", right: Optional["BooleanExpr"] = None) -> "BooleanExpr": ...

Trivial; can land in this PR or as a follow-up. Surfaced by Wave 1 of the review skill (plans/1194-lark-where-predicates-ambiguity/review-1209/waves/wave-1/quality/report.md).

lmeyerov commented Apr 25, 2026

View reviewed changes

lmeyerov merged commit bb72ca0 into master Apr 25, 2026
133 checks passed

lmeyerov deleted the m4-1194-expr-tree-walk branch April 26, 2026 03:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cypher): walk expr_tree to lift label-only WHERE predicates (#1194)#1209

fix(cypher): walk expr_tree to lift label-only WHERE predicates (#1194)#1209
lmeyerov merged 2 commits intomasterfrom
m4-1194-expr-tree-walk

lmeyerov commented Apr 25, 2026

Uh oh!

lmeyerov commented Apr 25, 2026

Uh oh!

lmeyerov Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		assert _match_bare_label_atom("'A:B'") is None # quoted string fragment


		def _bx_atom(text: str): # type: ignore[no-untyped-def]

Conversation

lmeyerov commented Apr 25, 2026

Summary

What changed

What did not change

Behavioral compatibility

Test plan

DRY scoreboard delta

Closes

Related

Supersedes

Uh oh!

lmeyerov commented Apr 25, 2026

PR Review: PR #1209 — fix(cypher): walk expr_tree to lift label-only WHERE predicates (#1194)

Blockers (must fix before merge)

Important (should fix)

Suggestions (nice to have)

Human checks required (operator decision needed)

Rejected / False positives (with proof)

Methodology

Cross-references

Recommendation

Uh oh!

lmeyerov Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant