Skip to content

fix(cypher): grammar-resolve where_predicates vs expr ambiguity (#1194)#1203

Closed
lmeyerov wants to merge 3 commits intomasterfrom
m4-1194-lark-where-ambiguity
Closed

fix(cypher): grammar-resolve where_predicates vs expr ambiguity (#1194)#1203
lmeyerov wants to merge 3 commits intomasterfrom
m4-1194-lark-where-ambiguity

Conversation

@lmeyerov
Copy link
Copy Markdown
Contributor

Summary

  • resolves gfql/cypher: resolve Lark ambiguity between where_predicates and expr (grammar-level follow-up to #1125) #1194 by replacing regex/text-split lifting in generic_where_clause with grammar-driven reparsing of ambiguous WHERE clauses
  • keeps primary Cypher parser on lalr (to avoid query-shape regressions), and adds a targeted earley where_clause subparser for ambiguous label-predicate cases
  • hardens binder label narrowing to an explicit all-or-nothing policy: only narrow when the full WHERE predicate set is label-only
  • updates parser/binder tests for routing and conservative mixed-predicate behavior

Why this shape

  • a full parser switch to earley fixed ambiguity but regressed clause attachment (for example WITH ... WHERE ...)
  • targeted where_clause reparsing preserves existing global parse behavior while removing the regex fallback and still using grammar-level disambiguation

Validation

  • PYTHONPATH=. uv run --no-project --with lark --with pytest python -m pytest graphistry/tests/compute/gfql/cypher/test_parser.py -q
  • PYTHONPATH=. uv run --no-project --with lark --with pytest python -m pytest graphistry/tests/compute/gfql/cypher/test_binder.py -q
  • PYTHONPATH=. uv run --no-project --with lark --with pytest python -m pytest graphistry/tests/compute/gfql/cypher/test_lowering.py -q
  • PYTHONPATH=. uv run --no-project --with ruff ruff check graphistry/compute/gfql/cypher/parser.py graphistry/compute/gfql/frontends/cypher/binder.py graphistry/tests/compute/gfql/cypher/test_parser.py graphistry/tests/compute/gfql/cypher/test_binder.py

Copy link
Copy Markdown
Contributor Author

Review rerun (wave 3/4 equivalent) found no blocking issues, but one suggestion-level hardening item:

  • In graphistry/compute/gfql/cypher/parser.py (generic_where_clause), the new sub-parse path currently uses a broad except Exception: pass around _where_clause_parser().parse(...) + transformer.
  • Suggest narrowing this to the expected parse/transform failures (for example LarkError and known parser/validation exceptions), so unrelated runtime defects do not get silently downgraded into generic raw-expression fallback.

Current implementation is functionally sound for #1194 scope; this is mainly a robustness/observability improvement for future maintenance.

@lmeyerov
Copy link
Copy Markdown
Contributor Author

Closing in favor of #1209, which addresses #1194 via the cleaner approach now that #1200 slice 1 (#1202) has landed.

What changed since this PR was opened:

Why this PR is superseded:

Replacement (#1209):

  • Adds _match_bare_label_atom and _lift_label_only_and_spine helpers in cypher/parser.py.
  • generic_where_clause walks BooleanExpr (AND-spine) or _ExpressionSlice.text (single atom) directly.
  • Same I/O contract as this PR for every WHERE shape; verified across the parser/binder/conformance/slice-2 binder_expr_tree/lowering suites (1528 passed).
  • Shrinks the duplicate-parsing surface flagged by gfql/ir: expose Lark and_op/or_op tree in WhereClause so passes walk structure instead of re-parsing text #1200 by retiring the regex+split path inside generic_where_clause.

The branch (m4-1194-lark-where-ambiguity) is left undeleted for reference.

@lmeyerov lmeyerov closed this Apr 25, 2026
lmeyerov added a commit that referenced this pull request Apr 25, 2026
… (#1209)

Replace the text-level `split_top_level_and` + regex loop in
`generic_where_clause` with a structural walker over Lark's parsed
`BooleanExpr` (and `_ExpressionSlice` for the single-atom path).

Closes #1194 in the spirit of the issue: the grammar already declares
the structured rule we want, and slice 1 (#1202) already exposes the
parsed tree on `WhereClause.expr_tree` — `generic_where_clause` should
trust that source of truth instead of re-splitting the WHERE body on
top-level AND.

Adds two helpers:
- `_match_bare_label_atom(text)` — fullmatch atom text against
  `_BARE_LABEL_PREDICATE_RE` (preserves the #1125 false-positive guard).
- `_lift_label_only_and_spine(node)` — DFS over a `BooleanExpr` AND-spine;
  returns lifted `(alias, labels)` tuples iff every leaf is a bare-label
  atom, else `None` (all-or-nothing — mixed/OR/XOR/NOT fall through).

Behaviorally identical to master across the WHERE-shape matrix: single
label, multi-AND chains, mixed label+property, OR/XOR/NOT, parenthesized
boolean trees, and string-literal false-positive guards all produce the
same `WhereClause` shape. Verified by parser, binder, slice-1 producer,
slice-2 conformance + binder_expr_tree, and lowering test suites
(1528 passed, no regressions). Targeted mypy clean.

Adds 4 focused unit tests for the new helpers in `test_parser.py` and
updates the stale comment in
`test_parse_where_triple_and_label_conjunction_through_generic_where_clause`
to reference the walker (was: `split_top_level_and`).

`expr_split.py` and the regex are unchanged — both still have other
callers (where-pattern canonicalization, lowering); their retirement
is deferred to later slices of #1200.

Supersedes the Earley-sub-parser approach in PR #1203, which would
have introduced a second Lark parser path; the structural walker is
cleaner now that slice 1 has landed on master.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov deleted the m4-1194-lark-where-ambiguity branch April 26, 2026 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gfql/cypher: resolve Lark ambiguity between where_predicates and expr (grammar-level follow-up to #1125)

1 participant