Skip to content

gfql/cypher: resolve Lark ambiguity between where_predicates and expr (grammar-level follow-up to #1125) #1194

@lmeyerov

Description

@lmeyerov

Context

PR #1193 (issue #1125) hardened WHERE label narrowing by replacing a fragile regex in `binder.py` with AST-derived narrowing plus a text-split fallback in `parser.py::generic_where_clause`.

The text-split fallback exists because of a Lark grammar ambiguity: `WHERE n:Admin AND n:Active` can match BOTH the structured rule `where_predicates: where_predicate ("AND"i where_predicate)*` (parser.py:127) AND the generic expression path `where_clause: "WHERE"i expr -> generic_where_clause` (parser.py:126). Lark's ambiguity resolution prefers the generic path, so the AND-joined label predicates land in `WhereClause.expr` as raw text rather than as structured `WhereClause.predicates`.

The current PR works around this at the transformer level — `generic_where_clause` re-parses the raw text and lifts it back into structured predicates via `_split_top_level_and_terms` + `_BARE_LABEL_PREDICATE_RE`. This is correct and well-tested, but architecturally backward: the grammar already declares the structured path we want; we should let it win.

Proposed follow-up

Fix the ambiguity at the grammar layer so AND-joined bare label predicates naturally route to `where_predicates`:

  1. Investigate why `expr` currently wins the ambiguity contest — likely because `bare_label_predicate_expr` is reachable from `?predicate` inside `?and_expr`.
  2. Either:
    • Remove the `bare_label_predicate_expr` alternative from the `expr` path, forcing label predicates to go through `where_predicates` exclusively, OR
    • Raise the priority of `where_predicates` (Lark rule priority annotation), OR
    • Introduce an explicit `where_clause` disambiguator rule.
  3. Once the grammar disambiguates correctly, the text-split logic in `generic_where_clause` becomes dead code for AND-joined labels — simplify to the single-label case (still a valid generic fallback) or remove entirely.

Tests to keep green

  • All 10 label-narrowing tests in `test_binder.py` (single, double, triple AND, multi-alias, multi-label, lowercase-and, mixed label+property, XOR/OR/NOT conservative, string-literal false-positive guards).
  • All 4 parser-level AST tests in `test_parser.py`.

Constraints

  • Must not regress existing `where_predicates` behavior (structured comparison predicates, IS NULL, CONTAINS, etc.).
  • Must keep string-literal false-positive protection (the `fullmatch` invariant in the current fallback).

Priority

Low-to-medium. The current text-level workaround is correct and tested; this is an architectural cleanup that removes a duplicate implementation of "split on top-level AND" and lets the grammar be the source of truth.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions