Context
PR #1217 (closes #1031 slice 1) swaps the Cypher parser from Lark LALR(1) to Earley. This solves `#1031`'s mixed-pattern+row WHERE problem cleanly, but as an architectural side effect lifts an implicit syntax gate that was previously hiding a missing semantic gate.
Pre-#1217 (LALR):
- Queries like `WHERE a.x = 1 OR b.y = 2` failed at parse time because LALR(1) couldn't unify FIRST sets across the OR boundary.
- The implicit "if I can't parse it, you can't write it" was the de facto semantic gate.
- No explicit Cypher-side validator existed for these shapes — none was needed, because the syntax gate caught them.
Post-#1217 (Earley):
- Earley parses OR/NOT/XOR among row predicates without complaint.
- Binder routes the result through the existing raw-expr path: `BoundPredicate(expression="a.x = 1 OR b.y = 2")`.
- Lowering's runtime evaluator (`graphistry.compute.gfql.expr_parser.parse_expr` at `lowering.py:7031`) handles AND/OR/NOT/XOR with correct SQL three-valued logic — so simple cases produce correct results.
- No static gate exists for compositions we haven't validated.
Asymmetry made visible
PR #1217 explicitly gates pattern-side compositions (NOT-pattern, OR-pattern, multi-positive-pattern) with E108 at the lift step in `parser.py::generic_where_clause` → `_build_where_with_pattern_lift`. Row-boolean compositions silently pass through:
| Newly-accepted shape |
Static gate? |
Outcome |
| `WHERE n.p1 = X OR n.p2 = Y` |
❌ none |
silent accept → expr_parser → runtime |
| `WHERE NOT n.p1 = X` |
❌ none |
silent accept |
| `WHERE n.p1 = X XOR n.p2 = Y` |
❌ none |
silent accept |
| `WHERE NOT (n)-[:R]->()` |
✅ E108 at lift |
explicit "unsupported" |
| `WHERE (n)-[:R]->() OR n.x = 1` |
✅ E108 at lift |
explicit "unsupported" |
| `WHERE (n)-[:R*]->() AND (m)-[:R*]->()` |
✅ E108 at lift |
explicit "unsupported" |
Empirical evidence
PR #1217 added 5 native execution tests covering simple disjunction shapes (TCK match-where1-10 mirror, same-alias OR, NOT with NaN, OR-then-AND, multi-row union). All pass — the runtime evaluator does the right thing for property-comparison disjunctions under SQL three-valued logic.
What's not validated
Compositional edge cases that may interact with parts of the planner/runtime that were designed assuming AND-only:
- Predicate pushdown — `pushdown_safety.py:58` explicitly says "Compound OR is not analyzed". AND has explicit handling (line 73); OR doesn't. Likely safe-by-default (predicates with OR don't push) but unverified.
- OPTIONAL MATCH null-extension — `is_null_rejecting()` is AND-aware; behavior on `WHERE x.field = 'val' OR ` where `x` is null-extended is unverified.
- Type coercion in OR operands — `WHERE n.p = 12 OR n.p = 'twelve'` mixed-type branches.
- OR with NULL literals — `WHERE n.p = 12 OR n.p IS NULL` three-valued logic edge cases.
- Nested compositions — `(a OR b) AND (c OR d)`, `NOT (a OR b)`, etc.
- Pushdown into PatternMatch — predicates with OR being pushed should respect the OR-aware safety analysis.
Decision space (intentionally open)
Several reasonable solutions; pick based on appetite + signal:
Option A: Static validator that rejects un-validated shapes
Option B: Accept simpler cases provably; reject the rest
- Define a "safe subset" of OR/NOT/XOR shapes (e.g., property comparisons with non-null-extended aliases, no nested compositions). Accept those; reject the rest.
- Pro: surfaces the gradient (some OR is supported, some isn't) more honestly.
- Con: "safe subset" is hard to specify formally; risks fragmentation.
Option C: Implement full disjunction support across the pipeline
- Audit pushdown, OPTIONAL MATCH null-extension, type coercion paths for OR-correctness. Add tests for each. Then accept all of OR/NOT/XOR with full validated support.
- Pro: "correctness by construction" across the planner.
- Con: largest scope; may need M-series collaboration on pushdown's OR-awareness.
Option D: Defer + document
Why this isn't M-series's responsibility
The IR verifier (M2-PR3) checks plan-shape invariants (op_id uniqueness, schema consistency, dangling refs, optional-arm nullability) — not "is this Cypher WHERE a supported subset". That's a Cypher front-end concern. The pattern-shape gates in #1217 live in `parser.py::_build_where_with_pattern_lift`, which is the right home for the row-boolean gates too.
Related
Priority
p3 — same as #1031 (architectural cleanup, no user-visible bug for simple cases; correctness risk for unvalidated compositions).
Context
PR #1217 (closes #1031 slice 1) swaps the Cypher parser from Lark LALR(1) to Earley. This solves `#1031`'s mixed-pattern+row WHERE problem cleanly, but as an architectural side effect lifts an implicit syntax gate that was previously hiding a missing semantic gate.
Pre-#1217 (LALR):
Post-#1217 (Earley):
Asymmetry made visible
PR #1217 explicitly gates pattern-side compositions (NOT-pattern, OR-pattern, multi-positive-pattern) with E108 at the lift step in `parser.py::generic_where_clause` → `_build_where_with_pattern_lift`. Row-boolean compositions silently pass through:
Empirical evidence
PR #1217 added 5 native execution tests covering simple disjunction shapes (TCK match-where1-10 mirror, same-alias OR, NOT with NaN, OR-then-AND, multi-row union). All pass — the runtime evaluator does the right thing for property-comparison disjunctions under SQL three-valued logic.
What's not validated
Compositional edge cases that may interact with parts of the planner/runtime that were designed assuming AND-only:
Decision space (intentionally open)
Several reasonable solutions; pick based on appetite + signal:
Option A: Static validator that rejects un-validated shapes
Option B: Accept simpler cases provably; reject the rest
Option C: Implement full disjunction support across the pipeline
Option D: Defer + document
Why this isn't M-series's responsibility
The IR verifier (M2-PR3) checks plan-shape invariants (op_id uniqueness, schema consistency, dangling refs, optional-arm nullability) — not "is this Cypher WHERE a supported subset". That's a Cypher front-end concern. The pattern-shape gates in #1217 live in `parser.py::_build_where_with_pattern_lift`, which is the right home for the row-boolean gates too.
Related
Priority
p3 — same as #1031 (architectural cleanup, no user-visible bug for simple cases; correctness risk for unvalidated compositions).