Cypher WHERE: static validation gap on row-boolean shapes (OR / NOT / XOR among row predicates) — emergent from #1217 Earley swap

## Context

PR #1217 (closes #1031 slice 1) swaps the Cypher parser from Lark LALR(1) to Earley.  This solves \`#1031\`'s mixed-pattern+row WHERE problem cleanly, but **as an architectural side effect** lifts an implicit syntax gate that was previously hiding a missing semantic gate.

**Pre-#1217** (LALR):
- Queries like \`WHERE a.x = 1 OR b.y = 2\` failed at parse time because LALR(1) couldn't unify FIRST sets across the OR boundary.
- The implicit \"if I can't parse it, you can't write it\" was the de facto semantic gate.
- No explicit Cypher-side validator existed for these shapes — none was needed, because the syntax gate caught them.

**Post-#1217** (Earley):
- Earley parses OR/NOT/XOR among row predicates without complaint.
- Binder routes the result through the existing raw-expr path: \`BoundPredicate(expression=\"a.x = 1 OR b.y = 2\")\`.
- Lowering's runtime evaluator (\`graphistry.compute.gfql.expr_parser.parse_expr\` at \`lowering.py:7031\`) handles AND/OR/NOT/XOR with correct SQL three-valued logic — so simple cases produce correct results.
- **No static gate exists** for compositions we haven't validated.

## Asymmetry made visible

PR #1217 explicitly gates pattern-side compositions (NOT-pattern, OR-pattern, multi-positive-pattern) with E108 at the lift step in \`parser.py::generic_where_clause\` → \`_build_where_with_pattern_lift\`.  Row-boolean compositions silently pass through:

| Newly-accepted shape | Static gate? | Outcome |
|---|---|---|
| \`WHERE n.p1 = X OR n.p2 = Y\` | ❌ none | silent accept → expr_parser → runtime |
| \`WHERE NOT n.p1 = X\` | ❌ none | silent accept |
| \`WHERE n.p1 = X XOR n.p2 = Y\` | ❌ none | silent accept |
| \`WHERE NOT (n)-[:R]->()\` | ✅ E108 at lift | explicit \"unsupported\" |
| \`WHERE (n)-[:R]->() OR n.x = 1\` | ✅ E108 at lift | explicit \"unsupported\" |
| \`WHERE (n)-[:R*]->() AND (m)-[:R*]->()\` | ✅ E108 at lift | explicit \"unsupported\" |

## Empirical evidence

PR #1217 added 5 native execution tests covering simple disjunction shapes (TCK match-where1-10 mirror, same-alias OR, NOT with NaN, OR-then-AND, multi-row union).  All pass — the runtime evaluator does the right thing for property-comparison disjunctions under SQL three-valued logic.

## What's not validated

Compositional edge cases that may interact with parts of the planner/runtime that were designed assuming AND-only:

1. **Predicate pushdown** — \`pushdown_safety.py:58\` explicitly says \"Compound OR is not analyzed\".  AND has explicit handling (line 73); OR doesn't.  Likely safe-by-default (predicates with OR don't push) but unverified.
2. **OPTIONAL MATCH null-extension** — \`is_null_rejecting()\` is AND-aware; behavior on \`WHERE x.field = 'val' OR <something>\` where \`x\` is null-extended is unverified.
3. **Type coercion in OR operands** — \`WHERE n.p = 12 OR n.p = 'twelve'\` mixed-type branches.
4. **OR with NULL literals** — \`WHERE n.p = 12 OR n.p IS NULL\` three-valued logic edge cases.
5. **Nested compositions** — \`(a OR b) AND (c OR d)\`, \`NOT (a OR b)\`, etc.
6. **Pushdown into PatternMatch** — predicates with OR being pushed should respect the OR-aware safety analysis.

## Decision space (intentionally open)

Several reasonable solutions; pick based on appetite + signal:

### Option A: Static validator that rejects un-validated shapes
- Walk \`expr_tree\` at the AST normalizer or binder.  Reject row-boolean shapes that touch null-extended aliases, interact with predicate pushdown, or contain mixed-type OR branches with E108.
- Pro: symmetric with the pattern-side gates in #1217.  Defense-in-depth.
- Con: may be over-restrictive; needs careful definition of \"supported subset\".

### Option B: Accept simpler cases provably; reject the rest
- Define a \"safe subset\" of OR/NOT/XOR shapes (e.g., property comparisons with non-null-extended aliases, no nested compositions).  Accept those; reject the rest.
- Pro: surfaces the gradient (some OR is supported, some isn't) more honestly.
- Con: \"safe subset\" is hard to specify formally; risks fragmentation.

### Option C: Implement full disjunction support across the pipeline
- Audit pushdown, OPTIONAL MATCH null-extension, type coercion paths for OR-correctness.  Add tests for each.  Then accept all of OR/NOT/XOR with full validated support.
- Pro: \"correctness by construction\" across the planner.
- Con: largest scope; may need M-series collaboration on pushdown's OR-awareness.

### Option D: Defer + document
- Ship #1217 as-is (already does).  Document the asymmetry in CHANGELOG.  Wait for bug reports or TCK-conformance sweep to surface specific failures, then fix targeted.
- Pro: lowest scope; ships value sooner.
- Con: silent-wrong-results risk if compositions interact badly with planner assumptions.

## Why this isn't M-series's responsibility

The IR verifier (M2-PR3) checks **plan-shape invariants** (op_id uniqueness, schema consistency, dangling refs, optional-arm nullability) — not \"is this Cypher WHERE a supported subset\".  That's a Cypher front-end concern.  The pattern-shape gates in #1217 live in \`parser.py::_build_where_with_pattern_lift\`, which is the right home for the row-boolean gates too.

## Related

- #1031 — parent issue; this is a side effect of the slice-1 grammar refactor.
- PR #1217 — Earley swap that exposes this gap.
- \`graphistry.compute.gfql.expr_parser.parse_expr\` — runtime evaluator (handles AND/OR/NOT/XOR with SQL three-valued logic).
- \`graphistry/compute/gfql/ir/pushdown_safety.py:58\` — \"Compound OR is not analyzed\" comment marks the boundary.

## Priority

**p3** — same as #1031 (architectural cleanup, no user-visible bug for simple cases; correctness risk for unvalidated compositions).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cypher WHERE: static validation gap on row-boolean shapes (OR / NOT / XOR among row predicates) — emergent from #1217 Earley swap #1219

Context

Asymmetry made visible

Empirical evidence

What's not validated

Decision space (intentionally open)

Option A: Static validator that rejects un-validated shapes

Option B: Accept simpler cases provably; reject the rest

Option C: Implement full disjunction support across the pipeline

Option D: Defer + document

Why this isn't M-series's responsibility

Related

Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Newly-accepted shape	Static gate?	Outcome
`WHERE n.p1 = X OR n.p2 = Y`	❌ none	silent accept → expr_parser → runtime
`WHERE NOT n.p1 = X`	❌ none	silent accept
`WHERE n.p1 = X XOR n.p2 = Y`	❌ none	silent accept
`WHERE NOT (n)-[:R]->()`	✅ E108 at lift	explicit "unsupported"
`WHERE (n)-[:R]->() OR n.x = 1`	✅ E108 at lift	explicit "unsupported"
`WHERE (n)-[:R]->() AND (m)-[:R]->()`	✅ E108 at lift	explicit "unsupported"

Cypher WHERE: static validation gap on row-boolean shapes (OR / NOT / XOR among row predicates) — emergent from #1217 Earley swap #1219

Description

Context

Asymmetry made visible

Empirical evidence

What's not validated

Decision space (intentionally open)

Option A: Static validator that rejects un-validated shapes

Option B: Accept simpler cases provably; reject the rest

Option C: Implement full disjunction support across the pipeline

Option D: Defer + document

Why this isn't M-series's responsibility

Related

Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions