gfql/ir: expose Lark and_op/or_op tree in WhereClause so passes walk structure instead of re-parsing text

## Context

\`WhereClause\` currently has two shapes for a WHERE body:

\`\`\`python
class WhereClause:
    predicates: Tuple[Union[WherePredicate, WherePatternPredicate], ...] = ()
    expr: Optional[ExpressionText] = None
\`\`\`

Structured predicates land in \`.predicates\`.  Anything Lark's \`where_predicates\` rule cannot decompose (nested OR/XOR/NOT, mixed boolean forms, parenthesized subexpressions) falls into \`.expr\` as a raw text blob:

\`\`\`python
@dataclass
class ExpressionText:
    text: str
    span: Optional[SourceSpan] = None
\`\`\`

Lark's grammar already parses the expression structurally via dedicated rules (\`parser.py:177-183\`):

\`\`\`
?or_expr: xor_expr
        | or_expr \"OR\"i xor_expr            -> or_op
?xor_expr: and_expr
         | xor_expr \"XOR\"i and_expr         -> xor_op
?and_expr: not_expr
         | and_expr \"AND\"i not_expr         -> and_op
\`\`\`

…but the transformer throws the \`and_op\` / \`or_op\` / \`xor_op\` / \`not_op\` tree away and captures only \`self._slice(span)\`.  Downstream code (binder serialization → \`BoundPredicate.expression\` string → \`predicate_pushdown._split_conjuncts\`) re-parses the text via regex-based top-level-AND splitting (\`graphistry/compute/gfql/expr_split.py\`, introduced by PR #1198 / issue #1195).

## Why this is a smell

Two separate systems re-implement what Lark already computed:

1. \`parser.py::generic_where_clause\` re-parses label-only AND conjunctions.  **Tracked by #1194** (grammar-level disambiguation between \`where_predicates\` and \`expr\`).
2. \`predicate_pushdown._split_conjuncts\` re-parses top-level AND conjuncts in arbitrary boolean expressions.  **This issue.**

Each re-parse is a potential source of divergence (e.g., the \`\\\\b\` vs \`\\b\` bug fixed in #1198 \`_refs_for_segment\`).

## Proposed follow-up

Expose the parsed boolean-expression tree structurally in IR:

\`\`\`python
@dataclass
class BooleanExpr:
    op: Literal[\"and\", \"or\", \"xor\", \"not\", \"atom\"]
    left: Optional[\"BooleanExpr\"] = None
    right: Optional[\"BooleanExpr\"] = None
    atom_text: Optional[str] = None          # leaf: raw predicate text
    atom_span: Optional[SourceSpan] = None

@dataclass
class WhereClause:
    predicates: Tuple[Union[WherePredicate, WherePatternPredicate], ...] = ()
    expr_tree: Optional[BooleanExpr] = None  # replaces ExpressionText
    span: Optional[SourceSpan] = None
\`\`\`

1. Parser transformer: implement \`and_op\` / \`or_op\` / \`xor_op\` / \`not_op\` methods that build \`BooleanExpr\` nodes; the leaf case captures atom text + span.
2. Binder: continue serializing leaves to \`BoundPredicate\` entries, but **emit one BoundPredicate per top-level AND conjunct** by walking \`BooleanExpr\` structure instead of producing one \`expression=where.expr.text\` blob.  The existing \`_split_conjuncts\` in predicate pushdown becomes redundant for most real inputs.
3. \`predicate_pushdown.py\`: once BoundPredicates already arrive as single conjuncts, \`_split_conjuncts\` is needed only for the residual case of nested subexpressions the binder chose not to flatten.  Optionally, drop it entirely once #1194 also lands.
4. Delete \`graphistry/compute/gfql/expr_split.py\` once both call sites are gone; Lark becomes the single source of truth for AND-splitting.

## Tests to keep green

- All 20 unit tests in \`test_expr_split.py\` (either migrate to test the new tree-walking, or delete).
- All predicate-pushdown tests in \`test_predicate_pushdown_pass.py\`, especially:
  - \`test_predicate_pushdown_splits_multi_alias_conjunct_and_narrows_references\` — multi-alias OPTIONAL MATCH regression lock for the \`\\b\` fix (#1198).
  - Partial-push / null-rejection cases for optional arms.
- All parser tests in \`test_parser.py\` including label-narrowing integration.

## Constraints

- \`BoundPredicate.expression\` is still a string (other code depends on this shape).  The refactor produces *more* BoundPredicates, not a structured replacement.
- Backward-compatible migration: add \`expr_tree\` alongside \`expr: Optional[ExpressionText]\`, migrate callers one by one, remove \`expr\` in a follow-up.

## Related

- #1194 — grammar-level Lark ambiguity (parser re-parse for generic_where_clause).
- #1195 / PR #1198 — predicate_pushdown regex bug + AND-splitter consolidation (introduced the \`expr_split.py\` helper this issue eliminates).
- #1125 / PR #1193 — original WHERE label narrowing regex removal.

## Priority

**p3** — architectural cleanup, no user-visible bug.  The current text-split is correct and tested; this change removes duplicated logic and lets the parser be authoritative.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gfql/ir: expose Lark and_op/or_op tree in WhereClause so passes walk structure instead of re-parsing text #1200

Context

Why this is a smell

Proposed follow-up

Tests to keep green

Constraints

Related

Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gfql/ir: expose Lark and_op/or_op tree in WhereClause so passes walk structure instead of re-parsing text #1200

Description

Context

Why this is a smell

Proposed follow-up

Tests to keep green

Constraints

Related

Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions