You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Structured predicates land in `.predicates`. Anything Lark's `where_predicates` rule cannot decompose (nested OR/XOR/NOT, mixed boolean forms, parenthesized subexpressions) falls into `.expr` as a raw text blob:
…but the transformer throws the `and_op` / `or_op` / `xor_op` / `not_op` tree away and captures only `self._slice(span)`. Downstream code (binder serialization → `BoundPredicate.expression` string → `predicate_pushdown._split_conjuncts`) re-parses the text via regex-based top-level-AND splitting (`graphistry/compute/gfql/expr_split.py`, introduced by PR #1198 / issue #1195).
Why this is a smell
Two separate systems re-implement what Lark already computed:
Parser transformer: implement `and_op` / `or_op` / `xor_op` / `not_op` methods that build `BooleanExpr` nodes; the leaf case captures atom text + span.
Binder: continue serializing leaves to `BoundPredicate` entries, but emit one BoundPredicate per top-level AND conjunct by walking `BooleanExpr` structure instead of producing one `expression=where.expr.text` blob. The existing `_split_conjuncts` in predicate pushdown becomes redundant for most real inputs.
Partial-push / null-rejection cases for optional arms.
All parser tests in `test_parser.py` including label-narrowing integration.
Constraints
`BoundPredicate.expression` is still a string (other code depends on this shape). The refactor produces more BoundPredicates, not a structured replacement.
Backward-compatible migration: add `expr_tree` alongside `expr: Optional[ExpressionText]`, migrate callers one by one, remove `expr` in a follow-up.
p3 — architectural cleanup, no user-visible bug. The current text-split is correct and tested; this change removes duplicated logic and lets the parser be authoritative.
Context
`WhereClause` currently has two shapes for a WHERE body:
```python
class WhereClause:
predicates: Tuple[Union[WherePredicate, WherePatternPredicate], ...] = ()
expr: Optional[ExpressionText] = None
```
Structured predicates land in `.predicates`. Anything Lark's `where_predicates` rule cannot decompose (nested OR/XOR/NOT, mixed boolean forms, parenthesized subexpressions) falls into `.expr` as a raw text blob:
```python
@DataClass
class ExpressionText:
text: str
span: Optional[SourceSpan] = None
```
Lark's grammar already parses the expression structurally via dedicated rules (`parser.py:177-183`):
```
?or_expr: xor_expr
| or_expr "OR"i xor_expr -> or_op
?xor_expr: and_expr
| xor_expr "XOR"i and_expr -> xor_op
?and_expr: not_expr
| and_expr "AND"i not_expr -> and_op
```
…but the transformer throws the `and_op` / `or_op` / `xor_op` / `not_op` tree away and captures only `self._slice(span)`. Downstream code (binder serialization → `BoundPredicate.expression` string → `predicate_pushdown._split_conjuncts`) re-parses the text via regex-based top-level-AND splitting (`graphistry/compute/gfql/expr_split.py`, introduced by PR #1198 / issue #1195).
Why this is a smell
Two separate systems re-implement what Lark already computed:
Each re-parse is a potential source of divergence (e.g., the `\\b` vs `\b` bug fixed in #1198 `_refs_for_segment`).
Proposed follow-up
Expose the parsed boolean-expression tree structurally in IR:
```python
@DataClass
class BooleanExpr:
op: Literal["and", "or", "xor", "not", "atom"]
left: Optional["BooleanExpr"] = None
right: Optional["BooleanExpr"] = None
atom_text: Optional[str] = None # leaf: raw predicate text
atom_span: Optional[SourceSpan] = None
@DataClass
class WhereClause:
predicates: Tuple[Union[WherePredicate, WherePatternPredicate], ...] = ()
expr_tree: Optional[BooleanExpr] = None # replaces ExpressionText
span: Optional[SourceSpan] = None
```
Tests to keep green
Constraints
Related
Priority
p3 — architectural cleanup, no user-visible bug. The current text-split is correct and tested; this change removes duplicated logic and lets the parser be authoritative.