Problem
The current local Cypher/GFQL execution model does not have a first-class row-carrier IR.
As a result, row-seeded semantics are handled in feature-specific ways instead of through one reusable contract. The bounded reentry work in PR #975 is the clearest example, but the same architectural pressure will recur for:
- multi-stage
MATCH ... WITH ... MATCH ...
- multi-alias
WITH / RETURN / ORDER BY
OPTIONAL MATCH null extension
- grouped row-preserving aggregation
- future vectorized GPU/backend audits
Today those semantics are spread across compiler rewrites, projection metadata, hidden columns, and runtime stitching. That is workable for a bounded slice, but it is not a good long-term model for a graph query compiler/runtime.
Why This Matters
- makes row semantics harder to extend cleanly
- encourages feature-by-feature protocol glue instead of a reusable IR
- complicates vectorization reasoning and GPU/backend parity review
- increases the chance that future Cypher support growth expands
lowering.py and gfql_unified.py in ad hoc ways
Proposed Direction
Introduce a first-class row-carrier / seeded-row IR for local Cypher/GFQL execution.
Core ideas:
-
Represent carried row state explicitly.
- row ids / seed ids
- bound aliases
- carried scalar columns
- ordering contract
- null-extension contract where applicable
-
Lower row-seeded features into that IR instead of feature-specific side channels.
-
Keep the implementation columnar/vectorized.
- pandas/cudf-friendly
- no generic Python row-loop fallback
-
Let specific features become clients of the same row model.
- bounded reentry
- later multi-alias
WITH
- later
OPTIONAL MATCH
- later multiplicity-preserving grouped aggregation
Relationship To Other Issues
Non-Goals
Success Criteria
- bounded reentry can be expressed as a normal client of the row IR
- future row-seeded Cypher features stop requiring bespoke hidden-column / metadata handshakes
- vectorization/backend expectations are clearer to audit
lowering.py and runtime orchestration can shrink over time instead of accumulating one-off row mechanics
Context
PR #975 is landing the bounded-reentry feature/hardening slice.
Issue #987 tracks the narrower follow-on cleanup for that implementation.
This issue tracks the broader architectural direction beyond that one feature.
Problem
The current local Cypher/GFQL execution model does not have a first-class row-carrier IR.
As a result, row-seeded semantics are handled in feature-specific ways instead of through one reusable contract. The bounded reentry work in PR #975 is the clearest example, but the same architectural pressure will recur for:
MATCH ... WITH ... MATCH ...WITH/RETURN/ORDER BYOPTIONAL MATCHnull extensionToday those semantics are spread across compiler rewrites, projection metadata, hidden columns, and runtime stitching. That is workable for a bounded slice, but it is not a good long-term model for a graph query compiler/runtime.
Why This Matters
lowering.pyandgfql_unified.pyin ad hoc waysProposed Direction
Introduce a first-class row-carrier / seeded-row IR for local Cypher/GFQL execution.
Core ideas:
Represent carried row state explicitly.
Lower row-seeded features into that IR instead of feature-specific side channels.
Keep the implementation columnar/vectorized.
Let specific features become clients of the same row model.
WITHOPTIONAL MATCHRelationship To Other Issues
Non-Goals
Success Criteria
lowering.pyand runtime orchestration can shrink over time instead of accumulating one-off row mechanicsContext
PR #975 is landing the bounded-reentry feature/hardening slice.
Issue #987 tracks the narrower follow-on cleanup for that implementation.
This issue tracks the broader architectural direction beyond that one feature.