Skip to content

Cypher/GFQL: introduce a first-class row-carrier IR for multi-stage vectorized row semantics #989

@lmeyerov

Description

@lmeyerov

Problem

The current local Cypher/GFQL execution model does not have a first-class row-carrier IR.

As a result, row-seeded semantics are handled in feature-specific ways instead of through one reusable contract. The bounded reentry work in PR #975 is the clearest example, but the same architectural pressure will recur for:

  • multi-stage MATCH ... WITH ... MATCH ...
  • multi-alias WITH / RETURN / ORDER BY
  • OPTIONAL MATCH null extension
  • grouped row-preserving aggregation
  • future vectorized GPU/backend audits

Today those semantics are spread across compiler rewrites, projection metadata, hidden columns, and runtime stitching. That is workable for a bounded slice, but it is not a good long-term model for a graph query compiler/runtime.

Why This Matters

  • makes row semantics harder to extend cleanly
  • encourages feature-by-feature protocol glue instead of a reusable IR
  • complicates vectorization reasoning and GPU/backend parity review
  • increases the chance that future Cypher support growth expands lowering.py and gfql_unified.py in ad hoc ways

Proposed Direction

Introduce a first-class row-carrier / seeded-row IR for local Cypher/GFQL execution.

Core ideas:

  1. Represent carried row state explicitly.

    • row ids / seed ids
    • bound aliases
    • carried scalar columns
    • ordering contract
    • null-extension contract where applicable
  2. Lower row-seeded features into that IR instead of feature-specific side channels.

  3. Keep the implementation columnar/vectorized.

    • pandas/cudf-friendly
    • no generic Python row-loop fallback
  4. Let specific features become clients of the same row model.

    • bounded reentry
    • later multi-alias WITH
    • later OPTIONAL MATCH
    • later multiplicity-preserving grouped aggregation

Relationship To Other Issues

Non-Goals

Success Criteria

  • bounded reentry can be expressed as a normal client of the row IR
  • future row-seeded Cypher features stop requiring bespoke hidden-column / metadata handshakes
  • vectorization/backend expectations are clearer to audit
  • lowering.py and runtime orchestration can shrink over time instead of accumulating one-off row mechanics

Context

PR #975 is landing the bounded-reentry feature/hardening slice.
Issue #987 tracks the narrower follow-on cleanup for that implementation.
This issue tracks the broader architectural direction beyond that one feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions