Skip to content

Cypher/GFQL: replace bounded reentry hidden-column handshake with an explicit ReentryPlan #987

@lmeyerov

Description

@lmeyerov

Problem

Bounded MATCH ... WITH ... MATCH ... reentry currently works, but the internal design is hard to reason about.

Today the same concept is spread across multiple mechanisms:

  • start_nodes_query in graphistry/compute/gfql/cypher/lowering.py
  • hidden __cypher_reentry_* columns and expression rewrites in graphistry/compute/gfql/cypher/lowering.py
  • _cypher_entity_projection_meta side-channel metadata
  • _compiled_query_reentry_state() stitching logic in graphistry/compute/gfql_unified.py

That makes the compiler/runtime contract implicit instead of explicit. A senior compiler / graph language / GPU engineer joining the project would have to reconstruct the model from several places at once.

Why This Matters

  • harder to audit vectorization and backend purity
  • harder to extend to the next row-seeded features
  • hidden invariants across compiler + runtime increase maintenance risk
  • lowering.py and gfql_unified.py are longer and conceptually denser than they need to be

Proposed Refactor

Treat bounded reentry as a first-class plan/runtime concept rather than a protocol assembled from side channels.

Recommended steps:

  1. Introduce an explicit ReentryPlan (or SeededMatchPlan) dataclass.

    • carried alias
    • id column
    • carried scalar outputs
    • ordering contract
    • trailing match alias contract
  2. Replace the current hidden-property rewrite protocol.

    • stop encoding carried scalars as synthetic __cypher_reentry_* property accesses
    • instead carry an explicit scalar mapping in the plan contract
  3. Move runtime stitching into a dedicated reentry module.

    • keep gfql_unified.py as dispatch/orchestration
    • move reentry-specific assembly/validation into a smaller targeted runtime helper module
  4. Make row-order and seed-row semantics explicit.

    • preserve order as part of the contract, not as an inferred merge behavior
  5. Split lowering.py by concern where useful.

    • general lowering
    • result projection planning
    • bounded reentry planning

Non-Goals

Success Criteria

  • existing bounded-reentry semantics stay green
  • current pandas + cudf bounded-reentry tests stay green
  • the reentry contract becomes readable from one place
  • low-hundreds LOC reduction across lowering.py + gfql_unified.py is plausible from collapsing duplicate protocol layers
  • follow-on work for multi-alias row carriers / optional null-extension becomes easier to reason about

Context

Current bounded-reentry hardening/validation work is in PR #975.
This issue is the follow-on cleanup/refactor lane, not a request to reopen that PR scope.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions