Skip to content

Cypher aggregates (collect, count(*), sum) over relationship-pattern rows rejected: runtime collapses rows before aggregation #1343

@lmeyerov

Description

@lmeyerov

Summary

Cypher aggregates (collect, count(*), sum, etc.) applied over rows produced by a MATCH containing a relationship pattern are rejected: the GFQL runtime collapses relationship rows before the aggregate fires, so multiplicity-sensitive aggregation cannot run. This blocks the LDBC SNB IC1 collect(CASE ...) shapes for unis / companies and shapes in several other IC* lanes.

Repros (minimal, both fail on master 41d865ac9)

Shape A — collect(alias.prop) over OPTIONAL MATCH:

MATCH (p:Person {id: $pid})
OPTIONAL MATCH (p)-[s:STUDY_AT]->(uni:University)
WITH p, collect(uni.name) AS unis
RETURN p.id AS personId, unis

Shape B — collect(CASE ...) over OPTIONAL MATCH (IC1 official):

MATCH (p:Person {id: $pid})
OPTIONAL MATCH (p)-[s:STUDY_AT]->(uni:University)
WITH p, collect(CASE uni.name WHEN null THEN null ELSE [uni.name, s.classYear] END) AS unis
RETURN p.id AS personId, unis

Expected vs observed

  • Expected: unis is the per-person collection of all matched STUDY_AT relationships' uni.name (or [uni.name, s.classYear]).
  • Observed: GFQLValidationError: [unsupported-cypher-query] This Cypher aggregate would need repeated MATCH rows from a relationship pattern, but the current runtime collapses those rows before aggregation. Queries like MATCH (a)-[r]->(b) RETURN a, count(*) are not supported yet.

Source

  • Guard: graphistry/compute/gfql/cypher/lowering.py:2417
  • Helper: _reject_unsound_relationship_multiplicity_aggregates (same file, ~:2425)
  • Predicate: _is_multiplicity_sensitive_aggregate

Benchmark evidence

  • IC1 (results/runs/dgx-spark-ic1-conformance-2026-04-05/): official query uses collect(CASE company.name ... END) AS companies and the unis variant.
  • IC4 (results/runs/dgx-spark-ic4-conformance-2026-04-05/): partial; sum/count over multi-stage WITH is now mostly cleared by fix(gfql): preserve alias property columns through non-final WITH aggregate (#1054) #1057, but collect() over relationship rows remains blocked here.
  • Other IC* lanes (IC5, IC6, IC7, IC8, IC11, IC12) use various collect() / count() aggregates over relationship rows; the current adapter workarounds compute those locally because of this guard.

Acceptance criteria

  • Both Shape A and Shape B succeed and produce the expected per-person collection.
  • Additional accepted shapes (verified by tests):
    • MATCH (a)-[r:R]->(b) RETURN a, count(*)
    • MATCH (a)-[r:R]->(b) WITH a, count(r) AS deg RETURN a.id, deg
    • MATCH (a)-[r:R]->(b) WITH a, sum(r.weight) AS total RETURN a.id, total
  • The relationship-multiplicity guard at lowering.py:2417 is replaced or scoped down so that aggregates which deliberately need relationship-row multiplicity admit through.
  • Regression tests added under graphistry/tests/compute/gfql/cypher/: at minimum collect(alias.prop) over OPTIONAL MATCH (1 edge), collect(CASE ...) over OPTIONAL MATCH (1 edge), count(*) over relationship pattern, sum(edge.prop) over relationship pattern.

Subsystem / scope

  • graphistry/compute/gfql/cypher/lowering.py (aggregate dispatch + _reject_unsound_relationship_multiplicity_aggregates)
  • graphistry/compute/gfql/row/pipeline.py (bindings-row aggregate execution path)
  • Out of scope: aggregate-after-aggregate stacking; this issue scopes the single-stage relationship-multiplicity gap.

Related (not a duplicate)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions