Skip to content

[isthmus][SubstraitToCalcite] Aggregation nullability differs from Calcite #336

@anlinc

Description

@anlinc

Description

For the following example query:

SELECT sum(count_of_a)
FROM (
   SELECT count(a) as count_of_a
   FROM example
)

substrait-java

  • When building the Calcite relation, substrait-java says that the output expression, sum(count_of_a), is nullable.
  • This is because, naively, sum always has a nullable return type in the Substrait extensions spec.

Example (failing) test:

@Test
public void nested() {
    Plan.Root root =
            b.root(
                    b.aggregate(
                            input -> b.grouping(),
                            input -> List.of(b.sum(input, 0)),
                            b.aggregate(
                                    input -> b.grouping(),
                                      input -> List.of(b.count(input, 0)),
                                    commonTable
                            )));

    var relNode = converter.convert(root.getInput());
    assertRowMatch(relNode.getRowType(), R.I64);
}

Calcite

Calcite, on the other hand, is slightly more precise. Given that the output of count(a) (count_of_a) is non-nullable, sum(count_of_a) is also non-nullable.

https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/type/ReturnTypes.java#L1433

The difference in nullability when Calcite is generated from Substrait can pose problems for certain Calcite rules, such as AggregateMergeRule. When AggregateMergeRule merges sum(count_of_a) into simply count(a), the output type will become non-nullable -- which violates the initial type bindings.

TBD: This might require a change to Substrait itself where sum uses MIRROR nullability?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions