count(col) returns incorrect values if col is NULL #14198

avikivity · 2023-06-09T18:50:26Z

count(col) (unlike count(*) or count(1)) is supposed not to count rows where col is NULL. However, it does, if the column has a collection type:

cqlsh> create table ks1.tab (id int PRIMARY KEY , i int, l frozen<list<int>>);                                                                         
cqlsh> insert into ks1.tab (id, i) values (1, 1);                                                                                                  
cqlsh> insert into ks1.tab (id, l) values (2, [2]);
cqlsh> select * from ks1.tab;

 id | i    | l
----+------+------
  1 |    1 | null
  2 | null |  [2]

(2 rows)

cqlsh> select count(i) from ks1.tab;

 system.count(i)
-----------------
               1

(1 rows)

cqlsh> select count(l) from ks1.tab;

 count
-------
     2

(1 rows)

Checked with 5.2.1. 5.0.5 known to be good.

The text was updated successfully, but these errors were encountered:

avikivity · 2023-06-09T18:54:48Z

Likely due to

    } else if (name.has_keyspace()
                ? name == COUNT_NAME
                : name.name == COUNT_NAME.name) {
        auto arg_types = get_arguments(COUNT_NAME.name);
        if (arg_types.size() != 1) {
            throw std::runtime_error("count() function requires only 1 argument");
        }

        auto& arg = arg_types[0];
        return aggregate_fcts::make_
            return aggregate_fcts::make_count_rows_function();
        }

which converts COUNT_NAME to COUNT_ROWS_NAME.

avikivity · 2023-06-09T20:51:17Z

In 5.0 it works by accident. count(frozen<list<int>>) happens to match the signature of count(blob) because of the weakly_assignable thing.

count(col), unlike count(*), does not count rows for which col is NULL. However, if col's data type is not a scalar (e.g. a collection, tuple, or user-defined type) it behaves like count(*), counting NULLs too. The cause is that get_dynamic_aggregate() converts count() to the count(*) version. It works for scalars because get_dynamic_aggregate() intentionally fails to match scalar arguments, and functions::get() then matches the arguments against the pre-declared count functions. As we can only pre-declare count(scalar) (there's an infinite number of non-scalar types), we change the approach to be the same as min/max: we make count() a generic function. In fact count(col) is much better as a generic function, as it only examines its input to see if it is NULL. A unit test is added. Fixes scylladb#14198.

count(col), unlike count(*), does not count rows for which col is NULL. However, if col's data type is not a scalar (e.g. a collection, tuple, or user-defined type) it behaves like count(*), counting NULLs too. The cause is that get_dynamic_aggregate() converts count() to the count(*) version. It works for scalars because get_dynamic_aggregate() intentionally fails to match scalar arguments, and functions::get() then matches the arguments against the pre-declared count functions. As we can only pre-declare count(scalar) (there's an infinite number of non-scalar types), we change the approach to be the same as min/max: we make count() a generic function. In fact count(col) is much better as a generic function, as it only examines its input to see if it is NULL. A unit test is added. It passes with Cassandra as well. Fixes scylladb#14198.

avikivity · 2023-11-08T19:39:23Z

5.1 is also broken.

This is difficult to backport due to all the refactoring, but perhaps not difficult to rework.

avikivity changed the title ~~count(col~~ count(col) returns incorrect values if col is NULL Jun 9, 2023

This was referenced Jun 9, 2023

cql3: functions: fix count(col) for non-scalar types #14199

Closed

Revert "configure: Switch debug build from -O0 to -Og" #14197

Closed

scylladb-promoter closed this as completed in 78f4ee3 Jun 13, 2023

scylladb-promoter added the Backport candidate label Jun 13, 2023

DoronArazii added this to the 5.4 milestone Jun 20, 2023

denesb removed the Backport candidate label Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

count(col) returns incorrect values if col is NULL #14198

count(col) returns incorrect values if col is NULL #14198

avikivity commented Jun 9, 2023 •

edited

avikivity commented Jun 9, 2023

avikivity commented Jun 9, 2023

avikivity commented Nov 8, 2023

count(col) returns incorrect values if col is NULL #14198

count(col) returns incorrect values if col is NULL #14198

Comments

avikivity commented Jun 9, 2023 • edited

avikivity commented Jun 9, 2023

avikivity commented Jun 9, 2023

avikivity commented Nov 8, 2023

avikivity commented Jun 9, 2023 •

edited