New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
count(col) returns incorrect values if col is NULL #14198
Milestone
Comments
Likely due to } else if (name.has_keyspace()
? name == COUNT_NAME
: name.name == COUNT_NAME.name) {
auto arg_types = get_arguments(COUNT_NAME.name);
if (arg_types.size() != 1) {
throw std::runtime_error("count() function requires only 1 argument");
}
auto& arg = arg_types[0];
return aggregate_fcts::make_
return aggregate_fcts::make_count_rows_function();
} which converts COUNT_NAME to COUNT_ROWS_NAME. |
avikivity
changed the title
count(col
count(col) returns incorrect values if col is NULL
Jun 9, 2023
In 5.0 it works by accident. count(frozen<list<int>>) happens to match the signature of count(blob) because of the weakly_assignable thing. |
avikivity
added a commit
to avikivity/scylladb
that referenced
this issue
Jun 9, 2023
count(col), unlike count(*), does not count rows for which col is NULL. However, if col's data type is not a scalar (e.g. a collection, tuple, or user-defined type) it behaves like count(*), counting NULLs too. The cause is that get_dynamic_aggregate() converts count() to the count(*) version. It works for scalars because get_dynamic_aggregate() intentionally fails to match scalar arguments, and functions::get() then matches the arguments against the pre-declared count functions. As we can only pre-declare count(scalar) (there's an infinite number of non-scalar types), we change the approach to be the same as min/max: we make count() a generic function. In fact count(col) is much better as a generic function, as it only examines its input to see if it is NULL. A unit test is added. Fixes scylladb#14198.
This was referenced Jun 9, 2023
avikivity
added a commit
to avikivity/scylladb
that referenced
this issue
Jun 10, 2023
count(col), unlike count(*), does not count rows for which col is NULL. However, if col's data type is not a scalar (e.g. a collection, tuple, or user-defined type) it behaves like count(*), counting NULLs too. The cause is that get_dynamic_aggregate() converts count() to the count(*) version. It works for scalars because get_dynamic_aggregate() intentionally fails to match scalar arguments, and functions::get() then matches the arguments against the pre-declared count functions. As we can only pre-declare count(scalar) (there's an infinite number of non-scalar types), we change the approach to be the same as min/max: we make count() a generic function. In fact count(col) is much better as a generic function, as it only examines its input to see if it is NULL. A unit test is added. Fixes scylladb#14198.
avikivity
added a commit
to avikivity/scylladb
that referenced
this issue
Jun 11, 2023
count(col), unlike count(*), does not count rows for which col is NULL. However, if col's data type is not a scalar (e.g. a collection, tuple, or user-defined type) it behaves like count(*), counting NULLs too. The cause is that get_dynamic_aggregate() converts count() to the count(*) version. It works for scalars because get_dynamic_aggregate() intentionally fails to match scalar arguments, and functions::get() then matches the arguments against the pre-declared count functions. As we can only pre-declare count(scalar) (there's an infinite number of non-scalar types), we change the approach to be the same as min/max: we make count() a generic function. In fact count(col) is much better as a generic function, as it only examines its input to see if it is NULL. A unit test is added. It passes with Cassandra as well. Fixes scylladb#14198.
avikivity
added a commit
to avikivity/scylladb
that referenced
this issue
Jun 12, 2023
count(col), unlike count(*), does not count rows for which col is NULL. However, if col's data type is not a scalar (e.g. a collection, tuple, or user-defined type) it behaves like count(*), counting NULLs too. The cause is that get_dynamic_aggregate() converts count() to the count(*) version. It works for scalars because get_dynamic_aggregate() intentionally fails to match scalar arguments, and functions::get() then matches the arguments against the pre-declared count functions. As we can only pre-declare count(scalar) (there's an infinite number of non-scalar types), we change the approach to be the same as min/max: we make count() a generic function. In fact count(col) is much better as a generic function, as it only examines its input to see if it is NULL. A unit test is added. It passes with Cassandra as well. Fixes scylladb#14198.
5.1 is also broken. This is difficult to backport due to all the refactoring, but perhaps not difficult to rework. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
count(col) (unlike count(*) or count(1)) is supposed not to count rows where col is NULL. However, it does, if the column has a collection type:
Checked with 5.2.1. 5.0.5 known to be good.
The text was updated successfully, but these errors were encountered: