Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combination of COUNT with GROUP BY is different from Cassandra in case of no matches #12477

Closed
nyh opened this issue Jan 9, 2023 · 7 comments

Comments

@nyh
Copy link
Contributor

nyh commented Jan 9, 2023

This issue reports a difference between the results of Cassandra and Scylla in the case of a quite-esoteric request. This difference is probably mostly insignificant, and perhaps should never be fixed - but I'd still like to write it down in case someone in the future rediscovers the same difference, or perhaps someone today wants to think what code caused this difference in the first place, and could this maybe result in bigger, less esoteric, incompatibility?

The difference is in queries like:

select p, c, count(c) from {table} where p=XXXX group by p, c

Where the p=XXXX matches nothing (c in () also causes tghe same problem). Cassandra returns no results at all for this query, which makes sense.

Scylla, on the other hand, returns one result row: p=null, c=null, count(c)=0. This is not only different from Cassandra, it's also strange, because we don't really expect to ever see p=null or c=null returned from a select grouped by p,c - which is supposed to list valid p,c combinations.

Updated list of cql-pytest tests reproducing this issue:

  • test_aggregate.py::test_count_and_group_by_row_none
  • test_aggregate.py::test_count_and_group_by_partition_none
  • cassandra_tests/validation/operations/compact_storage_test.py::testGroupByWithoutPaging.
  • cassandra_tests/validation/operations/select_group_by_test.py::testGroupByWithoutPaging
nyh added a commit to nyh/scylla that referenced this issue Jan 9, 2023
This patch adds a few simple functional test for the COUNT aggregation
feature, and in particular how count(*) differs from count(v), and how
COUNT interacts with the GROUP BY operation.

3 of the 8 new tests are marked xfail, and reproduce two newly
discovered issues:

Refs scylladb#12475: Global COUNT with empty IN results in an internal error
             instead of the expected empty count.

Refs scylladb#12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
nyh added a commit to nyh/scylla that referenced this issue Jan 16, 2023
This patch adds a few simple functional test for the COUNT aggregation
feature, and in particular how count(*) differs from count(v), and how
COUNT interacts with the GROUP BY operation.

3 of the 8 new tests are marked xfail, and reproduce two newly
discovered issues:

Refs scylladb#12475: Global COUNT with empty IN results in an internal error
             instead of the expected empty count.

Refs scylladb#12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
@DoronArazii DoronArazii added this to the 5.x milestone Jan 24, 2023
nyh added a commit to nyh/scylla that referenced this issue Feb 2, 2023
This patch fixes scylladb#12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).

The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).

The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.

The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":

Refs scylladb#12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
nyh added a commit to nyh/scylla that referenced this issue Feb 6, 2023
This patch fixes scylladb#12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).

The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).

The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.

The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":

Refs scylladb#12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
denesb pushed a commit that referenced this issue Feb 7, 2023
This patch fixes #12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).

The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).

The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.

The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":

Refs #12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12715
denesb pushed a commit that referenced this issue Feb 7, 2023
This patch fixes #12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).

The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).

The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.

The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":

Refs #12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12715
nyh added a commit to nyh/scylla that referenced this issue Feb 12, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/CompactStorageTest.java into our cql-pytest
framework.

This very large test file includes 86 tests for various types of
operations and corner cases of WITH COMPACT STORAGE tables.

All 86 tests pass on Cassandra (except one using a deprecated feature
that needs to be specially enabled). 30 of the tests fail on Scylla
reproducing 7 already-known Scylla issues and 7 previously-unknown issues:

Already known issues:

Refs scylladb#3882: Support "ALTER TABLE DROP COMPACT STORAGE"
Refs scylladb#4244: Add support for mixing token, multi- and single-column
            restrictions
Refs scylladb#5361: LIMIT doesn't work when using GROUP BY
Refs scylladb#5362: LIMIT is not doing it right when using GROUP BY
Refs scylladb#5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs scylladb#7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax
Refs scylladb#8627: Cleanly reject updates with indexed values where value > 64k

New issues:

Refs scylladb#12471: Range deletions on COMPACT STORAGE is not supported
Refs scylladb#12474: DELETE prints misleading error message suggesting
             ALLOW FILTERING would work
Refs scylladb#12477: Combination of COUNT with GROUP BY is different from
             Cassandra in case of no matches
Refs scylladb#12479: SELECT DISTINCT should refuse GROUP BY with clustering column
Refs scylladb#12526: Support filtering on COMPACT tables
Refs scylladb#12749: Unsupported empty clustering key in COMPACT table
Refs scylladb#12815: Hidden column "value" in compact table isn't completely hidden

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
denesb pushed a commit that referenced this issue Feb 20, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/CompactStorageTest.java into our cql-pytest
framework.

This very large test file includes 86 tests for various types of
operations and corner cases of WITH COMPACT STORAGE tables.

All 86 tests pass on Cassandra (except one using a deprecated feature
that needs to be specially enabled). 30 of the tests fail on Scylla
reproducing 7 already-known Scylla issues and 7 previously-unknown issues:

Already known issues:

Refs #3882: Support "ALTER TABLE DROP COMPACT STORAGE"
Refs #4244: Add support for mixing token, multi- and single-column
            restrictions
Refs #5361: LIMIT doesn't work when using GROUP BY
Refs #5362: LIMIT is not doing it right when using GROUP BY
Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs #7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax
Refs #8627: Cleanly reject updates with indexed values where value > 64k

New issues:

Refs #12471: Range deletions on COMPACT STORAGE is not supported
Refs #12474: DELETE prints misleading error message suggesting
             ALLOW FILTERING would work
Refs #12477: Combination of COUNT with GROUP BY is different from
             Cassandra in case of no matches
Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column
Refs #12526: Support filtering on COMPACT tables
Refs #12749: Unsupported empty clustering key in COMPACT table
Refs #12815: Hidden column "value" in compact table isn't completely hidden

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12816
tgrabiec pushed a commit that referenced this issue Feb 20, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/CompactStorageTest.java into our cql-pytest
framework.

This very large test file includes 86 tests for various types of
operations and corner cases of WITH COMPACT STORAGE tables.

All 86 tests pass on Cassandra (except one using a deprecated feature
that needs to be specially enabled). 30 of the tests fail on Scylla
reproducing 7 already-known Scylla issues and 7 previously-unknown issues:

Already known issues:

Refs #3882: Support "ALTER TABLE DROP COMPACT STORAGE"
Refs #4244: Add support for mixing token, multi- and single-column
            restrictions
Refs #5361: LIMIT doesn't work when using GROUP BY
Refs #5362: LIMIT is not doing it right when using GROUP BY
Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs #7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax
Refs #8627: Cleanly reject updates with indexed values where value > 64k

New issues:

Refs #12471: Range deletions on COMPACT STORAGE is not supported
Refs #12474: DELETE prints misleading error message suggesting
             ALLOW FILTERING would work
Refs #12477: Combination of COUNT with GROUP BY is different from
             Cassandra in case of no matches
Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column
Refs #12526: Support filtering on COMPACT tables
Refs #12749: Unsupported empty clustering key in COMPACT table
Refs #12815: Hidden column "value" in compact table isn't completely hidden

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12816
@nyh
Copy link
Contributor Author

nyh commented Mar 7, 2023

This issue is apparently less "esoteric" and "insignificant" than I figured, because it was re-discovered and reported by a user in duplicate issue #13027.

nyh added a commit to nyh/scylla that referenced this issue Mar 9, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/SelectGroupByTest.java into our cql-pytest
framework.

This test file contains only 8 separate test functions, but each of them
is very long checking hundreds of different combinations of GROUP BY with
other things like LIMIT, ORDER BY, etc., so 6 out of the 7 tests fail on
Scylla on one of the bugs listed below - most of the tests actually fail
in multiple places due to multiple bugs. All tests pass on Cassandra.

The tests reproduce six already-known Scylla issues and one new issue:

Already known issues:

Refs scylladb#2060: Allow mixing token and partition key restrictions
Refs scylladb#5361: LIMIT doesn't work when using GROUP BY
Refs scylladb#5362: LIMIT is not doing it right when using GROUP BY
Refs scylladb#5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs scylladb#12477: Combination of COUNT with GROUP BY is different from Cassandra
             in case of no matches
Refs scylladb#12479: SELECT DISTINCT should refuse GROUP BY with clustering column

A new issue discovered by these tests:

Refs scylladb#13109: Incorrect sort order when combining IN, GROUP BY and ORDER BY

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
denesb pushed a commit that referenced this issue Mar 15, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/SelectGroupByTest.java into our cql-pytest
framework.

This test file contains only 8 separate test functions, but each of them
is very long checking hundreds of different combinations of GROUP BY with
other things like LIMIT, ORDER BY, etc., so 6 out of the 7 tests fail on
Scylla on one of the bugs listed below - most of the tests actually fail
in multiple places due to multiple bugs. All tests pass on Cassandra.

The tests reproduce six already-known Scylla issues and one new issue:

Already known issues:

Refs #2060: Allow mixing token and partition key restrictions
Refs #5361: LIMIT doesn't work when using GROUP BY
Refs #5362: LIMIT is not doing it right when using GROUP BY
Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs #12477: Combination of COUNT with GROUP BY is different from Cassandra
             in case of no matches
Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column

A new issue discovered by these tests:

Refs #13109: Incorrect sort order when combining IN, GROUP BY and ORDER BY

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13126
avikivity pushed a commit that referenced this issue Apr 16, 2023
This patch fixes #12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).

The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).

The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.

The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":

Refs #12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12715

(cherry picked from commit 3ba011c)
avikivity pushed a commit that referenced this issue Apr 16, 2023
This patch fixes #12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).

The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).

The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.

The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":

Refs #12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12715

(cherry picked from commit 3ba011c)
denesb pushed a commit that referenced this issue Apr 17, 2023
This patch fixes #12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).

The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).

The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.

The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":

Refs #12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12715

(cherry picked from commit 3ba011c)
denesb pushed a commit that referenced this issue Apr 17, 2023
This patch fixes #12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).

The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).

The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.

The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":

Refs #12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12715

(cherry picked from commit 3ba011c)
denesb pushed a commit that referenced this issue Apr 17, 2023
This patch fixes #12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).

The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).

The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.

The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":

Refs #12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12715

(cherry picked from commit 3ba011c)
avikivity pushed a commit that referenced this issue May 4, 2023
This patch fixes #12475, where an aggregation (e.g., COUNT(*), MIN(v))
of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()")
resulted in an internal error instead of the "zero" result that each
aggregator expects (e.g., 0 for COUNT, null for MIN).

The problem is that normally our aggregator forwarder picks the nodes
which hold the relevant partition(s), forwards the request to each of
them, and then combines these results. When there are no partitions,
the query is sent to no node, and we end up with an empty result set
instead of the "zero" results. So in this patch we recognize this
case and build those "zero" results (as mentioned above, these aren't
always 0 and depend on the aggregation function!).

The patch also adds two tests reproducing this issue in a fairly general
way (e.g., several aggregators, different aggregation functions) and
confirming the patch fixes the bug.

The test also includes two additional tests for COUNT aggregation, which
uncovered an incompatibility with Cassandra which is still not fixed -
so these tests are marked "xfail":

Refs #12477: Combining COUNT with GROUP by results with empty results
             in Cassandra, and one result with empty count in Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12715

(cherry picked from commit 3ba011c)
@ptrsmrn
Copy link
Contributor

ptrsmrn commented May 29, 2023

@avikivity has made a changes in aggregate functions recently in #13105, perhaps it's resolved this issue?
EDIT: Seems it doesn't, because the tests that are reproducing the issue still fail with the latest OSS master.

avikivity added a commit to avikivity/scylladb that referenced this issue Jun 26, 2023
A GROUP BY combined with aggregation should produce a single
row per group, except for empty groups. This is in contrast
to an aggregation without GROUP BY, which produces a single
row no matter what.

The existing code only considered the case of no grouping
and forced a row into the result, but this caused an unwanted
row if grouping was used.

Fix by refining the check to also consider GROUP BY.

XFAIL tests are relaxed.

Fixes scylladb#12477.

Note, forward_service requires that aggregation produce
exactly one row, but since it can't work with grouping,
it isn't affected.
avikivity added a commit to avikivity/scylladb that referenced this issue Jun 28, 2023
A GROUP BY combined with aggregation should produce a single
row per group, except for empty groups. This is in contrast
to an aggregation without GROUP BY, which produces a single
row no matter what.

The existing code only considered the case of no grouping
and forced a row into the result, but this caused an unwanted
row if grouping was used.

Fix by refining the check to also consider GROUP BY.

XFAIL tests are relaxed.

Fixes scylladb#12477.

Note, forward_service requires that aggregation produce
exactly one row, but since it can't work with grouping,
it isn't affected.
@DoronArazii DoronArazii modified the milestones: 5.x, 5.4 Jul 4, 2023
@avikivity
Copy link
Member

Should we backport it? Probably not. Looks like an edge case, and better not to risk stable versions.

@nyh
Copy link
Contributor Author

nyh commented Jul 10, 2023

I agree it doesn't look like a very important case (when I first reported it I went as far as saying it "perhaps should never be fixed"), but we did get an actual user running into it so apparently it's less esoteric than we thought.

I would say that if it's a clean backport we can do it, but if the fix involved changes to areas which changed recently like expressions, then it's not worth the effort.

@shantanugithub
Copy link

@nyh @avikivity This issue becomes a major concern when we use spring data cassandra which directly maps the attributes to the entity fields defined. Values like null for primitive and NaN will immediately fail and we would need to introduce another layer between for validating this. Can we please consider backporting the fix to 5.2 series ?

@nyh
Copy link
Contributor Author

nyh commented Jul 19, 2023

@avikivity any objections that I backport this? We already had two separate user reports on this issue so it's apparently less obscure than we thought.

I see this is a one-line patch so I hope that even if it's not a clean backport, it won't be a difficult one. I also hope that by the nature of being a very simple one-line patch, there is low risk to the stability of older version (the change only kicks in if aggregate and GROUP BY is used).

nyh added a commit that referenced this issue Aug 13, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/CompactStorageTest.java into our cql-pytest
framework.

This very large test file includes 86 tests for various types of
operations and corner cases of WITH COMPACT STORAGE tables.

All 86 tests pass on Cassandra (except one using a deprecated feature
that needs to be specially enabled). 30 of the tests fail on Scylla
reproducing 7 already-known Scylla issues and 7 previously-unknown issues:

Already known issues:

Refs #3882: Support "ALTER TABLE DROP COMPACT STORAGE"
Refs #4244: Add support for mixing token, multi- and single-column
            restrictions
Refs #5361: LIMIT doesn't work when using GROUP BY
Refs #5362: LIMIT is not doing it right when using GROUP BY
Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs #7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax
Refs #8627: Cleanly reject updates with indexed values where value > 64k

New issues:

Refs #12471: Range deletions on COMPACT STORAGE is not supported
Refs #12474: DELETE prints misleading error message suggesting
             ALLOW FILTERING would work
Refs #12477: Combination of COUNT with GROUP BY is different from
             Cassandra in case of no matches
Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column
Refs #12526: Support filtering on COMPACT tables
Refs #12749: Unsupported empty clustering key in COMPACT table
Refs #12815: Hidden column "value" in compact table isn't completely hidden

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12816

(cherry picked from commit 328cdb2)
nyh added a commit that referenced this issue Aug 13, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/CompactStorageTest.java into our cql-pytest
framework.

This very large test file includes 86 tests for various types of
operations and corner cases of WITH COMPACT STORAGE tables.

All 86 tests pass on Cassandra (except one using a deprecated feature
that needs to be specially enabled). 30 of the tests fail on Scylla
reproducing 7 already-known Scylla issues and 7 previously-unknown issues:

Already known issues:

Refs #3882: Support "ALTER TABLE DROP COMPACT STORAGE"
Refs #4244: Add support for mixing token, multi- and single-column
            restrictions
Refs #5361: LIMIT doesn't work when using GROUP BY
Refs #5362: LIMIT is not doing it right when using GROUP BY
Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs #7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax
Refs #8627: Cleanly reject updates with indexed values where value > 64k

New issues:

Refs #12471: Range deletions on COMPACT STORAGE is not supported
Refs #12474: DELETE prints misleading error message suggesting
             ALLOW FILTERING would work
Refs #12477: Combination of COUNT with GROUP BY is different from
             Cassandra in case of no matches
Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column
Refs #12526: Support filtering on COMPACT tables
Refs #12749: Unsupported empty clustering key in COMPACT table
Refs #12815: Hidden column "value" in compact table isn't completely hidden

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12816

(cherry picked from commit 328cdb2)
(cherry picked from commit e11561e)
Modified for 5.1 to comment out error-path tests for "unset" values what
are silently ignored (instead of being detected) in this version.
@denesb
Copy link
Contributor

denesb commented Dec 18, 2023

This is a minor edge case but apparently user hit it. It is a simple fix which had time to cook on master for months.
@nyh can you backport this? It doesn't apply cleanly to 5.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment