Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIMIT is not doing it right when using GROUP BY #5362

Open
fgelcer opened this issue Nov 27, 2019 · 5 comments
Open

LIMIT is not doing it right when using GROUP BY #5362

fgelcer opened this issue Nov 27, 2019 · 5 comments

Comments

@fgelcer
Copy link

fgelcer commented Nov 27, 2019

as mentioned on #5361 , this is an effort for testing #2206 , and LIMIT is not behaving as expected (and not accordingly with C*):

cqlsh:test_paging_with_group_by> PAGING 1000
Page size: 1000
cqlsh:test_paging_with_group_by> SELECT a, b, c, d FROM test GROUP BY a, b LIMIT 4;

 a | b | c | d
---+---+---+---
 1 | 2 | 1 | 3
 1 | 4 | 2 | 6

(2 rows)
cqlsh:test_paging_with_group_by> SELECT a, b, c, d FROM test GROUP BY a, b LIMIT 5;

 a | b | c | d
---+---+---+---
 1 | 2 | 1 | 3
 1 | 4 | 2 | 6
 2 | 2 | 3 | 3

(3 rows)
cqlsh:test_paging_with_group_by> SELECT a, b, c, d FROM test GROUP BY a, b LIMIT 3;

 a | b | c | d
---+---+---+---
 1 | 2 | 1 | 3
 1 | 4 | 2 | 6

(2 rows)
cqlsh:test_paging_with_group_by> SELECT a, b, c, d FROM test GROUP BY a, b LIMIT 8;

 a | b | c | d
---+---+---+----
 1 | 2 | 1 |  3
 1 | 4 | 2 |  6
 2 | 2 | 3 |  3
 2 | 4 | 3 |  6
 4 | 8 | 2 | 12

(5 rows)
@fgelcer
Copy link
Author

fgelcer commented Nov 27, 2019

could be resolved by same PR as #5361 , but reporting apart as it doesn't behave the same.

@slivne slivne added this to the 3.3 milestone Nov 28, 2019
@slivne slivne modified the milestones: 3.3, 3.4 Feb 20, 2020
@slivne slivne modified the milestones: 4.0, 4.1 Apr 15, 2020
@slivne slivne modified the milestones: 4.1, 4.2 Jun 1, 2020
@slivne slivne modified the milestones: 4.2, 4.3 Nov 26, 2020
@slivne slivne modified the milestones: 4.3, 4.5 Jan 24, 2021
@slivne slivne modified the milestones: 4.5, 4.6 Mar 29, 2021
@dekimir
Copy link
Contributor

dekimir commented Sep 13, 2021

See diagnosis here: #5361 (comment)

@dekimir dekimir removed their assignment Sep 13, 2021
@slivne slivne modified the milestones: 4.6, 4.7 Nov 10, 2021
@DoronArazii DoronArazii modified the milestones: 5.0, 5.x Oct 12, 2022
@nyh
Copy link
Contributor

nyh commented Jan 9, 2023

This issue is also reproduced by the Cassandra unit test which I'm translating now, cassandra_tests/validation/operations/compact_storage_test.py::testGroupByWithoutPaging.

nyh added a commit to nyh/scylla that referenced this issue Feb 12, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/CompactStorageTest.java into our cql-pytest
framework.

This very large test file includes 86 tests for various types of
operations and corner cases of WITH COMPACT STORAGE tables.

All 86 tests pass on Cassandra (except one using a deprecated feature
that needs to be specially enabled). 30 of the tests fail on Scylla
reproducing 7 already-known Scylla issues and 7 previously-unknown issues:

Already known issues:

Refs scylladb#3882: Support "ALTER TABLE DROP COMPACT STORAGE"
Refs scylladb#4244: Add support for mixing token, multi- and single-column
            restrictions
Refs scylladb#5361: LIMIT doesn't work when using GROUP BY
Refs scylladb#5362: LIMIT is not doing it right when using GROUP BY
Refs scylladb#5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs scylladb#7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax
Refs scylladb#8627: Cleanly reject updates with indexed values where value > 64k

New issues:

Refs scylladb#12471: Range deletions on COMPACT STORAGE is not supported
Refs scylladb#12474: DELETE prints misleading error message suggesting
             ALLOW FILTERING would work
Refs scylladb#12477: Combination of COUNT with GROUP BY is different from
             Cassandra in case of no matches
Refs scylladb#12479: SELECT DISTINCT should refuse GROUP BY with clustering column
Refs scylladb#12526: Support filtering on COMPACT tables
Refs scylladb#12749: Unsupported empty clustering key in COMPACT table
Refs scylladb#12815: Hidden column "value" in compact table isn't completely hidden

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
denesb pushed a commit that referenced this issue Feb 20, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/CompactStorageTest.java into our cql-pytest
framework.

This very large test file includes 86 tests for various types of
operations and corner cases of WITH COMPACT STORAGE tables.

All 86 tests pass on Cassandra (except one using a deprecated feature
that needs to be specially enabled). 30 of the tests fail on Scylla
reproducing 7 already-known Scylla issues and 7 previously-unknown issues:

Already known issues:

Refs #3882: Support "ALTER TABLE DROP COMPACT STORAGE"
Refs #4244: Add support for mixing token, multi- and single-column
            restrictions
Refs #5361: LIMIT doesn't work when using GROUP BY
Refs #5362: LIMIT is not doing it right when using GROUP BY
Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs #7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax
Refs #8627: Cleanly reject updates with indexed values where value > 64k

New issues:

Refs #12471: Range deletions on COMPACT STORAGE is not supported
Refs #12474: DELETE prints misleading error message suggesting
             ALLOW FILTERING would work
Refs #12477: Combination of COUNT with GROUP BY is different from
             Cassandra in case of no matches
Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column
Refs #12526: Support filtering on COMPACT tables
Refs #12749: Unsupported empty clustering key in COMPACT table
Refs #12815: Hidden column "value" in compact table isn't completely hidden

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12816
tgrabiec pushed a commit that referenced this issue Feb 20, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/CompactStorageTest.java into our cql-pytest
framework.

This very large test file includes 86 tests for various types of
operations and corner cases of WITH COMPACT STORAGE tables.

All 86 tests pass on Cassandra (except one using a deprecated feature
that needs to be specially enabled). 30 of the tests fail on Scylla
reproducing 7 already-known Scylla issues and 7 previously-unknown issues:

Already known issues:

Refs #3882: Support "ALTER TABLE DROP COMPACT STORAGE"
Refs #4244: Add support for mixing token, multi- and single-column
            restrictions
Refs #5361: LIMIT doesn't work when using GROUP BY
Refs #5362: LIMIT is not doing it right when using GROUP BY
Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs #7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax
Refs #8627: Cleanly reject updates with indexed values where value > 64k

New issues:

Refs #12471: Range deletions on COMPACT STORAGE is not supported
Refs #12474: DELETE prints misleading error message suggesting
             ALLOW FILTERING would work
Refs #12477: Combination of COUNT with GROUP BY is different from
             Cassandra in case of no matches
Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column
Refs #12526: Support filtering on COMPACT tables
Refs #12749: Unsupported empty clustering key in COMPACT table
Refs #12815: Hidden column "value" in compact table isn't completely hidden

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12816
@avelanarius
Copy link
Member

@nyh
Copy link
Contributor

nyh commented Mar 8, 2023

Updated list of tests reproducing this issue:

cql-pytest:
Tests specificially for this issue:

  • test_group_by.py::test_group_by_clustering_prefix_with_limit

Tests that currently fail on many different issues (so fixing this issue won't make them pass):

  • cassandra_tests/validation/operations/compact_storage_test.py::testGroupByWithoutPaging
  • cassandra_tests/validation/operations/select_group_by_test.py::testGroupByWithoutPaging
  • cassandra_tests/validation/operations/select_group_by_test.py::testGroupByWithStaticColumnsWithoutPaging
  • cassandra_tests/validation/operations/select_group_by_test.py::testGroupByWithPaging
  • cassandra_tests/validation/operations/select_group_by_test.py::testGroupByWithStaticColumnsWithPaging

Java Driver integration test:
https://github.com/scylladb/java-driver/blob/b3f3ebaf161b21e5c4840ec294595d4e4b39d9bf/driver-core/src/test/java/com/datastax/driver/core/querybuilder/QueryBuilderExecutionTest.java#L636

Additionally, dtest's paging_test.py appears to contain similar tests to those that Cassandra later translated to their unit tests and I later translated into cql-pytest tests (see above), but as noted above by @fgelcer these dtests failed and were modified to pass (!?) on Scylla, and now have several FIXMEs referencing issue 5362. When we fix this bug we'll probably need to fix those tests back (or just remove them, if they are duplicates of the unit tests?).

nyh added a commit to nyh/scylla that referenced this issue Mar 9, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/SelectGroupByTest.java into our cql-pytest
framework.

This test file contains only 8 separate test functions, but each of them
is very long checking hundreds of different combinations of GROUP BY with
other things like LIMIT, ORDER BY, etc., so 6 out of the 7 tests fail on
Scylla on one of the bugs listed below - most of the tests actually fail
in multiple places due to multiple bugs. All tests pass on Cassandra.

The tests reproduce six already-known Scylla issues and one new issue:

Already known issues:

Refs scylladb#2060: Allow mixing token and partition key restrictions
Refs scylladb#5361: LIMIT doesn't work when using GROUP BY
Refs scylladb#5362: LIMIT is not doing it right when using GROUP BY
Refs scylladb#5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs scylladb#12477: Combination of COUNT with GROUP BY is different from Cassandra
             in case of no matches
Refs scylladb#12479: SELECT DISTINCT should refuse GROUP BY with clustering column

A new issue discovered by these tests:

Refs scylladb#13109: Incorrect sort order when combining IN, GROUP BY and ORDER BY

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
nyh added a commit to nyh/scylla that referenced this issue Mar 9, 2023
The translated Cassandra unit tests in cassandra_tests/validation/operations/
reproduced three bugs in GROUP BY's interaction with LIMIT and PER PARTITION
LIMIT - issue scylladb#5361, scylladb#5362 and scylladb#5363. Unfortunately, those test functions
are very long, and each test fails on all of these issues and a few more,
making it difficult to use these tests to verify when those tests have
been fixed. In other words, ideally a patch for issue 5361 should un-xfail
some reproducing test for this issue - but all the existing tests will
continue to fail after fixing 5361, because of other remaining bugs.

So in this patch, I created a new test file test_group_by.py with my own
tests for the GROUP BY feature. I tried to explore the different
capabilities of the GROUP BY feature, its different success and error
paths, and how GROUP BY interacts with LIMIT and PER PARTITION LIMIT.
As usual, I created many small test functions and not one huge test
function, and as a result we now have 5 xfailing tests which each
reproduces one bug and when the bug is fixed, it will start to pass.

All tests added here pass on Cassandra.

Refs scylladb#5361
Refs scylladb#5362
Refs scylladb#5363

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
denesb pushed a commit that referenced this issue Mar 15, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/SelectGroupByTest.java into our cql-pytest
framework.

This test file contains only 8 separate test functions, but each of them
is very long checking hundreds of different combinations of GROUP BY with
other things like LIMIT, ORDER BY, etc., so 6 out of the 7 tests fail on
Scylla on one of the bugs listed below - most of the tests actually fail
in multiple places due to multiple bugs. All tests pass on Cassandra.

The tests reproduce six already-known Scylla issues and one new issue:

Already known issues:

Refs #2060: Allow mixing token and partition key restrictions
Refs #5361: LIMIT doesn't work when using GROUP BY
Refs #5362: LIMIT is not doing it right when using GROUP BY
Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs #12477: Combination of COUNT with GROUP BY is different from Cassandra
             in case of no matches
Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column

A new issue discovered by these tests:

Refs #13109: Incorrect sort order when combining IN, GROUP BY and ORDER BY

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13126
denesb pushed a commit that referenced this issue Mar 16, 2023
The translated Cassandra unit tests in cassandra_tests/validation/operations/
reproduced three bugs in GROUP BY's interaction with LIMIT and PER PARTITION
LIMIT - issue #5361, #5362 and #5363. Unfortunately, those test functions
are very long, and each test fails on all of these issues and a few more,
making it difficult to use these tests to verify when those tests have
been fixed. In other words, ideally a patch for issue 5361 should un-xfail
some reproducing test for this issue - but all the existing tests will
continue to fail after fixing 5361, because of other remaining bugs.

So in this patch, I created a new test file test_group_by.py with my own
tests for the GROUP BY feature. I tried to explore the different
capabilities of the GROUP BY feature, its different success and error
paths, and how GROUP BY interacts with LIMIT and PER PARTITION LIMIT.
As usual, I created many small test functions and not one huge test
function, and as a result we now have 5 xfailing tests which each
reproduces one bug and when the bug is fixed, it will start to pass.

All tests added here pass on Cassandra.

Refs #5361
Refs #5362
Refs #5363

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13136
nyh added a commit that referenced this issue Aug 13, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/CompactStorageTest.java into our cql-pytest
framework.

This very large test file includes 86 tests for various types of
operations and corner cases of WITH COMPACT STORAGE tables.

All 86 tests pass on Cassandra (except one using a deprecated feature
that needs to be specially enabled). 30 of the tests fail on Scylla
reproducing 7 already-known Scylla issues and 7 previously-unknown issues:

Already known issues:

Refs #3882: Support "ALTER TABLE DROP COMPACT STORAGE"
Refs #4244: Add support for mixing token, multi- and single-column
            restrictions
Refs #5361: LIMIT doesn't work when using GROUP BY
Refs #5362: LIMIT is not doing it right when using GROUP BY
Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs #7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax
Refs #8627: Cleanly reject updates with indexed values where value > 64k

New issues:

Refs #12471: Range deletions on COMPACT STORAGE is not supported
Refs #12474: DELETE prints misleading error message suggesting
             ALLOW FILTERING would work
Refs #12477: Combination of COUNT with GROUP BY is different from
             Cassandra in case of no matches
Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column
Refs #12526: Support filtering on COMPACT tables
Refs #12749: Unsupported empty clustering key in COMPACT table
Refs #12815: Hidden column "value" in compact table isn't completely hidden

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12816

(cherry picked from commit 328cdb2)
nyh added a commit that referenced this issue Aug 13, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/CompactStorageTest.java into our cql-pytest
framework.

This very large test file includes 86 tests for various types of
operations and corner cases of WITH COMPACT STORAGE tables.

All 86 tests pass on Cassandra (except one using a deprecated feature
that needs to be specially enabled). 30 of the tests fail on Scylla
reproducing 7 already-known Scylla issues and 7 previously-unknown issues:

Already known issues:

Refs #3882: Support "ALTER TABLE DROP COMPACT STORAGE"
Refs #4244: Add support for mixing token, multi- and single-column
            restrictions
Refs #5361: LIMIT doesn't work when using GROUP BY
Refs #5362: LIMIT is not doing it right when using GROUP BY
Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs #7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax
Refs #8627: Cleanly reject updates with indexed values where value > 64k

New issues:

Refs #12471: Range deletions on COMPACT STORAGE is not supported
Refs #12474: DELETE prints misleading error message suggesting
             ALLOW FILTERING would work
Refs #12477: Combination of COUNT with GROUP BY is different from
             Cassandra in case of no matches
Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column
Refs #12526: Support filtering on COMPACT tables
Refs #12749: Unsupported empty clustering key in COMPACT table
Refs #12815: Hidden column "value" in compact table isn't completely hidden

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12816

(cherry picked from commit 328cdb2)
(cherry picked from commit e11561e)
Modified for 5.1 to comment out error-path tests for "unset" values what
are silently ignored (instead of being detected) in this version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants