Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SELECT on indexed table returns wrong values #8991

Closed
cvybhu opened this issue Jul 8, 2021 · 23 comments · Fixed by #8994
Closed

SELECT on indexed table returns wrong values #8991

cvybhu opened this issue Jul 8, 2021 · 23 comments · Fixed by #8994
Assignees
Milestone

Comments

@cvybhu
Copy link
Contributor

cvybhu commented Jul 8, 2021

Steps to reproduce:

create keyspace if not exists ks with replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
create table if not exists ks.tab (p int, c1 int, c2 int, primary key (p, c1, c2));
create index if not exists on ks.tab(c2);

insert into ks.tab (p, c1, c2) values(0, 0, 0);
insert into ks.tab (p, c1, c2) values(0, 0, 1);
insert into ks.tab (p, c1, c2) values(0, 0, 2);
insert into ks.tab (p, c1, c2) values(0, 1, 0);
insert into ks.tab (p, c1, c2) values(0, 1, 1);
insert into ks.tab (p, c1, c2) values(0, 1, 2);
insert into ks.tab (p, c1, c2) values(0, 2, 0);
insert into ks.tab (p, c1, c2) values(0, 2, 1);
insert into ks.tab (p, c1, c2) values(0, 2, 2);

select c1, c2 from ks.tab where c1 = 1 and c2 = 1;

The select returns a wrong result:

 c1 | c2
----+----
  0 |  1
  1 |  1
  2 |  1

Without an index everything works fine

To run save in bug.cql and run cqlsh -f bug.cql

Also here is a small test in python:

def test_clustering_index(cql, test_keyspace):
    schema = 'p int, c1 int, c2 int, primary key (p, c1, c2)'
    extra = ''
    with new_test_table(cql, test_keyspace, schema, extra) as table:
        cql.execute(f"CREATE INDEX ON {table}(c2)")
        for c1 in [0, 1, 2]:
            for c2 in [0, 1, 2]:
                cql.execute(f"INSERT INTO {table} (p, c1, c2) VALUES (0, {c1}, {c2})")
            
        stmt = SimpleStatement(f"SELECT c1, c2 FROM {table} WHERE c1 = 1 and c2 = 1")
        rows = cql.execute(stmt)
        assert len([r for r in rows]) == 1

Installation details
Scylla version (or git commit hash): Docker image (version 4.4.3-0.20210609.bfafb8456) and master (1924e8d)
Cluster size: Single node
OS (RHEL/CentOS/Ubuntu/AWS AMI): Fedora 33

@cvybhu cvybhu self-assigned this Jul 8, 2021
@cvybhu cvybhu added the bug label Jul 8, 2021
@tzach
Copy link
Contributor

tzach commented Jul 8, 2021

I hit another strange issue when trying to repreduce:

create keyspace if not exists ks with replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
create table if not exists ks.tab (p int, c1 int, c2 int, primary key (p, c1, c2));
insert into ks.tab (p, c1, c2) values(0, 0, 0);
insert into ks.tab (p, c1, c2) values(0, 0, 1);
insert into ks.tab (p, c1, c2) values(0, 0, 2);
insert into ks.tab (p, c1, c2) values(0, 1, 0);
insert into ks.tab (p, c1, c2) values(0, 1, 1);
insert into ks.tab (p, c1, c2) values(0, 1, 2);
insert into ks.tab (p, c1, c2) values(0, 2, 0);
insert into ks.tab (p, c1, c2) values(0, 2, 1);
insert into ks.tab (p, c1, c2) values(0, 2, 2);
select c1, c2 from ks.tab where c1 = 1 and c2 = 1;

 c1 | c2
----+----
  1 |  1

Last select should failed with out ALLOW FILTERING!
Note there is no index in my case

@tzach tzach added the area/cql label Jul 8, 2021
@cvybhu
Copy link
Contributor Author

cvybhu commented Jul 8, 2021

Oh maybe the original query should also require ALLOW FILTERING

I ran this example using cassandra and it failed and said that this query requires ALLOW FILTERING

@tzach
Copy link
Contributor

tzach commented Jul 8, 2021

Look like a regression

@nyh
Copy link
Contributor

nyh commented Jul 8, 2021

It seems the second issue discovered by @tzach - about the not-required ALLOW FILTERING - is a separate bug. We have a bunch of tests for when ALLOW FILTERING is needed in test/cql-pytest/test_allow_filtering.py, and should add in that file a test for this newly discovered scenario. CC @dekimir.

The original issue reported by @cvybhu seems more serious :-( CC @psarna

@nyh
Copy link
Contributor

nyh commented Jul 8, 2021

It seems the second issue discovered by @tzach - about the not-required ALLOW FILTERING - is a separate bug. We have a bunch of tests for when ALLOW FILTERING is needed in test/cql-pytest/test_allow_filtering.py, and should add in that file a test for this newly discovered scenario. CC @dekimir.

Correction: the ALLOW FILTERING thing is a known bug. #7608. There's a test for it: test_allow_filtering_clustering_key.

@psarna
Copy link
Contributor

psarna commented Jul 8, 2021

@cvybhu could you also try an older release, like 4.3? Then we'll know if it's a long preexisting bug or perhaps something that came up during the restrictions rewrite

@cvybhu
Copy link
Contributor Author

cvybhu commented Jul 8, 2021

I ran this example on Scylla version 4.3.5-0.20210614.690a96ff5 and it failed and said that it requires ALLOW FILTERING
I'm gonna try bisecting

@psarna
Copy link
Contributor

psarna commented Jul 8, 2021

If it used to need allow filtering, then I think it's indeed caused (maybe indirectly) by the restrictions rewrite (/cc @dekimir). In this case bisecting may not be that helpful, since the rewrite was quite big. Maybe it makes more sense to add debug prints and try to detect at which step we fetched wrong results and presented them to the client.

@psarna
Copy link
Contributor

psarna commented Jul 8, 2021

Maybe the previous code correctly detected that this query needs filtering and applied it, while the new one erroneously thinks that filtering is not necessary, so it fetches all rows that satisfy c2 == 1 from the index and does not filter for c1 == 1 afterwards?

@nyh
Copy link
Contributor

nyh commented Jul 8, 2021

@psarna I agree - I think this request indeed needs filtering and shouldn't have worked at all without ALLOW FILTERING, so it is very similar to #7608 just involving an additional index, and the result is more serious (wrong results, not just performance). The problem is that in the index, the first clustering key is p (and also token, derived from p), and the query does not give p and only gives later clustering key columns - so it's a typical example of requiring filtering.

Can you please run your test on Cassandra (there is a convenient cql-pytest/run-cassandra to run Cassandra for you if you don't want to run it yourself) and see if Cassandra wants ALLOW FILTERING? If it does, then it's a bug that we do not. Of course it's an even bigger bug that we don't require ALLOW FILTERING and also produce incorrect results :-(

@cvybhu
Copy link
Contributor Author

cvybhu commented Jul 8, 2021

@nyh
I ran this example using cassandra and it failed and said that this query requires ALLOW FILTERING

@nyh
Copy link
Contributor

nyh commented Jul 8, 2021

@nyh
I ran this example using cassandra and it failed and said that this query requires ALLOW FILTERING

So Piotr's description appears correct. I suggest that you expand your test to try the same thing with and without filtering, and write it so it will pass on Cassandra (run it with test/cql-pytest/run-cassandra testfile.py::testname). It probably means that without ALLOW FILTERING the query should fail (and not succeed and produce wrong results...) and with ALLOW FILTERING it should pass and produce correct results.

@dekimir
Copy link
Contributor

dekimir commented Jul 8, 2021

This is a duplicate of #7708. In particular: #7708 (comment)

@nyh
Copy link
Contributor

nyh commented Jul 8, 2021

This is a duplicate of #7708. In particular: #7708 (comment)

Wow, that bug is almost a year old, and we didn't have an xfailing test for it until today. The text on that issue gave me the impression that this was just an optimization thing (the word "regression" was given in double quotes), while it's actually a request giving wrong results.
@cvybhu are you planning to submit a cql-pytest patch with your example (hopefully with both ALLOW FILTERING and without), or shall I do it?

@cvybhu
Copy link
Contributor Author

cvybhu commented Jul 8, 2021

@nyh
I will submit them

@dekimir
Copy link
Contributor

dekimir commented Jul 8, 2021

The text on that issue gave me the impression that this was just an optimization thing (the word "regression" was given in double quotes)

That's because you keep mixing up #7708 with #7608. I'm going blue in the face repeating that they're different issues not to be comingled. "Needs filtering" != "needs ALLOW FILTERING".

(Ironically, even the bug numbers differ by just one digit. I'll never win this...)

cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 8, 2021
Adds test that creates a table with primary key (p, c1, c2)
with a global index on c2 and then selects where c1 = 1 and c2 = 1

This currently fails, issue scylladb#8991

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 8, 2021
Adds test that creates a table with primary key (p, c1, c2)
with a global index on c2 and then selects where c1 = 1 and c2 = 1

This should require filtering, but doesn't
Issue scylladb#8991

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 8, 2021
Adds test that creates a table with primary key (p, c1, c2)
with a global index on c2 and then selects where c1 = 1 and c2 = 1

This should require filtering, but doesn't
Issue scylladb#8991

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
@nyh
Copy link
Contributor

nyh commented Jul 8, 2021

The text on that issue gave me the impression that this was just an optimization thing (the word "regression" was given in double quotes)

That's because you keep mixing up #7708 with #7608. I'm going blue in the face repeating that they're different issues not to be comingled. "Needs filtering" != "needs ALLOW FILTERING".

(Ironically, even the bug numbers differ by just one digit. I'll never win this...)

If you think that I am mixing up 7708 and 7608, it doesn't help that you wrote in 7708 that it is "a duplicate of 7608". I don't think ALLOW FILTERING and need_filtering are the same - I think both have separate bugs, but specific examples can hit bugs in both :-(

In any case, none of these arguments matter. The present issue (8991) outlines a case where we both allow ALLOW FILTERING where we shouldn't, and produce wrong results. I don't know which of the two issues you want to blame for this - or both - but it's a real bug, more serious than we previously thought - in neither 7708 or 7608 did we ever have an example with blatently wrong results. Now we do.

@dekimir
Copy link
Contributor

dekimir commented Jul 8, 2021

If you think that I am mixing up 7708 and 7608, it doesn't help that you wrote in 7708 that it is "a duplicate of 7608".

I didn't. I wrote that the word "regression" you mention here is 7608 and shouldn't be discussed in 7708. I also don't like that it's being discussed here. I wish I could make people discuss it in 7608, which was filed specifically for it.

it's a real bug, more serious than we previously thought - in neither 7708 or 7608 did we ever have an example with blatently wrong results. Now we do.

I disagree. #7708 says in its very first line "returns invalid result".

@dekimir
Copy link
Contributor

dekimir commented Jul 8, 2021

I disagree. #7708 says in its very first line "returns invalid result".

Correction: it says that need_filtering() returns invalid result, without clarifying that Scylla returns the wrong query result as a consequence. But we knew that at the time -- that's why the PR that exposed it was reverted.

@nyh
Copy link
Contributor

nyh commented Jul 8, 2021

it's a real bug, more serious than we previously thought - in neither 7708 or 7608 did we ever have an example with blatently wrong results. Now we do.

I disagree. #7708 says in its very first line "returns invalid result".

Please reread #7708. It says that some internal function need_filtering() returns an invalid result - not that this results in a user-visible incorrect response. Now we have such an example where wrong results are produced.

@dekimir
Copy link
Contributor

dekimir commented Jul 8, 2021

it's a real bug, more serious than we previously thought - in neither 7708 or 7608 did we ever have an example with blatently wrong results. Now we do.

I disagree. #7708 says in its very first line "returns invalid result".

Please reread #7708. It says that some internal function need_filtering() returns an invalid result - not that this results in a user-visible incorrect response. Now we have such an example where wrong results are produced.

Yes, sorry, I misread it. See my last comment. We've had the example before, that's why #7708 was filed. It just didn't explicitly say which wrong results Scylla returned to that query.

cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 9, 2021
There were cases where a query on an indexed table
needed filtering but needs_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes scylladb#8991.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 9, 2021
There were cases where a query on an indexed table
needed filtering but needs_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes scylladb#8991.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 9, 2021
There were cases where a query on an indexed table
needed filtering but needs_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes scylladb#8991.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 9, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes scylladb#8991.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 9, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes scylladb#8991.
Fixes scylladb#7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 9, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes scylladb#8991.
Fixes scylladb#7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 9, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes scylladb#8991.
Fixes scylladb#7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 9, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes scylladb#8991.
Fixes scylladb#7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 9, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes scylladb#8991.
Fixes scylladb#7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 9, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes scylladb#8991.
Fixes scylladb#7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 9, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes scylladb#8991.
Fixes scylladb#7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 12, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes scylladb#8991.
Fixes scylladb#7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 13, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes scylladb#8991.
Fixes scylladb#7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 19, 2021
Adds test that creates a table with primary key (p, c1, c2)
with a global index on c2 and then selects where c1 = 1 and c2 = 1.

This currently fails.
Refs scylladb#8991.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 19, 2021
Adds test that creates a table with primary key (p, c1, c2)
with a global index on c2 and then selects where c1 = 1 and c2 = 1.

This should require filtering, but doesn't.
Refs scylladb#8991.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 19, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes scylladb#8991.
Fixes scylladb#7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu added a commit to cvybhu/scylladb that referenced this issue Jul 19, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes scylladb#8991.
Fixes scylladb#7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
nyh added a commit that referenced this issue Jul 19, 2021
…ering key' from Jan Ciołek

Add examples from issue #8991 to tests
Both of these tests pass on `cassandra 4.0` but fail on `scylla 4.4.3`

First test tests that selecting values from indexed table using only clustering key returns correct values.
The second test tests that performing this operation requires filtering.

The filtering test looks similar to [the one for #7608](https://github.com/scylladb/scylla/blob/1924e8d2b63e6611b78ac6252b5ddbc4884f9d22/test/cql-pytest/test_allow_filtering.py#L124) but there are some differences - here the table has two clustering columns and an index, so it could test different code paths.

Contains a quick fix for the `needs_filtering()` function to make these tests pass.
It returns `true` for this case and the one described in #7708.

This implementation is a bit conservative - it might sometimes return `true` where filtering isn't actually needed, but at least it prevents scylla from returning incorrect results.

Fixes #8991.
Fixes #7708.

Closes #8994

* github.com:scylladb/scylla:
  cql3: Fix need_filtering on indexed table
  cql-pytest: Test selecting using only clustering key requires filtering
  cql-pytest: Test selecting from indexed table using clustering key
avikivity pushed a commit that referenced this issue Oct 18, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes #8991.
Fixes #7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
(cherry picked from commit 5414924)
avikivity pushed a commit that referenced this issue Oct 18, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes #8991.
Fixes #7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
(cherry picked from commit 5414924)
avikivity pushed a commit that referenced this issue Oct 27, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes #8991.
Fixes #7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
(cherry picked from commit 5414924)
avikivity pushed a commit that referenced this issue Oct 28, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes #8991.
Fixes #7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
(cherry picked from commit 5414924)
@tzach
Copy link
Contributor

tzach commented Oct 28, 2021

@nyh the issue I reported in the comment above was not fix in 4.5.1
Open a new issue for it #9536

avikivity pushed a commit that referenced this issue Oct 28, 2021
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes #8991.
Fixes #7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
(cherry picked from commit 5414924)
@nyh
Copy link
Contributor

nyh commented Dec 29, 2021

This fix is in branch-4.6, and was backported to 4.5 and 4.4 by @avikivity already months ago, so will remove the backport candidate tag.

@tzach this fix did get included in 4.5.1. So maybe the issue you found is a different one? I'll take a look.

@DoronArazii DoronArazii added this to the 4.6 milestone May 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants