Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If ALLOW FILTERING excludes a long string of rows, the scan can stop prematurely #8203

Closed
nyh opened this issue Mar 2, 2021 · 7 comments
Closed
Assignees

Comments

@nyh
Copy link
Contributor

nyh commented Mar 2, 2021

This issue replaces #7966, because we have a better understanding now of what caused the problem we saw there.

Consider a SELECT request with paging, which scans an entire table or just one partition, and also a filter (with ALLOW FILTERING) that excludes all the rows except the last one. In this case, Scylla needs to return a bunch of empty pages (each page probably corresponds to 1 MB of pre-filtering content), finally followed by a page with one match.

However, in the cql-pytests:

  • test_filtering.py::test_filtering_contiguous_nonmatching_partition_range
  • test_filtering.py::test_filtering_contiguous_nonmatching_single_partition

We see that if we ask the Python driver to do the paging and perform the entire scan, the result is wrong: empty, instead of the expected one row.

@denesb investigated (in #7966) and discovered that the Python driver receives an empty page with has_more_pages=true (as expected), then asks for another page and gets another empty page with has_more_pages=true (again, as expected), but now - the driver does not ask for any more pages. This is wrong. It looks like it may be a bug in the Python driver but we're not yet sure (we didn't yet try a similar test using a different driver).

psarna pushed a commit that referenced this issue Mar 3, 2021
Previously, we had two tests demonstrating issue #7966. But since then,
our understanding of this issue has improved which resulted in issue #8203,
so this patch improves those tests and makes them reproduce the new issue.

Importantly, we now know that this problem is not specific to a full-table
scan, and also happens in a single-partition scan, so we fix the test to
demonstrate this (instead of the old test, which missed the problem so
the test passed).

Both tests pass on Cassandra, and fail on Scylla.

Refs #8203.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210302224020.1498868-1-nyh@scylladb.com>
@nyh nyh changed the title If ALLOW FILTERING excludes a long string or rows, the scan can stop prematurely If ALLOW FILTERING excludes a long string of rows, the scan can stop prematurely Mar 7, 2021
@nyh
Copy link
Contributor Author

nyh commented Oct 10, 2021

The two cql-pytest tests we have for this issue,

  • test_filtering.py::test_filtering_contiguous_nonmatching_partition_range
  • test_filtering.py::test_filtering_contiguous_nonmatching_single_partition

do not fail any more (XPASS).
We should check if the problem is really fixed, and if what fixed it was a newer Python driver version, or some fix in Scylla.

@psarna
Copy link
Contributor

psarna commented Oct 11, 2021

@nyh our dear friend Ultrabug fixed the problem recently: datastax/python-driver@1d9077d , but it looks like it's not part of any official release yet...

@nyh
Copy link
Contributor Author

nyh commented Oct 11, 2021

@nyh our dear friend Ultrabug fixed the problem recently: datastax/python-driver@1d9077d , but it looks like it's not part of any official release yet...

Wow. I don't know how this happened, but I have in /home/nyh/.local/lib/python3.9/site-packages/cassandra/cluster.py a file which contains his fix! The file is dated September 19, a couple of days after he committed his patch. I have no idea where this file came from - I certainly didn't do it myself, and it's rather annoying - I'm guessing some sort of Scylla build tool or something brought it for me, but how or why and from where? Is Scylla's branch of the Python driver periodically merged with the master branch of Datastax, not any release branch?

In any case, it's good to know that this was in fact a Python driver bug, not a Scylla bug. I should send a patch to un-XFAIL these two tests and close this issue - but doing this properly will require some thought on how to have the test PASS on a recent driver versions and SKIP (not fail) on old versions. Or, at the minimum, we need our Jenkins runners to have the recent driver (perhaps they already do?) so at least they will not fail these tests.

Thanks @ultrabug for fixing this!

@nyh nyh self-assigned this Oct 11, 2021
@nyh
Copy link
Contributor Author

nyh commented Oct 11, 2021

Even more frustratingly, the official cassandra-driver version I have in /usr/local installed by pip is 3.25.0 (the file is dated March 18th), which does not have this fix, while the one I have in ~/.local is newer (September 19th), is indeed the Scylla fork of the driver, but lists its version as the older 3.24.5. So my checks in the test will need to be very ugly :-( But I'll do it.

@nyh
Copy link
Contributor Author

nyh commented Oct 11, 2021

Indeed, we have this patch in the Scylla driver: scylladb/python-driver@6ed53d9

@ultrabug
Copy link
Contributor

@nyh , the fix has also been contributed and merged to cassandra-driver but it's not part of a release yet AFAIK

The fix should be present in the latest release of scylla-driver tho, tag 3.24.5-scylla since I allowed for the CI to release it here. PyPi agrees with this version being the latest.

@nyh
Copy link
Contributor Author

nyh commented Oct 12, 2021

Thanks. I sent yesterday a patch titled "[PATCH] cql-pytest: XFAILing test was fixed by a Python driver fix" to fix the tests that used to fail because of the driver bug to pass on new-enough drivers and be skipped on older driver versions - checking for either Scylla or Datastax variants of the driver.

nyh added a commit to nyh/scylla that referenced this issue Oct 12, 2021
Issue scylladb#8203 describes a bug in a long scan which returns a lot of empty
pages (e.g., because most of the results are filtered out). We have two
cql-pytest test cases that reproduced this bug - one for a whole-table
scan and one for a single-partition scan.

It turned out that the bug was not in the Scylla server, but actually in
the Python driver which incorrectly stopped the iteration after an empty
page even though this page did contain the "more pages" flag.

This driver bug was already fixed in the Datastax driver (see
scylladb/python-driver@6ed53d9,
and in the Scylla fork of the driver:
datastax/python-driver@1d9077d

So in this patch we drop the XFAIL, and if the driver is not new enough
to contain this fix - the test is skipped.

Since our Jenkins machines have the latest Scylla fork of the driver and
it already contains this fix, these tests will not be skipped - and will
run and should pass. Developers who run these tests on their development
machine will see these tests either passing or skipped - depending on
which version of the driver they have installed.

Closes scylladb#8203

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants