New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow filtering doesn't page correctly #4156
Comments
What |
Please note:
|
Indeed, we could enhance our |
I think the bigger question is how to better handle the case of empty pages in ALLOW FILTERING and we did have some other alternatives. such as fetch additional results till we are closer to the timeout time of the query. |
That's not a complete solution, since we can still send empty pages to the client. I still think there's value in modifying cqlsh. Indeed we can try and fetch more pages if we have budget for it. But would we apply this solution if we got a single row back? I don't think we need to avoid these intermediate user requests for this case, since allow filtering is not supposed to be an efficient operation - there's even a warning about that. |
On Mon, Jan 28, 2019 at 2:58 PM Duarte Nunes ***@***.***> wrote:
That's not a complete solution, since we can still send empty pages to the
client. I still think there's value in modifying cqlsh.
Indeed we can try and fetch more pages if we have budget for it. But would
we apply this solution if we got a single row back?
No - I guess.
I don't think we need to avoid these intermediate user requests for this
case, since allow filtering is not supposed to be an efficient operation -
there's even a warning about that.
Its more efficient then having the client fetch all the data and doing
filtering on his own.
fair enough - the client can do it to some extent on his end by increasing
the page size to a higher value.
… —
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4156 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADThCLiw9Tv3ZVWpb-6hv0CuqV6vfrQ4ks5vHvP1gaJpZM4aVzrD>
.
|
I don't think we should play games with trying to fetch more rows. There will always be a case where it doesn't work (consider a query that scans the entire table and returns no rows). The inefficiency here is in the query, adding an extra hop to the client makes it less efficient, but not by much. Filtering should be done when the query returns >1% of the data, below that it is inefficient. |
You have many cases where the "allow filtering" is utilized for experimentation and examination of the data directly in cqlsh- not for efficiency reasons- this is required since the key is "synthetic" or "bucketed" (i.e by hours) because of partitioning constraints - in this case the feature is meant as a convenience and should be easy to use, not there for peformance optimizations reasons |
Agreed, but it's still a tooling issue, not a DB issue. |
I think this is a cqlsh bug. cqlsh documentation states that "PAGING ON displays query results in 100-line chunks followed by the more prompt". But this is not really what it does. It asks Cassandra or Scylla for up to 100 rows, and outputs the page it got. But that page may be smaller than 100 rows, it may even be (as we noticed) empty. cqlsh needs to fetch the next page in this case, again and again, until it has 100 rows to output, if it really wants to obey its documentation. Alternaively, cqlsh's documentation can be modified. This is similar to piping to "more" in Linux: "something | more" is supposed to show a full screen (e.g., 25 lines) of output at a time. If "something" outputs just 2 lines (or zero lines) and pauses, more waits longer until 25 lines are available and only then says "--More--". It doesn't just show a "--More--" after two lines of output because that's what it got quickly. But I do agree with @slivne that with changes to Scylla (to try to produce more results for a longer time instead of quickly returning a page of zero results) would have made this issue less important. But I wonder if it doesn't make even more sense to just fix cqlsh, and optimize Scylla for this case (to return fewer silly empty pages) later. |
It's not possible, in the general case, to fix this in the database. Suppose you filter all the data away; scylla will have to scan the entire data set before returning the final page (which would also be empty). |
It's more efficient to scan the table on the scylla side without involving the client, so that network latency to the client doesn't limit throughput. The client shouldn't set a timeout for queries whose completion is not bounded in time and he's willing to wait for as long as it takes. Filtering queries and For canceling queries on the server side for which no one is waiting we should have a different mechanism - connection drop should cancel them. The driver should also expose a way to cancel. |
I don't understand why we should be interested in optimizing I also don't think it's wise for clients to perform operations without setting a timeout, or they can easily block the application for an arbitrary amount of time. |
On Tue, Jan 29, 2019 at 12:08 PM Duarte Nunes ***@***.***> wrote:
I don't understand why we should be interested in optimizing allow
filtering queries, or queries that are performed without paging.
We can certainly wait until a user with a weird use case comes along and
complains.
I think aggregation is a different topic, and something we should support
efficiently.
Both aggregate queries and filtered queries have the same issue with
timeouts, so they could share the solution.
I also don't think it's wise for clients to perform operations without
setting a timeout, or they can easily block the application for an
arbitrary amount of time.
If there is an upper bound which stems from the business logic, then it
should be imposed by the application, but I think that's for the
application to decide, not the driver. When I run a filtering query via
cqlsh, I don't want it to timeout, unless I'm no longer interested in the
results. ^C should cancel the query.
Detecting failures or deadlocks within the system is a different matter.
… |
:D But on the serious side, I don't think users expect
A filtering query is more or less sequential, and the result set is bounded by the page size. An aggregation is fully concurrent, so there are other vector for optimization (i.e., map-reduce). A potential solution that we discussed would be to send keep-alives as empty pages, but then we'd see the same behavior.
Agreed, my point was simply that we can't recommend users to not use timeouts, because I don't think that's the general case. If they are able, and aware of the risks, then querying without a timeout is legitimate. |
We tested this on Cassandra and it does not return an empty page - so although the protocol supports it - its clear we do not behave in the same manner. One solution that tzach suggested is to make the drivers "smarter" and fetch the next page if the returned page was empty. |
On Sun, Feb 3, 2019 at 12:57 Shlomi Livne ***@***.***> wrote:
One solution that tzach suggested is to make the drivers "smarter" and
fetch the next page if the returned page was empty.
The drivers naturally do this, it's just cqlsh that doesn't. See my first
comment here - I already opened an issue in scylla-tools-java.
… |
scylla --version
3.0.1-0.20190120.3c4f8cf6e
testing the filtering feature in the
cqlsh
- it returns one result but doesn't "page" (or continue , what ever the term is) to the subsequent results - it just shows the"--MORE---"
foot noteThe text was updated successfully, but these errors were encountered: