Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.x] Schema queries with configurable server-side timeouts #312

Merged

Conversation

Bouncheck
Copy link
Collaborator

@Bouncheck Bouncheck commented Jul 1, 2024

Adds USING TIMEOUT clause to schema queries when applicable.
Uses the same timeout as driver-side timeouts defined by METADATA_SCHEMA_REQUEST_TIMEOUT ("advanced.metadata.schema.request-timeout").
Fixes #298

@mykaul
Copy link

mykaul commented Jul 1, 2024

I don't fully understand the whole Scylla version issue here. 5.0/2022.1 are ancient version. If someone is using such an old version, what happens when we try to fetch the schema with timeout?

@Bouncheck
Copy link
Collaborator Author

Bouncheck commented Jul 1, 2024

I don't fully understand the whole Scylla version issue here. 5.0/2022.1 are ancient version. If someone is using such an old version, what happens when we try to fetch the schema with timeout?

I couldn't figure out a clean way to get Scylla version without querying system.versions which was introduced in those versions. In this change if we cannot determine Scylla version we fall back to relying on reported Cassandra version. If the Scylla version is too old (or undetermined) for using timeout we just don't use it.

I probably should've marked this as draft for now since I'm still cleaning this up. I may switch to either checking for shard awareness (as a proxy check for "is it scylla cluster") or running trial query with using timeout instead of checking exact Scylla version

@Bouncheck Bouncheck marked this pull request as draft July 1, 2024 09:20
@mykaul
Copy link

mykaul commented Jul 1, 2024

I don't fully understand the whole Scylla version issue here. 5.0/2022.1 are ancient version. If someone is using such an old version, what happens when we try to fetch the schema with timeout?

I couldn't figure out a clean way to get Scylla version without querying system.versions which was introduced in those versions. In this change if we cannot determine Scylla version we fall back to relying on reported Cassandra version. If the Scylla version is too old (or undetermined) for using timeout we just don't use it.

What do you mean too old? What happen when you do try to use timeout, and the scylla version is too old? If the timeout is ignored - fine, go with it. Doesn't matter. If it crashes, I'm more worried. If it errs the CQL query - then I think you should fallback to not using timeout.

I probably should've marked this as draft for now since I'm still cleaning this up. I may switch to either checking for shard awareness (as a proxy check for "is it scylla cluster") or running trial query with using timeout instead of checking exact Scylla version

Why? What would happen when you use it against a Cassandra cluster? Same as the above, IMHO (just as we try initially with high DSE versions that are not supported against Scylla!)

@Bouncheck
Copy link
Collaborator Author

Bouncheck commented Jul 1, 2024

Cassandra does not have USING TIMEOUT. Select queries do not allow USING altogether. Both will result in SyntaxException. I have to check how older versions of Scylla react, I've just picked the oldest supporting according to commits (4.4 was first with using timeout cql extension, 5.0 was for system.versions table containing scylla version).

@mykaul
Copy link

mykaul commented Jul 1, 2024

Cassandra does not have USING TIMEOUT. Select queries do not allow USING altogether. Both will result in SyntaxException. I have to check how older versions of Scylla react, I've just picked the oldest supporting according to commits (4.4 was first with using timeout cql extension, 5.0 was for system.versions table containing scylla version).

I'd catch the exception and re-run the query without 'USING TIMEOUT' . Probably a more optimistic approach (happy path) then the whole additional query we have. Of course, we may use the Scylla version or whatnot elsewhere, later, etc, so it might have value by itself.

@Bouncheck Bouncheck force-pushed the scylla-4.x-schema-queries-timeout branch from 4221556 to 492557b Compare July 1, 2024 14:19
@Bouncheck
Copy link
Collaborator Author

Bouncheck commented Jul 1, 2024

I've switched to running small trial select with USING TIMEOUT clause. If it completes without exceptions then the following schema queries will use this clause with the same configurable timeout as it is for driver side timeout. I've also removed unnecessary new additions.

Commit with querying for Scylla version remains as it was. While it is no longer necessary for this change I also believe it may be useful later and its impact should be minimal.

Waiting to see CI results.

@Bouncheck Bouncheck force-pushed the scylla-4.x-schema-queries-timeout branch from 492557b to c984c31 Compare July 1, 2024 15:07
@Bouncheck Bouncheck force-pushed the scylla-4.x-schema-queries-timeout branch from c984c31 to 3d13226 Compare July 1, 2024 15:57
@Bouncheck
Copy link
Collaborator Author

Bouncheck commented Jul 1, 2024

I've removed commit with getScyllaVersion() change for now. There are more test adjustments needed for it, besides missed license headers, etc. and it's blocking the main purpose of this PR. I'll save that one for another time.

@Bouncheck Bouncheck changed the title Scylla 4.x schema queries timeout [4.x] Schema queries with configurable server-side timeouts Jul 1, 2024
@Bouncheck
Copy link
Collaborator Author

I don't understand what's happening with gh actions. Unit tests fail due to cancellation but I have no idea what is triggering it.

@Bouncheck
Copy link
Collaborator Author

Bouncheck commented Jul 1, 2024

Oh, it's just timeout. I guess it became too small at some point.
edit: or not, since everything seems to hang on a test related to the change

@Bouncheck Bouncheck marked this pull request as ready for review July 1, 2024 16:19
@Bouncheck Bouncheck marked this pull request as draft July 1, 2024 18:40
@roydahan roydahan requested a review from wprzytula July 2, 2024 08:57
@Bouncheck Bouncheck force-pushed the scylla-4.x-schema-queries-timeout branch from 3d13226 to be26134 Compare July 2, 2024 09:04
@Bouncheck
Copy link
Collaborator Author

Unit tests should pass now

@Bouncheck Bouncheck force-pushed the scylla-4.x-schema-queries-timeout branch 5 times, most recently from 1316cce to f5234ef Compare July 2, 2024 14:23
@Bouncheck
Copy link
Collaborator Author

I couldn't understand why my test query was causing integration tests to take too long, so instead I look at sharding info of the control connection channel. If it's non-null I am assuming I'm dealing with Scylla and will use the timeouts. I don't check the Scylla version.
I see that some integration tests failed already but this time it's the flakiness of other stuff.

@Bouncheck Bouncheck marked this pull request as ready for review July 2, 2024 14:49
@mykaul
Copy link

mykaul commented Jul 2, 2024

This looks much better - please resolve all comments.

Copy link

@Lorak-mmk Lorak-mmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message says USING TIMESTAMP instead of USING TIMEOUT
Apart from that looks good to me

The driver will now use existing configurable option
`METADATA_SCHEMA_REQUEST_TIMEOUT` in a `USING TIMEOUT` clause
with each schema query (if applicable).
This will make it a server side timeout in addition to driver side.
The driver will apply the `USING` clause if the sharding info of control
connection channel is not null. This is used as a proxy check to differentiate
between Scylla and Cassandra.
The behaviour should remain unchanged for Cassandra clusters.
USING TIMEOUT is a ScyllaDB CQL extension and it is available in
versions 4.4/2022.1.0 or newer.
@Bouncheck
Copy link
Collaborator Author

I've fixed the typo @Lorak-mmk

Copy link

@Lorak-mmk Lorak-mmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One more thing: shouldn't we do the same with topology queries?

@mykaul
Copy link

mykaul commented Jul 4, 2024

LGTM. One more thing: shouldn't we do the same with topology queries?

Is topology response large? I doubt it, but we can certainly test it with a (mockup) large setup.

@Bouncheck Bouncheck merged commit 5a6a195 into scylladb:scylla-4.x Jul 4, 2024
10 of 11 checks passed
@Lorak-mmk
Copy link

LGTM. One more thing: shouldn't we do the same with topology queries?

Is topology response large? I doubt it, but we can certainly test it with a (mockup) large setup.

There are as many rows as nodes in the cluster. What is the biggest cluster that we know of?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants