Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce support for tablets #137

Merged
merged 4 commits into from
Jan 16, 2024

Conversation

sylwiaszunejko
Copy link
Collaborator

@sylwiaszunejko sylwiaszunejko commented Jul 20, 2023

This PR introduces changes to the driver that are necessary for shard-awareness and token-awareness to work effectively with the tablets.

Now if driver send the request to the wrong node/shard it will get the correct tablet information from Scylla in custom_payload. It uses this information to obtain target replicas and shard numbers for tables managed by tablet replication.

I also added parsing TABLETS_ROUTING_V1 extension. In order for Scylla to send the tablet info, the driver must tell the database during connection handshake that it is able to interpret it. It was implemented similarly to the already-present LWT optimization extension negotiation.

The metadata for tablets are provided to the user.

To test this change I created both integration and unit tests.

Fixes: #155

@avikivity
Copy link
Member

Please add an experimental setting and require the user to enable it. Tablets are still experimental in ScyllaDB and this allows us to change the tablets table format, but if drivers start to rely on it, then we cannot.

@sylwiaszunejko sylwiaszunejko force-pushed the poll_system_tablets branch 4 times, most recently from d8bf274 to b217867 Compare July 24, 2023 12:44
@sylwiaszunejko
Copy link
Collaborator Author

Please add an experimental setting and require the user to enable it. Tablets are still experimental in ScyllaDB and this allows us to change the tablets table format, but if drivers start to rely on it, then we cannot.

Added an experimental setting and corrected .github/workflows/main.yml to make Checks passed.

@avikivity
Copy link
Member

Unfortunately we'll need to add such a flag to scylla-bench and plumb it down, but that's the price of compatibility.

@sylwiaszunejko sylwiaszunejko force-pushed the poll_system_tablets branch 6 times, most recently from 4fa94d0 to 9bb1f52 Compare July 25, 2023 07:58
policies.go Outdated Show resolved Hide resolved
cluster.go Outdated Show resolved Hide resolved
host_source.go Outdated Show resolved Hide resolved
host_source.go Outdated
func findTabletForToken(tablets []*TabletInfo, token token, l int) *TabletInfo {
r := l + tablets[l].TabletCount() - 1
for l < r {
m := (l + r) / 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this l+r overflow is l, r are big?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could happen, I changed it a bit to prevent that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't m := l + (r-l)/2 still incorrect? For:

l, r - signed 64-bit ints
r = 2^63 - 5
l = -(2^63-5)
r - l = 2^63 - 5 - (-(2^63-5))
r - l = 2^63 - 5 + 2^63 - 5
r - l = 2^64 - 10 > range of 64-bit signed int

(some other context: https://stackoverflow.com/questions/19106350/explanation-of-the-safe-average-of-two-numbers)

It's interesting that the tests didn't catch the problem, so either they are not sufficient enough to catch the bug or this code is actually correct for some reason.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it is incorrect it only solves the problem for two positive or two negative variables, I think that the tests are not sufficient enough to catch this bug

host_source.go Outdated Show resolved Hide resolved
host_source.go Outdated Show resolved Hide resolved
tablet_test.go Outdated Show resolved Hide resolved
@sylwiaszunejko sylwiaszunejko force-pushed the poll_system_tablets branch 2 times, most recently from 34b5c75 to 89c9190 Compare July 27, 2023 10:46
policies.go Outdated Show resolved Hide resolved
scylla.go Outdated Show resolved Hide resolved
}
}

assertEqual(t, "queriedHosts", 1, queriedHosts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a (temporary?) print here of all hostAddresses (before strings.Contains filter) so you can debug the failing CI.

@sylwiaszunejko sylwiaszunejko force-pushed the poll_system_tablets branch 2 times, most recently from a575552 to ceb8294 Compare July 28, 2023 10:49
@sylwiaszunejko sylwiaszunejko force-pushed the poll_system_tablets branch 3 times, most recently from cd4ff97 to 898567d Compare July 31, 2023 08:39
@sylwiaszunejko sylwiaszunejko force-pushed the poll_system_tablets branch 2 times, most recently from 49062b9 to d2f016f Compare August 16, 2023 08:32
@sylwiaszunejko sylwiaszunejko force-pushed the poll_system_tablets branch 2 times, most recently from 641af7f to 8339f5a Compare December 3, 2023 19:17
@bhalevy
Copy link
Member

bhalevy commented Dec 22, 2023

I opened #155 as an umbrella issue

@bhalevy
Copy link
Member

bhalevy commented Dec 28, 2023

@sylwiaszunejko when do you plan to get back to this PR?

@sylwiaszunejko
Copy link
Collaborator Author

@sylwiaszunejko when do you plan to get back to this PR?

@bhalevy It is ready for review, but @avelanarius is on PTO, and it has to wait until he is back.

@sylwiaszunejko
Copy link
Collaborator Author

sylwiaszunejko commented Jan 4, 2024

@avelanarius I adapted the code to recent changes in the core. Now all tablet info are sent under one key in custom_payload and driver has to inform Scylla that it is able to interpret tablet info during connection handshake.

@avikivity
Copy link
Member

Please add an experimental setting and require the user to enable it. Tablets are still experimental in ScyllaDB and this allows us to change the tablets table format, but if drivers start to rely on it, then we cannot.

Added an experimental setting and corrected .github/workflows/main.yml to make Checks passed.

With the negotiation thing, this is no longer needed.

Also, the cover letter is outdated.

@sylwiaszunejko
Copy link
Collaborator Author

Please add an experimental setting and require the user to enable it. Tablets are still experimental in ScyllaDB and this allows us to change the tablets table format, but if drivers start to rely on it, then we cannot.

Added an experimental setting and corrected .github/workflows/main.yml to make Checks passed.

With the negotiation thing, this is no longer needed.

Also, the cover letter is outdated.

Do I understand correctly that I should delete experimental setting and replace it with checking if ScyllaDB supports TABLETS_ROUTING_V1?

@avikivity
Copy link
Member

Please add an experimental setting and require the user to enable it. Tablets are still experimental in ScyllaDB and this allows us to change the tablets table format, but if drivers start to rely on it, then we cannot.

Added an experimental setting and corrected .github/workflows/main.yml to make Checks passed.

With the negotiation thing, this is no longer needed.
Also, the cover letter is outdated.

Do I understand correctly that I should delete experimental setting and replace it with checking if ScyllaDB supports TABLETS_ROUTING_V1?

Yes; you must check for it anyway, otherwise the server won't send routing info.

In order for Scylla to send the tablet info, the driver must tell
the database during connection handshake that it is able to
interpret it. It was implemented similarly to the
already-present LWT optimisation extension negotiation.
Add mechanism to parse system.tablets periodically. In TokenAwareHostPolicy
check if keyspace uses tablets if so try to use them to find replicas.

Make shard awareness work when using tablets.
tablet_integration_test.go: Integration tests checking if
TokenAwareHostPolicy and shard awareness works correctly when
using tablets, and if adding new table changes tablets table.

tablet_test.go: Unit tests checking if searching in tablets list
works correctly.
@sylwiaszunejko
Copy link
Collaborator Author

experimental setting removed

@avelanarius avelanarius merged commit f614a51 into scylladb:master Jan 16, 2024
1 check passed
idx := p.shardOf(mmt)
idx := -1

p.conns[0].mu.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sylwiaszunejko This code seems to assume that p.conns[0] is never nil. I don't think this is always true. AFAIK p.conns is indexed by the shard of the connection. It may temporarily happen that there is no connection to shard 0 for some time (e.g. the first connection to the node landed on a different shard than 0).

I think this might be the cause of scylladb/scylla-bench#134. You can try reproducing it by writing with a high concurrency and shutting down a node or starting one. The node should have more than one shard set up (the more, the easier to reproduce).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are probably right about the root cause of this issue, in this part of the code I needed to take any connection just to check if tabletsRoutingV1 is true, but taking p.conns[0] was definitely not the right approach to do that. I will try to confirm that it is true and fix it. Thank you for your feedback

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piodul I created PR to fix this comment: #158

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Driver extensions for tablet awareness
6 participants