New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong schema version in system.peers when using raft #15078
Comments
Strange. Gossiper has different schema versions as well. |
@kbr-scylla how do we test for this to stay in sync in the future? @gleb-cloudius can we make sure we don't update system.local through the observer on the raft command apply path? |
We do not update system.local through the observer. We update gossiper state through it. |
But how does the gossiper manage to get the wrong state in the first place? The node is just starting after all, it got to be taking it from the system table. |
The node was starting. Was receiving a schema update through raft, but not updating its local version in the gossiper. |
I don't know. We could compare versions from system.peers in our system tests -- similarly to how we check group 0 and token ring consistency. But with the bug present, the tests would be flaky, sometimes they would pass because the problem didn't always reproduce (?) But maybe a flaky test is better than no test... |
This bug was 100% reproducible. But since peers table is updated as node learns about other nodes versions over the network a test may see different versions for some time. |
…p0 and starting gossiper The schema version is updated by group0, so if group0 starts before schema version observer is registered some updates may be missed. Since the observer is used to update node's gossiper state the gossiper may contain wrong schema version. Fix by registering the observer before starting group0 and even before starting gossiper to avoid a theoretical case that something may pull schema after start of gossiping and before the observer is registered. Fixes: scylladb#15078 Message-Id: <ZOYZWhEh6Zyb+FaN@scylladb.com>
@gleb-cloudius please evaluate for backport |
All the version with non experimental schema over raft should get it. |
…p0 and starting gossiper The schema version is updated by group0, so if group0 starts before schema version observer is registered some updates may be missed. Since the observer is used to update node's gossiper state the gossiper may contain wrong schema version. Fix by registering the observer before starting group0 and even before starting gossiper to avoid a theoretical case that something may pull schema after start of gossiping and before the observer is registered. Fixes: #15078 Message-Id: <ZOYZWhEh6Zyb+FaN@scylladb.com> (cherry picked from commit d1654cc)
Backported to 5.2 |
I have started three node cluster with
--experimental-features consistent-topology-changes --experimental-features tablets
. I have used master version of Scylla (34c3688). Without any previous steps, I have connected to the empty cluster using cqlsh 127.0.0.2 and I have executed the following commands:The output looks like that:
Accordingly, I have connected to 127.0.0.3 and 127.0.0.4, run the same commands, and got the outputs:
The schema versions in
system.peers
does not align with the schema versions in system.local.This cause problems with for example running any command using gocql.
I have noticed that if I perform operation that changes schema (creating keyspace/table) it fixes
system.peers
.I tried to reproduce this on cluster with nodes without
--experimental-features consistent-topology-changes
and the bug does not occurs.I have also tested it with
scylladb/scylla-nightly
and the last properly working version is5.4.0-dev-0.20230802.0239ba45272f-x86_64
(5.4.0-dev-0.20230803.39ca07c49b25-x86_64
has this bug).I have performed a kind of bisect to find out from which commit this problem occurs, it happens to be 7c30954, the last commit without that bug is 3c1ca12.
I attach logs from all three nodes.
node3_logs.txt
node2_logs.txt
node1_logs.txt
The text was updated successfully, but these errors were encountered: