New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migration_manager: schema version correctness depends on order of feature enabling #16004
Comments
Interesting to note, that the two nodes have the same schema versions. Yet they have different versions for |
Another variant of this can be seen at: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/4504/testReport/junit/cleanup_test/TestCleanup/Tests___Sanity_Tests___test_cluster_cleanup/. |
Another can be seen here: https://jenkins.scylladb.com/job/scylla-master/job/gating-dtest-release/5740/artifact/logs-full.release.003/1699011052696_cql_tracing_test.py%3A%3ATestCqlTracing%3A%3Atest_tracing_shutdown/ Here the mismatch happens on a system table (system_traces.events). |
@kbr-scylla is this related to the recent consistent schema version changes? Maybe there is another corner-case where it fails and actually generates different versions on different nodes. |
@denesb please upload test and relevant node logs so they don't get lost after jenkins gets garbage-collected
these changes got reverted |
Are these CI runs on the PR where you add the new cluster feature? Schema versions can mismatch because of different subset of enabled features on the two nodes, so it might be related |
Yes, all these are from #15890. Which for some reason reproduces this issue very consistently. It must be related to adding the new feature, but I don't yet see how. |
What I don't understand is: from the POV of a new cluster, all features are new. So how is adding a new feature matter here? These are not upgrade tests. |
What if... the string that lists all of the features became too long, and gets truncated somewhere? 😱 |
I even considered that the problem is that I added the new feature member, into the middle of the feature member list. But I see no sign of feature member declaration order mattering at all, in the code. Maybe I missed something? Another things I considered is that maybe with the new feature we go over some threshold of feature numbers. These are silly theories but I'm getting desperate. |
Logs for the simplest example -- the test mentioned in the opening message.: |
Or what if there are so many features now, that enabling them takes too long, and now we're hitting some race that we weren't hitting before? |
cc @tgrabiec maybe you have some idea
|
Just to be clear: all the test runs mentioned here so far contain the whole fix from #15890, not just the line that adds the feature? Do those runs have topology-on-raft enabled or not? |
The dtest ones don't |
Ahh, good point @denesb could it be that e.g. we read empty pages when reading schema tables, and there's some kind of bug there that makes us skip parts of mutations, and that leads to schema mismatch? How can you be sure that your change is not the cause here? |
Note that all cluster features start as disabled for a joining node. The node verifies that it supports all of the features enabled by the cluster and only then locally marks them as enabled. |
@denesb if we get this failure from ONLY adding the feature flag, then it's plausible there's some underlying problem with features, and it would make sense to spend time digging into features implementation, or schema, or whatever. BUT if, in order to trigger this, you actually have to use the feature to protect your new code -- then I think we should first place most suspicions on your new code. |
Ah, perhaps this is racing with queries being executed, and some of them set this "allow empty page" flag, some of them don't? Or even, there could be a single query (a scan perhaps?) that is being executed, and part of it is executed without the flag, but the next part with? @denesb edit: and the query returns multiple pages |
@kbr-scylla the tests failed even when I commented out all the code, except for the feature addition. Also, the feature guards a bit used in read-repair, and all schema table reads are local, no? |
Ok, here: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/4503/ we have a run with commit 0375aa0, that has only the feature added, without the |
I took a short look. I can confirm that this issue reproduces relatively often when I run it locally (~50% of the time). I noticed that, curiously, changing the name of the feature makes the problem go away. I suspect this can have something to do with the order in which features are enabled. When a node wants to enable features on join after checking them with the cluster, it goes through the registered features (kept in
For some features (including As a side note, maybe we could randomize the order of enabling features, at least in debug mode? I don't think the code should rely on any particular order. |
I managed to reproduce locally too. Comparing the logs for a failed and successful run: In both cases, the table on the schema node is created while features are still being negotiated. In both cases, node2 ends up with a different schema version initially. As the features are enabled, Note that |
I'm testing a fix... |
…to_expiry Currently, when said feature is enable, we recalcuate the schema digest. But this feature also influences how table versions are calculated, so it has to trigger a recalculation of all table versions, so that we can guarantee correct schema versions. Before, this used to happen by happy accident. Another feature -- table_digest_insensitive_to_expiry -- used to take care of this, by triggering a table version recalulation. However this feature only takes effect if digest_insensitive_to_expiry also enabled. This used to be the case incidently, by the time the reload triggered by table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was already enabled. But this was not guaranteed whatsoever and as we've recently seen, any change to the feature list, which changes the order in which features are enabled, can cause this intricate balance to break. This patch makes digest_insensitive_to_expiry also kick off a schema reload, to eliminate our dependence (unguaranteed) feature order, and table schemas have a correct version after all features are enabled. Fixes: scylladb#16004
Fix here: #16013 |
With the above fix |
…to_expiry Currently, when said feature is enabled, we recalcuate the schema digest. But this feature also influences how table versions are calculated, so it has to trigger a recalculation of all table versions, so that we can guarantee correct versions. Before, this used to happen by happy accident. Another feature -- table_digest_insensitive_to_expiry -- used to take care of this, by triggering a table version recalulation. However this feature only takes effect if digest_insensitive_to_expiry is also enabled. This used to be the case incidently, by the time the reload triggered by table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was already enabled. But this was not guaranteed whatsoever and as we've recently seen, any change to the feature list, which changes the order in which features are enabled, can cause this intricate balance to break. This patch makes digest_insensitive_to_expiry also kick off a schema reload, to eliminate our dependence (unguaranteed) feature order, and table schemas have a correct version after all features are enabled. Fixes: scylladb#16004
…to_expiry Currently, when said feature is enabled, we recalcuate the schema digest. But this feature also influences how table versions are calculated, so it has to trigger a recalculation of all table versions, so that we can guarantee correct versions. Before, this used to happen by happy accident. Another feature -- table_digest_insensitive_to_expiry -- used to take care of this, by triggering a table version recalulation. However this feature only takes effect if digest_insensitive_to_expiry is also enabled. This used to be the case incidently, by the time the reload triggered by table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was already enabled. But this was not guaranteed whatsoever and as we've recently seen, any change to the feature list, which changes the order in which features are enabled, can cause this intricate balance to break. This patch makes digest_insensitive_to_expiry also kick off a schema reload, to eliminate our dependence on (unguaranteed) feature order, and to guarantee that table schemas have a correct version after all features are enabled. Fixes: scylladb#16004
Good idea, I created an issue for that: #16014 |
…to_expiry Currently, when said feature is enabled, we recalcuate the schema digest. But this feature also influences how table versions are calculated, so it has to trigger a recalculation of all table versions, so that we can guarantee correct versions. Before, this used to happen by happy accident. Another feature -- table_digest_insensitive_to_expiry -- used to take care of this, by triggering a table version recalulation. However this feature only takes effect if digest_insensitive_to_expiry is also enabled. This used to be the case incidently, by the time the reload triggered by table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was already enabled. But this was not guaranteed whatsoever and as we've recently seen, any change to the feature list, which changes the order in which features are enabled, can cause this intricate balance to break. This patch makes digest_insensitive_to_expiry also kick off a schema reload, to eliminate our dependence on (unguaranteed) feature order, and to guarantee that table schemas have a correct version after all features are enabled. Fixes: scylladb#16004
…to_expiry Currently, when said feature is enabled, we recalcuate the schema digest. But this feature also influences how table versions are calculated, so it has to trigger a recalculation of all table versions, so that we can guarantee correct versions. Before, this used to happen by happy accident. Another feature -- table_digest_insensitive_to_expiry -- used to take care of this, by triggering a table version recalulation. However this feature only takes effect if digest_insensitive_to_expiry is also enabled. This used to be the case incidently, by the time the reload triggered by table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was already enabled. But this was not guaranteed whatsoever and as we've recently seen, any change to the feature list, which changes the order in which features are enabled, can cause this intricate balance to break. This patch makes digest_insensitive_to_expiry also kick off a schema reload, to eliminate our dependence on (unguaranteed) feature order, and to guarantee that table schemas have a correct version after all features are enabled. In fact, all schema feature notification handlers now kick off a full schema reload, to ensure bugs like this don't creep in, in the future. Fixes: scylladb#16004
…to_expiry Currently, when said feature is enabled, we recalcuate the schema digest. But this feature also influences how table versions are calculated, so it has to trigger a recalculation of all table versions, so that we can guarantee correct versions. Before, this used to happen by happy accident. Another feature -- table_digest_insensitive_to_expiry -- used to take care of this, by triggering a table version recalulation. However this feature only takes effect if digest_insensitive_to_expiry is also enabled. This used to be the case incidently, by the time the reload triggered by table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was already enabled. But this was not guaranteed whatsoever and as we've recently seen, any change to the feature list, which changes the order in which features are enabled, can cause this intricate balance to break. This patch makes digest_insensitive_to_expiry also kick off a schema reload, to eliminate our dependence on (unguaranteed) feature order, and to guarantee that table schemas have a correct version after all features are enabled. In fact, all schema feature notification handlers now kick off a full schema reload, to ensure bugs like this don't creep in, in the future. Fixes: scylladb#16004
Can it this be backported promptly, to unblock 5.4? |
…to_expiry Currently, when said feature is enabled, we recalcuate the schema digest. But this feature also influences how table versions are calculated, so it has to trigger a recalculation of all table versions, so that we can guarantee correct versions. Before, this used to happen by happy accident. Another feature -- table_digest_insensitive_to_expiry -- used to take care of this, by triggering a table version recalulation. However this feature only takes effect if digest_insensitive_to_expiry is also enabled. This used to be the case incidently, by the time the reload triggered by table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was already enabled. But this was not guaranteed whatsoever and as we've recently seen, any change to the feature list, which changes the order in which features are enabled, can cause this intricate balance to break. This patch makes digest_insensitive_to_expiry also kick off a schema reload, to eliminate our dependence on (unguaranteed) feature order, and to guarantee that table schemas have a correct version after all features are enabled. In fact, all schema feature notification handlers now kick off a full schema reload, to ensure bugs like this don't creep in, in the future. Fixes: #16004 Closes #16013 (cherry picked from commit 2238144)
Backport queued to 5.4. |
…to_expiry Currently, when said feature is enabled, we recalcuate the schema digest. But this feature also influences how table versions are calculated, so it has to trigger a recalculation of all table versions, so that we can guarantee correct versions. Before, this used to happen by happy accident. Another feature -- table_digest_insensitive_to_expiry -- used to take care of this, by triggering a table version recalulation. However this feature only takes effect if digest_insensitive_to_expiry is also enabled. This used to be the case incidently, by the time the reload triggered by table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was already enabled. But this was not guaranteed whatsoever and as we've recently seen, any change to the feature list, which changes the order in which features are enabled, can cause this intricate balance to break. This patch makes digest_insensitive_to_expiry also kick off a schema reload, to eliminate our dependence on (unguaranteed) feature order, and to guarantee that table schemas have a correct version after all features are enabled. In fact, all schema feature notification handlers now kick off a full schema reload, to ensure bugs like this don't creep in, in the future. Fixes: #16004 Closes #16013 (cherry picked from commit 2238144)
Backported to 5.2 |
…to_expiry Currently, when said feature is enabled, we recalcuate the schema digest. But this feature also influences how table versions are calculated, so it has to trigger a recalculation of all table versions, so that we can guarantee correct versions. Before, this used to happen by happy accident. Another feature -- table_digest_insensitive_to_expiry -- used to take care of this, by triggering a table version recalulation. However this feature only takes effect if digest_insensitive_to_expiry is also enabled. This used to be the case incidently, by the time the reload triggered by table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was already enabled. But this was not guaranteed whatsoever and as we've recently seen, any change to the feature list, which changes the order in which features are enabled, can cause this intricate balance to break. This patch makes digest_insensitive_to_expiry also kick off a schema reload, to eliminate our dependence on (unguaranteed) feature order, and to guarantee that table schemas have a correct version after all features are enabled. In fact, all schema feature notification handlers now kick off a full schema reload, to ensure bugs like this don't creep in, in the future. Fixes: #16004 Closes #16013 (cherry picked from commit 2238144) (cherry picked from commit e31f222)
Backported to 5.1 |
Summary
migration_manager::init_messaging_service()
installs listeners on interesting schema-related features. There are two kinds of action taken on enabling of different schema features:TABLE_DIGEST_INSENSITIVE_TO_EXPIRY
triggers areload_schema()
call, the rest triggerupdate_schema()
. HoweverTABLE_DIGEST_INSENSITIVE_TO_EXPIRY
doesn't take effect on its own, it needsDIGEST_INSENSITIVE_TO_EXPIRY
to be also enabled, for it to take effect. This pretty much always happens with the current order of feature processing. BUT:reload_schema()
runs in the background and doesn't wait for enabling ofDIGEST_INSENSITIVE_TO_EXPIRY
.The end result: any change in the order of features or timing, will result in
reload_schema()
not taking effect and tables having wrong schema versions in memory. This can be fixed with a reboot.Details (also see discussion)
Seen in https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/4504/testReport/junit/bootstrap_test/TestBootstrap/Tests___Sanity_Tests___test_off_strategy_during_bootstrap
The test fails with:
Checking the logs of the nodes,
fe31106e-176c-3aa2-9212-0878916a48f0
is the schema version, the schema gets on node1, where the table is originally created.Later, a new node, node2 is added. However, on node2, the schema gets a different schema version:
After this, node2 proceeds to successfully bootstrap, however when later
ks.cf
is queried, it fails because on node2, said table has a different version.The text was updated successfully, but these errors were encountered: