Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migration_manager: schema version correctness depends on order of feature enabling #16004

Closed
denesb opened this issue Nov 8, 2023 · 37 comments
Closed
Assignees
Labels
area/schema changes status/release blocker Preventing from a release to be promoted symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework
Milestone

Comments

@denesb
Copy link
Contributor

denesb commented Nov 8, 2023

Summary

migration_manager::init_messaging_service() installs listeners on interesting schema-related features. There are two kinds of action taken on enabling of different schema features:

  • update_schema() - recalculate schema digest
  • reload_schema() - reload schema from disk, recalculate version of individual tables

TABLE_DIGEST_INSENSITIVE_TO_EXPIRY triggers a reload_schema() call, the rest trigger update_schema(). However TABLE_DIGEST_INSENSITIVE_TO_EXPIRY doesn't take effect on its own, it needs DIGEST_INSENSITIVE_TO_EXPIRY to be also enabled, for it to take effect. This pretty much always happens with the current order of feature processing. BUT:

  • The feature container is std::unordered_map, so adding new features, renaming existing ones can change the orders in unexpected ways.
  • reload_schema() runs in the background and doesn't wait for enabling of DIGEST_INSENSITIVE_TO_EXPIRY.

The end result: any change in the order of features or timing, will result in reload_schema() not taking effect and tables having wrong schema versions in memory. This can be fixed with a reboot.

Details (also see discussion)

Seen in https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/4504/testReport/junit/bootstrap_test/TestBootstrap/Tests___Sanity_Tests___test_off_strategy_during_bootstrap

The test fails with:

cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.0.56.1:9042 datacenter1>: <Error from server: code=0000 [Server error] message="Schema version fe31106e-176c-3aa2-9212-0878916a48f0 not found">, <Host: 127.0.56.2:9042 datacenter1>: <Error from server: code=0000 [Server error] message="Schema version b7028222-6072-370f-a4eb-7d6637c0bdaf not found">})

Checking the logs of the nodes, fe31106e-176c-3aa2-9212-0878916a48f0 is the schema version, the schema gets on node1, where the table is originally created.
Later, a new node, node2 is added. However, on node2, the schema gets a different schema version:

INFO  2023-11-03 11:32:50,948 [shard 0:stmt] schema_tables - Creating ks.cf id=b5f542c0-7a3c-11ee-8f31-36ede729ddfa version=b7028222-6072-370f-a4eb-7d6637c0bdaf
INFO  2023-11-03 11:32:50,948 [shard 0:stmt] schema_tables - Creating system_distributed.service_levels id=b8c556bd-212d-37ad-9484-690c73a5994b version=ae051e00-84fa-393e-994d-8c75166a899c
INFO  2023-11-03 11:32:50,949 [shard 0:stmt] schema_tables - Creating system_traces.node_slow_log id=bfcc4e62-5b63-3aa1-a1c3-6f5e47f3325c version=3a93a9bd-67aa-3f1a-a679-aa4723b3df0b
INFO  2023-11-03 11:32:50,949 [shard 0:stmt] schema_tables - Creating system_traces.sessions id=c5e99f16-8677-3914-b17e-960613512345 version=96836fad-fdb0-3a72-bd10-4f972b32636a
INFO  2023-11-03 11:32:50,949 [shard 0:stmt] schema_tables - Creating system_traces.node_slow_log_time_idx id=f9706768-aa1e-3d87-9e5c-51a3927c2870 version=8622004e-038b-3d45-8c88-437e6331fe5b
INFO  2023-11-03 11:32:50,950 [shard 0:stmt] schema_tables - Creating system_distributed.cdc_generation_timestamps id=fdf455c4-cfec-3e00-9719-d7a45436c89d version=ed3059fb-5129-30be-ae3a-329e3cb6fc0d
INFO  2023-11-03 11:32:50,952 [shard 0:comp] compaction - [Compact system_schema.columns b8a52080-7a3c-11ee-b2b7-34e75fc33ec1] Compacting [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-3pqqjy_4/test/node2/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/mc-4-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-3pqqjy_4/test/node2/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/mc-2-big-Data.db:level=0:origin=memtable]
INFO  2023-11-03 11:32:50,961 [shard 0:stmt] schema_tables - Schema version changed to 5767fb1b-bfd8-3ced-8173-1b93726fce0d
INFO  2023-11-03 11:32:50,962 [shard 0:main] raft_group_registry - marking Raft server e078fe75-6334-4d70-9823-ea48bc060caf as alive for raft groups
INFO  2023-11-03 11:32:50,963 [shard 0:strm] raft_group0 - setup_group0: waiting for peers to synchronize state...
INFO  2023-11-03 11:32:50,963 [shard 0:strm] raft_group0_upgrade - wait_for_peers_to_enter_synchronize_state: 127.0.56.1 confirmed that they finished upgrade.
INFO  2023-11-03 11:32:50,963 [shard 0:strm] raft_group0 - setup_group0: the cluster is ready to use Raft. Finishing.
INFO  2023-11-03 11:32:50,964 [shard 0:comp] compaction - [Compact system_schema.columns b8a52080-7a3c-11ee-b2b7-34e75fc33ec1] Compacted 2 sstables to [/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.dtest/dtest-3pqqjy_4/test/node2/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/mc-6-big-Data.db:level=0]. 22kB to 19kB (~85% of original) in 9ms = 2MB/s. ~256 total partitions merged to 7.
INFO  2023-11-03 11:32:50,966 [shard 0:strm] features - Feature TABLE_DIGEST_INSENSITIVE_TO_EXPIRY is enabled
INFO  2023-11-03 11:32:50,966 [shard 0:strm] migration_manager - Reloading schema
INFO  2023-11-03 11:32:50,966 [shard 0:strm] features - Feature UUID_SSTABLE_IDENTIFIERS is enabled
INFO  2023-11-03 11:32:50,966 [shard 0:strm] features - Feature SECONDARY_INDEXES_ON_STATIC_COLUMNS is enabled
INFO  2023-11-03 11:32:50,966 [shard 0:strm] features - Feature LARGE_COLLECTION_DETECTION is enabled
INFO  2023-11-03 11:32:50,966 [shard 0:strm] features - Feature COLLECTION_INDEXING is enabled
INFO  2023-11-03 11:32:50,966 [shard 0:strm] features - Feature PER_TABLE_PARTITIONERS is enabled
INFO  2023-11-03 11:32:50,966 [shard 0:strm] features - Feature LWT is enabled
INFO  2023-11-03 11:32:50,966 [shard 0:strm] features - Feature NONFROZEN_UDTS is enabled
INFO  2023-11-03 11:32:50,966 [shard 0:strm] features - Feature RANGE_SCAN_DATA_VARIANT is enabled
INFO  2023-11-03 11:32:50,966 [shard 0:strm] features - Feature CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX is enabled
INFO  2023-11-03 11:32:50,966 [shard 0:strm] features - Feature VIEW_VIRTUAL_COLUMNS is enabled
INFO  2023-11-03 11:32:50,966 [shard 0:strm] features - Feature MD_SSTABLE_FORMAT is enabled
INFO  2023-11-03 11:32:50,971 [shard 0:strm] schema_tables - Reloading system_distributed.cdc_streams_descriptions_v2 id=0bf73fd7-65b2-36b0-85e5-658131d5df36 version=5dbcd7ee-ee18-3a51-bf55-1c4ce40b7d79
INFO  2023-11-03 11:32:50,971 [shard 0:strm] schema_tables - Reloading system_traces.sessions_time_idx id=0ebf001c-c1d1-3693-9a63-c3d96ac53318 version=1c4a42d0-5d34-3d49-bcac-b7d757259aa3
INFO  2023-11-03 11:32:50,971 [shard 0:strm] schema_tables - Reloading system_auth.role_members id=0ecdaa87-f8fb-3e60-88d1-74fb36fe5c0d version=c358d59a-560e-32ce-8c56-d5d64b13ae5d
INFO  2023-11-03 11:32:50,971 [shard 0:strm] schema_tables - Reloading system_distributed_everywhere.cdc_generation_descriptions_v2 id=234d2227-dd63-3d37-ac5f-c013e2ea9e6e version=4da047b9-32d0-3032-85e6-c419adb5fca7
INFO  2023-11-03 11:32:50,971 [shard 0:strm] schema_tables - Reloading system_distributed.view_build_status id=5582b59f-8e4e-35e1-b913-3acada51eb04 version=f21c5312-4f07-3670-b1d8-1561d47d48c0
INFO  2023-11-03 11:32:50,971 [shard 0:strm] schema_tables - Reloading system_auth.roles id=5bc52802-de25-35ed-aeab-188eecebb090 version=f71e5f3a-5fca-321b-a9d1-e07af824a636
INFO  2023-11-03 11:32:50,971 [shard 0:strm] schema_tables - Reloading system_auth.role_attributes id=6b8c7359-a843-33f2-a1d8-5dc6a187436f version=4625931c-1e4a-37ca-b954-ae8d45da8a2a
INFO  2023-11-03 11:32:50,972 [shard 0:strm] schema_tables - Reloading system_traces.events id=8826e8e9-e16a-3728-8753-3bc1fc713c25 version=36007558-238e-30b0-8800-019d3bd6eec7
INFO  2023-11-03 11:32:50,972 [shard 0:strm] schema_tables - Reloading ks.cf id=b5f542c0-7a3c-11ee-8f31-36ede729ddfa version=b7028222-6072-370f-a4eb-7d6637c0bdaf
INFO  2023-11-03 11:32:50,972 [shard 0:strm] schema_tables - Reloading system_distributed.service_levels id=b8c556bd-212d-37ad-9484-690c73a5994b version=ae051e00-84fa-393e-994d-8c75166a899c
INFO  2023-11-03 11:32:50,972 [shard 0:strm] schema_tables - Reloading system_traces.node_slow_log id=bfcc4e62-5b63-3aa1-a1c3-6f5e47f3325c version=3a93a9bd-67aa-3f1a-a679-aa4723b3df0b
INFO  2023-11-03 11:32:50,972 [shard 0:strm] schema_tables - Reloading system_traces.sessions id=c5e99f16-8677-3914-b17e-960613512345 version=96836fad-fdb0-3a72-bd10-4f972b32636a
INFO  2023-11-03 11:32:50,972 [shard 0:strm] schema_tables - Reloading system_traces.node_slow_log_time_idx id=f9706768-aa1e-3d87-9e5c-51a3927c2870 version=8622004e-038b-3d45-8c88-437e6331fe5b
INFO  2023-11-03 11:32:50,972 [shard 0:strm] schema_tables - Reloading system_distributed.cdc_generation_timestamps id=fdf455c4-cfec-3e00-9719-d7a45436c89d version=ed3059fb-5129-30be-ae3a-329e3cb6fc0d
INFO  2023-11-03 11:32:50,972 [shard 0:strm] query_processor - Column definitions for system_distributed.cdc_streams_descriptions_v2 changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,972 [shard 0:strm] query_processor - Column definitions for system_traces.sessions_time_idx changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,972 [shard 0:strm] query_processor - Column definitions for system_auth.role_members changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,972 [shard 0:strm] query_processor - Column definitions for system_distributed_everywhere.cdc_generation_descriptions_v2 changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,972 [shard 0:strm] query_processor - Column definitions for system_distributed.view_build_status changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,972 [shard 0:strm] query_processor - Column definitions for system_auth.roles changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,972 [shard 0:strm] query_processor - Column definitions for system_auth.role_attributes changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,972 [shard 0:strm] query_processor - Column definitions for system_traces.events changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,972 [shard 0:strm] query_processor - Column definitions for ks.cf changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,972 [shard 0:strm] query_processor - Column definitions for system_distributed.service_levels changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,972 [shard 0:strm] query_processor - Column definitions for system_traces.node_slow_log changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,972 [shard 0:strm] query_processor - Column definitions for system_traces.sessions changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,972 [shard 0:strm] query_processor - Column definitions for system_traces.node_slow_log_time_idx changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,972 [shard 0:strm] query_processor - Column definitions for system_distributed.cdc_generation_timestamps changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,973 [shard 1:strm] query_processor - Column definitions for system_distributed.cdc_streams_descriptions_v2 changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,973 [shard 1:strm] query_processor - Column definitions for system_traces.sessions_time_idx changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,973 [shard 1:strm] query_processor - Column definitions for system_auth.role_members changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,973 [shard 1:strm] query_processor - Column definitions for system_distributed_everywhere.cdc_generation_descriptions_v2 changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,973 [shard 1:strm] query_processor - Column definitions for system_distributed.view_build_status changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,973 [shard 1:strm] query_processor - Column definitions for system_auth.roles changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,973 [shard 1:strm] query_processor - Column definitions for system_auth.role_attributes changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,973 [shard 1:strm] query_processor - Column definitions for system_traces.events changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,973 [shard 1:strm] query_processor - Column definitions for ks.cf changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,973 [shard 1:strm] query_processor - Column definitions for system_distributed.service_levels changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,973 [shard 1:strm] query_processor - Column definitions for system_traces.node_slow_log changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,973 [shard 1:strm] query_processor - Column definitions for system_traces.sessions changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,973 [shard 1:strm] query_processor - Column definitions for system_traces.node_slow_log_time_idx changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,973 [shard 1:strm] query_processor - Column definitions for system_distributed.cdc_generation_timestamps changed, invalidating related prepared statements
INFO  2023-11-03 11:32:50,976 [shard 0:strm] schema_tables - Schema version changed to 599ea985-3978-3457-86d0-b78e4ea3208e
INFO  2023-11-03 11:32:50,979 [shard 0:strm] schema_tables - Schema version changed to 599ea985-3978-3457-86d0-b78e4ea3208e
INFO  2023-11-03 11:32:50,982 [shard 0:strm] schema_tables - Schema version changed to 599ea985-3978-3457-86d0-b78e4ea3208e

After this, node2 proceeds to successfully bootstrap, however when later ks.cf is queried, it fails because on node2, said table has a different version.

@denesb denesb added area/schema changes symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework labels Nov 8, 2023
@denesb
Copy link
Contributor Author

denesb commented Nov 8, 2023

Interesting to note, that the two nodes have the same schema versions. Yet they have different versions for ks.cf.

@denesb
Copy link
Contributor Author

denesb commented Nov 8, 2023

@denesb
Copy link
Contributor Author

denesb commented Nov 8, 2023

@denesb
Copy link
Contributor Author

denesb commented Nov 8, 2023

@kbr-scylla is this related to the recent consistent schema version changes? Maybe there is another corner-case where it fails and actually generates different versions on different nodes.

@kbr-scylla
Copy link
Contributor

@denesb please upload test and relevant node logs so they don't get lost after jenkins gets garbage-collected

@kbr-scylla is this related to the recent consistent schema version changes?

these changes got reverted

@kbr-scylla
Copy link
Contributor

Are these CI runs on the PR where you add the new cluster feature?

Schema versions can mismatch because of different subset of enabled features on the two nodes, so it might be related

@denesb
Copy link
Contributor Author

denesb commented Nov 8, 2023

Are these CI runs on the PR where you add the new cluster feature?

Schema versions can mismatch because of different subset of enabled features on the two nodes, so it might be related

Yes, all these are from #15890. Which for some reason reproduces this issue very consistently. It must be related to adding the new feature, but I don't yet see how.

@denesb
Copy link
Contributor Author

denesb commented Nov 8, 2023

What I don't understand is: from the POV of a new cluster, all features are new. So how is adding a new feature matter here? These are not upgrade tests.

@kbr-scylla
Copy link
Contributor

What if... the string that lists all of the features became too long, and gets truncated somewhere? 😱

@denesb
Copy link
Contributor Author

denesb commented Nov 8, 2023

I even considered that the problem is that I added the new feature member, into the middle of the feature member list. But I see no sign of feature member declaration order mattering at all, in the code. Maybe I missed something?

Another things I considered is that maybe with the new feature we go over some threshold of feature numbers.

These are silly theories but I'm getting desperate.

@kbr-scylla
Copy link
Contributor

@xemul @piodul does this ring any bell -- how can adding a new cluster feature (and doing nothing other than that) completely break everything?

@denesb
Copy link
Contributor Author

denesb commented Nov 8, 2023

@denesb please upload test and relevant node logs so they don't get lost after jenkins gets garbage-collected

Logs for the simplest example -- the test mentioned in the opening message.:
dtest-gw1.log
node1.log
node2.log

@kbr-scylla
Copy link
Contributor

Or what if there are so many features now, that enabling them takes too long, and now we're hitting some race that we weren't hitting before?

@kbr-scylla
Copy link
Contributor

cc @tgrabiec maybe you have some idea

how can adding a new cluster feature (and doing nothing other than that) completely break everything?

@piodul
Copy link
Contributor

piodul commented Nov 8, 2023

Just to be clear: all the test runs mentioned here so far contain the whole fix from #15890, not just the line that adds the feature?

Do those runs have topology-on-raft enabled or not?

@kbr-scylla
Copy link
Contributor

Do those runs have topology-on-raft enabled or not?

The dtest ones don't

@kbr-scylla
Copy link
Contributor

all the test runs mentioned here so far contain the whole fix from #15890, not just the line that adds the feature?

Ahh, good point

@denesb could it be that e.g. we read empty pages when reading schema tables, and there's some kind of bug there that makes us skip parts of mutations, and that leads to schema mismatch?

How can you be sure that your change is not the cause here?

@piodul
Copy link
Contributor

piodul commented Nov 8, 2023

What I don't understand is: from the POV of a new cluster, all features are new. So how is adding a new feature matter here? These are not upgrade tests.

Note that all cluster features start as disabled for a joining node. The node verifies that it supports all of the features enabled by the cluster and only then locally marks them as enabled.

@kbr-scylla
Copy link
Contributor

@denesb if we get this failure from ONLY adding the feature flag, then it's plausible there's some underlying problem with features, and it would make sense to spend time digging into features implementation, or schema, or whatever.

BUT if, in order to trigger this, you actually have to use the feature to protect your new code -- then I think we should first place most suspicions on your new code.

@kbr-scylla
Copy link
Contributor

kbr-scylla commented Nov 8, 2023

Note that all cluster features start as disabled for a joining node. The node verifies that it supports all of the features enabled by the cluster and only then locally marks them as enabled.

Ah, perhaps this is racing with queries being executed, and some of them set this "allow empty page" flag, some of them don't? Or even, there could be a single query (a scan perhaps?) that is being executed, and part of it is executed without the flag, but the next part with? @denesb

edit: and the query returns multiple pages

@denesb
Copy link
Contributor Author

denesb commented Nov 8, 2023

@kbr-scylla the tests failed even when I commented out all the code, except for the feature addition. Also, the feature guards a bit used in read-repair, and all schema table reads are local, no?

@kbr-scylla
Copy link
Contributor

@kbr-scylla the tests failed even when I commented out all the code, except for the feature addition.

Ok, here: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/4503/

we have a run with commit 0375aa0, that has only the feature added, without the if, and it also contains Tests / Sanity Tests / bootstrap_test.TestBootstrap.test_off_strategy_during_bootstrap which appears to have failed due to the same reason

@kostja kostja self-assigned this Nov 8, 2023
@piodul
Copy link
Contributor

piodul commented Nov 8, 2023

I took a short look. I can confirm that this issue reproduces relatively often when I run it locally (~50% of the time). I noticed that, curiously, changing the name of the feature makes the problem go away.

I suspect this can have something to do with the order in which features are enabled. When a node wants to enable features on join after checking them with the cluster, it goes through the registered features (kept in unordered_map) and enables the ones that the cluster already enables. It looks like adding a new feature changes the iteration order of that unordered_map, (changing the new feature's name does that, too). Here is a diff of preprocessed node logs from the dtest, feats1 is without the feature and feats2 is with the new feature:

(dtest-3.11.5) [piodul@scyllapc scylla-dtest]$ cat testlogs/last/node2.log | cut -d' ' -f9- | grep 'Feature.*is enabled'
...
(dtest-3.11.5) [piodul@scyllapc scylla-dtest]$ diff feats1 feats2  -u -U 1000
--- feats1      2023-11-08 23:01:29.046491639 +0100
+++ feats2      2023-11-08 22:53:52.374563787 +0100
@@ -1,29 +1,30 @@
 Feature TABLE_DIGEST_INSENSITIVE_TO_EXPIRY is enabled
 Feature UUID_SSTABLE_IDENTIFIERS is enabled
 Feature SECONDARY_INDEXES_ON_STATIC_COLUMNS is enabled
 Feature LARGE_COLLECTION_DETECTION is enabled
+Feature COLLECTION_INDEXING is enabled
 Feature PER_TABLE_PARTITIONERS is enabled
 Feature LWT is enabled
 Feature NONFROZEN_UDTS is enabled
 Feature RANGE_SCAN_DATA_VARIANT is enabled
-Feature DIGEST_INSENSITIVE_TO_EXPIRY is enabled
 Feature CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX is enabled
 Feature VIEW_VIRTUAL_COLUMNS is enabled
 Feature MD_SSTABLE_FORMAT is enabled
 Feature CDC_GENERATIONS_V2 is enabled
 Feature HINTED_HANDOFF_SEPARATE_CONNECTION is enabled
 Feature CDC is enabled
 Feature SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT is enabled
 Feature ME_SSTABLE_FORMAT is enabled
 Feature DIGEST_FOR_NULL_VALUES is enabled
 Feature PER_TABLE_CACHING is enabled
 Feature ALTERNATOR_TTL is enabled
 Feature PARALLELIZED_AGGREGATION is enabled
 Feature UDA is enabled
-Feature COLLECTION_INDEXING is enabled
 Feature EMPTY_REPLICA_PAGES is enabled
+Feature DIGEST_INSENSITIVE_TO_EXPIRY is enabled
+Feature EMPTY_REPLICA_MUTATION_PAGES is enabled
 Feature TOMBSTONE_GC_OPTIONS is enabled
 Feature AGGREGATE_STORAGE_OPTIONS is enabled
 Feature TYPED_ERRORS_IN_READ_RPC is enabled
 Feature SCHEMA_COMMITLOG is enabled
 Feature UDA_NATIVE_PARALLELIZED_AGGREGATION is enabled

For some features (including DIGEST_INSENSITIVE_TO_EXPIRY for example), there are callbacks registered which run at the moment when the feature is enabled. Perhaps some of them depended on the order and changing that order makes them fail.

As a side note, maybe we could randomize the order of enabling features, at least in debug mode? I don't think the code should rely on any particular order.

@denesb
Copy link
Contributor Author

denesb commented Nov 9, 2023

I managed to reproduce locally too. Comparing the logs for a failed and successful run:

In both cases, the table on the schema node is created while features are still being negotiated. In both cases, node2 ends up with a different schema version initially.

As the features are enabled, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY kicks off a reload of schema from the disk. In the successful run, this ends up changing the version of the schema, to the same one that node1 calculated before. In the failed run, this reload doesn't change the schema version. The only difference I could find, is that in the successful run, all schema features were already enabled by the time the (background) reload actually got to ks.cf. In the failed run, some of the features (e.g. DIGEST_FOR_NULL_VALUES and DIGEST_INSENSITIVE_TO_EXPIRY) were enabled after the schema was reloaded.

Note that DIGEST_INSENSITIVE_TO_EXPIRY also kicks off an update_schema process, but this apparently doesn't end up recalculating digests.

@denesb
Copy link
Contributor Author

denesb commented Nov 9, 2023

I'm testing a fix...

@denesb denesb self-assigned this Nov 9, 2023
denesb added a commit to denesb/scylla that referenced this issue Nov 9, 2023
…to_expiry

Currently, when said feature is enable, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct schema versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry also enabled. This used to be the
case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence (unguaranteed) feature order, and
table schemas have a correct version after all features are enabled.

Fixes: scylladb#16004
@denesb
Copy link
Contributor Author

denesb commented Nov 9, 2023

Fix here: #16013

@denesb
Copy link
Contributor Author

denesb commented Nov 9, 2023

With the above fix $ pytest --cassandra-dir=~/ScyllaDB/scylla/worktree1/build/dev --keep-test-dir bootstrap_test.py::TestBootstrap::test_off_strategy_during_bootstrap --count=10 all passed for me, on top of #15890. Before, this test would fail roughly 50% of the time.

denesb added a commit to denesb/scylla that referenced this issue Nov 9, 2023
…to_expiry

Currently, when said feature is enabled, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry is also enabled. This used to be
the case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence (unguaranteed) feature order, and
table schemas have a correct version after all features are enabled.

Fixes: scylladb#16004
denesb added a commit to denesb/scylla that referenced this issue Nov 9, 2023
…to_expiry

Currently, when said feature is enabled, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry is also enabled. This used to be
the case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence on (unguaranteed) feature order, and
to guarantee that table schemas have a correct version after all features
are enabled.

Fixes: scylladb#16004
@mykaul mykaul added backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed status/release blocker Preventing from a release to be promoted labels Nov 9, 2023
@kbr-scylla
Copy link
Contributor

As a side note, maybe we could randomize the order of enabling features, at least in debug mode? I don't think the code should rely on any particular order.

Good idea, I created an issue for that: #16014

denesb added a commit to denesb/scylla that referenced this issue Nov 9, 2023
…to_expiry

Currently, when said feature is enabled, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry is also enabled. This used to be
the case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence on (unguaranteed) feature order, and
to guarantee that table schemas have a correct version after all features
are enabled.

Fixes: scylladb#16004
denesb added a commit to denesb/scylla that referenced this issue Nov 13, 2023
…to_expiry

Currently, when said feature is enabled, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry is also enabled. This used to be
the case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence on (unguaranteed) feature order, and
to guarantee that table schemas have a correct version after all features
are enabled. In fact, all schema feature notification handlers now kick
off a full schema reload, to ensure bugs like this don't creep in, in
the future.

Fixes: scylladb#16004
denesb added a commit to denesb/scylla that referenced this issue Nov 13, 2023
…to_expiry

Currently, when said feature is enabled, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry is also enabled. This used to be
the case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence on (unguaranteed) feature order, and
to guarantee that table schemas have a correct version after all features
are enabled. In fact, all schema feature notification handlers now kick
off a full schema reload, to ensure bugs like this don't creep in, in
the future.

Fixes: scylladb#16004
@mykaul
Copy link
Contributor

mykaul commented Nov 16, 2023

Can it this be backported promptly, to unblock 5.4?

denesb added a commit that referenced this issue Nov 16, 2023
…to_expiry

Currently, when said feature is enabled, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry is also enabled. This used to be
the case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence on (unguaranteed) feature order, and
to guarantee that table schemas have a correct version after all features
are enabled. In fact, all schema feature notification handlers now kick
off a full schema reload, to ensure bugs like this don't creep in, in
the future.

Fixes: #16004

Closes #16013

(cherry picked from commit 2238144)
@denesb
Copy link
Contributor Author

denesb commented Nov 16, 2023

Backport queued to 5.4.

@denesb denesb removed Backport candidate backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed labels Nov 16, 2023
tgrabiec pushed a commit that referenced this issue Nov 21, 2023
…to_expiry

Currently, when said feature is enabled, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry is also enabled. This used to be
the case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence on (unguaranteed) feature order, and
to guarantee that table schemas have a correct version after all features
are enabled. In fact, all schema feature notification handlers now kick
off a full schema reload, to ensure bugs like this don't creep in, in
the future.

Fixes: #16004

Closes #16013

(cherry picked from commit 2238144)
@tgrabiec
Copy link
Contributor

Backported to 5.2

tgrabiec pushed a commit that referenced this issue Nov 21, 2023
…to_expiry

Currently, when said feature is enabled, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry is also enabled. This used to be
the case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence on (unguaranteed) feature order, and
to guarantee that table schemas have a correct version after all features
are enabled. In fact, all schema feature notification handlers now kick
off a full schema reload, to ensure bugs like this don't creep in, in
the future.

Fixes: #16004

Closes #16013

(cherry picked from commit 2238144)
(cherry picked from commit e31f222)
@tgrabiec
Copy link
Contributor

Backported to 5.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/schema changes status/release blocker Preventing from a release to be promoted symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants