-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tablets: alter keyspace #16723
tablets: alter keyspace #16723
Conversation
🔴 CI State: FAILURE❌ - Build Build Details:
|
bf70578
to
7ebb49a
Compare
37055ac
to
abc7a01
Compare
ecf18cd
to
a7b8d74
Compare
a7b8d74
to
ea6f965
Compare
Version 3:
|
8f00651
to
f2220e7
Compare
Version 4:
CC @tgrabiec |
c23761f
to
48c4932
Compare
Version 5:
|
95b77d4
to
031bd31
Compare
Version 6:
|
CC @tgrabiec Version 7:
I repeatedly (approx. each ~45-60 seconds) get the following (which should happen only once?):
|
4cef6fb
to
cf47b4b
Compare
My guess is that waiting for topology request completion doesn't work, because this function doesn't handle global requests. |
…ifferent raft commands Since ALTER KS requires creating topology_change raft command, some functions need to be extended to handle it. RAFT commands are recognized by types, so some functions are just going to be parameterized by type, i.e. made into templates. These templates are instantiated already, so that only 1 instances of each template exists across the whole code base, to avoid compiling it in each translation unit.
This commit adds support for executing ALTER KS for keyspaces with tablets and utilizes all the previous commits. The ALTER KS is handled in alter_keyspace_statement, where a global topology request in generated with data attached to system.topology table. Then, once topology state machine is ready, it starts to handle this global topology event, which results in producing mutations required to change the schema of the keyspace, delete the system.topology's global req, produce tablets mutations and additional mutations for a table tracking the lifetime of the whole req. Tracking the lifetime is necessary to not return the control to the user too early, so the query processor only returns the response while the mutations are sent.
This patch removes the support for the "wildcard" replication_factor option for ALTER KEYSPACE when the keyspace supports tablets. It will still be supported for CREATE KEYSPACE so that a user doesn't have to know all datacenter names when creating the keyspace, but ALTER KEYSPACE will require that and the user will have to specify the exact change in replication factors they wish to make by explicitly specifying the datacenter names. Expanding the replication_factor option in the ALTER case is unintuitive and it's a trap many users fell into. See scylladb#8881, scylladb#15391, scylladb#16115
…than 1 We want to ensure that when the replication factor of a keyspace changes, it changes by at most 1 per DC if it uses tablets. The rationale for that is to make sure that the old and new quorums overlap by at least one node. After these changes, attempts to change the RF of a keyspace in any DC by more than 1 will fail.
This commit adds a test verifying that we can only change the RF of a keyspace for any DC by at most 1 when using tablets. Fixes scylladb#18029
Up until now we waited until mutations are in place and then returned directly to the caller of the ALTER statement, but that doesn't imply that tablets were deleted/created, so we must wait until the whole processing is done and return only then.
Now the scailing works and test must check it does Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When the test changes RF from 2 to 3, the extra node executes "rebuild" transition which means that it streams tablets replicas from two other peers. When doing it, the node receives two sets of sstables with mutations from the given tablet. The test part that checks if the extra node received the mutations notices two mutation fragments on the new replica and errorneously fails by seeing, that RF=3 is not equal to the number of mutations found, which is 4. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The check is performed by selecting from mutation_fragments(table), but it's known that this query crashes Scylla when there's no tablet replica on that node. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
5bb4ffe
to
66f6001
Compare
Yes. There are many ways to keep things working, I guess, but the way I imagined it working is that #18772 should be merged to 6.0 only, and this PR should be merged to The easiest would be to drop #18772, merge this PR to master and then backport to 6.0, but this is against plans of @mykaul. |
🟢 CI State: SUCCESS✅ - Build Build Details:
|
For the record -- this is to be backported to 6.0 once it hits the master |
We've discussed it on the daily call. We'll take the change RF to 6.0. Reject ALTER is not needed at this point. |
This change supports changing replication factor in tablets-enabled keyspaces. This covers both increasing and decreasing the number of tablets replicas through first building topology mutations (`alter_keyspace_statement.cc`) and then tablets/topology/schema mutations (`topology_coordinator.cc`). For the limitations of the current solution, please see the docs changes attached to this PR. refs: #16723 * br-backport-alter-ks-tablets: test: Do not check tablets mutations on nodes that don't have them test: Fix the way tablets RF-change test parses mutation_fragments test/tablets: Unmark RF-changing test with xfail docs: document ALTER KEYSPACE with tablets Return response only when tablets are reallocated cql-pytest: Verify RF is changes by at most 1 when tablets on cql3/alter_keyspace_statement: Do not allow for change of RF by more than 1 Reject ALTER with 'replication_factor' tag Implement ALTER tablets KEYSPACE statement support Parameterize migration_manager::announce by type to allow executing different raft commands Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks Extend system.topology with 3 new columns to store data required to process alter ks global topo req Allow query_processor to check if global topo queue is empty Introduce new global topo `keyspace_rf_change` req New raft cmd for both schema & topo changes Add storage service to query processor tablets: tests for adding/removing replicas tablet_allocator: make load_balancer_stats_manager configurable by name
Backported into 6.0 |
This change supports changing replication factor in tablets-enabled keyspaces.
This covers both increasing and decreasing the number of tablets replicas through
first building topology mutations (
alter_keyspace_statement.cc
) and thentablets/topology/schema mutations (
topology_coordinator.cc
).For the limitations of the current solution, please see the docs changes attached to this PR.
Fixes: #16129