-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tablets] Support RF changes using ALTER KEYSPACE #16129
Comments
Currently, RF is changed by altering keyspace options. It can be safely changed only by 1, in which case the old and new quorums must overlap. Afterwards, admin should run repair to reduce risk of data loss, since changing of RF doesn't replicate old writes automatically. The plan is to start with something which resembles the current way things work and then improve it to be safe against any RF changes, and to replicate automatically. Unlike with vnodes, tablet replicas are explicitly stated for each tablet (token range). So to make the current procedure work with tablets, we should extend alter keyspace execution to walk over tablet metadata and change the replicas accordingly, by allocating new replicas or dropping them. This cannot be done on the spot as a group0 transaction if some affected tablet is currently migrating. To solve that, we make tablets updating a topology transaction executed by topology change coordinator, which excludes with tablet migration globally. We introduce a new kind of The CQL statement should fail if there is already an ongoing request. User will have to retry when the previous one is finished. The CQL request handler should wait for the request to complete before returning, this can be done using the virtual task API. If the CQL shell lost track of it, it will be available via task manager API. So this request should integrate with virtual task API (#16374). Tablet replica selection should reuse It can happen that RF cannot be achieved. In this case the operation should fail. We should also fail if RF is changed by more than 1, since the procedure is not safe in this case. When adding a new DC, it can happen that keyspace metadata already has RF for the DC, but nodes are not bootstrapped yet in that DC, so tablet allocation will fail. We should require users to first add nodes and then set the replication factor for the DC, and that's what our docs recommend to do: https://opensource.docs.scylladb.com/stable/operating-scylla/procedures/cluster-management/add-dc-to-existing-dc.html When determining the list of DCs, one should look at tm->get_topology()->get_datacenters() and not keyspace options, since some keyspace options are not DCs (e.g. 'replication_factor'). See docs/dev/topology-over-raft.md |
For tablets, the replication factor is stored in two places:
If we make storage_proxy look at the tablets table (via effective_replication_map), then the replication clause becomes a goal for the load balancer. It sees the discrepancy between the replication clause and the tablets table, and starts working to reconcile the discrepancy. Once it's done, the ALTER KEYSPACE statement completes. |
That's more complicated to implement, so I'd suggest we defer it. |
Sure, we don't have to do everything in one day., |
IMO it's okay to allow any RF changes during the first phase. It's consistent with what we do with vnodes. The user is responsible for running repair if they want reads not to lose data, or they can alter the replication factor by 1 each time. |
I realized there is one problem with simply changing the replica set. In order for the new tablet replica to accept requests, it must know the new tablet metadata (it creates compaction group for the tablet). There is a time window where some (storage_proxy) coordinator can already work with the new replica set, but new replica may still be at old metadata version. To prevent unnecessary request failures, we should go through a simplified tablet migration track in tablet's state machine, which has two stages, and doesn't do streaming. So request handler for RF change would initiate migrations and switch topology transition state to tablet migration track. Later, we will do repair there, to automatically repair new replicas. But to do that for arbitrary RF changes we need an infrastructure in storage_proxy to work with more than 1 pending replica. |
@tgrabiec@scylladb.com can the tablet scheduler make sure to rebuild just one replica at a time until we support multiple pending replicas in the storage proxy? |
It can, but we don't plan to do automatic rebuild on RF changes now. |
Refs #17846 |
We have to eliminate the query timeout when ALTERing KS. It probably doesn't matter if it's tablet-enabled KS or not, both can have the timeout disabled, which simplifies the implementation. Where to do it exactly (cqlsh, python driver?) has to be yet decided. |
I believe that the client timeout can be set per query by the application, so this can be done in cqlsh. |
#16723 is moved to 6.1. I believe so should this one. |
Makes sense |
The full support for ALTERing a tablets-enabled KEYSPACE is not yet implemented, and we don't want to only change the schema without changing any tablets, so the statement has to be explicitly rejected for cases that won't work, so every time any replication option is provided. Fixes: scylladb#18795 References: scylladb#16129
The full support for ALTERing a tablets-enabled KEYSPACE is not yet implemented, and we don't want to only change the schema without changing any tablets, so the statement has to be explicitly rejected for cases that won't work, so every time any replication option is provided. Fixes: scylladb#18795 References: scylladb#16129
The full support for ALTERing a tablets-enabled KEYSPACE is not yet implemented, and we don't want to only change the schema without changing any tablets, so the statement has to be explicitly rejected for cases that won't work, so every time any replication option is provided. Fixes: scylladb#18795 References: scylladb#16129
The full support for ALTERing a tablets-enabled KEYSPACE is not yet implemented, and we don't want to only change the schema without changing any tablets, so the statement has to be explicitly rejected for cases that won't work, so every time any replication option is provided. Fixes: scylladb#18795 References: scylladb#16129
The full support for ALTERing a tablets-enabled KEYSPACE is not yet implemented, and we don't want to only change the schema without changing any tablets, so the statement has to be explicitly rejected for cases that won't work, so every time any replication option is provided. Fixes: scylladb#18795 References: scylladb#16129
The full support for ALTERing a tablets-enabled KEYSPACE is not yet implemented, and we don't want to only change the schema without changing any tablets, so the statement has to be explicitly rejected for cases that won't work, so every time any replication option is provided. Fixes: scylladb#18795 References: scylladb#16129
The full support for ALTERing a tablets-enabled KEYSPACE is not yet implemented, and we don't want to only change the schema without changing any tablets, so the statement has to be explicitly rejected for cases that won't work, so every time any replication option is provided. Fixes: scylladb#18795 References: scylladb#16129
Support updating the RF startagy with ALTER KEYSPACE under Tablets.
Blocked by #16101
The text was updated successfully, but these errors were encountered: