-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
redpanda: cluster will not form without a node with an empty seed server list #333
Comments
Probably related/prerequisite: #245 |
i remember @mmaslankaprv mentioning we had a restriction to form the raft groups from one node, i.e.: to bootstrap from one node. we needed to differentiate node joining vs node bootstrap. i think we have more metadata tracking now we can make that distinction. basically if not in set. |
Currently we operate with the following assumptions:
|
How does one restart the cluster root? Should it have seeds? What if it lost its data dir? Would it make sense to have a two-phase initialisation, where a tool, perhaps RPK, triggers cluster formation? |
Restarting node is not a problem. The problem is when it lose the data directory, then we have to change configuration to point it to different nodes to join the cluster. I am wondering how is this solved in CoackroachDB. They similar approach to seed servers. |
I think CockroachDB does two-phase. The concern is that during a network partition, bootstrapping must form within the majority partition. An unusual situation for sure, but it comes up if the cluster root loses it's data and is restarted. The context here is within Kubernetes. It's not so easy with StatefulSet, to have a node behave differently depending on whether it is the cluster root, and whether it has lost its data. It could be punted to an operator, but it might make sense to have Redpanda perform the magic. |
Yes, CockroachDB uses a two-phased init. Each server is brought up with the same "join" list. Once those nodes are up, if a cluster hasn't been formed in the past, they go into standby until a node gets an "init" command. https://www.cockroachlabs.com/docs/v20.2/cockroach-init.html |
I think we can make the operation to be two step without implementing the centralized configuration. We can introduce the centralized configuration as a follow up. |
I think the two-phase init makes sense -- that should probably be hidden behind an rpk setup command that runs it for the user after daemons start. We should also retain the current behaviour that writing a config with seed_servers=[] causes a node to auto-init, so that a single node cluster init is still a trivial case of just running a binary. |
Related to #2793 -- once both are done, a cluster could realistically use the same redpanda.yml on all nodes. |
Leaving this here as it may affect the solution to this issue. It turns out this is the case, except when the cluster has TLS and mutual authentication on the Kafka API endpoint.
So, the seed server list needs to be completely empty for the initial cluster to be created. |
…nitial raft group This tries to solve the problem with empty seed_servers on node 0. With this change, all fresh clusters will be initially set to 1 replica (via `status.currentReplicas`), until a cluster is created and the operator can verify it via admin API. Then the cluster is scaled to the number of instances desired by the user. After the cluster is initialized, and for the entire lifetime of the cluster, the `seed_servers` property will be populated with the full list of available servers, in every node of the cluster. This overcomes redpanda-data#333. Previously, node 0 was always forced to have an empty seed_servers property, but this caused problems when it lost the data dir, as it tried to create a brand-new cluster. With this change, even if node 0 loses the data dir, the seed_servers property will always point to other nodes, so it will try to join the existing cluster.
…nitial raft group This tries to solve the problem with empty seed_servers on node 0. With this change, all fresh clusters will be initially set to 1 replica (via `status.currentReplicas`), until a cluster is created and the operator can verify it via admin API. Then the cluster is scaled to the number of instances desired by the user. After the cluster is initialized, and for the entire lifetime of the cluster, the `seed_servers` property will be populated with the full list of available servers, in every node of the cluster. This overcomes redpanda-data#333. Previously, node 0 was always forced to have an empty seed_servers property, but this caused problems when it lost the data dir, as it tried to create a brand-new cluster. With this change, even if node 0 loses the data dir, the seed_servers property will always point to other nodes, so it will try to join the existing cluster.
…nitial raft group This tries to solve the problem with empty seed_servers on node 0. With this change, all fresh clusters will be initially set to 1 replica (via `status.currentReplicas`), until a cluster is created and the operator can verify it via admin API. Then the cluster is scaled to the number of instances desired by the user. After the cluster is initialized, and for the entire lifetime of the cluster, the `seed_servers` property will be populated with the full list of available servers, in every node of the cluster. This overcomes redpanda-data#333. Previously, node 0 was always forced to have an empty seed_servers property, but this caused problems when it lost the data dir, as it tried to create a brand-new cluster. With this change, even if node 0 loses the data dir, the seed_servers property will always point to other nodes, so it will try to join the existing cluster.
@jcsp am I reading the thread above correctly, we need the code to exist in redpanda first, and once redpanda handles an empty seed servers, we can change rpk to emit no seed servers. I'll move this too our own "awaiting other team" queue. cc @piyushredpanda |
There will be a bit more to it than that. We haven't nailed this down yet, but probably:
Auto-selection node_id is a separate but complementary thing: that enables orchestators to avoid picking node Ids for redpanda nodes, just leave it out the config file and redpanda will make one up. |
initializing a single node cluster should not set seeds in order to trigger auto-init per redpanda-data/redpanda#333 (comment) also remove apparently invalid `empty_seed_starts_cluster` flag per: ``` INFO 2022-11-09 02:19:46,745 [shard 0] redpanda::main - application.cc:255 - Failure during startup: std::invalid_argument (Unknown property empty_seed_starts_cluster) ```
…create initial raft group This tries to solve the problem with empty seed_servers on node 0. With this change, all fresh clusters will be initially set to 1 replica (via `status.currentReplicas`), until a cluster is created and the operator can verify it via admin API. Then the cluster is scaled to the number of instances desired by the user. After the cluster is initialized, and for the entire lifetime of the cluster, the `seed_servers` property will be populated with the full list of available servers, in every node of the cluster. This overcomes redpanda-data#333. Previously, node 0 was always forced to have an empty seed_servers property, but this caused problems when it lost the data dir, as it tried to create a brand-new cluster. With this change, even if node 0 loses the data dir, the seed_servers property will always point to other nodes, so it will try to join the existing cluster.
…0 create initial raft group This tries to solve the problem with empty seed_servers on node 0. With this change, all fresh clusters will be initially set to 1 replica (via `status.currentReplicas`), until a cluster is created and the operator can verify it via admin API. Then the cluster is scaled to the number of instances desired by the user. After the cluster is initialized, and for the entire lifetime of the cluster, the `seed_servers` property will be populated with the full list of available servers, in every node of the cluster. This overcomes redpanda-data#333. Previously, node 0 was always forced to have an empty seed_servers property, but this caused problems when it lost the data dir, as it tried to create a brand-new cluster. With this change, even if node 0 loses the data dir, the seed_servers property will always point to other nodes, so it will try to join the existing cluster.
When setting up a cluster, you want to make sure all nodes have the same seed servers. This includes the initial node since if it was to come back with an empty data directory you would want it to be able to join the cluster automatically without user intervention. This does not work today. If you set up a three-node cluster with each node having all three nodes in its seeds list it will never form a cluster. You see the following from the node:
It seems no one knows who should be the bootstrap server and thus the cluster never forms.
The text was updated successfully, but these errors were encountered: