-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] In HA installs, wait for the first node to be ready before joining others #895
Comments
When updating the docs with this - please also add the info on how to test local and remote if the first node is ready and etcd is not in a status where an other secondary master is in process of joining.. |
Maybe something interesting to add in the doc : While experiencing issues with a simultaneous rke2 / etcd servers start, I could validate that for example a 1s wait between rke2 / etcd starts was ok. |
Is this only when joining the second master, or do we have to wait between all the masters joining? |
Only one node can join the etcd cluster at a time, so all servers (we don't have masters) need to wait for previous nodes to complete their join before joining. |
Yes and this would be great if this is included in the documentation. |
@brandond why did you unassign @rancher-max ? |
When running through milestone items after @cjellick dropped and asked the rest of the team to finish moving things out of the v1.21.2+rke2r1 milestone, this issue came up and @rancher-max indicated that it was unclear why he was assigned an issue that's not ready for testing. If he's expected to write the documentation and put in the PR then someone needs to remind him. |
Oh okay gotcha. @rancher-max and I synced on this and he agreed he could do the docs work here previously. I'll reassign to him. |
Can this limitation be wrapped by This means bootstrapping RKE2 clusters using tools like Terraform get a little more painful. Typically you'd create a group of master nodes using something like a resource "virtual_machine" "server_nodes" {
for_each = {
master1 = 1.1.1.1
master2 = 1.1.1.2
master3 = 1.1.1.3
}
....
} The only way to truly force the first server node to be online before creating the others is to introduce dependencies between server nodes. This creates a lot of problems because if you need to recreate the 1st server node, all the dependencies need to be destroyed which destroys the whole server plane. The other option which is more of a hack, is when your |
You do need to have dependencies between nodes though - exactly one of your nodes must be started with That said we will probably eventually add some retry behavior so that joins work better; this is tracked under #897 |
This doesn't really apply anymore as in all of the latest releases it is actually possible to run all the rke2-server processes at the same time, so I'm going to close this docs issue. See #349 for details on the fix. |
Need to update documentation to call out the importance of waiting for the initial node to be running before joining other server nodes, due to the limitations in etcd learners.
The text was updated successfully, but these errors were encountered: