New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Unable to restore rke2/k3s provisioned clusters from etcd snapshot if cluster is completely down #41080
Comments
It seems that the |
Encountered rancher/rke2#4052 (comment) while attempting to fix the |
https://github.com/rancher/rancher/pull/41459/files#diff-347b85f4b27f0bc66ce4f2e3c4ca653d3f1c1ffb9e2b4f7c35814cf7881fd8a5R526 adds e2e tests to restore etcd on both machine provisioned and custom clusters when etcd is completely down |
Ticket #41080 - Test Results - ❌ blockedReproduced with HA Helm Rancher on
Verified with HA Helm Rancher on
|
@Oats87 , is it correct to assume that for the Custom RKE2 clusters the procedure to restore is the same as one outlined for RKE1 here? |
Ticket #41080 - Test Results - ✅Verified with HA Helm Rancher on
Note:
Scenario 1 - (Fresh install)
Scenario 2 - (Upgrade)
|
The procedure is similar between machine provisioned and custom clusters. If you have a complete cluster failure, you must remove all etcd nodes/machines from your cluster before you can add a "new" etcd node for restore. NOTE If you are using local snapshots, it is VERY important that you ensure you back up the corresponding snapshot you want to restore from the ANOTHER NOTE This procedure is only usable with Rancher >= v2.7.6 -- if you follow this procedure with an older version of Rancher, it will not work as expected.
|
Ticket #41080 - Additional Scenarios - Test Results - ✅Verified with HA Helm Rancher on
Scenario 1 - (single-node:
Scenario 2 - (2 nodes, split-roles: 1
|
@Oats87 Is there a plan for this procedure to also work on Rancher Prime 2.7.3? |
@kaioneuhauss , code changes in linked PR(s) are necessary for the procedure so it will only work on the upcoming Q2 feature release (2.7.5, including Prime version). |
Rancher Server Setup
Information about the Cluster
Describe the bug
A user is unable to restore an etcd snapshot on a custom K3s/RKE2 provisioned downstream cluster when the controlplane/etcd are completely unavailable. The system-agent-install script gets stuck waiting for a machine plan secret to be assigned to the RKE bootstrap.
To Reproduce
Result
Note that the install script will get stuck waiting for a plan secret
Expected Result
You should be able to register new nodes
Screenshots
Additional context
SURE-6119
The text was updated successfully, but these errors were encountered: