[BUG] Etcd restore does not work on an RKE2 cluster #42895

vivek-shilimkar · 2023-09-21T11:49:59Z

Rancher Server Setup

Rancher version: 2.8-head commit id: d101c27
Installation option (Docker install/Helm Chart): Docker Install

Information about the Cluster

Kubernetes version: 1.27.5+rke2r1 to v1.26.8+rke2r1 RKE2
Cluster Type (Local/Downstream): AWS Node driver cluster

User Information

What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom) Standard

Describe the bug
[BUG] Etcd restore does not work on an RKE2 cluster

To Reproduce

Deploy a downstream RKE2 node driver cluster on 1.26 RKE2 version
Take an etcd snapshot
Upgrade to 1.27 RKE2 version
Restore using All options - config, k8s and etcd option to the snapshot taken previously
Cluster is stuck in Updating state error: [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd
rancher prov logs:

[INFO ] provisioning done
--
4:52:48 pm | [INFO ] configuring bootstrap node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-7rkvw: waiting for plan to be applied
4:52:54 pm | [INFO ] configuring bootstrap node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-7rkvw: waiting for probes: kubelet
4:53:30 pm | [INFO ] configuring bootstrap node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-7rkvw: waiting for probes: etcd
4:53:36 pm | [INFO ] configuring bootstrap node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-7rkvw: waiting for kubelet to update
4:53:46 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-q89sr,rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm
4:54:28 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: waiting for probes: kubelet
4:54:46 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: waiting for probes: etcd, kubelet
4:54:50 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: waiting for probes: etcd
4:54:56 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: waiting for kubelet to update
4:55:04 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-b4cb4,rke2-backup-restore-cp-8675c69865x58z9h-zstx8
4:56:10 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-zstx8: waiting for plan to be applied
4:56:16 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-b4cb4,rke2-backup-restore-cp-8675c69865x58z9h-zstx8
4:56:20 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-zstx8: waiting for probes: kubelet
4:56:44 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-zstx8: waiting for probes: kube-apiserver, kubelet
4:56:54 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-zstx8: waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
4:56:58 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-zstx8: waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler
4:57:12 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-zstx8: waiting for probes: kube-apiserver, kube-controller-manager
4:57:18 pm | [INFO ] configuring worker node(s) rke2-backup-restore-wk-56df7d58b5xb6ffp-4dhrj,rke2-backup-restore-wk-56df7d58b5xb6ffp-5kcbn,rke2-backup-restore-wk-56df7d58b5xb6ffp-ltbld
4:57:34 pm | [INFO ] configuring worker node(s) rke2-backup-restore-wk-56df7d58b5xb6ffp-5kcbn,rke2-backup-restore-wk-56df7d58b5xb6ffp-ltbld
4:57:54 pm | [INFO ] configuring worker node(s) rke2-backup-restore-wk-56df7d58b5xb6ffp-ltbld: waiting for plan to be applied
4:57:58 pm | [INFO ] configuring worker node(s) rke2-backup-restore-wk-56df7d58b5xb6ffp-ltbld: waiting for probes: kubelet
4:58:08 pm | [INFO ] configuring worker node(s) rke2-backup-restore-wk-56df7d58b5xb6ffp-ltbld: waiting for kubelet to update
4:58:46 pm | [INFO ] rke2-backup-restore-wk-56df7d58b5xb6ffp-4dhrj,rke2-backup-restore-wk-56df7d58b5xb6ffp-5kcbn,rke2-backup-restore-wk-56df7d58b5xb6ffp-ltbld
4:58:48 pm | [INFO ] provisioning done
5:01:26 pm | [INFO ] refreshing etcd restore state
5:01:28 pm | [INFO ] waiting to stop rke2 services on node [rke2-backup-restore-cp-bfd6beba-nz8l2]
5:01:30 pm | [INFO ] waiting to stop rke2 services on node [rke2-backup-restore-wk-e548aa18-h5k2c]
5:01:32 pm | [INFO ] waiting for etcd restore
5:02:16 pm | [INFO ] waiting for etcd restore probes
5:02:54 pm | [INFO ] waiting for etcd restore
5:05:02 pm | [INFO ] waiting for etcd restore probes
5:05:16 pm | [INFO ] refreshing etcd restore state
5:05:18 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-7rkvw,rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm
5:05:34 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for plan to be applied
5:05:36 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet
5:05:46 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-b4cb4,rke2-backup-restore-cp-8675c69865x58z9h-zstx8
5:05:56 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-zstx8: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for plan to be applied
5:05:58 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd

Note:

On an RKE1 cluster, this use case works. No issues seen.
On an rke2 cluster - Cluster upgrade from 1.26.8+rke2r1 to 1.27.5+rke2r1 works but the restore to snapshot taken on 1.26 fails.

The text was updated successfully, but these errors were encountered:

vivek-shilimkar · 2023-09-21T11:50:54Z

Similar issue was observed earlier - #40005

Oats87 · 2023-09-21T17:21:25Z

Need more logs.

Do you have logs from the rancher-system-agent and rke2-server from the restoring etcd node?

vivek-shilimkar · 2023-09-22T11:10:49Z

I did not had a logs from rancher-system-agent and rke2-server. I tried reproducing it on v2.8-head (675a0e4) but I couldn't. However, this time also the cluster restoration was not successful. The cluster remained in updating state. Let's just wait for the Alpha-1 or RC release of the v2.8 and I'll retest the backup-restore scenario.

Sahota1225 · 2023-09-25T18:04:18Z

@vivek-shilimkar Since 2.8 Alpha-1 is available, can we retest the backup-restore scenario?

felipe-colussi · 2023-10-05T20:24:24Z

Testing on the latest version d101c27fe5eb6bfd1992165b2f08c9fc02a2a55f I was able to do the update and then restore it with no problems, tested on AWS and DO.

Also in 2.8 Alpha-2 I wasn't able to reproduce it.

Additional note: on alpha i noticed that the fleet-agent of the downstream cluster was on crashing loopback. For entering into a nill map, that wasn't happening on head. .

Reason: The error don't happen on a single node cluster.

felipe-colussi · 2023-10-06T14:26:32Z

Using a cluster with:
3 ETC (50GB storage, t3.large)
3 Workers (16gb storage, t3.large)
2 CP (16 gb storage, t3.large)

I was able to get the error.

At the beginning 2 ETCD +1 CP were restored, one ETC Was stuck failing to connect to the server:

rke2 journalctl:

Oct 06 13:43:30 felipe-test2-etcd-da41e26e-d6784 rke2[8987]: time="2023-10-06T13:43:30Z" level=warning msg="Failed to get apiserver address from etcd: context deadline exceeded"
Oct 06 13:43:32 felipe-test2-etcd-da41e26e-d6784 rke2[8987]: time="2023-10-06T13:43:32Z" level=info msg="Waiting for apiserver addresses"

After ~45min the ETCD was able to reconcile itself.
The other CP was stucked with calico problems.
The workers weren't able to start RKE2 (nor even create the bin for k8s).

Looking into the nodes I noticed that dev/root was almost full for all nodes instantiated with 16gb, that may be causing the problem. I'll retry with larger storage.

Observation: I do believe that the 16gb could have been the cause of the rke binary no to be created, even after updating to larger nodes the problem persist.

felipe-colussi · 2023-10-06T16:11:54Z

The upgrade behavior is also strange.

I tried to create a new cluster using the 3-3-2 all of them with more storage. After doing the back-up and the upgrade I noticed that all etcd nodes were missing the kubeconfig file: /etc/rancher/rke2/rke2.yaml.

Observation: After talking with Jake, this behavior is expected.

After trying to restore the ETCD those nodes didn't start the rke2-server.

The error is not being consistently reproducible , I tryed to do a 1-1-1 cluster and on that during the upgrade the CP node got stuck rke2-server wasn't starting cause a problem on the CA not being authorized to the IP that it was trying to use.

felipe-colussi · 2023-10-06T19:31:34Z

Did some extra tries, wile trying to do the restore before the Kubernetes update the same problem happens, but the clusters that fail are the worker ones.

felipe-colussi · 2023-10-09T14:44:44Z

Some extra information:

This problem is related to restoring the ETCd on K8s 1.26.9 and 1.26.8 on a multi node cluster.

Doing just an ETCd backup and Restore (ETCd only) is enough to reproduce it.

The problem dosen't happen wile restoring K8s 1.27.

Next tests that i'll do today:

Test it on K8s 1.25: Need some extra testing, i got the problem on 1.25.13 and not on 1.25.12
Check if the error only happen on RKE2 or if it also happen on K3s: RKE2 Only
Do an ETCd restore on a RKE2 cluster without rancher.
Test it on a stable rancher version, probably 2.7.6: Reproducible.

igomez06 · 2023-10-10T17:54:44Z

@felipe-colussi I tested this on v2.7.8 an a v1.25.13+rke2r1 rke2 cluster, and got that same error. Also I ran the same test on k3s v1.25.13+k3s1 and it worked fine.

Josh-Diamond · 2023-10-10T19:45:26Z

@felipe-colussi I tested the following scenario on v2.7.7 and no issues were seen:

Fresh install of Rancher v2.7.7
Provision a downstream RKE2 w/ k8s v1.25.13+rke2r1
Once active, take a snapshot
Once active, upgrade cluster to v1.26.8+rke2r1
Once active, restore to snapshot taken in step 3
Verified - cluster successfully restores to etcd snapshot

felipe-colussi · 2023-10-17T17:18:28Z

Merged #43158

The PR fixes the problem were the restores got stuck forever with "Waiting for probe: calico".

Even after this PR we still have the following known problems (wile using RKE2 on 1.26.8, 1.26.9, 1.25.13)¹:

1. Wile doing etcd restores (only etcd or all 3): Etcd node get stuck with:

Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd
In this case rancher reconcile itself. It takes up to ~30 mins to do so.

2. Wile Upgrading to 1.27.6 there is a chance that a worker node get stuck with:

Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for kubelet to update
In this case it reconcile itself, during tests it took up to ~45 mins.

3. Wile restoring an ETCD (only etcd or all 3) to 1.26.8 and 1.26.9² there is a chance of it getting stuck forever with:
Waiting for probes: kube-controller-manager

¹ probably also happens on 1.25.14 but wasn't intensively tested.
² During my tests this was an exception and was only saw on the 1.26.x versions, the one that were tested the most, so there is a chance that it can happen on 1.25.x.

Josh-Diamond · 2023-10-18T17:07:50Z

I ran etcd snap/restore checks yesterday on a few Rancher server versions, in an effort to better help determine the scope of the recent rke2 snap/restore failures seen, and the results are below:

On v2.8-c4847070c39209a65029aa3a43347e4d9bac1d12-head - (this was on a commit before Felipe's fix was put in):

k8s 1.27.6 snap/restore successful - ✅
k8s 1.26.9 snap/restore unsuccessful - Waiting for probes: calico on worker node - ❌
k8s 1.25.14 snap/restore unsuccessful - Waiting for probes: kube-controller-manager on etcd+cp node - ❌

On v2.7.9-rc2:

k8s 1.26.8 snap/restore unsuccessful - Waiting for probes: calico on worker node - ❌
k8s 1.25.13 snap/restore unsuccessful - Waiting for probes: calico on worker node - ❌

On v2.7-918fb36b3edaa8f305ee5b9d0c6c51ec52813bda-head:

k8s 1.26.8 snap/restore unsuccessful - Waiting for probes: kube-controller-manager on etc+cp node - ❌
k8s 1.25.13 snap/restore unsuccessful - Waiting for probes: calico on worker node - ❌

On Rancher v2.8-7b319e9aa9d877ce1eb12f2afa19588afb7440bd-head - (with Felipe's fix):

k8s v1.27.6+rke2r1 snap/restore successful - ✅
k8s v1.26.9+rke2r1 snap/restore successful - ✅
k8s v1.25.14+rke2r1 snap/restore unsuccessful - Waiting for probes: kube-controller-manager on etcd+cp node - ❌

Note: Calico issue on worker nodes no longer encountered w/ Felipe's fix

In an effort to determine the frequency of kube-controller-manager issue still seen, the following results were gathered from testing on Rancher v2.8-7b319e9aa9d877ce1eb12f2afa19588afb7440bd-head - (with Felipe's fix):

v1.26.9+rke2r1 -

❌ - Waiting for probes: kube-controller-manager on one etcd+cp machine
❌ - Waiting for probes: kube-controller-manager on one etcd+cp machine
❌ - Waiting for probes: kube-controller-manager on one etcd+cp machine
❌ - Waiting for probes: kube-controller-manager on one etcd+cp machine
❌ - Waiting for probes: kube-controller-manager on one etcd+cp machine

v1.25.14+rke2r1 -

✅ - snap/restore successful
✅ - snap/restore successful
✅ - snap/restore successful
❌ - Waiting for probes: kube-controller-manager on one etcd+cp machine
❌ - Waiting for probes: kube-controller-manager on one etcd+cp machine

galal-hussein · 2023-10-18T20:59:51Z

As a quick workaround for this issue we found out that a restart to rke2-server after restore resolves the problem by restarting the kubelet and containerd, it seems that kubelet is stuck with restarting the kube-controller-manager pod after it exits and mistakenly reporting that its in ready state, so as a workaround for this problem the plan can trigger a restart to rke2-server.service systemctl restart rke2-server.service after a certain timeout period.

brandond · 2023-10-18T21:06:07Z

There is a report of the same behavior of static pods not starting, also on a rancher managed cluster, at rancher/rke2#4864. In this case I believe the issue was triggered by an upgrade to a newer patch release of RKE2, not a cluster restore.

mdrahman-suse · 2023-10-25T19:11:48Z

Validated the issue with Rancher 2.8.0-rc1

Single node running Rancher with Docker install
3 etcd only, 2 cp-only, 2 worker config

Issue Replication

v1.27.6+rke2r1 ❌
v1.26.9+rke2r1 ❌
v1.25.14+rke2r1 ✅

[INFO ] provisioning done
[INFO ] waiting for etcd snapshot on node mdrke2125rep-pool1-75bb6a9e-gcml7, waiting for etcd snapshot on node mdrke2125rep-pool1-75bb6a9e-w766b, waiting for etcd snapshot on node mdrke2125rep-pool1-75bb6a9e-pfb8m
[INFO ] waiting for etcd snapshot on node mdrke2125rep-pool1-75bb6a9e-w766b, waiting for etcd snapshot on node mdrke2125rep-pool1-75bb6a9e-pfb8m
[INFO ] configuring bootstrap node(s) mdrke2125rep-pool1-7ff6bf9cf6xpkxlb-2zq74: waiting for plan to be applied
[INFO ] waiting for etcd snapshot creation management plane restart
[INFO ] refreshing etcd create state
[INFO ] provisioning done
[INFO ] shutting down cluster
[INFO ] cluster shutdown complete, running etcd restore
[INFO ] waiting for etcd restore
[INFO ] waiting for etcd restore probes
[INFO ] waiting for etcd restore
[INFO ] waiting for etcd restore probes
[INFO ] configuring etcd node(s) mdrke2125rep-pool1-7ff6bf9cf6xpkxlb-2zq74,mdrke2125rep-pool1-7ff6bf9cf6xpkxlb-7wvn4
[INFO ] configuring etcd node(s) mdrke2125rep-pool1-7ff6bf9cf6xpkxlb-7wvn4: waiting for plan to be applied
[INFO ] configuring etcd node(s) mdrke2125rep-pool1-7ff6bf9cf6xpkxlb-7wvn4: waiting for probes: etcd, kubelet
[INFO ] configuring etcd node(s) mdrke2125rep-pool1-7ff6bf9cf6xpkxlb-7wvn4: waiting for probes: etcd
[INFO ] configuring control plane node(s) mdrke2125rep-pool2-58b7c6595cx6hnpx-4clpq,mdrke2125rep-pool2-58b7c6595cx6hnpx-9z752
[INFO ] configuring worker node(s) mdrke2125rep-pool3-7c44f9c9dcxzlnjd-6fzh9,mdrke2125rep-pool3-7c44f9c9dcxzlnjd-zvspc

Issue validation

v1.27.7-rc2+rke2r1 ✅
v1.26.10-rc2+rke2r1 ✅
v1.25.15-rc2+rke2r1 ✅

Testing

Create cluster with the specific versions
Take snapshots (multiple times)
Restore snapshot (multiple times)
Restore to older snapshot after upgrade
- v1.25.15-rc2+rke2r1, Upgrade to v1.26.10-rc2+rke2r1 Then Restore to v1.25.15-rc2+rke2r1
- v1.26.10-rc2+rke2r1, Upgrade to v1.27.7-rc2+rke2r1 Then Restore to v1.26.10-rc2+rke2r1

[INFO ] provisioning done
[INFO ] waiting for etcd snapshot on node mdrke2125fix-pool1-110053a3-k8mvb, waiting for etcd snapshot on node mdrke2125fix-pool1-110053a3-8jflr, waiting for etcd snapshot on node mdrke2125fix-pool1-110053a3-qv52n
[INFO ] configuring bootstrap node(s) mdrke2125fix-pool1-649965db6cx8d4kp-trp9b: waiting for plan to be applied
[INFO ] waiting for etcd snapshot creation management plane restart
[INFO ] provisioning done
[INFO ] waiting to stop rke2 services on node [mdrke2125fix-pool2-ac2bc6f3-c6mcc]
[INFO ] waiting for etcd restore
[INFO ] waiting for etcd restore probes
[INFO ] waiting for etcd restore
[INFO ] waiting for etcd restore probes
[INFO ] configuring etcd node(s) mdrke2125fix-pool1-649965db6cx8d4kp-b9tbf,mdrke2125fix-pool1-649965db6cx8d4kp-trp9b
[INFO ] configuring etcd node(s) mdrke2125fix-pool1-649965db6cx8d4kp-trp9b: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for plan to be applied
[INFO ] configuring etcd node(s) mdrke2125fix-pool1-649965db6cx8d4kp-trp9b: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet
[INFO ] configuring etcd node(s) mdrke2125fix-pool1-649965db6cx8d4kp-trp9b: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd
[INFO ] configuring control plane node(s) mdrke2125fix-pool2-86c77f7678xncdkc-8bpq6,mdrke2125fix-pool2-86c77f7678xncdkc-krk29
[INFO ] configuring control plane node(s) mdrke2125fix-pool2-86c77f7678xncdkc-krk29: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for plan to be applied
[INFO ] configuring control plane node(s) mdrke2125fix-pool2-86c77f7678xncdkc-krk29: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: calico, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
[INFO ] configuring control plane node(s) mdrke2125fix-pool2-86c77f7678xncdkc-krk29: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: calico, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring control plane node(s) mdrke2125fix-pool2-86c77f7678xncdkc-krk29: waiting for probes: calico, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring control plane node(s) mdrke2125fix-pool2-86c77f7678xncdkc-krk29: waiting for probes: calico, kube-controller-manager, kube-scheduler
[INFO ] configuring control plane node(s) mdrke2125fix-pool2-86c77f7678xncdkc-krk29: waiting for probes: calico
[INFO ] configuring worker node(s) mdrke2125fix-pool3-84dbd5bbdfxlc8p6-8f68s,mdrke2125fix-pool3-84dbd5bbdfxlc8p6-vzqxq
[INFO ] configuring worker node(s) mdrke2125fix-pool3-84dbd5bbdfxlc8p6-vzqxq: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for plan to be applied
[INFO ] configuring worker node(s) mdrke2125fix-pool3-84dbd5bbdfxlc8p6-vzqxq: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: calico, kubelet
[INFO ] configuring worker node(s) mdrke2125fix-pool3-84dbd5bbdfxlc8p6-vzqxq: Node condition Ready is False., waiting for probes: calico
[INFO ] configuring worker node(s) mdrke2125fix-pool3-84dbd5bbdfxlc8p6-vzqxq: waiting for probes: calico
[INFO ] refreshing etcd restore state
[INFO ] waiting for etcd restore
[INFO ] configuring bootstrap node(s) mdrke2125fix-pool1-649965db6cx8d4kp-vv4rt: waiting for plan to be applied
[INFO ] provisioning done

vivek-shilimkar added this to the v2.8.0 milestone Sep 21, 2023

vivek-shilimkar self-assigned this Sep 21, 2023

Sahota1225 added [zube]: To Triage team/infracloud and removed [zube]: To Triage labels Sep 28, 2023

Sahota1225 assigned felipe-colussi Oct 4, 2023

Sahota1225 added the [zube]: Next Up label Oct 4, 2023

zube bot removed the [zube]: To Test label Oct 4, 2023

Sahota1225 added [zube]: Working and removed [zube]: Next Up labels Oct 4, 2023

Sahota1225 assigned jakefhyde Oct 11, 2023

Sahota1225 added the dependency-rke2 Indicates that the rancher issue has a dependency to an RKE2 issue label Oct 11, 2023

susesgartner mentioned this issue Oct 13, 2023

Rancher cluster hangs after upgrading/downgrading from a depricated k8s version #43156

Closed

felipe-colussi mentioned this issue Oct 13, 2023

delete pod-manifests/kube-proxy for rke2 on etcd restore #43158

Merged

felipe-colussi mentioned this issue Oct 17, 2023

[Backport v2.7] etcd restore get stuck on "Waiting for probes: calico" #43189

Closed

This was referenced Oct 17, 2023

Init Node cannot be removed #42709

Closed

[Backport release/v2.7] Init Node cannot be removed #43142

Closed

vivek-shilimkar mentioned this issue Oct 19, 2023

[Feature] K8s 1.27 Support #41840

Closed

15 tasks

felipe-colussi mentioned this issue Oct 20, 2023

Revert "delete pod-manifests/kube-proxy for rke2 on etcd restore" #43225

Merged

snasovich mentioned this issue Oct 20, 2023

[Backport] Etcd restore does not work on an RKE2 cluster #43228

Closed

rancher-max assigned mdrahman-suse Oct 20, 2023

rancher-max added the team/rke2 label Oct 20, 2023

snasovich assigned rancher-max and snasovich and unassigned jakefhyde, felipe-colussi and rancher-max Oct 23, 2023

rancher-max assigned snasovich and unassigned snasovich Oct 23, 2023

mdrahman-suse closed this as completed Oct 25, 2023

zube bot added [zube]: Done and removed [zube]: Working labels Oct 25, 2023

zube bot removed the [zube]: Done label Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Etcd restore does not work on an RKE2 cluster #42895

[BUG] Etcd restore does not work on an RKE2 cluster #42895

vivek-shilimkar commented Sep 21, 2023

vivek-shilimkar commented Sep 21, 2023

Oats87 commented Sep 21, 2023

vivek-shilimkar commented Sep 22, 2023

Sahota1225 commented Sep 25, 2023

felipe-colussi commented Oct 5, 2023 •

edited

Loading

felipe-colussi commented Oct 6, 2023 •

edited

Loading

felipe-colussi commented Oct 6, 2023 •

edited

Loading

felipe-colussi commented Oct 6, 2023

felipe-colussi commented Oct 9, 2023 •

edited

Loading

igomez06 commented Oct 10, 2023

Josh-Diamond commented Oct 10, 2023 •

edited

Loading

felipe-colussi commented Oct 17, 2023 •

edited

Loading

Josh-Diamond commented Oct 18, 2023

galal-hussein commented Oct 18, 2023

brandond commented Oct 18, 2023

mdrahman-suse commented Oct 25, 2023 •

edited

Loading

[BUG] Etcd restore does not work on an RKE2 cluster #42895

[BUG] Etcd restore does not work on an RKE2 cluster #42895

Comments

vivek-shilimkar commented Sep 21, 2023

vivek-shilimkar commented Sep 21, 2023

Oats87 commented Sep 21, 2023

vivek-shilimkar commented Sep 22, 2023

Sahota1225 commented Sep 25, 2023

felipe-colussi commented Oct 5, 2023 • edited Loading

Additional note: on alpha i noticed that the fleet-agent of the downstream cluster was on crashing loopback. For entering into a nill map, that wasn't happening on head. .

felipe-colussi commented Oct 6, 2023 • edited Loading

felipe-colussi commented Oct 6, 2023 • edited Loading

felipe-colussi commented Oct 6, 2023

felipe-colussi commented Oct 9, 2023 • edited Loading

igomez06 commented Oct 10, 2023

Josh-Diamond commented Oct 10, 2023 • edited Loading

felipe-colussi commented Oct 17, 2023 • edited Loading

Josh-Diamond commented Oct 18, 2023

galal-hussein commented Oct 18, 2023

brandond commented Oct 18, 2023

mdrahman-suse commented Oct 25, 2023 • edited Loading

Validated the issue with Rancher 2.8.0-rc1

Issue Replication

Issue validation

Testing

felipe-colussi commented Oct 5, 2023 •

edited

Loading

felipe-colussi commented Oct 6, 2023 •

edited

Loading

felipe-colussi commented Oct 6, 2023 •

edited

Loading

felipe-colussi commented Oct 9, 2023 •

edited

Loading

Josh-Diamond commented Oct 10, 2023 •

edited

Loading

felipe-colussi commented Oct 17, 2023 •

edited

Loading

mdrahman-suse commented Oct 25, 2023 •

edited

Loading