From c5ec30ae9eb9e23e8b39712598871c9d2af3a8cb Mon Sep 17 00:00:00 2001 From: Laura Hinson Date: Fri, 21 Mar 2025 13:56:06 -0400 Subject: [PATCH] [OCPBUGS-49679]: Improvements to restoring the cluster state procedure --- modules/dr-restoring-cluster-state.adoc | 194 ++++++++---------------- 1 file changed, 64 insertions(+), 130 deletions(-) diff --git a/modules/dr-restoring-cluster-state.adoc b/modules/dr-restoring-cluster-state.adoc index 2b2f548dda1c..cfa7757837e6 100644 --- a/modules/dr-restoring-cluster-state.adoc +++ b/modules/dr-restoring-cluster-state.adoc @@ -35,16 +35,16 @@ When you restore your cluster, you must use an `etcd` backup that was taken from [IMPORTANT] ==== -For non-recovery control plane nodes, it is not required to establish SSH connectivity or to stop the static pods. You can delete and recreate other non-recovery, control plane machines, one by one. +For non-recovery control plane nodes, it is not required to establish SSH connectivity or to stop the static pods. You can delete and re-create other non-recovery, control plane machines, one by one. ==== .Procedure . Select a control plane host to use as the recovery host. This is the host that you will run the restore operation on. -. Establish SSH connectivity to each of the control plane nodes, including the recovery host. +. Establish SSH connectivity to each of the control plane nodes, including the recovery host. Use a separate terminal to establish SSH connectivity for each control plane node. + -`kube-apiserver` becomes inaccessible after the restore process starts, so you cannot access the control plane nodes. For this reason, it is recommended to establish SSH connectivity to each control plane host in a separate terminal. +The Kubernetes API server becomes inaccessible after the restore process starts, so you cannot access the control plane nodes by using the `oc debug` method. For this reason, establish SSH connectivity to each control plane host in a separate terminal. + [IMPORTANT] ==== @@ -64,14 +64,14 @@ You do not need to stop the static pods on the recovery host. .. Access a control plane host that is not the recovery host. -.. Move the existing etcd pod file out of the kubelet manifest directory by running: +.. Move the existing etcd pod file out of the kubelet manifest directory by running the following command: + [source,terminal] ---- $ sudo mv -v /etc/kubernetes/manifests/etcd-pod.yaml /tmp ---- -.. Verify that the `etcd` pods are stopped by using: +.. Verify that the `etcd` pods are stopped by running the following command: + [source,terminal] ---- @@ -80,14 +80,14 @@ $ sudo crictl ps | grep etcd | egrep -v "operator|etcd-guard" + If the output of this command is not empty, wait a few minutes and check again. -.. Move the existing `kube-apiserver` file out of the kubelet manifest directory by running: +.. Move the existing `kube-apiserver` file out of the kubelet manifest directory by running the following command: + [source,terminal] ---- $ sudo mv -v /etc/kubernetes/manifests/kube-apiserver-pod.yaml /tmp ---- -.. Verify that the `kube-apiserver` containers are stopped by running: +.. Verify that the `kube-apiserver` containers are stopped by running the following command: + [source,terminal] ---- @@ -96,14 +96,14 @@ $ sudo crictl ps | grep kube-apiserver | egrep -v "operator|guard" + If the output of this command is not empty, wait a few minutes and check again. -.. Move the existing `kube-controller-manager` file out of the kubelet manifest directory by using: +.. Move the existing `kube-controller-manager` file out of the kubelet manifest directory by running the following command: + [source,terminal] ---- $ sudo mv -v /etc/kubernetes/manifests/kube-controller-manager-pod.yaml /tmp ---- -.. Verify that the `kube-controller-manager` containers are stopped by running: +.. Verify that the `kube-controller-manager` containers are stopped by running the following command: + [source,terminal] ---- @@ -111,14 +111,14 @@ $ sudo crictl ps | grep kube-controller-manager | egrep -v "operator|guard" ---- If the output of this command is not empty, wait a few minutes and check again. -.. Move the existing `kube-scheduler` file out of the kubelet manifest directory by using: +.. Move the existing `kube-scheduler` file out of the kubelet manifest directory by running the following command: + [source,terminal] ---- $ sudo mv -v /etc/kubernetes/manifests/kube-scheduler-pod.yaml /tmp ---- -.. Verify that the `kube-scheduler` containers are stopped by using: +.. Verify that the `kube-scheduler` containers are stopped by running the following command: + [source,terminal] ---- @@ -133,14 +133,19 @@ If the output of this command is not empty, wait a few minutes and check again. $ sudo mv -v /var/lib/etcd/ /tmp ---- -.. If the `/etc/kubernetes/manifests/keepalived.yaml` file exists and the node is deleted, follow these steps: +.. If the `/etc/kubernetes/manifests/keepalived.yaml` file exists, complete the following steps. These steps are necessary to ensure that the API IP address is listening on the recovery host. -... Move the `/etc/kubernetes/manifests/keepalived.yaml` file out of the kubelet manifest directory: +... Move the `/etc/kubernetes/manifests/keepalived.yaml` file out of the kubelet manifest directory by running the following command: + [source,terminal] ---- -$ sudo mv -v /etc/kubernetes/manifests/keepalived.yaml /tmp +$ sudo mv -v /etc/kubernetes/manifests/keepalived.yaml /home/core/ ---- ++ +[NOTE] +==== +This file must be restored to its original location after the procedure is completed. +==== ... Verify that any containers managed by the `keepalived` daemon are stopped: + @@ -169,7 +174,7 @@ $ sudo ip address del dev . Access the recovery control plane host. -. If the `keepalived` daemon is in use, verify that the recovery control plane node owns the VIP: +. If the `keepalived` daemon is in use, verify that the recovery control plane node owns the VIP. Otherwise, repeat step 4.xi. + [source,terminal] ---- @@ -221,7 +226,6 @@ starting kube-scheduler-pod.yaml static-pod-resources/kube-scheduler-pod-8/kube-scheduler-pod.yaml ---- + - The cluster-restore.sh script must show that `etcd`, `kube-apiserver`, `kube-controller-manager`, and `kube-scheduler` pods are stopped and then started at the end of the restore process. + [NOTE] @@ -229,7 +233,20 @@ The cluster-restore.sh script must show that `etcd`, `kube-apiserver`, `kube-con The restore process can cause nodes to enter the `NotReady` state if the node certificates were updated after the last `etcd` backup. ==== -. Check the nodes to ensure they are in the `Ready` state. +. Check the nodes to ensure they are in the `Ready` state. To check the nodes, you can use either the bastion host or the recovery host. ++ +** If you use the recovery host, run the following commands: ++ +[source,terminal] +---- +$ export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost-recovery.kubeconfig +---- ++ +[source,terminal] +---- +$ oc get nodes -w +---- +** If you use the bastion host, complete the following steps: .. Run the following command: + [source,terminal] @@ -252,7 +269,7 @@ host-172-25-75-98 Ready infra,worker 3d20h v1.30.3 ---- + It can take several minutes for all nodes to report their state. - ++ .. If any nodes are in the `NotReady` state, log in to the nodes and remove all of the PEM files from the `/var/lib/kubelet/pki` directory on each node. You can SSH into the nodes or use the terminal window in the web console. + [source,terminal] @@ -272,7 +289,7 @@ kubelet-client-current.pem kubelet-server-current.pem . Restart the kubelet service on all control plane hosts. -.. From the recovery host, run: +.. From the recovery host, run the following command: + [source,terminal] ---- @@ -288,7 +305,7 @@ $ sudo systemctl restart kubelet.service Clusters with no worker nodes, such as single-node clusters or clusters consisting of three schedulable control plane nodes, will not have any pending CSRs to approve. You can skip all the commands listed in this step. ==== -.. Get the list of current CSRs by running: +.. Get the list of current CSRs by running the following command: + [source,terminal] ---- @@ -307,7 +324,7 @@ csr-zhhhp 3m8s kubernetes.io/kube-apiserver-client-kubelet system:servicea <1> A pending kubelet serving CSR, requested by the node for the kubelet serving endpoint. <2> A pending kubelet client CSR, requested with the `node-bootstrapper` node bootstrap credentials. -.. Review the details of a CSR to verify that it is valid by running: +.. Review the details of a CSR to verify that it is valid by running the following command: + [source,terminal] ---- @@ -315,14 +332,14 @@ $ oc describe csr <1> ---- <1> `` is the name of a CSR from the list of current CSRs. -.. Approve each valid `node-bootstrapper` CSR by running: +.. Approve each valid `node-bootstrapper` CSR by running the following command: + [source,terminal] ---- $ oc adm certificate approve ---- -.. For user-provisioned installations, approve each valid kubelet service CSR by running: +.. For user-provisioned installations, approve each valid kubelet service CSR by running the following command: + [source,terminal] ---- @@ -331,7 +348,7 @@ $ oc adm certificate approve . Verify that the single member control plane has started successfully. -.. From the recovery host, verify that the `etcd` container is running by using: +.. From the recovery host, verify that the `etcd` container is running by entering the following command: + [source,terminal] ---- @@ -344,7 +361,7 @@ $ sudo crictl ps | grep etcd | egrep -v "operator|etcd-guard" 3ad41b7908e32 36f86e2eeaaffe662df0d21041eb22b8198e0e58abeeae8c743c3e6e977e8009 About a minute ago Running etcd 0 7c05f8af362f0 ---- -.. From the recovery host, verify that the `etcd` pod is running by using: +.. From the recovery host, verify that the `etcd` pod is running by entering the following command: + [source,terminal] ---- @@ -361,13 +378,13 @@ etcd-ip-10-0-143-125.ec2.internal 1/1 Running 1 If the status is `Pending`, or the output lists more than one running `etcd` pod, wait a few minutes and check again. . If you are using the `OVNKubernetes` network plugin, you must restart `ovnkube-controlplane` pods. -.. Delete all of the `ovnkube-controlplane` pods by running: +.. Delete all of the `ovnkube-controlplane` pods by running the following command: + [source,terminal] ---- $ oc -n openshift-ovn-kubernetes delete pod -l app=ovnkube-control-plane ---- -.. Verify that all of the `ovnkube-controlplane` pods were redeployed by using: +.. Verify that all of the `ovnkube-controlplane` pods were redeployed by running the following command: + [source,terminal] ---- @@ -391,7 +408,7 @@ Validating and mutating admission webhooks can reject pods. If you add any addit Alternatively, you can temporarily set the `failurePolicy` to `Ignore` while restoring the cluster state. After the cluster state is restored successfully, you can set the `failurePolicy` to `Fail`. ==== -.. Remove the northbound database (nbdb) and southbound database (sbdb). Access the recovery host and the remaining control plane nodes by using Secure Shell (SSH) and run: +.. Remove the northbound database (nbdb) and southbound database (sbdb). Access the recovery host and the remaining control plane nodes by using Secure Shell (SSH) and run the following command: + [source,terminal] ---- @@ -496,82 +513,7 @@ $ oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-node --field-selector=sp It might take several minutes for the pods to restart. ==== -. Delete and re-create other non-recovery, control plane machines, one by one. After the machines are re-created, a new revision is forced and `etcd` automatically scales up. -+ -** If you use a user-provisioned bare metal installation, you can re-create a control plane machine by using the same method that you used to originally create it. For more information, see "Installing a user-provisioned cluster on bare metal". -+ -[WARNING] -==== -Do not delete and re-create the machine for the recovery host. -==== -+ -** If you are running installer-provisioned infrastructure, or you used the Machine API to create your machines, follow these steps: -+ -[WARNING] -==== -Do not delete and re-create the machine for the recovery host. - -For bare metal installations on installer-provisioned infrastructure, control plane machines are not re-created. For more information, see "Replacing a bare-metal control plane node". -==== -.. Obtain the machine for one of the lost control plane hosts. -+ -In a terminal that has access to the cluster as a cluster-admin user, run the following command: -+ -[source,terminal] ----- -$ oc get machines -n openshift-machine-api -o wide ----- -+ -Example output: -+ -[source,terminal] ----- -NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE -clustername-8qw5l-master-0 Running m4.xlarge us-east-1 us-east-1a 3h37m ip-10-0-131-183.ec2.internal aws:///us-east-1a/i-0ec2782f8287dfb7e stopped <1> -clustername-8qw5l-master-1 Running m4.xlarge us-east-1 us-east-1b 3h37m ip-10-0-143-125.ec2.internal aws:///us-east-1b/i-096c349b700a19631 running -clustername-8qw5l-master-2 Running m4.xlarge us-east-1 us-east-1c 3h37m ip-10-0-154-194.ec2.internal aws:///us-east-1c/i-02626f1dba9ed5bba running -clustername-8qw5l-worker-us-east-1a-wbtgd Running m4.large us-east-1 us-east-1a 3h28m ip-10-0-129-226.ec2.internal aws:///us-east-1a/i-010ef6279b4662ced running -clustername-8qw5l-worker-us-east-1b-lrdxb Running m4.large us-east-1 us-east-1b 3h28m ip-10-0-144-248.ec2.internal aws:///us-east-1b/i-0cb45ac45a166173b running -clustername-8qw5l-worker-us-east-1c-pkg26 Running m4.large us-east-1 us-east-1c 3h28m ip-10-0-170-181.ec2.internal aws:///us-east-1c/i-06861c00007751b0a running ----- -<1> This is the control plane machine for the lost control plane host, `ip-10-0-131-183.ec2.internal`. - -.. Delete the machine of the lost control plane host by running: -+ -[source,terminal] ----- -$ oc delete machine -n openshift-machine-api clustername-8qw5l-master-0 <1> ----- -<1> Specify the name of the control plane machine for the lost control plane host. -+ -A new machine is automatically provisioned after deleting the machine of the lost control plane host. - -.. Verify that a new machine has been created by running: -+ -[source,terminal] ----- -$ oc get machines -n openshift-machine-api -o wide ----- -+ -Example output: -+ -[source,terminal] ----- -NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE -clustername-8qw5l-master-1 Running m4.xlarge us-east-1 us-east-1b 3h37m ip-10-0-143-125.ec2.internal aws:///us-east-1b/i-096c349b700a19631 running -clustername-8qw5l-master-2 Running m4.xlarge us-east-1 us-east-1c 3h37m ip-10-0-154-194.ec2.internal aws:///us-east-1c/i-02626f1dba9ed5bba running -clustername-8qw5l-master-3 Provisioning m4.xlarge us-east-1 us-east-1a 85s ip-10-0-173-171.ec2.internal aws:///us-east-1a/i-015b0888fe17bc2c8 running <1> -clustername-8qw5l-worker-us-east-1a-wbtgd Running m4.large us-east-1 us-east-1a 3h28m ip-10-0-129-226.ec2.internal aws:///us-east-1a/i-010ef6279b4662ced running -clustername-8qw5l-worker-us-east-1b-lrdxb Running m4.large us-east-1 us-east-1b 3h28m ip-10-0-144-248.ec2.internal aws:///us-east-1b/i-0cb45ac45a166173b running -clustername-8qw5l-worker-us-east-1c-pkg26 Running m4.large us-east-1 us-east-1c 3h28m ip-10-0-170-181.ec2.internal aws:///us-east-1c/i-06861c00007751b0a running ----- -<1> The new machine, `clustername-8qw5l-master-3` is being created and is ready after the phase changes from `Provisioning` to `Running`. -+ -It might take a few minutes for the new machine to be created. The `etcd` cluster Operator will automatically sync when the machine or node returns to a healthy state. - -.. Repeat these steps for each lost control plane host that is not the recovery host. - -. Turn off the quorum guard by entering: +. Turn off the quorum guard by running the following command: + [source,terminal] ---- @@ -580,7 +522,7 @@ $ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": + This command ensures that you can successfully re-create secrets and roll out the static pods. -. In a separate terminal window within the recovery host, export the recovery `kubeconfig` file by running: +. If not yet defined, in a separate terminal window within the recovery host, export the recovery `kubeconfig` file by running the following command: + [source,terminal] ---- @@ -589,7 +531,7 @@ $ export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-apiserver-certs/se . Force `etcd` redeployment. + -In the same terminal window where you exported the recovery `kubeconfig` file, run: +In the same terminal window where you exported the recovery `kubeconfig` file, run the following command: + [source,terminal] ---- @@ -601,14 +543,14 @@ The `etcd` redeployment starts. + When the `etcd` cluster Operator performs a redeployment, the existing nodes are started with new pods similar to the initial bootstrap scale up. -. Turn the quorum guard back on by entering: +. Turn the quorum guard back on by running the following command: + [source,terminal] ---- $ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": null}}' ---- -. You can verify that the `unsupportedConfigOverrides` section is removed from the object by running: +. You can verify that the `unsupportedConfigOverrides` section is removed from the object by running the following command: + [source,terminal] ---- @@ -617,7 +559,7 @@ $ oc get etcd/cluster -oyaml . Verify all nodes are updated to the latest revision. + -In a terminal that has access to the cluster as a `cluster-admin` user, run: +In a terminal that has access to the cluster as a `cluster-admin` user, run the following command: + [source,terminal] ---- @@ -637,7 +579,7 @@ If the output includes multiple revision numbers, such as `2 nodes are at revisi . After `etcd` is redeployed, force new rollouts for the control plane. `kube-apiserver` will reinstall itself on the other nodes because the kubelet is connected to API servers using an internal load balancer. + -In a terminal that has access to the cluster as a `cluster-admin` user, run: +In a terminal that has access to the cluster as a `cluster-admin` user, complete the following steps: .. Force a new rollout for `kube-apiserver`: + @@ -671,7 +613,7 @@ If the output includes multiple revision numbers, such as `2 nodes are at revisi $ oc patch kubecontrollermanager cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge ---- + -Verify all nodes are updated to the latest revision by running: +Verify all nodes are updated to the latest revision by running the following command: + [source,terminal] ---- @@ -689,14 +631,14 @@ AllNodesAtLatestRevision + If the output includes multiple revision numbers, such as `2 nodes are at revision 6; 1 nodes are at revision 7`, this means that the update is still in progress. Wait a few minutes and try again. -.. Force a new rollout for the `kube-scheduler` by running: +.. Force a new rollout for the `kube-scheduler` by running the following command: + [source,terminal] ---- $ oc patch kubescheduler cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge ---- + -Verify all nodes are updated to the latest revision by using: +Verify all nodes are updated to the latest revision by running the following command: + [source,terminal] ---- @@ -714,34 +656,24 @@ AllNodesAtLatestRevision + If the output includes multiple revision numbers, such as `2 nodes are at revision 6; 1 nodes are at revision 7`, this means that the update is still in progress. Wait a few minutes and try again. -. Monitor the platform Operators by running: +. If the `keepalived` daemon is in use, restore the configuration on the control plane nodes other than the recovery host by running the following command. Otherwise, the network operator will not advance beyond the "Progressing" state. + [source,terminal] ---- -$ oc adm wait-for-stable-cluster +$ sudo cp -v /home/core/keepalived.yaml /etc/kubernetes/manifests/ ---- -+ -This process can take up to 15 minutes. -. Verify that all control plane hosts have started and joined the cluster. -+ -In a terminal that has access to the cluster as a `cluster-admin` user, run the following command: +. Monitor the platform Operators by running the following command: + [source,terminal] ---- -$ oc -n openshift-etcd get pods -l k8s-app=etcd +$ oc adm wait-for-stable-cluster ---- + -.Example output -[source,terminal] ----- -etcd-ip-10-0-143-125.ec2.internal 2/2 Running 0 9h -etcd-ip-10-0-154-194.ec2.internal 2/2 Running 0 9h -etcd-ip-10-0-173-171.ec2.internal 2/2 Running 0 9h ----- - -To ensure that all workloads return to normal operation following a recovery procedure, restart all control plane nodes. +This process can take up to 15 minutes. +. To ensure that all workloads return to normal operation following a recovery procedure, restart all the control plane nodes. ++ [NOTE] ==== On completion of the previous procedural steps, you might need to wait a few minutes for all services to return to their restored state. For example, authentication by using `oc login` might not immediately work until the OAuth server pods are restarted. @@ -760,3 +692,5 @@ Issue the following command to display your authenticated user name: $ oc whoami ---- ==== + +. Restart all the worker nodes in a rolling fashion. \ No newline at end of file