OCPBUGS-19403: updates etcd procedure with OVN-K i/c

openshift · Sep 27, 2023 · eda9110 · eda9110
1 parent 0b42f2b
commit eda9110
Showing 1 changed file with 23 additions and 28 deletions.
diff --git a/modules/dr-restoring-cluster-state.adoc b/modules/dr-restoring-cluster-state.adoc
@@ -281,26 +281,29 @@ etcd-ip-10-0-143-125.ec2.internal                1/1     Running     1
 +
 If the status is `Pending`, or the output lists more than one running etcd pod, wait a few minutes and check again.
 
-. If you are using the `OVNKubernetes` network plugin, delete the node objects that are associated with control plane hosts that are not the recovery control plane host.
+. If you are using the `OVNKubernetes` network plugin, you must restart `ovnkube-controlplane` pods.
+.. Delete all of the `ovnkube-controlplane` pods by running the following command:
 +
 [source,terminal]
 ----
-$ oc delete node <non-recovery-controlplane-host-1> <non-recovery-controlplane-host-2>
+$ oc -n openshift-ovn-kubernetes delete pod -l app=ovnkube-control-plane
 ----
-
-. Verify that the Cluster Network Operator (CNO) redeploys the OVN-Kubernetes control plane and that it no longer references the non-recovery controller IP addresses. To verify this result, regularly check the output of the following command. Wait until it returns an empty result before you proceed to restart the Open Virtual Network (OVN) Kubernetes pods on all of the hosts in the next step.
+.. Verify that all of the `ovnkube-controlplane` pods were redeployed by running the following command:
 +
 [source,terminal]
 ----
-$ oc -n openshift-ovn-kubernetes get ds/ovnkube-master -o yaml | grep -E '<non-recovery_controller_ip_1>|<non-recovery_controller_ip_2>'
+$ oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-control-plane
 ----
+
+. If you are using the OVN-Kubernetes network plugin, restart the Open Virtual Network (OVN) Kubernetes pods on all the nodes one by one. Use the following steps to restart OVN-Kubernetes pods on each node:
 +
-[NOTE]
+[IMPORTANT]
 ====
-It can take at least 5-10 minutes for the OVN-Kubernetes control plane to be redeployed and the previous command to return empty output.
+.Restart OVN-Kubernetes pods in the following order:
+. The recovery control plane host
+. The other control plane hosts (if available)
+. The other nodes
 ====
-
-. Restart the Open Virtual Network (OVN) Kubernetes pods on all the hosts.
 +
 [NOTE]
 ====
@@ -313,43 +316,35 @@ Alternatively, you can temporarily set the `failurePolicy` to `Ignore` while res
 +
 [source,terminal]
 ----
-$ sudo rm -f /var/lib/ovn/etc/*.db
+$ sudo rm -f /var/lib/ovn-ic/etc/*.db
 ----
 
-.. Delete all OVN-Kubernetes control plane pods by running the following command:
+.. Restart the OpenVSwitch services. Access the node by using Secure Shell (SSH) and run the following command:
 +
 [source,terminal]
 ----
-$ oc delete pods -l app=ovnkube-master -n openshift-ovn-kubernetes
+$ sudo systemctl restart ovs-vswitchd ovsdb-server
 ----
 
-.. Ensure that any OVN-Kubernetes control plane pods are deployed again and are in a `Running` state by running the following command:
+.. Delete the `ovnkube-node` pod on the node by running the following command, replacing `<node>` with the name of the node that you are restarting:
 +
 [source,terminal]
 ----
-$ oc get pods -l app=ovnkube-master -n openshift-ovn-kubernetes
+$ oc -n openshift-ovn-kubernetes delete pod -l app=ovnkube-node --field-selector=spec.nodeName==<node>
 ----
 +
-.Example output
-[source,terminal]
-----
-NAME                   READY   STATUS    RESTARTS   AGE
-ovnkube-master-nb24h   4/4     Running   0          48s
-----
 
-.. Delete all `ovnkube-node` pods by running the following command:
+.. Verify that the `ovnkube-node` pod is running again with the following command:
 +
 [source,terminal]
 ----
-$ oc get pods -n openshift-ovn-kubernetes -o name | grep ovnkube-node | while read p ; do oc delete $p -n openshift-ovn-kubernetes ; done
+$ oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-node --field-selector=spec.nodeName==<node>
 ----
-
-.. Ensure that all the `ovnkube-node` pods are deployed again and are in a `Running` state by running the following command:
 +
-[source,terminal]
-----
-$ oc get  pods -n openshift-ovn-kubernetes | grep ovnkube-node
-----
+[NOTE]
+====
+It might take several minutes for the pods to restart.
+====
 
 . Delete and re-create other non-recovery, control plane machines, one by one. After the machines are re-created, a new revision is forced and etcd automatically scales up.
 +