Skip to content

Commit

Permalink
OCPBUGS-19403: updates etcd procedure with OVN-K i/c
Browse files Browse the repository at this point in the history
  • Loading branch information
JoeAldinger committed Sep 27, 2023
1 parent 0b42f2b commit eda9110
Showing 1 changed file with 23 additions and 28 deletions.
51 changes: 23 additions & 28 deletions modules/dr-restoring-cluster-state.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -281,26 +281,29 @@ etcd-ip-10-0-143-125.ec2.internal 1/1 Running 1
+
If the status is `Pending`, or the output lists more than one running etcd pod, wait a few minutes and check again.

. If you are using the `OVNKubernetes` network plugin, delete the node objects that are associated with control plane hosts that are not the recovery control plane host.
. If you are using the `OVNKubernetes` network plugin, you must restart `ovnkube-controlplane` pods.
.. Delete all of the `ovnkube-controlplane` pods by running the following command:
+
[source,terminal]
----
$ oc delete node <non-recovery-controlplane-host-1> <non-recovery-controlplane-host-2>
$ oc -n openshift-ovn-kubernetes delete pod -l app=ovnkube-control-plane
----

. Verify that the Cluster Network Operator (CNO) redeploys the OVN-Kubernetes control plane and that it no longer references the non-recovery controller IP addresses. To verify this result, regularly check the output of the following command. Wait until it returns an empty result before you proceed to restart the Open Virtual Network (OVN) Kubernetes pods on all of the hosts in the next step.
.. Verify that all of the `ovnkube-controlplane` pods were redeployed by running the following command:
+
[source,terminal]
----
$ oc -n openshift-ovn-kubernetes get ds/ovnkube-master -o yaml | grep -E '<non-recovery_controller_ip_1>|<non-recovery_controller_ip_2>'
$ oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-control-plane
----

. If you are using the OVN-Kubernetes network plugin, restart the Open Virtual Network (OVN) Kubernetes pods on all the nodes one by one. Use the following steps to restart OVN-Kubernetes pods on each node:
+
[NOTE]
[IMPORTANT]
====
It can take at least 5-10 minutes for the OVN-Kubernetes control plane to be redeployed and the previous command to return empty output.
.Restart OVN-Kubernetes pods in the following order:
. The recovery control plane host
. The other control plane hosts (if available)
. The other nodes
====

. Restart the Open Virtual Network (OVN) Kubernetes pods on all the hosts.
+
[NOTE]
====
Expand All @@ -313,43 +316,35 @@ Alternatively, you can temporarily set the `failurePolicy` to `Ignore` while res
+
[source,terminal]
----
$ sudo rm -f /var/lib/ovn/etc/*.db
$ sudo rm -f /var/lib/ovn-ic/etc/*.db
----

.. Delete all OVN-Kubernetes control plane pods by running the following command:
.. Restart the OpenVSwitch services. Access the node by using Secure Shell (SSH) and run the following command:
+
[source,terminal]
----
$ oc delete pods -l app=ovnkube-master -n openshift-ovn-kubernetes
$ sudo systemctl restart ovs-vswitchd ovsdb-server
----

.. Ensure that any OVN-Kubernetes control plane pods are deployed again and are in a `Running` state by running the following command:
.. Delete the `ovnkube-node` pod on the node by running the following command, replacing `<node>` with the name of the node that you are restarting:
+
[source,terminal]
----
$ oc get pods -l app=ovnkube-master -n openshift-ovn-kubernetes
$ oc -n openshift-ovn-kubernetes delete pod -l app=ovnkube-node --field-selector=spec.nodeName==<node>
----
+
.Example output
[source,terminal]
----
NAME READY STATUS RESTARTS AGE
ovnkube-master-nb24h 4/4 Running 0 48s
----
.. Delete all `ovnkube-node` pods by running the following command:
.. Verify that the `ovnkube-node` pod is running again with the following command:
+
[source,terminal]
----
$ oc get pods -n openshift-ovn-kubernetes -o name | grep ovnkube-node | while read p ; do oc delete $p -n openshift-ovn-kubernetes ; done
$ oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-node --field-selector=spec.nodeName==<node>
----

.. Ensure that all the `ovnkube-node` pods are deployed again and are in a `Running` state by running the following command:
+
[source,terminal]
----
$ oc get pods -n openshift-ovn-kubernetes | grep ovnkube-node
----
[NOTE]
====
It might take several minutes for the pods to restart.
====

. Delete and re-create other non-recovery, control plane machines, one by one. After the machines are re-created, a new revision is forced and etcd automatically scales up.
+
Expand Down

0 comments on commit eda9110

Please sign in to comment.