Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
manifests/0000_31_cluster-baremetal-operator_06_deployment: Enable le…
…ader election The option has been available for years: $ git blame main.go | grep enable-leader-election dcbe86f (Sandhya Dasu 2020-08-18 21:09:29 -0400 72) flag.BoolVar(&enableLeaderElection, "enable-leader-election", false, and without it overlapping operator pods can crash-loop [1]: : [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers 0s { event [namespace/openshift-machine-api node/ip-10-0-62-147.us-west-2.compute.internal pod/cluster-baremetal-operator-574577fbcb-z8nd4 hmsg/bf39bb17ae - Back-off restarting failed container cluster-baremetal-operator in pod cluster-baremetal-operator-574577fbcb-z8nd4_openshift-machine-api(441969c1-b430-412c-b67f-4ae2f7797f4f)] happened 26 times event [namespace/openshift-machine-api node/ip-10-0-62-147.us-west-2.compute.internal pod/cluster-baremetal-operator-574577fbcb-z8nd4 hmsg/bf39bb17ae - Back-off restarting failed container cluster-baremetal-operator in pod cluster-baremetal-operator-574577fbcb-z8nd4_openshift-machine-api(441969c1-b430-412c-b67f-4ae2f7797f4f)] happened 51 times} while fighting each other over the same ClusterOperator status: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-ovn-upgrade/1737335551998038016/artifacts/e2e-aws-ovn-upgrade/gather-audit-logs/artifacts/audit-logs.tar | tar -xz --strip-components=2 $ zgrep -h '"resource":"clusteroperators","name":"baremetal"' kube-apiserver/*audit*log.gz | jq -r 'select(.verb == "create" or .verb == "update") | .stageTimestamp + " " + .verb + " " + (.responseStatus.code | tostring) + " " + (.objectRef.subresource) + " " + .user.username + " " + .user.extra["authentication.kubernetes.io/pod-name"][0]' | grep 'T06:08:.*cluster-baremetal-operator' | sort 2023-12-20T06:08:21.757799Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-574577fbcb-z8nd4 2023-12-20T06:08:21.778638Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-7fbb57959b-s9v9g 2023-12-20T06:08:21.780378Z update 409 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-574577fbcb-z8nd4 2023-12-20T06:08:21.790000Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-7fbb57959b-s9v9g 2023-12-20T06:08:21.802780Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-7fbb57959b-s9v9g Using a leader lock will avoid this contention, and the system should be able to coast through brief moments after an outgoing leader leaves until a replacement leader picks things back up. I'm also setting a Recreate strategy [2], because: 1. Incoming pod surged by the default Deployment strategy. 2. Incoming pod attempts to acquire the Lease, but the outgoing pod is holding it. 3. Outgoing pod releases the lease and exits. 4. Incoming pod tries again, and this time acquires the lease. can be slow in the 3-to-4 phase, while: 1. Outgoing pod releases the lease and exits. 2. Incoming pod created, scheduled, and acquires the lease. tends to be faster. And again, the component should be able to coast through small durations without a functioning leader. See openshift/machine-config-operator@7530ded86 (install: Recreate and delayed default ServiceAccount deletion, 2023-08-29, openshift/machine-config-operator#3895) for another example of how Recreate can help that way. [1]: https://issues.redhat.com/browse/OCPBUGS-25766 [2]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#recreate-deployment
- Loading branch information