[release-4.7] Bug 1930876: etcdInsufficientMembers is wrong when etcd is in a pod #1066

The upstream etcd alert is incorrect because it only excludes instance, but OpenShift runs etcd in a pod and therefore the pod label must be excluded. Exclude the upstream alert, improve the resiliency of the alert expression, target the alert to the expected job for the cluster etcd (job="etcd"), update the description and health text to include a clearer description of what insufficient members means and consequences and some impact actions, and separate the alert into its own rule group to prepare (in the future) of moving the alert into the cluster-etcd-operator repo. The alert now includes etcd_server_has_leader == 1 to ensure that if an instance from a previous quorum appears we will not consider it part of the majority calculation. This also flags when we can't establish quorum due to failures in communication between nodes (but not between monitoring and etcd).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release-4.7] Bug 1930876: etcdInsufficientMembers is wrong when etcd is in a pod #1066

[release-4.7] Bug 1930876: etcdInsufficientMembers is wrong when etcd is in a pod #1066

Commits on Feb 19, 2021