Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.7] Bug 1930876: etcdInsufficientMembers is wrong when etcd is in a pod #1066

Commits on Feb 19, 2021

  1. jsonnet/rules: etcdInsufficientMembers is wrong when etcd is in a pod

    The upstream etcd alert is incorrect because it only excludes instance,
    but OpenShift runs etcd in a pod and therefore the pod label must be
    excluded.
    
    Exclude the upstream alert, improve the resiliency of the alert
    expression, target the alert to the expected job for the cluster etcd
    (job="etcd"), update the description and health text to include a
    clearer description of what insufficient members means and consequences
    and some impact actions, and separate the alert into its own rule
    group to prepare (in the future) of moving the alert into the
    cluster-etcd-operator repo. The alert now includes
    etcd_server_has_leader == 1 to ensure that if an instance from a
    previous quorum appears we will not consider it part of the majority
    calculation. This also flags when we can't establish quorum due to
    failures in communication between nodes (but not between monitoring
    and etcd).
    smarterclayton authored and openshift-cherrypick-robot committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    238a598 View commit details
    Browse the repository at this point in the history