New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1730413: etcd should still alert when a member disappears from the endpoints #420
Bug 1730413: etcd should still alert when a member disappears from the endpoints #420
Conversation
…points There is no alert that reports when an etcd quorum member is degraded. Add an alert that reports if any etcd member is down or potentially unreachable. Since it is possible if the etcd metrics are collected under a Kubernetes service and the etcd instance is removed from the endpoint (due to implementation details of how etcd is run), extend the alert to also capture failure rate of requests to that cluster.
|
@smarterclayton: This pull request references a valid Bugzilla bug. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest |
|
/refresh |
|
/skip |
|
/retest |
|
The shellcheck test is failing because it's missing the |
|
/test e2e-aws-operator |
|
/refresh |
|
@smarterclayton: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/override ci/prow/shellcheck |
|
@smarterclayton: Overrode contexts on behalf of smarterclayton: ci/prow/shellcheck In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest |
|
@lilic PTAL (green) |
|
/lgtm |
|
Was not in the OWNERS file in 4.1 branch, so can't approve this. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: LiliC, s-urbaniak, smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
ping @smarterclayton @eparis this needs an explicit cherry-pick-approval I think 🤔 |
|
Yeah this weeks patch manager will do it.
…On Mon, Aug 5, 2019 at 2:33 AM Sergiusz Urbaniak ***@***.***> wrote:
ping @smarterclayton <https://github.com/smarterclayton> @eparis
<https://github.com/eparis> this needs an explicit cherry-pick-approval I
think 🤔
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#420?email_source=notifications&email_token=AAI37J4KN67OGZWYED7A7OLQC7CUDA5CNFSM4IIKOI2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3Q2K7Y#issuecomment-518104447>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAI37J2GL53IENJE475MBOLQC7CUDANCNFSM4IIKOI2A>
.
|
|
@smarterclayton: All pull requests linked via external trackers have merged. The Bugzilla bug has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There is no alert that reports when an etcd quorum member is degraded.
Add an alert that reports if any etcd member is down or potentially
unreachable. Since it is possible if the etcd metrics are collected
under a Kubernetes service and the etcd instance is removed from the
endpoint (due to implementation details of how etcd is run), extend
the alert to also capture failure rate of requests to that cluster.