-
Notifications
You must be signed in to change notification settings - Fork 90
only alert for ES being red if the csv succeeded #585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
only alert for ES being red if the csv succeeded #585
Conversation
|
I think that's an uncommon way of handling this: adding the values of status together. |
|
/retest |
During the rollout of a new logging-operator version, ES can take some time before it goes green again. This patch only causes the alert for ES being red to fire if the elasticsearch-operator csv succeeded. If the csv is actively rolling out, ES being red should be expected and we have other monitoring for when a csv is abnormal.
79f0bf6 to
9f58c25
Compare
|
/retest |
|
/retest |
1 similar comment
|
/retest |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: RiRa12621, yithian The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/assign @ewolinetz |
Co-authored-by: Rick Rackow <rrackow@redhat.com>
|
New changes are detected. LGTM label has been removed. |
I'm not sure I'm following here. why would it be expected that elasticsearch is red when the operator is being updated? |
|
I'm basing that on a couple assumptions:
|
this isn't a correct assumption. the operator will only restart the nodes if there is a cert change (CLO is responsible for this) or if there is a change in the elasticsearch CR that needs to be reflected in the deployments themselves.
this can happen even if there isn't a csv update... would the proper fix for this be to adjust the |
|
@yithian: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
We explicitly don't want to extend the 'for'. |
blockloop
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to handle this by grouping and/or inhibition? I don't know that we should be combining this kind of behavior into one alert when the API already accounts for this behavior with another entity. Perhaps we should have another alert for CSV failure or or an inhibitor to this rule.
|
An inhibition does require an other alert to be present already. |
|
Right, I think I failed to make the point that we probably want to inhibit all ES alerts while the CSV is invalid. In which case I'm suggesting we add an alert for |
Can you expand on this? The cluster being This can be true and still not be an issue if there are a very large number of primary shards on a node that was just restarted, or if the disk backing this node is slow. My underlying points is, just because its |
|
It can be true and not be an issue but more often than not it is an issue. |
|
Ultimately, what we want is not to get alerts for the elasticsearch cluster being unhealthy if its csv is in an unready state. I'm pretty agnostic about how we get to that point. If we decide to set up an alert for the |
|
If elasticsearch-operator doesn't have a way to manage inhibition rules, should we go ahead with this change to the alerting rule? Or do we need to implement a new csv_succeeded rule and time its release with an update to the alertmanager config that would cause it to inhibit ElasticsearchClusterNotHealthy ? |
|
Inhibition rules are part of the alertmanager configuration. |
Description
During the rollout of a new logging-operator version, ES can take some
time before it goes green again. This patch only causes the alert for ES
being red to fire if the elasticsearch-operator csv succeeded.
If the csv is actively rolling out, ES being red should be expected and
we have other monitoring for when a csv is abnormal.
/cc @alanconway
/assign @ewolinetz
Links