New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LOG-5055: Add deprecation alerts #2352
Conversation
@Clee2691: This pull request references LOG-5055 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.8.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@Clee2691: This pull request references LOG-5055 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.8.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
bundle/manifests/collector_monitoring.coreos.com_v1_prometheusrule.yaml
Outdated
Show resolved
Hide resolved
bundle/manifests/collector_monitoring.coreos.com_v1_prometheusrule.yaml
Outdated
Show resolved
Hide resolved
bundle/manifests/collector_monitoring.coreos.com_v1_prometheusrule.yaml
Outdated
Show resolved
Hide resolved
bundle/manifests/collector_monitoring.coreos.com_v1_prometheusrule.yaml
Outdated
Show resolved
Hide resolved
bundle/manifests/collector_monitoring.coreos.com_v1_prometheusrule.yaml
Outdated
Show resolved
Hide resolved
bundle/manifests/collector_monitoring.coreos.com_v1_prometheusrule.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi: I just ran the queries on a test-cluster using Elasticsearch, did not actually apply the alerts, so I did not see how the text looks in the UI.
message: "The OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator." | ||
summary: "Detected Elasticsearch as the on cluster storage which is being deprecated in a future release." | ||
expr: | | ||
sum(min_over_time(container_cpu_usage_seconds_total{container=~'elasticsearch'}[5m])) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query seems to work, but is a bit "unspecific" in my opinion. We're not really interested in the CPU usage, only the existence of the elasticsearch pods. Also we're not interested in elasticsearch
containers that belong to other applications (possibly of the customer).
I would suggest adding matchers to "drill down" into the pod that we want to monitor and also use a metric that just gives us "existence", for example one of the metrics provided by kube-state-metrics:
sum(min_over_time(container_cpu_usage_seconds_total{container=~'elasticsearch'}[5m])) > 0 | |
sum(kube_pod_labels{namespace="openshift-logging", label_component="elasticsearch"}) > 0 |
Alternatively counting containers called "elasticsearch" similar to the previous query:
sum(min_over_time(container_cpu_usage_seconds_total{container=~'elasticsearch'}[5m])) > 0 | |
sum(kube_pod_container_info{namespace="openshift-logging", container="elasticsearch"}) > 0 |
This suggestion also avoids the "time-based aggregation" and "range vector" which might cause flapping of the alert if the CPU usage is variable (which is probably no issue for ES but might be for Kibana). This should also mean that you can reduce the for
time to five minutes or even less, because we do not have to wait for the result to "settle" because there is no time component involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorporated the suggested metrics for the alerts. Thanks!
message: "The Kibana web console is now deprecated and is planned to be removed in a future logging release." | ||
summary: "Detected Kibana as the visualization which is being deprecated in a future release" | ||
expr: | | ||
sum(min_over_time(container_cpu_usage_seconds_total{container=~'kibana'}[5m])) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See elasticsearch
for the longer explanation:
sum(min_over_time(container_cpu_usage_seconds_total{container=~'kibana'}[5m])) > 0 | |
sum(kube_pod_labels{namespace="openshift-logging", label_component="kibana"}) > 0 |
message: "Fluentd is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to Fluentd, you can use Vector instead." | ||
summary: "Detected Fluentd as the collector which is being deprecated in a future release." | ||
expr: | | ||
sum(min_over_time(container_cpu_usage_seconds_total{image=~'.+fluentd.+'}[5m])) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See elasticsearch
for the longer explanation:
sum(min_over_time(container_cpu_usage_seconds_total{image=~'.+fluentd.+'}[5m])) > 0 | |
sum(kube_pod_labels{namespace="openshift-logging", label_app_kubernetes_io_component="collector", label_app_kubernetes_io_name="fluentd"}) > 0 |
- alert: ElasticsearchDeprecation | ||
annotations: | ||
message: "The OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator." | ||
summary: "Detected Elasticsearch as the on cluster storage which is being deprecated in a future release." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't all these components already deprecated? I think we're better suited with "going to be removed in a future release".
edit: it seems Jeff had a similar thought.
- alert: ElasticsearchDeprecation | ||
annotations: | ||
message: "The OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator." | ||
summary: "Detected Elasticsearch as the on cluster storage which is deprecated and will be removed in a future release." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The on
tripped me up on first read, I think it needs a -
:
summary: "Detected Elasticsearch as the on cluster storage which is deprecated and will be removed in a future release." | |
summary: "Detected Elasticsearch as the on-cluster storage which is deprecated and will be removed in a future release." |
Also, isn't it "in-cluster"?
summary: "Detected Elasticsearch as the on cluster storage which is deprecated and will be removed in a future release." | |
summary: "Detected Elasticsearch as the in-cluster storage which is deprecated and will be removed in a future release." |
/retest |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Clee2691, jcantrill The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold pending PM review |
/lgtm |
@Clee2691: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Description
This PR introduces three new Prometheus alerts to notify about the deprecation of
fluentd
,elasticsearch
, andkibana
./cc @cahartma @vparfonov @xperimental
/assign @jcantrill
Links