New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

LOG-5055: Add deprecation alerts #2352

Merged

openshift-merge-bot merged 1 commit into openshift:master from Clee2691:LOG-5055

Feb 15, 2024

Contributor

Clee2691 commented Feb 12, 2024 •

edited

Description

This PR introduces three new Prometheus alerts to notify about the deprecation of fluentd, elasticsearch, and kibana.

/cc @cahartma @vparfonov @xperimental
/assign @jcantrill

Links

JIRA: https://issues.redhat.com/browse/LOG-5055

openshift-ci-robot added the jira/valid-reference label

openshift-ci-robot commented Feb 12, 2024 •

edited by openshift-ci bot

@Clee2691: This pull request references LOG-5055 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.8.0" version, but no target version was set.

In response to this:

Description

This PR introduces three new Prometheus alerts to notify about the deprecation of fluentd, elasticsearch, and 'kibana`.

/cc @cahartma @vparfonov @xperimental
/assign @jcantrill

Links

JIRA: https://issues.redhat.com/browse/LOG-5055

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci bot assigned jcantrill

openshift-ci bot requested review from cahartma, vparfonov and xperimental

February 12, 2024 21:27

openshift-ci-robot commented Feb 12, 2024 •

edited by openshift-ci bot

@Clee2691: This pull request references LOG-5055 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.8.0" version, but no target version was set.

In response to this:

Description

This PR introduces three new Prometheus alerts to notify about the deprecation of fluentd, elasticsearch, and kibana.

/cc @cahartma @vparfonov @xperimental
/assign @jcantrill

Links

JIRA: https://issues.redhat.com/browse/LOG-5055

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Contributor Author

Clee2691 commented Feb 13, 2024

/retest

jcantrill added the release/5.9 label

jcantrill reviewed

View reviewed changes

Contributor

jcantrill left a comment

/hold

bundle/manifests/collector_monitoring.coreos.com_v1_prometheusrule.yaml Outdated Show resolved Hide resolved

bundle/manifests/collector_monitoring.coreos.com_v1_prometheusrule.yaml Outdated Show resolved Hide resolved

bundle/manifests/collector_monitoring.coreos.com_v1_prometheusrule.yaml Outdated Show resolved Hide resolved

bundle/manifests/collector_monitoring.coreos.com_v1_prometheusrule.yaml Outdated Show resolved Hide resolved

bundle/manifests/collector_monitoring.coreos.com_v1_prometheusrule.yaml Outdated Show resolved Hide resolved

bundle/manifests/collector_monitoring.coreos.com_v1_prometheusrule.yaml Outdated Show resolved Hide resolved

openshift-ci bot added the do-not-merge/hold label

xperimental reviewed

View reviewed changes

Contributor

xperimental left a comment

fyi: I just ran the queries on a test-cluster using Elasticsearch, did not actually apply the alerts, so I did not see how the text looks in the UI.

config/prometheus/collector_alerts.yaml Outdated

+                      message: "The OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator."
+                      summary: "Detected Elasticsearch as the on cluster storage which is being deprecated in a future release."
+                    expr: |
+                      sum(min_over_time(container_cpu_usage_seconds_total{container=~'elasticsearch'}[5m])) > 0

Contributor

xperimental Feb 13, 2024

This query seems to work, but is a bit "unspecific" in my opinion. We're not really interested in the CPU usage, only the existence of the elasticsearch pods. Also we're not interested in elasticsearch containers that belong to other applications (possibly of the customer).

I would suggest adding matchers to "drill down" into the pod that we want to monitor and also use a metric that just gives us "existence", for example one of the metrics provided by kube-state-metrics:

Suggested change

      
                    sum(min_over_time(container_cpu_usage_seconds_total{container=~'elasticsearch'}[5m])) > 0
          
                    sum(kube_pod_labels{namespace="openshift-logging", label_component="elasticsearch"}) > 0

Alternatively counting containers called "elasticsearch" similar to the previous query:

Suggested change

      
                    sum(min_over_time(container_cpu_usage_seconds_total{container=~'elasticsearch'}[5m])) > 0
          
                    sum(kube_pod_container_info{namespace="openshift-logging", container="elasticsearch"}) > 0

This suggestion also avoids the "time-based aggregation" and "range vector" which might cause flapping of the alert if the CPU usage is variable (which is probably no issue for ES but might be for Kibana). This should also mean that you can reduce the for time to five minutes or even less, because we do not have to wait for the result to "settle" because there is no time component involved.

Contributor Author

Clee2691 Feb 13, 2024

Incorporated the suggested metrics for the alerts. Thanks!

config/prometheus/collector_alerts.yaml Outdated

+                      message: "The Kibana web console is now deprecated and is planned to be removed in a future logging release."
+                      summary: "Detected Kibana as the visualization which is being deprecated in a future release"
+                    expr: |
+                      sum(min_over_time(container_cpu_usage_seconds_total{container=~'kibana'}[5m])) > 0

Contributor

xperimental Feb 13, 2024

See elasticsearch for the longer explanation:

Suggested change

      
                    sum(min_over_time(container_cpu_usage_seconds_total{container=~'kibana'}[5m])) > 0
          
                    sum(kube_pod_labels{namespace="openshift-logging", label_component="kibana"}) > 0

config/prometheus/collector_alerts.yaml Outdated

+                      message: "Fluentd is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to Fluentd, you can use Vector instead."
+                      summary: "Detected Fluentd as the collector which is being deprecated in a future release."
+                    expr: |
+                      sum(min_over_time(container_cpu_usage_seconds_total{image=~'.+fluentd.+'}[5m])) > 0

Contributor

xperimental Feb 13, 2024

See elasticsearch for the longer explanation:

Suggested change

      
                    sum(min_over_time(container_cpu_usage_seconds_total{image=~'.+fluentd.+'}[5m])) > 0
          
                    sum(kube_pod_labels{namespace="openshift-logging", label_app_kubernetes_io_component="collector", label_app_kubernetes_io_name="fluentd"}) > 0

config/prometheus/collector_alerts.yaml Outdated

+                  - alert: ElasticsearchDeprecation
+                    annotations:
+                      message: "The OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator."
+                      summary: "Detected Elasticsearch as the on cluster storage which is being deprecated in a future release."

Contributor

xperimental Feb 13, 2024

Aren't all these components already deprecated? I think we're better suited with "going to be removed in a future release".

edit: it seems Jeff had a similar thought.

Clee2691 force-pushed the LOG-5055 branch from 3deb57f to a735589 Compare

February 13, 2024 17:26

xperimental reviewed

View reviewed changes

config/prometheus/collector_alerts.yaml Outdated

+                  - alert: ElasticsearchDeprecation
+                    annotations:
+                      message: "The OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator."
+                      summary: "Detected Elasticsearch as the on cluster storage which is deprecated and will be removed in a future release."

Contributor

xperimental Feb 13, 2024

The on tripped me up on first read, I think it needs a -:

Suggested change

      
                    summary: "Detected Elasticsearch as the on cluster storage which is deprecated and will be removed in a future release."
          
                    summary: "Detected Elasticsearch as the on-cluster storage which is deprecated and will be removed in a future release."

Also, isn't it "in-cluster"?

Suggested change

      
                    summary: "Detected Elasticsearch as the on cluster storage which is deprecated and will be removed in a future release."
          
                    summary: "Detected Elasticsearch as the in-cluster storage which is deprecated and will be removed in a future release."


          LOG-5055: Add deprecation alerts

aac47fb

Clee2691 force-pushed the LOG-5055 branch from a735589 to aac47fb Compare

February 13, 2024 18:14

Contributor Author

Clee2691 commented Feb 13, 2024

/retest

Contributor

jcantrill commented Feb 13, 2024

/approve
/hold cancel

openshift-ci bot removed the do-not-merge/hold label

Contributor

openshift-ci bot commented Feb 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Clee2691, jcantrill

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jcantrill]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot added the approved label

Contributor

jcantrill commented Feb 14, 2024

/hold

pending PM review

openshift-ci bot added the do-not-merge/hold label

Contributor

jcantrill commented Feb 15, 2024

/lgtm
/hold cancel

openshift-ci bot added lgtm and removed do-not-merge/hold labels

Contributor

openshift-ci bot commented Feb 15, 2024

@Clee2691: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-merge-bot bot merged commit 9de498a into openshift:master

10 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment