Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOG-5055: Add deprecation alerts #2352

Merged
merged 1 commit into from Feb 15, 2024

Conversation

Clee2691
Copy link
Contributor

@Clee2691 Clee2691 commented Feb 12, 2024

Description

This PR introduces three new Prometheus alerts to notify about the deprecation of fluentd, elasticsearch, and kibana.

/cc @cahartma @vparfonov @xperimental
/assign @jcantrill

Links

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 12, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 12, 2024

@Clee2691: This pull request references LOG-5055 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.8.0" version, but no target version was set.

In response to this:

Description

This PR introduces three new Prometheus alerts to notify about the deprecation of fluentd, elasticsearch, and 'kibana`.

/cc @cahartma @vparfonov @xperimental
/assign @jcantrill

Links

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 12, 2024

@Clee2691: This pull request references LOG-5055 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.8.0" version, but no target version was set.

In response to this:

Description

This PR introduces three new Prometheus alerts to notify about the deprecation of fluentd, elasticsearch, and kibana.

/cc @cahartma @vparfonov @xperimental
/assign @jcantrill

Links

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Clee2691
Copy link
Contributor Author

/retest

Copy link
Contributor

@jcantrill jcantrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 13, 2024
Copy link
Contributor

@xperimental xperimental left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi: I just ran the queries on a test-cluster using Elasticsearch, did not actually apply the alerts, so I did not see how the text looks in the UI.

message: "The OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator."
summary: "Detected Elasticsearch as the on cluster storage which is being deprecated in a future release."
expr: |
sum(min_over_time(container_cpu_usage_seconds_total{container=~'elasticsearch'}[5m])) > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query seems to work, but is a bit "unspecific" in my opinion. We're not really interested in the CPU usage, only the existence of the elasticsearch pods. Also we're not interested in elasticsearch containers that belong to other applications (possibly of the customer).

I would suggest adding matchers to "drill down" into the pod that we want to monitor and also use a metric that just gives us "existence", for example one of the metrics provided by kube-state-metrics:

Suggested change
sum(min_over_time(container_cpu_usage_seconds_total{container=~'elasticsearch'}[5m])) > 0
sum(kube_pod_labels{namespace="openshift-logging", label_component="elasticsearch"}) > 0

Alternatively counting containers called "elasticsearch" similar to the previous query:

Suggested change
sum(min_over_time(container_cpu_usage_seconds_total{container=~'elasticsearch'}[5m])) > 0
sum(kube_pod_container_info{namespace="openshift-logging", container="elasticsearch"}) > 0

This suggestion also avoids the "time-based aggregation" and "range vector" which might cause flapping of the alert if the CPU usage is variable (which is probably no issue for ES but might be for Kibana). This should also mean that you can reduce the for time to five minutes or even less, because we do not have to wait for the result to "settle" because there is no time component involved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporated the suggested metrics for the alerts. Thanks!

message: "The Kibana web console is now deprecated and is planned to be removed in a future logging release."
summary: "Detected Kibana as the visualization which is being deprecated in a future release"
expr: |
sum(min_over_time(container_cpu_usage_seconds_total{container=~'kibana'}[5m])) > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See elasticsearch for the longer explanation:

Suggested change
sum(min_over_time(container_cpu_usage_seconds_total{container=~'kibana'}[5m])) > 0
sum(kube_pod_labels{namespace="openshift-logging", label_component="kibana"}) > 0

message: "Fluentd is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to Fluentd, you can use Vector instead."
summary: "Detected Fluentd as the collector which is being deprecated in a future release."
expr: |
sum(min_over_time(container_cpu_usage_seconds_total{image=~'.+fluentd.+'}[5m])) > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See elasticsearch for the longer explanation:

Suggested change
sum(min_over_time(container_cpu_usage_seconds_total{image=~'.+fluentd.+'}[5m])) > 0
sum(kube_pod_labels{namespace="openshift-logging", label_app_kubernetes_io_component="collector", label_app_kubernetes_io_name="fluentd"}) > 0

- alert: ElasticsearchDeprecation
annotations:
message: "The OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator."
summary: "Detected Elasticsearch as the on cluster storage which is being deprecated in a future release."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't all these components already deprecated? I think we're better suited with "going to be removed in a future release".

edit: it seems Jeff had a similar thought.

- alert: ElasticsearchDeprecation
annotations:
message: "The OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator."
summary: "Detected Elasticsearch as the on cluster storage which is deprecated and will be removed in a future release."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The on tripped me up on first read, I think it needs a -:

Suggested change
summary: "Detected Elasticsearch as the on cluster storage which is deprecated and will be removed in a future release."
summary: "Detected Elasticsearch as the on-cluster storage which is deprecated and will be removed in a future release."

Also, isn't it "in-cluster"?

Suggested change
summary: "Detected Elasticsearch as the on cluster storage which is deprecated and will be removed in a future release."
summary: "Detected Elasticsearch as the in-cluster storage which is deprecated and will be removed in a future release."

@Clee2691
Copy link
Contributor Author

/retest

@jcantrill
Copy link
Contributor

/approve
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 13, 2024
Copy link
Contributor

openshift-ci bot commented Feb 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Clee2691, jcantrill

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 13, 2024
@jcantrill
Copy link
Contributor

/hold

pending PM review

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 14, 2024
@jcantrill
Copy link
Contributor

/lgtm
/hold cancel

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 15, 2024
Copy link
Contributor

openshift-ci bot commented Feb 15, 2024

@Clee2691: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 9de498a into openshift:master Feb 15, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. release/5.9
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants