Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prometheus-kube-stack] "Error on ingesting out-of-order result from rule evaluation" #1177

Closed
antoineozenne opened this issue Jul 19, 2021 · 20 comments · Fixed by #2076
Closed
Labels
bug Something isn't working

Comments

@antoineozenne
Copy link

Describe the bug
There is some warn errors in logs:

level=warn ts=2021-07-19T12:40:08.145Z caller=manager.go:651 component="rule manager" group=kube-apiserver.rules msg="Error on ingesting out-of-order result from rule evaluation" numDropped=8
level=warn ts=2021-07-19T12:40:48.951Z caller=manager.go:651 component="rule manager" group=kube-apiserver-burnrate.rules msg="Error on ingesting out-of-order result from rule evaluation" numDropped=1

This seems to trigger the PrometheusMissingRuleEvaluations alert.

Version of Helm and Kubernetes:

Helm Version:

$ helm version
version.BuildInfo{Version:"v3.5.3", GitCommit:"041ce5a2c17a58be0fcd5f5e16fb3e7e95fea622", GitTreeState:"dirty", GoVersion:"go1.15.8"}

Kubernetes Version:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.10", GitCommit:"98d5dc5d36d34a7ee13368a7893dcb400ec4e566", GitTreeState:"clean", BuildDate:"2021-04-15T03:28:42Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.10", GitCommit:"98d5dc5d36d34a7ee13368a7893dcb400ec4e566", GitTreeState:"clean", BuildDate:"2021-04-15T03:20:25Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}

Which chart: kube-prometheus-stack

Which version of the chart: 16.14.1

What happened:

The record cluster_quantile:apiserver_request_duration_seconds:histogram_quantile, defined in kube-apiserver.rules.yaml and kube-apiserver-histogram.rules.yaml, contains a lot of NaN values (because of some 0 values in the instant-vector in histogram_quantile). This triggers PrometheusMissingRuleEvaluations alert.

What you expected to happen:

Handle the rule to not trigger alert.

Anything else:

I noticed this commit redefines this record in a second group of rules f501c4ed62c9e77cf96b46e83202f6ea17a13b97 (kube-apiserver-histogram.rules in addition to kube-apiserver.rules).

@antoineozenne antoineozenne added the bug Something isn't working label Jul 19, 2021
@stale
Copy link

stale bot commented Aug 18, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@antoineozenne
Copy link
Author

/no-stale

@stale stale bot removed the lifecycle/stale label Aug 18, 2021
@antoineozenne
Copy link
Author

Still exists in 18.0.0.

@stale
Copy link

stale bot commented Sep 23, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@antoineozenne
Copy link
Author

/no-stale

@stale stale bot removed the lifecycle/stale label Sep 23, 2021
@stale
Copy link

stale bot commented Oct 23, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@antoineozenne
Copy link
Author

/no-stale

@stale stale bot removed the lifecycle/stale label Oct 24, 2021
@antoineozenne
Copy link
Author

I can provide more information if you need.

@stale
Copy link

stale bot commented Dec 4, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale label Dec 4, 2021
@antoineozenne
Copy link
Author

/no-stale

@stale stale bot removed the lifecycle/stale label Dec 6, 2021
@stale
Copy link

stale bot commented Jan 5, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale label Jan 5, 2022
@antoineozenne
Copy link
Author

/no-stale

@stale stale bot removed the lifecycle/stale label Jan 5, 2022
@bryanasdev000
Copy link

bryanasdev000 commented Jan 10, 2022

Can confirm, same here.

Also related to #1283, with a possible workaround.

EDIT: Also related in my case kubernetes-monitoring/kubernetes-mixin#392 and https://docs.microfocus.com/itom/HCMX:2021.05/PrometheusManyToManyMatching .

@antoineozenne
Copy link
Author

I think setting .Values.defaultRules.rules.kubeApiserver to false isn't really a workaround as it disables the monitoring. :)

@bryanasdev000
Copy link

bryanasdev000 commented Jan 11, 2022

I think setting .Values.defaultRules.rules.kubeApiserver to false isn't really a workaround as it disables the monitoring. :)

Forgot the quotes at workaround :P

In my specific case, resolving the many to many also resolved out of order result after a pod restart.

In my case it all refers to an old/dirty installation of Prometheus. In your context, does a new cluster have the same problems?

I am running K8S 1.19-1.21 with kube-prometheus-stack-23.3.2.

@antoineozenne
Copy link
Author

It is the case for a new installation, yes.

@stale
Copy link

stale bot commented Feb 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@antoineozenne
Copy link
Author

/no-stale

@stale stale bot removed the lifecycle/stale label Feb 14, 2022
@bryanasdev000
Copy link

bryanasdev000 commented Mar 2, 2022

/no-stale

@antoineozenne take a look at: #1799

Basically, setting .Values.defaultRules.rules.kubeApiserver to false fixes it, as it disables the duplicate. You keep the monitoring for the API Server and its rules.

@antoineozenne
Copy link
Author

Thank you @bryanasdev000, will use that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants