Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate tolerations causing issue with prometheus >= 2.52.0 #2390

Closed
rgarcia89 opened this issue May 14, 2024 · 2 comments
Closed

Duplicate tolerations causing issue with prometheus >= 2.52.0 #2390

rgarcia89 opened this issue May 14, 2024 · 2 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@rgarcia89
Copy link

What happened:
Starting with version 2.52.0, Prometheus introduced a mechanism to detect duplicate series during scraping. This can lead to error logs when kube-state-metrics scrapes metrics for deployments, particularly if there are duplicate entries within the toleration array.

prometheus debug logs:

ts=2024-05-13T19:21:09.190Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-state-metrics/0 target=https://10.244.5.6:8443/metrics msg="Duplicate sample for timestamp" series="kube_pod_tolerations{namespace=\"calico-system\",pod=\"calico-kube-controllers-75c647b46c-pg9cr\",uid=\"bf944c52-17bd-438b-bbf1-d97f8671bd6b\",key=\"CriticalAddonsOnly\",operator=\"Exists\"}"
ts=2024-05-13T19:21:09.207Z caller=scrape.go:1738 level=warn component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-state-metrics/0 target=https://10.244.5.6:8443/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=1

There might be a need to deduplicate the toleration entries or add an index to entries with existing duplicates.

How to reproduce it (as minimally and precisely as possible):

create the following deployment and look at the metrics produced by kube-state-metrics

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-deployment
  labels:
    app: something
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - name: test-container
        image: nginx
      tolerations:
       - key: CriticalAddonsOnly
         operator: Exists
       - key: CriticalAddonsOnly
         operator: Exists

Anything else we need to know?:
Issue report I opened on the prometheus project prometheus/prometheus#14089

Environment:

  • kube-state-metrics version: 2.12.0
  • Kubernetes version (use kubectl version): 1.27.9
  • Cloud provider or hardware configuration: AKS
  • Other info:
@dgrisonnet
Copy link
Member

/assign
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 16, 2024
@dgrisonnet
Copy link
Member

Quoting yourself from the issue you opened against Kubernetes:

A validation check within the Kubernetes API server to reject manifests with duplicate tolerations, ensuring adherence to Kubernetes best practices and avoiding potential issues related to duplicate toleration definitions would be great.

This is also what I would expect to be in the kube-apiserver. I don't think we should handle this scenario at kube-state-metrics' level since the object data is erroneous.

I am closing this issue in favor of the Kubernetes one. Feel free to reopen if the Kubernetes maintainers think we should handle this scenario here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants