Duplicate tolerations causing issue with prometheus >= 2.52.0 #2390

rgarcia89 · 2024-05-14T08:50:11Z

What happened:
Starting with version 2.52.0, Prometheus introduced a mechanism to detect duplicate series during scraping. This can lead to error logs when kube-state-metrics scrapes metrics for deployments, particularly if there are duplicate entries within the toleration array.

prometheus debug logs:

ts=2024-05-13T19:21:09.190Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-state-metrics/0 target=https://10.244.5.6:8443/metrics msg="Duplicate sample for timestamp" series="kube_pod_tolerations{namespace=\"calico-system\",pod=\"calico-kube-controllers-75c647b46c-pg9cr\",uid=\"bf944c52-17bd-438b-bbf1-d97f8671bd6b\",key=\"CriticalAddonsOnly\",operator=\"Exists\"}"
ts=2024-05-13T19:21:09.207Z caller=scrape.go:1738 level=warn component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-state-metrics/0 target=https://10.244.5.6:8443/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=1

There might be a need to deduplicate the toleration entries or add an index to entries with existing duplicates.

How to reproduce it (as minimally and precisely as possible):

create the following deployment and look at the metrics produced by kube-state-metrics

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-deployment
  labels:
    app: something
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - name: test-container
        image: nginx
      tolerations:
       - key: CriticalAddonsOnly
         operator: Exists
       - key: CriticalAddonsOnly
         operator: Exists

Anything else we need to know?:
Issue report I opened on the prometheus project prometheus/prometheus#14089

Environment:

kube-state-metrics version: 2.12.0
Kubernetes version (use kubectl version): 1.27.9
Cloud provider or hardware configuration: AKS
Other info:

The text was updated successfully, but these errors were encountered:

dgrisonnet · 2024-05-16T16:47:23Z

/assign
/triage accepted

dgrisonnet · 2024-05-16T17:08:45Z

Quoting yourself from the issue you opened against Kubernetes:

A validation check within the Kubernetes API server to reject manifests with duplicate tolerations, ensuring adherence to Kubernetes best practices and avoiding potential issues related to duplicate toleration definitions would be great.

This is also what I would expect to be in the kube-apiserver. I don't think we should handle this scenario at kube-state-metrics' level since the object data is erroneous.

I am closing this issue in favor of the Kubernetes one. Feel free to reopen if the Kubernetes maintainers think we should handle this scenario here.

rgarcia89 added the kind/bug Categorizes issue or PR as related to a bug. label May 14, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 14, 2024

bootc mentioned this issue May 14, 2024

Prometheus v2.52.0 raises "Error on ingesting samples with different value but same timestamp" for kube-state-metrics prometheus/prometheus#14089

Closed

rgarcia89 mentioned this issue May 15, 2024

Duplicate Tolerations kubernetes/kubernetes#124881

Open

k8s-ci-robot assigned dgrisonnet May 16, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 16, 2024

dgrisonnet closed this as completed May 16, 2024

albundy83 mentioned this issue May 22, 2024

[Bug]: On PersistentVolumeClaim accessModes duplicate cloudnative-pg/cloudnative-pg#4621

Closed

4 tasks

tl-eirik-albrigtsen mentioned this issue May 30, 2024

Duplicate sample for HPA metrics using multiple external metrics with same metric name #2405

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate tolerations causing issue with prometheus >= 2.52.0 #2390

Duplicate tolerations causing issue with prometheus >= 2.52.0 #2390

rgarcia89 commented May 14, 2024

dgrisonnet commented May 16, 2024

dgrisonnet commented May 16, 2024

Duplicate tolerations causing issue with prometheus >= 2.52.0 #2390

Duplicate tolerations causing issue with prometheus >= 2.52.0 #2390

Comments

rgarcia89 commented May 14, 2024

dgrisonnet commented May 16, 2024

dgrisonnet commented May 16, 2024