Adjust bundled Prometheus to scrape for only essential metrics #1805

thomasvn · 2022-11-15T01:38:18Z

What does this PR change?

The scrape_config titled kubernetes-service-endpoints currently attempts to scrape all service endpoints available in the cluster which include the prometheus.io/scrape: true annotation

Because Kubecost only needs metrics from kubecost-kube-state-metrics, kubecost-prometheus-node-exporter, and kubecost-network-costs ... Prometheus is scraping more than it needs to. This results in duplicate metrics, and errors when attempting to scrape a service endpoint which it doesn't have permissions to scrape.

This PR adds a filter in the kubernetes-service-endpoints scrape_config so that it is only scraping for the metrics required by Kubecost.

Does this PR rely on any other PRs?

No

How does this PR impact users? (This is the kind of thing that goes in release notes!)

If users were querying any metrics from Kubecost’s bundled prometheus that are not listed here: https://github.com/kubecost/docs/blob/main/user-metrics.md, they will be affected.

Links to Issues or ZD tickets this PR addresses or fixes

More implicit scraping of required Prometheus metrics #1742

How was this PR tested?

Using the following values.yaml, I verified on the Prometheus server that all metrics endpoints required by Kubecost were still being scraped, while all excess metrics endpoints (e.g. kube-dns) were no longer scraped.

kubectl port-forward svc/kubecost-prometheus-server 8080:80

# values.yaml
kube-state-metrics:
  enabled: true
nodeExporter:
  enabled: true
networkCosts:
  enabled: true

Have you made an update to documentation?

No

…elist which endpoints are being scraped

jessegoodier · 2022-11-15T21:23:21Z

Love this. testing now.

AjayTripathy · 2022-11-17T16:40:53Z

This is all we need for now, but as a note we need to add the DCGM exporter when we reintroduce GPU usage metrics: https://docs.google.com/document/d/1fsbV55wTbpfWy4m9_FVETqHgtTMahfUa2w5MzV9gngc/edit

AjayTripathy · 2022-11-18T05:07:24Z

Approved, @jessegoodier @thomasvn you all feel good to merge this?

thomasvn · 2022-11-19T02:13:19Z

Good to merge. After further review, here are some future todo items:

Test the Nvidia GPU metric DCGM_FI_DEV_GPU_UTIL once the feature is reintroduced. Reintroduce gpuRequestAverage and gpuUsageAverage to the Allocation API Schema #1787
Consider removing kubernetes-service-endpoints-slow scrape_config because this only applies to the annotation prometheus.io/scrape-slow, which is not used by Kubecost targets. Slower scrape intervals can be configured in Kubecost’s .Values.prometheus.server.global.scrape_interval.
Consider removing the kubernetes-nodes scrape_config. I believe its metrics are not currently used by Kubecost (I may be wrong).

Adam-Stack-PM · 2022-11-21T01:45:48Z

So cool @thomasvn. Thanks for pushing this forward.

jessegoodier · 2022-11-21T13:43:04Z

Approved, @jessegoodier @thomasvn you all feel good to merge this?

yes, working in all of my tests. much cleaner. nice work @thomasvn

Update prometheus scrapeconfig 'kubernetes-service-endpoints' to whit…

eb5010d

…elist which endpoints are being scraped

AjayTripathy mentioned this pull request Nov 17, 2022

Reintroduce gpuRequestAverage and gpuUsageAverage to the Allocation API Schema #1787

Closed

AjayTripathy approved these changes Nov 17, 2022

View reviewed changes

thomasvn marked this pull request as ready for review November 19, 2022 02:13

thomasvn merged commit f32b715 into develop Nov 21, 2022

aaroniscode mentioned this pull request Feb 10, 2023

Kubecost no longer supports BYO prometheus-node-exporter or kube-state-metrics #1975

Closed

keithhand mentioned this pull request Feb 17, 2023

fix bring your own prometheus-node-exporter and kube-state-metrics #1982

Merged

jessegoodier mentioned this pull request Mar 2, 2023

minor fixes for hosted agent #2019

Merged

thomasvn deleted the thomasn/prom-scrape-whitelist branch September 9, 2023 01:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust bundled Prometheus to scrape for only essential metrics #1805

Adjust bundled Prometheus to scrape for only essential metrics #1805

thomasvn commented Nov 15, 2022 •

edited

jessegoodier commented Nov 15, 2022

AjayTripathy commented Nov 17, 2022

AjayTripathy commented Nov 18, 2022

thomasvn commented Nov 19, 2022

Adam-Stack-PM commented Nov 21, 2022

jessegoodier commented Nov 21, 2022

Adjust bundled Prometheus to scrape for only essential metrics #1805

Adjust bundled Prometheus to scrape for only essential metrics #1805

Conversation

thomasvn commented Nov 15, 2022 • edited

What does this PR change?

Does this PR rely on any other PRs?

How does this PR impact users? (This is the kind of thing that goes in release notes!)

Links to Issues or ZD tickets this PR addresses or fixes

How was this PR tested?

Have you made an update to documentation?

jessegoodier commented Nov 15, 2022

AjayTripathy commented Nov 17, 2022

AjayTripathy commented Nov 18, 2022

thomasvn commented Nov 19, 2022

Adam-Stack-PM commented Nov 21, 2022

jessegoodier commented Nov 21, 2022

thomasvn commented Nov 15, 2022 •

edited