-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scraping Metrics from a CronJob #1338
Comments
This is a problem that unfortunately affects metrics collection for any short-lived workload (I was working on the same issue for serverless recently). So, it's not just a descheduler issue and I don't think a deadline setting like you proposed is technically the right solution. I don't know if the Prometheus community has come to a broader solution for this type of problem. Ultimately, short-lived workloads benefit from exporting their metrics to a listening server, rather than the Prometheus standard of waiting to be scraped by a server. This is how OpenTelemetry metrics work, and when a workload shuts down all metrics in memory are flushed to the collection endpoint. So I think to really address this, we should consider updating our metrics implementation to use OpenTelemetry. We already use Otel for traces, so there is some benefit to using both. But the good news is we could do this without breaking existing Prometheus users either by:
@yutkin Unfortunately this still doesn't fix your problem, because you're using a Prometheus server to scrape the endpoint. But if we implement Otel metrics, you could run an OpenTelemetry Collector with otlp receiver and Prometheus exporter, then point your Prometheus agent at that endpoint. |
Here is another option: Prometheus has a push gateway for handling this, https://github.com/prometheus/pushgateway. I'm not super familiar with push gateway, but I believe the descheduler code would need to be updated to have an option to push metrics when running as a Job or CronJob. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Is your feature request related to a problem? Please describe.
We were running Descheduler as deployment but decided to switch to
CronJob
because we want to run it only during nighttime. However,CronJob
finishes within a few seconds, which is not enough for a pod to be scraped by Prometheus.Describe the solution you'd like
I don't have a solution, but maybe introduce a CLI flag configuring how long to keep the Descheduler up and running. It would allow to keep a pod, for example for 15s, which is enough to be scraped by Prometheus. But I am open for other suggestions.
Describe alternatives you've considered
Run Descheduler as a deployment, however, it allows only to specify a period, but not the exact time when to run.
What version of the descheduler are you using?
descheduler version: 0.29
The text was updated successfully, but these errors were encountered: