-
Notifications
You must be signed in to change notification settings - Fork 39.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics for job-controller #98434
Comments
cc @yangkev |
/triage accepted |
I've updated the job description to include the metric names and labels I propose. The ones that are preceded with TBD I'm not convinced we need. I'm wondering if a KEP is an overkill. |
I feel the metrics are straightforward enough. Perhaps we add each metrics in a separate PR in which we discuss the exact semantics. |
/assign |
I'm documenting this in the KEP for beta graduation for indexed job kubernetes/enhancements#2616 |
What would you like to be added:
Metrics for the job controller operations, including:
job_sync_duration_seconds
job_sync_total
(with aresult
label forsuccess
/error
)job_finished_total
(with aresult
label forsuccess
/error
)All metrics should include labels that allow to narrow delays or errors to features or code paths:
completion_mode
:NonIndexed
,Indexed
.pattern
:single
,workQueue
,fixedCompletion
Why is this needed:
This is essential to diagnose common usage patterns or problems in the controller.
Note:
A KEP will be written for this feature, but we want to gather feedback on other useful metrics or labels. The lists are currently open to additions or removals.
/sig apps
/area workload-api/job
The text was updated successfully, but these errors were encountered: