Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed graduation of scheduler metrics from alpha to stable #105861

Closed
vantuvt opened this issue Oct 24, 2021 · 15 comments · Fixed by #105941
Closed

Proposed graduation of scheduler metrics from alpha to stable #105861

vantuvt opened this issue Oct 24, 2021 · 15 comments · Fixed by #105941
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@vantuvt
Copy link
Contributor

vantuvt commented Oct 24, 2021

What would you like to be added?

I would like to propose the graduation of the following scheduler metrics from alpha to stable:

Why is this needed?

These metrics have been stable for at least 15 months.

@vantuvt vantuvt added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 24, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 24, 2021
@vantuvt
Copy link
Contributor Author

vantuvt commented Oct 24, 2021

/sig scheduling
/sig instrumentation

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 24, 2021
@vantuvt
Copy link
Contributor Author

vantuvt commented Oct 24, 2021

/cc @alculquicondor @ahg-g @logicalhan

@alculquicondor
Copy link
Member

IIRC, the last addition to these metrics was the profile name label. I don't foresee need for new labels.

I don't see many precedents for graduation in core components. @ehashman are there any defined criteria for graduation?

@vantuvt
Copy link
Contributor Author

vantuvt commented Oct 25, 2021

As requested, the following is how long it has been since a change has been made to each of the requested metrics:

  • pending_pods - 2 years; last change was to migrate to the metrics stability framework
  • preemption_attempts_total - 17 months; last change was a renaming of the metric
  • preemption_victims - 15 months; last change was a renaming of the metric
  • e2e_scheduling_duration_seconds - 17 months; last change was to add a label
  • schedule_attempts_total - 17 months; last change was to add a label

@logicalhan
Copy link
Member

IIRC, the last addition to these metrics was the profile name label. I don't foresee need for new labels.

I don't see many precedents for graduation in core components. @ehashman are there any defined criteria for graduation?

It's up to the component owners to commit to graduating these metrics, since the component owners own the metrics and therefore need to agree to support them in accordance to stability requirements. If you guys are okay with promoting these metrics, I don't see a reason, from an instrumentation side, why we would object to promotion.

@alculquicondor
Copy link
Member

+1 to graduation then. These metrics are very mature.

Although, I'm not sure if all metrics already adhere to naming conventions. pending_pods and preemption_victims don't have a unit suffix, but they are unit-less. Is that fine @logicalhan?

@logicalhan
Copy link
Member

+1 to graduation then. These metrics are very mature.

Although, I'm not sure if all metrics already adhere to naming conventions. pending_pods and preemption_victims don't have a unit suffix, but they are unit-less. Is that fine @logicalhan?

pending_pods is a gauge so it's fine, preemption_victims is a histogram so it's also fine.

@kerthcet
Copy link
Member

I'm recently working with metrics, I'm glad to provide some help if no one has interest about it.

@alculquicondor
Copy link
Member

There is already an open PR for this.

@kerthcet
Copy link
Member

Oh sorry, didn't notice about that. Forget it.

@dgrisonnet
Copy link
Member

These graduations sound good to me, but I would be inclined to rename e2e_scheduling_duration_seconds. I feel that the e2e part of the metric is unnecessary. Having scheduling_duration_seconds instead and keeping scheduling_algorithm_duration_seconds for the algorithm latency is already descriptive enough to know the difference between both. If that's not the case, then the Help text of the metric can be used to guide the users.

@alculquicondor
Copy link
Member

I agree that e2e is rather confusing, so +1 to removing the prefix. The metric doesn't include retries, for example.

We have a separate metric that includes retries:

Name: "pod_scheduling_duration_seconds",

We need to rename that one eventually to e2e, but it can only be done after the existing one has been deprecated and deleted for a few releases.

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 28, 2021
@rezakrimi
Copy link
Contributor

/assign

@erain
Copy link
Contributor

erain commented Oct 29, 2021

/cc

@rezakrimi
Copy link
Contributor

The PR for this is ready to be reviewed: #105941

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants