Allow priority class values to be updated. #99205

ravisantoshgudimetla · 2021-02-18T15:50:57Z

What would you like to be added:

I wish to revisit the topic of priority classes being immutable. We made priority classes immutable so that once a priority class is created, we will not be able to update the name or value because of the following reasons:

-  May remove config portability: pod specs written for one cluster are no longer guaranteed to work on a different cluster if the same priority classes do not exist in the second cluster.
-  If quota is specified for existing priority classes (at the time of this writing, we don’t have this feature in Kubernetes), adding or deleting priority classes will require reconfiguration of quota allocations.
-  An existing pods may have an integer value of priority that does not reflect the current value of its PriorityClass.

While I agree with the first 2 points on changing priority class names, I feel that 3rd point of changing priority class values can be re-visited, considering we noticed a lot of people are anyways deleting and re-creating priority classes just to update the value and the updated priority class value is not reflected for existing pods.

Why is this needed:

While the deletion and re-creation seems like lesser of both the evils, our experience says that the deletion and re-creation of priority classes is more dangerous and error prone, considering people might delete another priority class, instead of doing just an update where an update can be rejected if there is a name mismatch or value update.

Ref: https://stupefied-goodall-e282f7.netlify.app/contributors/design-proposals/scheduling/pod-priority-api/#modifying-priority-classes

/sig scheduling

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2021-02-18T15:51:03Z

@ravisantoshgudimetla: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ravisantoshgudimetla · 2021-02-18T16:00:26Z

cc @wking @damemi @soltysh @ingvagabund

wking · 2021-02-18T17:24:27Z

May remove config portability: pod specs written for one cluster are no longer guaranteed to work on a different cluster if the same priority classes do not exist in the second cluster.

I don't see how in-cluster guards around PriorityClass manipulation can have any bearing on whether Pod specs are portable between clusters. An admin could set up an admission webhook or RBAC that blocks the removal of a set of well-known PriorityClass names to ensure that Pods consuming those names are portable, but that's not something you can build into a cluster by default outside of a few, baked-in systemPriorityClasses entries.

If quota is specified for existing priority classes (at the time of this writing, we don’t have this feature in Kubernetes), adding or deleting priority classes will require reconfiguration of quota allocations.

This sounds like an argument for allowing value updates instead of requiring delete/recreate. Am I missing something?

And my main concern over delete/recreate isn't "might accidentally delete or recreate the wrong PriorityClass" as much as the following flow:

Admin deletes PriorityClass A.
Pod consuming PriorityClass A is created.
Pod blows up, because there is no PriorityClass A.
Admin recreates PriorityClass A.
Pod schedules successfully on retry, now that PriorityClass A exists again.

By comparison, an in-place value update removes the racy gap between 1 and 4, making the change atomic to the updating admin, and as atomic as the server-side wants to make it on the control-plane side.

damemi · 2021-02-18T19:50:52Z

I agree with Ravi on the 3rd point, allowing updates directly is safer and in my opinion the more expected outcome. In other words, right now if I deleted and recreated a priority class the average user would probably just expect pods using that priority class to use the new value, which is not the case.

wking · 2021-02-19T03:57:29Z

...the average user would probably just expect pods using that priority class to use the new value...

I think it's important to manage this expectation via godocs on value and other places we doc priority classes. Because allowing UPDATE on PriorityClass's value seems orthogonal to addressing priority value updates on existing pods.

ingvagabund · 2021-02-19T13:59:25Z

What will ensure all pods with old value of priority class will get eventually recreated? Keeping some portion of pods with old priority and the rest with a new one might e.g. make descheduler stop/start evicting pods which were/were not supposed to be evicted.

alculquicondor · 2021-02-19T14:38:42Z

What needs to be properly documented is that changing the priority value of a priority class doesn't change the priority of running or pending pods. This has implications for unscheduled pods, but more importantly for eviction by kubelet.
Other than that, I think the request is reasonable. Specially given than a similar effect can be achieved by recreation.

I don't think this requires a KEP, but definitely an API review and a notice to sig node.

alculquicondor · 2021-02-19T14:39:07Z

cc @Huang-Wei

Huang-Wei · 2021-02-19T19:45:08Z

Given that we don't change the semantics that how a recreated/updated PriorityClass impacts existing/upcoming pods, this proposal looks good to me.

Just one concern: for some integrators (not default scheduler), they may rely on PriorityClass deletion to trigger some customized behavior (like recreating/updating pending pods). This proposal may get them impacted.

ravisantoshgudimetla · 2021-02-19T23:08:51Z

Just one concern: for some integrators (not default scheduler), they may rely on PriorityClass deletion to trigger some customized behavior (like recreating/updating pending pods). This proposal may get them impacted.

I agree and this is concern I share too. If all of us agree on the above approach, I can send a notice to kubernetes-dev/ & sig-scheduling mailing list and get feedback. We can also go with the approach of having a feature flag and disable this feature before making it a default choice.

alculquicondor · 2021-02-22T15:25:15Z

Sounds like starting a KEP is the best next step

ravisantoshgudimetla · 2021-02-24T01:16:18Z

I am fine opening a KEP. I hope it is general consensus.

alculquicondor · 2021-02-24T14:18:59Z

cc @ahg-g

ravisantoshgudimetla · 2021-02-25T18:36:33Z

@liggitt - A question for you, would it be ok to allow updates to objects that were immutable in previous releases especially if we were achieving the same result by deleting and re-creating the object

liggitt · 2021-02-25T22:33:28Z

Allowing a previously immutable value to be mutable can be reasonable as long as the things consuming that value are well understood and react to changes correctly. IIUC, about the only in-tree actor that makes use of this value is the admission plugin, which pulls from an informer, so would use the value for future pods, right?

ravisantoshgudimetla · 2021-02-26T00:28:35Z

IIUC, about the only in-tree actor that makes use of this value is the admission plugin, which pulls from an informer, so would use the value for future pods, right?

That is correct, the priority admission plugin intercepts the pod creation request and translates priority class to priority value. For the pods that are already running, the priority value would still point to old one, say if priority value has been modified or the priority class has been deleted and re-created.

alculquicondor · 2021-02-26T14:15:27Z

It sounds like you could proceed. I would still advice a KEP, because of the production readiness questionnaire and to have the change properly documented. Note that this means that you would have to do this next release, as enhancements freeze already happened.

Unless people think a KEP is an overkill for this.

ravisantoshgudimetla · 2021-03-01T19:29:57Z

I think KEP may be an overkill considering we were doing the same with deletion and re-creation pattern but I think it atleast warrants an e-mail to devel community and docs update.

soltysh · 2021-03-02T12:47:19Z

This is changing a stable API, so PRR and a proper documentation is in place, imo.

fejta-bot · 2021-05-31T13:42:18Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

alculquicondor · 2021-05-31T15:33:06Z

@ravisantoshgudimetla do you plan to work on a KEP for this for 1.23?

fejta-bot · 2021-06-30T16:16:20Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

denkensk · 2022-02-16T03:08:24Z

/reopen
/remove-lifecycle rotten

k8s-ci-robot · 2022-02-16T03:08:42Z

@denkensk: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2022-05-17T03:39:35Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-06-16T04:27:41Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-07-16T04:58:08Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-07-16T04:58:25Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ghost · 2022-09-19T07:52:38Z

Hi, for now, is it still not able to use patch_priority_class() to update the value of priority class and apply it automatically to running pods?

alculquicondor · 2022-09-19T13:21:54Z

Correct. There are not enough contributors to work on this.

denkensk · 2022-09-19T13:50:12Z

/reopen
/remove-lifecycle rotten
/assign
I think I can try to move it forward. 😄

k8s-ci-robot · 2022-09-19T13:50:16Z

@denkensk: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten
/assign
I think I can try to move it forward. 😄

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2022-12-18T14:39:05Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

alculquicondor · 2022-12-19T15:44:37Z

/help

k8s-ci-robot · 2022-12-19T15:44:38Z

@alculquicondor:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alculquicondor · 2022-12-19T15:44:44Z

/lifecycle stale

wackxu · 2023-07-20T01:46:00Z

I want to help fix it
/assign

ravisantoshgudimetla added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 18, 2021

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Feb 18, 2021

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 18, 2021

wking mentioned this issue Feb 18, 2021

Expose HighestUserDefinablePriority via scheduler config #99203

Closed

ravisantoshgudimetla mentioned this issue Feb 24, 2021

Allow priority classes to be updated openshift/kubernetes#575

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 31, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 30, 2021

k8s-ci-robot reopened this Feb 16, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 16, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 17, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 16, 2022

k8s-ci-robot closed this as completed Jul 16, 2022

k8s-ci-robot assigned denkensk Sep 19, 2022

k8s-ci-robot reopened this Sep 19, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 19, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 18, 2022

k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Dec 19, 2022

k8s-ci-robot assigned wackxu Jul 20, 2023

This was referenced Jul 20, 2023

Allow mutating priority classes kubernetes/enhancements#4129

Closed

Allow priority classes value to be updated #119458

Closed

Allow priority class values to be updated kubernetes/enhancements#4133

Closed

Allow priority class values to be updated. #99205

Allow priority class values to be updated. #99205

Comments

ravisantoshgudimetla commented Feb 18, 2021

What would you like to be added:

Why is this needed:

k8s-ci-robot commented Feb 18, 2021

ravisantoshgudimetla commented Feb 18, 2021 • edited

wking commented Feb 18, 2021

damemi commented Feb 18, 2021

wking commented Feb 19, 2021

ingvagabund commented Feb 19, 2021

alculquicondor commented Feb 19, 2021

alculquicondor commented Feb 19, 2021

Huang-Wei commented Feb 19, 2021

ravisantoshgudimetla commented Feb 19, 2021

alculquicondor commented Feb 22, 2021

ravisantoshgudimetla commented Feb 24, 2021

alculquicondor commented Feb 24, 2021

ravisantoshgudimetla commented Feb 25, 2021

liggitt commented Feb 25, 2021

ravisantoshgudimetla commented Feb 26, 2021

alculquicondor commented Feb 26, 2021 • edited

ravisantoshgudimetla commented Mar 1, 2021

soltysh commented Mar 2, 2021

fejta-bot commented May 31, 2021

alculquicondor commented May 31, 2021

fejta-bot commented Jun 30, 2021

denkensk commented Feb 16, 2022

k8s-ci-robot commented Feb 16, 2022

k8s-triage-robot commented May 17, 2022

k8s-triage-robot commented Jun 16, 2022

k8s-triage-robot commented Jul 16, 2022

k8s-ci-robot commented Jul 16, 2022

ghost commented Sep 19, 2022

alculquicondor commented Sep 19, 2022

denkensk commented Sep 19, 2022

k8s-ci-robot commented Sep 19, 2022

k8s-triage-robot commented Dec 18, 2022

alculquicondor commented Dec 19, 2022

k8s-ci-robot commented Dec 19, 2022

Guidelines

alculquicondor commented Dec 19, 2022

wackxu commented Jul 20, 2023

ravisantoshgudimetla commented Feb 18, 2021 •

edited

alculquicondor commented Feb 26, 2021 •

edited