New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Critical pod priorityClass addition #58835

Merged
merged 2 commits into from Feb 23, 2018

Conversation

@ravisantoshgudimetla
Contributor

ravisantoshgudimetla commented Jan 25, 2018

What this PR does / why we need it:
@bsalamat - Apologies for the delay. This PR is to ensure that all pods with priorityClassName system-node-critical and system-cluster-critical will be critical pods while preserving backwards compatibility.

Special notes for your reviewer:

  • Moved some constants and other data structures to scheduler/api/types.go where other constants are present.
  • An automatic assignment of critical priorities to pods based on critical pod annotation for backwards compatibility including some unit tests.
    xref: #57471

Release note:

Critical pods to use priorityClasses.
@ravisantoshgudimetla

This comment has been minimized.

Contributor

ravisantoshgudimetla commented Jan 25, 2018

/cc @bsalamat

@k8s-ci-robot k8s-ci-robot requested a review from bsalamat Jan 25, 2018

@ravisantoshgudimetla ravisantoshgudimetla changed the title from Critical pod priorityClass addition to [WIP]Critical pod priorityClass addition Jan 25, 2018

@bsalamat

Thanks, @ravisantoshgudimetla! Looks very good. Just a couple of minor comments.

if ns != kubeapi.NamespaceSystem {
return false
}
if _, ok := schedulerapi.SystemPriorityClasses[priorityClassName]; ok {

This comment has been minimized.

@bsalamat

bsalamat Jan 25, 2018

Contributor

In the design of Priority, we decided that internal components of K8s shouldn't care about the PriorityClassName and should always look at the value of priority. So, please change the function to receive the integer value of priority (PodSpec.Priority) and change this condition to:

if podPriority >= schedulerapi.SystemCriticalPriority {

This comment has been minimized.

@ravisantoshgudimetla
@@ -91,9 +92,9 @@ func TestPriorityClassAdmission(t *testing.T) {
Kind: "PriorityClass",
},
ObjectMeta: metav1.ObjectMeta{
Name: "system-cluster-critical",
Name: "schedulerapi.SystemClusterCritical",

This comment has been minimized.

@bsalamat

bsalamat Jan 25, 2018

Contributor

Was this an unintended "replace all"?

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Jan 26, 2018

Contributor

Yes it is. Good catch. Thanks.

@bsalamat

This comment has been minimized.

Contributor

bsalamat commented Jan 26, 2018

RE e2e tests, since priority and preemption is an alpha feature, if you intend to add e2e test to CI, you will have to do something like this PR. And, here is how you define the feature in the tests.

@ravisantoshgudimetla

This comment has been minimized.

Contributor

ravisantoshgudimetla commented Jan 26, 2018

RE e2e tests, since priority and preemption is an alpha feature, if you intend to add e2e test to CI, you will have to do something like this PR. And, here is how you define the feature in the tests.

Thanks. My question is more on the lines of how to in enable CI. I already have e2e test case added to preemption.go which you can find in the commit now. PTAL

return IsCritical(pod.Namespace, pod.Annotations)
if IsCritical(pod.Namespace, pod.Annotations) {
return true
} else if pod.Spec.Priority != nil && v1helper.IsCriticalPodBasedOnPriority(pod.Namespace, *pod.Spec.Priority) {

This comment has been minimized.

@bsalamat

bsalamat Jan 27, 2018

Contributor

nit: You don't need else here. Also you can write the whole condition as:

return IsCritical(pod.Namespace, pod.Annotations) || (pod.Spec.Priority != nil && v1helper.IsCriticalPodBasedOnPriority(pod.Namespace, *pod.Spec.Priority))

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Jan 27, 2018

Contributor

Done. The comment about return statement seems fine but I think having a one liner may cause readability problems.

This comment has been minimized.

@bsalamat

bsalamat Jan 27, 2018

Contributor

I don't want to be too pedantic, but isn't it effectively the same one line now? ;-)

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Jan 27, 2018

Contributor

Yes it is :). I initially wanted to have one more nested if instead of && but thought it will be difficult to read as well 😃

This comment has been minimized.

@bsalamat

bsalamat Jan 27, 2018

Contributor

I see. 😃 I would still prefer the return to directly return the condition without the if statement, but it is up to you.

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Jan 27, 2018

Contributor

Oops. Sorry, I thought I pushed the one liner that you wanted earlier. Just updated it now.

// This test verifies that when a critical pod is created and no node with
// enough resources is found, scheduler preempts a lower priority pod to schedule
// this critical pod.
It("validates lower pod preemption by critical pod", func() {

This comment has been minimized.

@bsalamat

bsalamat Jan 27, 2018

Contributor

s/lower pod/lower priority pod/

This comment has been minimized.

@ravisantoshgudimetla
@ravisantoshgudimetla

This comment has been minimized.

Contributor

ravisantoshgudimetla commented Jan 27, 2018

@bsalamat Thanks for the review. I addressed your comments. PTAL.

@bsalamat

This comment has been minimized.

Contributor

bsalamat commented Jan 27, 2018

/lgtm

@ravisantoshgudimetla

This comment has been minimized.

Contributor

ravisantoshgudimetla commented Jan 27, 2018

@ravisantoshgudimetla ravisantoshgudimetla changed the title from [WIP]Critical pod priorityClass addition to Critical pod priorityClass addition Jan 28, 2018

@@ -143,11 +144,12 @@ func (sp SyncPodType) String() string {
// key. Both the rescheduler and the kubelet use this key to make admission
// and scheduling decisions.
func IsCriticalPod(pod *v1.Pod) bool {

This comment has been minimized.

@yujuhong

yujuhong Jan 29, 2018

Contributor

nit: update the comments

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Jan 29, 2018

Contributor

Ack. Done.

@bsalamat

This comment has been minimized.

Contributor

bsalamat commented Jan 29, 2018

@k8s-ci-robot k8s-ci-robot added the lgtm label Feb 12, 2018

@ravisantoshgudimetla

This comment has been minimized.

Contributor

ravisantoshgudimetla commented Feb 13, 2018

/retest

"k8s.io/kubernetes/pkg/apis/core/helper"
schedulerapi "k8s.io/kubernetes/pkg/scheduler/api"

This comment has been minimized.

@deads2k

deads2k Feb 16, 2018

Contributor

The direction of this dependency looks backwards. Seems odd for apis/core/v1 to depend on the scheduler.

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Feb 20, 2018

Contributor

Thanks for clarification @deads2k. I have moved the fn's around to remove this dependency.

@@ -436,3 +438,15 @@ func StorageNodeAffinityToAlphaAnnotation(annotations map[string]string, affinit
annotations[v1.AlphaStorageNodeAffinityAnnotation] = string(json)
return nil
}
// IsCriticalPodBasedOnPriority checks if the given pod is a critical pod based on priority resolved from pod Spec.
func IsCriticalPodBasedOnPriority(ns string, priority int32) bool {

This comment has been minimized.

@deads2k

deads2k Feb 16, 2018

Contributor

This is only called from one spot. Move it to the point of use.

This comment has been minimized.

@sttts

sttts Feb 19, 2018

Contributor

Agree. SystemCriticalPriority belongs here or the helper into the scheduler.

@@ -7,6 +7,7 @@
- k8s.io/kubernetes/pkg/fieldpath
- k8s.io/kubernetes/pkg/util
- k8s.io/api/core/v1
- k8s.io/kubernetes/pkg/scheduler/api

This comment has been minimized.

@deads2k

deads2k Feb 16, 2018

Contributor

This dependency looks bad.

@sttts Caught one!

@k8s-merge-robot k8s-merge-robot removed the lgtm label Feb 20, 2018

@ravisantoshgudimetla

This comment has been minimized.

Contributor

ravisantoshgudimetla commented Feb 20, 2018

@bsalamat I have updated the PR as per @deads2k's comments. Now the IsCriticalPodBasedOnPriority helper fn is in pod_update.go instead of v1helper. PTAL.

@sjenning

This comment has been minimized.

Contributor

sjenning commented Feb 20, 2018

reapplying
/lgtm
/assign derekwaynecarr

@@ -166,6 +155,11 @@ func (p *PriorityPlugin) admitPod(a admission.Attributes) error {
}
if utilfeature.DefaultFeatureGate.Enabled(features.PodPriority) {
var priority int32
// TODO: @ravig - This is for backwards compatibility to ensure that critical pods with annotations just work fine.
// Remove when no longer needed.
if len(pod.Spec.PriorityClassName) == 0 && kubelettypes.IsCritical(pod.Namespace, pod.Annotations) {

This comment has been minimized.

@deads2k

deads2k Feb 21, 2018

Contributor

You can't trust the pod.Namespace here. The user could have lied in the object he is creating. You need to check the attributes namespace.

This comment has been minimized.

@deads2k

deads2k Feb 21, 2018

Contributor

You can't trust the pod.Namespace here. The user could have lied in the object he is creating. You need to check the attributes namespace.

Since this is commonly confused: we cannot ensure the well-formedness of the object's namespace before mutation admission because a mutating admission plugin can change it. We can only confirm and/or force the namespace after mutation is completed. Another admission plugin could later "make valid" this field (though they shouldn't) and you'd end up having set this for that user.

@kubernetes/sig-api-machinery-bugs an edge we care about? I suppose it is rather unlikely....

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Feb 21, 2018

Contributor

Thanks for the awesome explanation @deads2k. I have changed the code to get namepsace using a.GetNamespace(). As a follow up, since the mutation is not specific to pod's namespace could other subresources of pod like Annotations, Spec, tolerations etc need to be addressed the same way using a.GetSubResource()

@ravisantoshgudimetla

This comment has been minimized.

Contributor

ravisantoshgudimetla commented Feb 21, 2018

/retest

flake #59426

@derekwaynecarr

I would like the annotation to not have meaning in clusters that did not previously enable the critical pod annotation feature.

SystemCriticalPriority = 2 * HighestUserDefinablePriority
// NOTE: In order to avoid conflict of names with user-defined priority classes, all the names must
// start with scheduling.SystemPriorityClassPrefix which is by default "system-".
SystemClusterCritical = "system-cluster-critical"

This comment has been minimized.

@derekwaynecarr

derekwaynecarr Feb 21, 2018

Member

Are these priorities defined in the API storage? If not, why not? IMHO this should work just like namespaces, where the default namespace and kube-system namespace are always in storage, and therefore there is no special magic namespaces. I am worried that this is blessing magic priorities.

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Feb 22, 2018

Contributor

I have created an issue to track this - #60178

func IsCriticalPod(pod *v1.Pod) bool {
return IsCritical(pod.Namespace, pod.Annotations)
return IsCritical(pod.Namespace, pod.Annotations) || (pod.Spec.Priority != nil && IsCriticalPodBasedOnPriority(pod.Namespace, *pod.Spec.Priority))

This comment has been minimized.

@derekwaynecarr

derekwaynecarr Feb 21, 2018

Member

I think the IsCriticalPod check should still gate respecting the annotation by enabling the ExperimentalCriticalPodAnnotation feature.

This check right now would make any pod or workload API pod template that had this annotation in a cluster that did not enable this feature prior suddenly elevate the priority of the workload. I think this is a mistake and would represent a surprise during upgrades (since prior clusters that did not enable this feature never respected the annotation).

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Feb 22, 2018

Contributor

IsCriticalPod fn predates my changes, I did a quick search in the codebase where this fn is being called from and in every place there is featuregate enabled for ExperimentalCriticalPodAnnotation except for https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/preemption/preemption.go#L233. I need to look further why the featuregating is not needed there. While the approach may not be elegant, it still does the job of gate respecting.

But the point you raised is valid from priority admission controller standpoint, where we are calling IsCritical fn. I have updated the PR to include a check for ExperimentalCriticalPodAnnotation, PTAL.

This comment has been minimized.

@dashpole

dashpole Feb 22, 2018

Contributor

The feature gate is also checked there, just at the entry point function for the package: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/preemption/preemption.go#L67. The call will always exit if the feature gate is disabled.

This comment has been minimized.

@derekwaynecarr

derekwaynecarr Feb 23, 2018

Member

thank you for the clarification. its a public method so i had thought centralizing the feature gate would have been preferred here as well.

@ravisantoshgudimetla

This comment has been minimized.

Contributor

ravisantoshgudimetla commented Feb 22, 2018

/retest

@derekwaynecarr

This comment has been minimized.

Member

derekwaynecarr commented Feb 23, 2018

/approve

@sjenning

This comment has been minimized.

Contributor

sjenning commented Feb 23, 2018

once more!
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Feb 23, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Feb 23, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, derekwaynecarr, ravisantoshgudimetla, sjenning, yujuhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Feb 23, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Feb 23, 2018

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit 755ab97 into kubernetes:master Feb 23, 2018

12 of 13 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-e2e-gce
Details
cla/linuxfoundation ravisantoshgudimetla authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-unit Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

@ravisantoshgudimetla ravisantoshgudimetla deleted the ravisantoshgudimetla:critical-pod-with-priority branch Feb 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment