Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep backward compatibility for 'node.Spec.Unschedulable'. #68984

Merged
merged 1 commit into from
Sep 27, 2018

Conversation

k82cn
Copy link
Member

@k82cn k82cn commented Sep 23, 2018

Signed-off-by: Da K. Ma klaus1982.cn@gmail.com

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #68899

Release note:

Fixes issue [#68899](https://github.com/kubernetes/kubernetes/issues/68899) where pods might schedule on an unschedulable node.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 23, 2018
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 23, 2018
@k82cn
Copy link
Member Author

k82cn commented Sep 23, 2018

/kind bug
/priority critical-urgent
/sig scheduling
/milestone 1.12

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 23, 2018
@k8s-ci-robot
Copy link
Contributor

@k82cn: The provided milestone is not valid for this repository. Milestones in this repository: [next-candidate, v1.10, v1.11, v1.12, v1.13, v1.4, v1.5, v1.6, v1.7, v1.8, v1.9]

Use /milestone clear to clear the milestone.

In response to this:

/kind bug
/priority critical-urgent
/sig scheduling
/milestone 1.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Sep 23, 2018
@k82cn
Copy link
Member Author

k82cn commented Sep 23, 2018

/milestone v1.12

@k8s-ci-robot k8s-ci-robot added this to the v1.12 milestone Sep 23, 2018
@@ -1470,7 +1470,15 @@ func CheckNodeUnschedulablePredicate(pod *v1.Pod, meta algorithm.PredicateMetada
return false, []algorithm.PredicateFailureReason{ErrNodeUnknownCondition}, nil
}

if nodeInfo.Node().Spec.Unschedulable {
// If pod tolerate unschedulable taint, it's also tolerate node.Spec.Unschedulable.
unschedulable := !v1helper.TolerationsTolerateTaint(pod.Spec.Tolerations, &v1.Taint{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the var name makes this hard to follow, as does combining info about the node and info about the pod into a single variable below. would something like this be clearer?

podToleratesUnschedulable := v1helper.TolerationsTolerateTaint(...)`
if nodeInfo.Node().Spec.Unschedulable && !podToleratesUnschedulable {
  return false, []algorithm.PredicateFailureReason{ErrNodeUnschedulable}, nil
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Key: algorithm.TaintNodeUnschedulable,
Effect: v1.TaintEffectNoSchedule,
})
// TODO (k82cn): deprecates `node.Spec.Unschedulable` in 1.13.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unclear what is being deprecated here, or what action should be taken in 1.13

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update related doc to depreciate it.

Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 23, 2018
Copy link
Member

@tpepper tpepper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given all of the conversation in #68899 and the nature of the problem here, I think this needs a better commit message still and might even deserve a release note under "Known Issues" or SIGs Scheduling/Node to at least remind operators what's changed here and the taints and tolerations they need to be considering across updates in this version range.

@k82cn
Copy link
Member Author

k82cn commented Sep 23, 2018

I think this needs a better commit message still and might even deserve a release note

There's info in the description; no behaviour changed, so no release note here.

@tpepper
Copy link
Member

tpepper commented Sep 23, 2018

So you're saying there's definitely absolutely no way this issue is user visible with respect to cluster operator's existing configs or in their cluster users' pod specs? If so, that feels like an argument this is not release blocking and is just an artifact of a stale test case. Yet you're only touching pkg/scheduler in this PR and not the test case. This doesn't match up.

Plus we do have a bunch of "fixes issue where..." type release notes. It seems to me this would be totally appropriate: "Fixes issue #68899 where pod might schedule on an incorrectly tainted node. Future note: node.Spec.Unschedulable will be deprecated in 1.13 as per issue #XXXXXX."

Ie: you should open an issue now for milestone 1.13 for this deprecation.

@k82cn
Copy link
Member Author

k82cn commented Sep 23, 2018

Plus we do have a bunch of "fixes issue where..." type release notes.

Such kind of release node is ok to me.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Sep 23, 2018
@krmayankk
Copy link

will kubectl cordon be implemented using unschedulable taint instead of node.Spec.Unschedulable ?

@tpepper
Copy link
Member

tpepper commented Sep 24, 2018

Plus we do have a bunch of "fixes issue where..." type release notes.

Such kind of release node is ok to me.

OK then we need confirmation this follows deprecation policy which I'm not sure is actually true, the relnote wording specified below, a decision on whether to put this is the Node or Scheduling notes section, and the deprecation issue opened against 1.13.

Possible relnote:

Fixes issue [#68899](https://github.com/kubernetes/kubernetes/issues/68899) where pods might schedule on an incorrectly tainted node. Deprecation notice: The alpha feature "Critical Pods" has been replaced and `node.Spec.Unschedulable` will be removed in 1.13 as per issue #XXXXXX.

@liggitt
Copy link
Member

liggitt commented Sep 24, 2018

The api field will definitely not be removed from the v1 api. It's not entirely clear to me what behavior can be deprecated in v1. If a client stops setting it and just tries to set the taint, the controller will remove the taint automatically, right?

If kubectl wanted to begin simultaneously setting the unscheduleable field and taint in cordon (and removing them both in uncordon), that could maybe be ok, but would need to think about that a little more.

@tpepper
Copy link
Member

tpepper commented Sep 24, 2018

I really want to see this PR finished off today so we get final confirmation on long running tests over the coming days head of 1.12. This is an important compatibility issue to resolve thoroughly.

@tpepper
Copy link
Member

tpepper commented Sep 24, 2018

...and this is the only PR I'm holding the 1.12 release on at this point.

@neolit123
Copy link
Member

looks like anyone can LGTM this at this point.
(it is approved, has labels and milestone)

are further changes required here?

@neolit123
Copy link
Member

/hold
just in case.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 24, 2018
@tpepper
Copy link
Member

tpepper commented Sep 24, 2018

Chatting with @bsalamat it seems I've misread the intent here. Clarifying based on that:

  • release note suggestion: Fixes issue [#68899](https://github.com/kubernetes/kubernetes/issues/68899) where pods might schedule on an incorrectly tainted node.
  • Bobby will open an issue to track a future deprecation.
  • Deprecation announcement would be targeted for 1.13, with actual removal then being at least a year out given this is v1 stable content.

@bsalamat
Copy link
Member

Thanks, @tpepper. Looks good to me, however, we can deprecate Node.Spec.Unschedulable only when our replacement (TaintNodeByCondition) is GA'ed. If that happens in 1.13 (which I doubt), we can deprecate Node.Spec.Unschedulable in 1.13. If not, we must wait for TaintNodeByCondition to be promoted to GA and then deprecate Node.Spec.Unschedulable.

Copy link
Member

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 24, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, k82cn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@neolit123
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 24, 2018
@k82cn
Copy link
Member Author

k82cn commented Sep 25, 2018

will kubectl cordon be implemented using unschedulable taint instead of node.Spec.Unschedulable ?

Eventually, Yes; but we need to follow deprecation policy to do that.

If kubectl wanted to begin simultaneously setting the unscheduleable field and taint in cordon (and removing them both in uncordon),
that could maybe be ok, but would need to think about that a little more.

Agree, we need to think about it more; we can follow up this in #69010 .

for _, test := range testCases {
nodeInfo := schedulercache.NewNodeInfo()
nodeInfo.SetNode(test.node)
fit, _, err := CheckNodeUnschedulablePredicate(test.pod, nil, nodeInfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like the CheckNodeUnschedulablePredicate() has no callers at all, can you confirm about that?
if so, do we still need to update this func?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it by RegisterMandatoryFitPredicate in this PR.

@yujuhong
Copy link
Contributor

It's a bit unclear to me what "deprecation of the field" means here. We cannot remove the field from the api, so will it become a no-op a year from now? I don't think kubernete's deprecation policy explicitly states what to do in this case.

@k82cn
Copy link
Member Author

k82cn commented Sep 26, 2018

so will it become a no-op a year from now? I don't think kubernete's deprecation policy explicitly states what to do in this case.

According to deprecation policy, "API elements may only be removed by incrementing the version of the API group.". For now, we only need to highlight spec.Unschedulable will be removed in future; but no behaviour changes right now, or someone has better idea for that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failing Test : persistent-volume-upgrade [sig-storage]
9 participants