Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support NoSchedule taints correctly in DaemonSet controller #48189

Merged
merged 3 commits into from
Jul 3, 2017

Conversation

mikedanese
Copy link
Member

@mikedanese mikedanese commented Jun 28, 2017

Fixes #48190

Support NoSchedule taints correctly in DaemonSet controller.

cc @kubernetes/sig-apps-pr-reviews

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 28, 2017
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jun 28, 2017
@kubernetes kubernetes deleted a comment from k8s-github-robot Jun 28, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mikedanese
We suggest the following additional approver: timothysc

Assign the PR to them by writing /assign @timothysc in a comment when ready.

Associated issue: 48190

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@luxas luxas added this to the v1.7 milestone Jun 28, 2017
@luxas
Copy link
Member

luxas commented Jun 28, 2017

cc @kubernetes/sig-apps-pr-reviews @kubernetes/sig-scheduling-pr-reviews @kubernetes/kubernetes-release-managers

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Jun 28, 2017
@gmarek
Copy link
Contributor

gmarek commented Jun 28, 2017

This is a regression in 1.6. We need to cherry pick this there @enisoc @ethernetdan
I believe this means that we also need it in 1.7 @dchen1107. Sadly this surfaced only recently when users started to complain about fluentd being evicted on tainted Nodes. The bad thing is - it's a non-trivial change, so it has non-zero risk...

// only emit this event if insufficient resource is the only thing
// preventing the daemon pod from scheduling
if shouldSchedule && insufficientResourceErr != nil {
dsc.eventRecorder.Eventf(ds, v1.EventTypeWarning, FailedPlacementReason, "failed to place pod on %q: %s", node.ObjectMeta.Name, insufficientResourceErr.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often do we retry in case of this Error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be much clearer to determine that if the following return contained all the actual variables that it returns. It seems that err is nil at this point so unless the controller needs to retry for a different error down the rest of the code path, this won't be retried.

Copy link
Member Author

@mikedanese mikedanese Jun 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to not requeue on insufficientResourceErr as long as we are requeueing when something changes that would cause resources to be freed up. Regardless of whether something is doing that right now, this is not a change in behavior and not the bug I am trying to fix.

}
if !fit {
predicateFails = append(predicateFails, reasons...)
}

fit, _, err = predicates.PodToleratesNodeNoExecuteTaints(pod, nil, nodeInfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question here, can you please add some comments why check taint twice? At both line 1190 and line 1198?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, here is mainly to check if want to evict the pod or not, but this seems a bit strange: In above PodToleratesNodeTaints, it was already checked both taints, and here we need to check the NoExecute taint separately.

I think that @janetkuo 's proposal is better and would involve less code change.

@@ -1190,18 +1183,24 @@ func NewPod(ds *extensions.DaemonSet, nodeName string) *v1.Pod {

// Predicates checks if a DaemonSet's pod can be scheduled on a node using GeneralPredicates
// and PodToleratesNodeTaints predicate
func Predicates(pod *v1.Pod, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) {
func Predicates(pod *v1.Pod, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, bool, error) {
var predicateFails []algorithm.PredicateFailureReason
critical := utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation) && kubelettypes.IsCriticalPod(pod)

fit, reasons, err := predicates.PodToleratesNodeTaints(pod, nil, nodeInfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should consider Pods' status in PodToleratesNodeTaints: if pod is running, it tolerant node's unschedule effect (event spec.Toleration not). Or create another func for that. And I think we need to check other codes for this case.

@gmarek , WDYT ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please - bear with me for few more days. I promise that I'll be more responsive after 1.7 is released...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, np :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k82cn I think that is a reasonable change for PodToleratesNodeTaints to take into account whether .spec.nodeName is set on the pod (which is what I consider to be the definition of scheduled although others may disagree) but that change wouldn't help us here. The pod here always has a nodeName set when we call Predicates and never has a status. This could be fixed to work but it would be a larger change then what I would want to cherrypick.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: It'd be too big change to cherry-pick for 1.7.

OK, we spoke about that and it all depends on the definition of what 'scheduled' means. Here the Pod that's being checked is completely artificial, created specially for this simulation, with NodeName already set. One can argue that semantics of 'NoSchedule' should prevent from setting NodeName, i.e. it should be ignored if NodeName is set. But I'd propose another meaning of word 'scheduled', i.e. 'Node agent already knows about it'. Sadly this would mean that the only thing that one can be certain, is that if Pod doesn't have NodeName set it's not scheduled. Otherwise it can't be said looking at the Pod object alone. Which means that 'PodTolerateTaints' need to just return what Taints are possibly violated and depend on the caller to figure out what to do with it. Exactly as we're doing in this PR. Post 1.7 we can discuss this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikedanese, I means pod.Status.Phase is v1.PodRunning. As gmarek@ suggested, let's discuss this after 1.7.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the draft diff (not test yet) in my mind: https://gist.github.com/k82cn/3dbd398ef2b138fee6a4c19ec3d2ddcf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't be anything, because this Pod is an artificial one, created for simulation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, yes. but we already get the pod of that node in nodeShouldRunDaemonPod, we can keep its status.phase, update a bit of gist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this would mean that everyone who'd do such simulation needs to remember/know about it. Which is why I think it's not worth it, as I believe that it'd introduce more bugs that it would solve.

@mikedanese mikedanese changed the title support NoExecute and NoSchedule taints correctly in DaemonSet controller support NoSchedule taints correctly in DaemonSet controller Jun 28, 2017
Copy link
Member

@janetkuo janetkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this without reverting the change (and keep all the test)?

  1. Leave Predicates as is
  2. Before calling hasIntentionalPredicatesReasons, check PodToleratesNodeNoExecuteTaints: if it doesn't fit, return false, false, false immediately
  3. Move predicates.ErrTaintsTolerationsNotMatch case out of hasIntentionalPredicatesReasons and down to the for range reasons loop, and this case should only set wantToRun and shouldSchedule to false (since we're sure it's not NoExecute).

…e intentional reasons."

This partially reverts commit 2b311fe.
We drop the changes to the DaemonSet controller but leave the test. By
reverting the changes, we make it easier to return different values of
shouldContinueRunning for intentional predicate failures, rather then
lumping all intentional predicate failures together. The test should
continue to pass after the fix.
@mikedanese
Copy link
Member Author

mikedanese commented Jun 29, 2017

I reverted changes to Predicate but I don't like breaking up the switch. The function has a very clear contracted (documented in a comment at the beginning which I expanded on). Since hasIntentionalPredicatesReasons not used anywhere else, what is the point of it? By extracting it we are hiding the contract and reducing readability which will lead to more bugs.

@k82cn
Copy link
Member

k82cn commented Jun 29, 2017

/test pull-kubernetes-kubemark-e2e-gce

@gyliu513
Copy link
Contributor

Since hasIntentionalPredicatesReasons not used anywhere else, what is the point of it? By extracting it we are hiding the contract and reducing readability which will lead to more bugs.

I recalled the major concern is the function nodeShouldRunDaemonPod is a large block, we want to make this function smaller.

@mikedanese
Copy link
Member Author

I don't think breaking up the switch is the best way of doing that. It makes the logic very fragmented IMO. We can't treat all intentional predicates the same so hasIntentionalPredicatesReasons doesn't make sense. Moving the simulation out of the function would be better. It's a larger portion of the function and it's more self contained.

@gyliu513
Copy link
Contributor

Moving the simulation out of the function would be better.

@mikedanese what do you mean by simulation? Can you elaborate? Perhaps I can follow up a PR for it.

@mikedanese
Copy link
Member Author

I'm referring to the section where we simulate what would happen if we were to add a new daemon pod to the node:

https://github.com/kubernetes/kubernetes/blob/v1.7.0/pkg/controller/daemon/daemoncontroller.go#L1089-L1136

That looks like it would cleanly extract to a function with signature:

func (dsc *DaemonSetsController) simulate(pod, node) (reasons, error)

Copy link
Member

@janetkuo janetkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally suggested doing this without reverting 2b311fe so that this change is small and the 1.7 cherrypick would be simpler. We can do code clean up as a follow up in master. However, reverting that commit makes cherrypicking to 1.6 easier so I'm okay with it now.

@janetkuo janetkuo dismissed their stale review June 30, 2017 18:24

ok to revert

@gyliu513
Copy link
Contributor

gyliu513 commented Jul 1, 2017

@mikedanese got it, will try to create a PR for this later.

@gyliu513
Copy link
Contributor

gyliu513 commented Jul 1, 2017

@mikedanese a PR #48189 for the comments above.

@mikedanese mikedanese added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 3, 2017
@mikedanese
Copy link
Member Author

/retest

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit eb2a560 into kubernetes:master Jul 3, 2017
gyliu513 added a commit to gyliu513/kubernetes that referenced this pull request Jul 5, 2017
k8s-github-robot pushed a commit that referenced this pull request Jul 5, 2017
Automatic merge from submit-queue

Factored out simulate from nodeShouldRunDaemonPod.

Addressed comments from #48189 (comment)



**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
none
```

/sig apps
@caesarxuchao caesarxuchao added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jul 6, 2017
k8s-github-robot pushed a commit that referenced this pull request Jul 6, 2017
…8189-upstream-release-1.7

Automatic merge from submit-queue

Automated cherry pick of #48189 upstream release 1.7

Cherry pick of #48189 on release-1.7.

Needed for #48190

```release-note
Support NoSchedule taints correctly in DaemonSet controller.
```
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet