Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeController should add NoSchedule taints and we should get rid of getNodeConditionPredicate() #42001

Closed
davidopp opened this Issue Feb 23, 2017 · 31 comments

Comments

Projects
None yet
@davidopp
Copy link
Member

davidopp commented Feb 23, 2017

getNodeConditionPredicate() in plugin/pkg/scheduler/factory/factory.go makes our code hard to understand because it hides the node-condition-based filtering in the node lister, which is totally non-obvious.

We should get rid of this function and have NodeController add NoSchedule taints for these situations instead. (Alas, we might not actually be able to get rid of getNodeConditionPredicate() completely until we get rid of the Unschedulable field of PodSpec, but at least we can get rid of all the other code in this function.) If there are pods that should still be able to schedule in any of these situations (e.g. DaemonSet pods?) we should add tolerations in admission control for them (e.g. see pkg/controller/daemon/daemoncontroller.go).

cc/ @kubernetes/sig-scheduling-misc @kubernetes/sig-cluster-lifecycle-misc
cc/ @gmarek @kevin-wangzefeng

Sub-tasks according to the design doc:

  • Add node taints label and feature flag (#49547)
  • In the node controller, taint Nodes according to the Node Condition (#49257)
  • In DaemonSet, update DaemonSetController to Tolerant new Taints (#50186)
  • In admissionController, add MemoryPressure/DiskPressure toleration for no BestEffort pod (#50180)
  • In scheduler, disable the current behaviour of filtering out Nodes. Instead, pods will not be scheduled to tainted nodes if a toleration does not exist in its PodSpec (#50185)
  • Add e2e test for this feature (#52723)
  • Update doc for this feature (kubernetes/website#5352)
@resouer

This comment has been minimized.

Copy link
Member

resouer commented Feb 25, 2017

until we get rid of the Unschedulable field of PodSpec

I think you are referring Unschedulable field of NodeSpec?

@resouer resouer self-assigned this Feb 25, 2017

@davidopp

This comment has been minimized.

Copy link
Member Author

davidopp commented Feb 25, 2017

I think you are referring Unschedulable field of NodeSpec?

Yes sorry, that was a typo.

@gmarek

This comment has been minimized.

Copy link
Member

gmarek commented Feb 27, 2017

@resouer - please CC me to all PRs

@bgrant0607

This comment has been minimized.

Copy link
Member

bgrant0607 commented Mar 31, 2017

See also #29178

@davidopp

This comment has been minimized.

Copy link
Member Author

davidopp commented Apr 9, 2017

Note that memory and disk pressure are checked in separate predicates, not in the NodeConditionPredicate. These should be represented as NoSchedule taints too.

func CheckNodeMemoryPressurePredicate(pod *v1.Pod, meta interface{}, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) {

@davidopp

This comment has been minimized.

Copy link
Member Author

davidopp commented Jun 22, 2017

Setting NoSchedule taints corresponding to the node conditions that today cause pods not to schedule onto a node, should also allow us to use tolerations rather than special-case code in the DaemonSet controller (and thereby make it easier to have DaemonSet pods be scheduled by the default scheduler.)

@davidopp

This comment has been minimized.

Copy link
Member Author

davidopp commented Jun 26, 2017

ref/ #42002

@k82cn

This comment has been minimized.

Copy link
Member

k82cn commented Jul 18, 2017

Sub-tasks according to the design doc, I'll create PR for them accordingly:

The task list was moved to the issue description here.

@wanghaoran1988

This comment has been minimized.

Copy link
Member

wanghaoran1988 commented Jul 28, 2017

/cc @mdshuai

@resouer resouer removed their assignment Aug 1, 2017

k8s-github-robot pushed a commit that referenced this issue Aug 2, 2017

Kubernetes Submit Queue
Merge pull request #49870 from k82cn/nc_rename_zone_tainer
Automatic merge from submit-queue (batch tested with PRs 49870, 49416, 49872, 49892, 49908)

Renamed zoneNotReadyOrUnreachableTainer to zoneNoExecuteTainer.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: partially fixes #42001 

**Release note**:

```release-note
None
```
@k82cn

This comment has been minimized.

Copy link
Member

k82cn commented Aug 2, 2017

/reopen

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Aug 2, 2017

@k82cn: you can't re-open an issue/PR unless you authored it or you are assigned to it.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k82cn

This comment has been minimized.

Copy link
Member

k82cn commented Aug 2, 2017

/assign

@k82cn

This comment has been minimized.

Copy link
Member

k82cn commented Aug 4, 2017

@yastij , sure, thanks very much :).

@yastij

This comment has been minimized.

Copy link
Member

yastij commented Aug 4, 2017

@k82cn @gmarek - just saw that you moved node conditions into predicate func, once adding node taiting and the according tolerations to dsc, do we'll still need to have node conditions checking ? (since the we're tainting by node condition).

Also, correct me if I'm wrong, but from what I understood: dsc tolerates new taints with effect noExec since we don't wanna evict any already-running pods ?

@gmarek

This comment has been minimized.

Copy link
Member

gmarek commented Aug 4, 2017

We won't, but it'll take time for us to get there. We'll roll out Tainting as alpha (obviously), later move it to beta and GA in 1.10 at the earliest. We can't remove Condition checking logic before that.

dsc is a daemon set right? New taints are NoSchedule only, so running Pods won't be affected (at least on the control plane level). Kubelet can still decide to evict them based on problem it observes.

@yastij

This comment has been minimized.

Copy link
Member

yastij commented Aug 4, 2017

@gmarek - seems logic for the timeline.

From @k82cn's sub-tasks

In DaemonSet, update DaemonSetController to Tolerant new Taints

Since new Taints are NoSchedule Only, Daemon set shouldn't have toleration for these ?

It should instead respect the new taints when applied and not schedule on these nodes, or I'm I missing something ? @lukaszo

@k82cn

This comment has been minimized.

Copy link
Member

k82cn commented Aug 4, 2017

The task for DS in my mind:

  1. For all DS pods, it should tolerant MemoryPressure & DiskPressure
  2. For critical DS pods, it should tolerant OutOfDisk

And if the taint by condition feature is enabled, we should check OutOfDisk by taints instead of predicates; the refactor is trying to make this simpler :).

@yastij

This comment has been minimized.

Copy link
Member

yastij commented Aug 4, 2017

@k82cn - You're right, I'll send a PR after yours get merged :)

@jamiehannaford

This comment has been minimized.

Copy link
Member

jamiehannaford commented Aug 4, 2017

@k82cn Do we have a to do list so folks can help out with kubernetes/community#819? I'd like to help when I get some time.

@gmarek I have a question about timeline. If alpha makes it into v1.8, will I be able to schedule to NotReady nodes (no CNI) if I reference the alpha tolerations? Or do you mean it won't be functional at all until v1.10?

@gmarek

This comment has been minimized.

Copy link
Member

gmarek commented Aug 4, 2017

It means you'll need to flip feature gate flag for it to work.

k8s-github-robot pushed a commit that referenced this issue Aug 4, 2017

Kubernetes Submit Queue
Merge pull request #49547 from k82cn/k8s_42001_0
Automatic merge from submit-queue (batch tested with PRs 50119, 48366, 47181, 41611, 49547)

Task 0: Added node taints labels and feature flags

**What this PR does / why we need it**:
Added node taint const for node condition.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001

**Release note**:
```release-note
None
```
@k82cn

This comment has been minimized.

Copy link
Member

k82cn commented Aug 4, 2017

@jamiehannaford , here is the task list: #42001 (comment)

I'm working on related PRs for them; please ping me when you got time :).

@derekwaynecarr

This comment has been minimized.

Copy link
Member

derekwaynecarr commented Aug 9, 2017

@k82cn - i anticipate we will also add a CPUPressure condition which will also mean that no BestEffort pod can be scheduled to that node if the kubelet is running with a static cpu assignment policy.

/cc @sjenning

@sjenning sjenning referenced this issue Aug 9, 2017

Closed

CPU manager phase 1 #49186

4 of 6 tasks complete

k8s-github-robot pushed a commit that referenced this issue Aug 11, 2017

Kubernetes Submit Queue
Merge pull request #50186 from k82cn/k8s_42001-4
Automatic merge from submit-queue

Task 2: Added toleration to DaemonSet pods for node condition taints

**What this PR does / why we need it**:
If TaintByCondition was enabled, added toleration to DaemonSet pods for node condition taints.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001 

**Release note**:
```release-note
None
```

k8s-github-robot pushed a commit that referenced this issue Aug 14, 2017

Kubernetes Submit Queue
Merge pull request #50180 from k82cn/k8s_42001-2
Automatic merge from submit-queue

Task 3: Add MemoryPressure toleration for no BestEffort pod.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001 

**Release note**:
```release-note
After 1.8, admission controller will add 'MemoryPressure' toleration to Guaranteed and Burstable pods.
```

hh pushed a commit to ii/kubernetes that referenced this issue Aug 30, 2017

Kubernetes Submit Queue
Merge pull request kubernetes#50185 from k82cn/k8s_42001-3
Automatic merge from submit-queue (batch tested with PRs 51228, 50185, 50940, 51544, 51543)

Task 4: Ignored node condition predicates if TaintsByCondition enabled.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of kubernetes#42001 

**Release note**:
```release-note
None
```

k8s-github-robot pushed a commit that referenced this issue Sep 1, 2017

Kubernetes Submit Queue
Merge pull request #49257 from k82cn/k8s_42001
Automatic merge from submit-queue (batch tested with PRs 51574, 51534, 49257, 44680, 48836)

Task 1: Tainted node by condition.

**What this PR does / why we need it**:
Tainted node by condition for MemoryPressure, OutOfDisk and so on.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001 

**Release note**:
```release-note
Tainted nodes by conditions as following:
  * 'node.kubernetes.io/network-unavailable=:NoSchedule' if NetworkUnavailable is true
  * 'node.kubernetes.io/disk-pressure=:NoSchedule' if DiskPressure is true
  * 'node.kubernetes.io/memory-pressure=:NoSchedule' if MemoryPressure is true
  * 'node.kubernetes.io/out-of-disk=:NoSchedule' if OutOfDisk is true
```

k8s-github-robot pushed a commit that referenced this issue Oct 3, 2017

Kubernetes Submit Queue
Merge pull request #52723 from k82cn/k8s_42001_5
Automatic merge from submit-queue (batch tested with PRs 52723, 53271). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Apply algorithm in scheduler by feature gates.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001

**Release note**:
```release-note
Apply algorithm in scheduler by feature gates.
```

k8s-github-robot pushed a commit that referenced this issue Oct 6, 2017

Kubernetes Submit Queue
Merge pull request #53184 from k82cn/k8s_42001_6
Automatic merge from submit-queue (batch tested with PRs 53278, 53184). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Added integration test for TaintNodeByCondition.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001 

**Release note**:

```release-note
Added integration test for TaintNodeByCondition.
```
@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 6, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@gmarek

This comment has been minimized.

Copy link
Member

gmarek commented Jan 9, 2018

I think it's done (as alpha). @davidopp - do we want to keep this issue open?

@bsalamat

This comment has been minimized.

Copy link
Contributor

bsalamat commented Jan 12, 2018

We can close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.