Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requirements for Affinity to graduate to Beta and then v1 #25319

Open
5 of 9 tasks
davidopp opened this issue May 8, 2016 · 45 comments
Open
5 of 9 tasks

Requirements for Affinity to graduate to Beta and then v1 #25319

davidopp opened this issue May 8, 2016 · 45 comments
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@davidopp
Copy link
Member

davidopp commented May 8, 2016

We'd like to graduate node and pod affinity to Beta in 1.4.

Beta

v1

  • Move Affinity to be a field in PodSpec
  • Implement RequiredDuringExecution
  • Incorporate feedback from users of Beta

Better solution for DoS issues? (or just rely on priority/preemption when we have it?) (see #18265 (comment))

cc/ @kevin-wangzefeng @bgrant0607 @wojtek-t

ref/ #18261 #18265 #24853 #22985 #19758

@davidopp davidopp added this to the next-candidate milestone May 8, 2016
@adohe-zz
Copy link

adohe-zz commented May 8, 2016

/cc

@timothysc
Copy link
Member

timothysc commented Sep 12, 2016

/cc @jayunit100 @ConnorDoyle

@timothysc
Copy link
Member

@rrati is starting on list above for 1.5

@timothysc
Copy link
Member

timothysc commented Oct 3, 2016

@rrati things to keep in mind re(alpha-beta) : #30819 (comment)

@rrati
Copy link

rrati commented Oct 4, 2016

@davidopp @wojtek-t Do you know the state of the items listed above? Some of the issues appear to have been either resolved or have had work done on them.

"Before allowing users to use it, be sure we won't need to roll back to binary version that doesn't support it* - Does this have a defined work item? Is it just a confidence level?

Better solution for "first pod problem" - Has anyone started discussing an alternate plan?

@davidopp
Copy link
Member Author

davidopp commented Oct 5, 2016

@rrati :

  • Change annotation key from scheduler.alpha.kubernetes.io/affinity to scheduler.beta.kubernetes.io/affinity: There's ongoing discussion about whether to use fields or annotations in beta, in general (if you want me to look up the issue number and kubernetes-dev thread I can, but you probably know what I'm talking about)
  • Better solution for "first pod problem": I don't think we really need to do this for Beta, because it is not an API change. If you're wondering what it's referring to, see the section that starts "This last example illustrates" in the design doc. Today we have implemented the "short term" approach listed there.
  • Last bullet point here: This is a test improvement. Also not strictly needed for Beta. (Though I suspect we will never do it if we don't do it now.)
  • Before allowing users to use it, be sure we won't need to roll back to binary version that doesn't support it: The ongoing discussion related to the first bullet point has also touched on this. Basically it means make sure we do the right thing if someone sets alpha pod affinity in 1.4 and they upgrade to 1.5, sets beta pod affinity in 1.5 and downgrade to 1.4, sets pod affinity in 1.4 and upgrade to 1.5 and then downgrade to 1.4, etc. We have some flexibility in defining "right thing" in the alpha case since we make no guarantees about alpha features, though in the aforementioned discussion Tim was saying we should support the alpha version in a couple of version before we get rid of it, which would mean we need to support both alpha and beta in 1.5.
  • Re-enable/fix all tests (see e.g. Disable PodAffinity SchedulerPredicates test #26697, PodAffinity SchedulerPredicates are not well isolated #26698): Looks like there's nothing to do here, as I don't see any e2e's still marked Feature:PodAffinity (but can't hurt for you to double-check).

@ConnorDoyle
Copy link
Contributor

Made a separate issue for the test improvement: #34253

@timothysc timothysc modified the milestones: next-candidate, v1.5 Oct 13, 2016
@timothysc
Copy link
Member

Version moved per sig+community discussion re: alpha-beta transitioning.

@rrati
Copy link

rrati commented Oct 26, 2016

I would propose converting from annotations to api fields before moving from alpha->beta in a process described in #35518. Namely, convert from annotations to api fields, strip the annotations implementation out completely, port tests, etc. Once functionality is confirmed to be at the same level as annotations then promote from alpha->beta.

@davidopp
Copy link
Member Author

We should add node and pod affinity to GeneralPredicates when we move them to Beta (this will make Kubelet check them in admission check).

@davidopp
Copy link
Member Author

Also, @wojtek-t has suggested we should consider restricting hard anti-affinity to node-level only. The reasons are

  • scalability/performance
  • for Kubelet admission to check any topology other than node, it would need full cluster state
  • it's not clear there are any use cases for hard anti-affinity at topology other than node-level (soft anti-affinity is different; e.g. you might want to spread across zones)

@wojtek-t
Copy link
Member

we should consider restricting hard anti-affinity to node-level only.

We should do it as long as we can (which means we need to make this before doing beta).

@sdminonne
Copy link
Contributor

/cc

k8s-github-robot pushed a commit that referenced this issue Dec 16, 2016
Automatic merge from submit-queue (batch tested with PRs 38730, 37299)

[scheduling] Moved node affinity from annotations to api fields. #35518

Converted node affinity from annotations to api fields

Fixes: #35518 
Related: #25319
Related: #34508

**Release note**:
```release-note
Node affinity has moved from annotations to api fields in the pod spec.  Node affinity that is defined in the annotations will be ignored.
```
@davidopp
Copy link
Member Author

BTW one implication of using LabelSelector on namespace labels is that it will be possible to specify "all namespaces" (which we were special-casing when we were doing it as a list of namespaces), because empty label selector matches all objects.

In light of this, I don't think it's worth adding "all namespaces" capability to the list-of-namespaces (#43525) as we are going to deprecate list-of-namespaces.

@wanghaoran1988
Copy link
Contributor

@davidopp Do we have the final decision here? I saw some different voice in the sig-network thread about the multiple tenant problem.

@davidopp
Copy link
Member Author

I attended the sig-network meeting on April 6 and they decided to go ahead with the label-selector-on-namespace approach.

@gmarek
Copy link
Contributor

gmarek commented May 11, 2017

Before moving PodAntiAffinity to GA (aka v1) we NEED to fix it's performance. It's a non-trivial amount of work, which will probably result in rewriting most of the logic, thus when we do this it probably should bake for one release as a beta feature. @kubernetes/sig-scalability-misc @kubernetes/sig-scheduling-misc

@k8s-ci-robot k8s-ci-robot added the sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. label May 11, 2017
@timothysc
Copy link
Member

There are no plans to move to GA this cycle that I'm aware of.

@gyliu513
Copy link
Contributor

@gmarek does there are any issues related to the performance issues of PodAntiAffinity? I would like to do some investigation if needed.

@gmarek
Copy link
Contributor

gmarek commented May 12, 2017

Yup, I know. I just wanted to leave the note for whoever will be doing it at any point, and for us that we don't forget.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 24, 2017
@kevin-wangzefeng
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 3, 2018
dims pushed a commit to dims/kubernetes that referenced this issue Apr 13, 2018
Automatic merge from submit-queue (batch tested with PRs 62467, 62482, 62211). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Improve performance of affinity/anti-affinity predicate by 20x in large clusters

**What this PR does / why we need it**:
Improves performance of affinity/anti-affinity predicate by over 20x in large clusters. Performance improvement is smaller in small clusters, but it is still very significant and is about 4x. Also, before this PR, performance of the predicate was dropping quadratically with increasing size of nodes and pods. As the results shows, the slow down is now linear in larger clusters.

Affinity/anti-affinity predicate was checking all pods of the cluster for each node in the cluster to determine feasibility of affinit/anti-affinity terms of the pod being scheduled. This optimization first finds all the pods in a cluster that match the affinity/anti-affinity terms of the pod being scheduled once and stores the metadata. It then only checks the topology of the matching pods for each node in the cluster. 
 This results in major reduction of the search space per node and improves performance significantly. 

Below results are obtained by running scheduler benchmarks:
```
make test-integration WHAT=./test/integration/scheduler_perf KUBE_TEST_ARGS="-run=xxx -bench=.*BenchmarkSchedulingAntiAffinity"
```
```
AntiAffinity Topology: Hostname
before: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12         	     	  37031638 ns/op
after:  BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12         	     	  10373222 ns/op

before: BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-12        	     	 134205302 ns/op
after:  BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-12        	     	  12000580 ns/op

befor: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12         	     	 498439953 ns/op
after: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12         	     	  24692552 ns/op


AntiAffinity Topology: Region
before: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12         	     	  60003672 ns/op
after:  BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12         	     	  13346400 ns/op

before: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12         	     	 600085491 ns/op
after: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12         	     	  27783333 ns/op
```

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

ref/ kubernetes#56032 kubernetes#47318 kubernetes#25319

**Release note**:

```release-note
improve performance of affinity/anti-affinity predicate of default scheduler significantly.
```

/sig scheduling
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2018
@bsalamat
Copy link
Member

/remove-lifecycle stale

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 14, 2018
@geekidentity
Copy link

what time the affinity and anti-affinity release stable version?

@bsalamat
Copy link
Member

bsalamat commented Nov 7, 2018

They are currently in Beta. We haven't planned on promoting them to GA because they had many performance/scalability issues. We have addressed some of those issues, but there are still some left to be addressed. No concrete plan yet, but I feel we are getting closer. Maybe we go to GA in 1.14 or 1.15.

@BarthV
Copy link

BarthV commented Feb 20, 2019

BTW one implication of using LabelSelector on namespace labels is that it will be possible to specify "all namespaces" (which we were special-casing when we were doing it as a list of namespaces), because empty label selector matches all objects.

In light of this, I don't think it's worth adding "all namespaces" capability to the list-of-namespaces (#43525) as we are going to deprecate list-of-namespaces.

Selecting labels over all namespaces would be a really great feature !
As an example, we wanted to avoid some kind of statefulsets running side to side on the same node, but since they are running in their own namespace we're not able to use the podAntiAffinity mecanisms ... So we had to trick this and use state / capacity mecanism which is much less flexible than selectors.

So please keep on mind that it might be a good way to reintroduce cross-namespaced selectors feature in affinity rules :-) , definitely not useful for everyone but a really great tool in some cases !

@bsalamat
Copy link
Member

bsalamat commented Mar 1, 2019

@BarthV we heard you all and we will try to add the feature, but we need to be careful here as this feature can increase possibility of DoS attacks. Please see my comment for more info: #68827 (comment).

@pacoxu
Copy link
Member

pacoxu commented Jan 25, 2021

#97410 need we cleanup those alpha annotations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
Status: Needs Triage
Development

No branches or pull requests