-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requirements for Affinity to graduate to Beta and then v1 #25319
Comments
/cc |
@rrati is starting on list above for 1.5 |
@rrati things to keep in mind re(alpha-beta) : #30819 (comment) |
@davidopp @wojtek-t Do you know the state of the items listed above? Some of the issues appear to have been either resolved or have had work done on them. "Before allowing users to use it, be sure we won't need to roll back to binary version that doesn't support it* - Does this have a defined work item? Is it just a confidence level? Better solution for "first pod problem" - Has anyone started discussing an alternate plan? |
@rrati :
|
Made a separate issue for the test improvement: #34253 |
Version moved per sig+community discussion re: alpha-beta transitioning. |
I would propose converting from annotations to api fields before moving from alpha->beta in a process described in #35518. Namely, convert from annotations to api fields, strip the annotations implementation out completely, port tests, etc. Once functionality is confirmed to be at the same level as annotations then promote from alpha->beta. |
We should add node and pod affinity to GeneralPredicates when we move them to Beta (this will make Kubelet check them in admission check). |
Also, @wojtek-t has suggested we should consider restricting hard anti-affinity to node-level only. The reasons are
|
we should consider restricting hard anti-affinity to node-level only. We should do it as long as we can (which means we need to make this before doing beta). |
/cc |
Automatic merge from submit-queue (batch tested with PRs 38730, 37299) [scheduling] Moved node affinity from annotations to api fields. #35518 Converted node affinity from annotations to api fields Fixes: #35518 Related: #25319 Related: #34508 **Release note**: ```release-note Node affinity has moved from annotations to api fields in the pod spec. Node affinity that is defined in the annotations will be ignored. ```
BTW one implication of using LabelSelector on namespace labels is that it will be possible to specify "all namespaces" (which we were special-casing when we were doing it as a list of namespaces), because empty label selector matches all objects. In light of this, I don't think it's worth adding "all namespaces" capability to the list-of-namespaces (#43525) as we are going to deprecate list-of-namespaces. |
@davidopp Do we have the final decision here? I saw some different voice in the sig-network thread about the multiple tenant problem. |
I attended the sig-network meeting on April 6 and they decided to go ahead with the label-selector-on-namespace approach. |
Before moving PodAntiAffinity to GA (aka v1) we NEED to fix it's performance. It's a non-trivial amount of work, which will probably result in rewriting most of the logic, thus when we do this it probably should bake for one release as a beta feature. @kubernetes/sig-scalability-misc @kubernetes/sig-scheduling-misc |
There are no plans to move to GA this cycle that I'm aware of. |
@gmarek does there are any issues related to the performance issues of PodAntiAffinity? I would like to do some investigation if needed. |
Yup, I know. I just wanted to leave the note for whoever will be doing it at any point, and for us that we don't forget. |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
/remove-lifecycle stale |
Automatic merge from submit-queue (batch tested with PRs 62467, 62482, 62211). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Improve performance of affinity/anti-affinity predicate by 20x in large clusters **What this PR does / why we need it**: Improves performance of affinity/anti-affinity predicate by over 20x in large clusters. Performance improvement is smaller in small clusters, but it is still very significant and is about 4x. Also, before this PR, performance of the predicate was dropping quadratically with increasing size of nodes and pods. As the results shows, the slow down is now linear in larger clusters. Affinity/anti-affinity predicate was checking all pods of the cluster for each node in the cluster to determine feasibility of affinit/anti-affinity terms of the pod being scheduled. This optimization first finds all the pods in a cluster that match the affinity/anti-affinity terms of the pod being scheduled once and stores the metadata. It then only checks the topology of the matching pods for each node in the cluster. This results in major reduction of the search space per node and improves performance significantly. Below results are obtained by running scheduler benchmarks: ``` make test-integration WHAT=./test/integration/scheduler_perf KUBE_TEST_ARGS="-run=xxx -bench=.*BenchmarkSchedulingAntiAffinity" ``` ``` AntiAffinity Topology: Hostname before: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 37031638 ns/op after: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 10373222 ns/op before: BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-12 134205302 ns/op after: BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-12 12000580 ns/op befor: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 498439953 ns/op after: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 24692552 ns/op AntiAffinity Topology: Region before: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 60003672 ns/op after: BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 13346400 ns/op before: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 600085491 ns/op after: BenchmarkSchedulingAntiAffinity/1000Nodes/10000Pods-12 27783333 ns/op ``` **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes # ref/ kubernetes#56032 kubernetes#47318 kubernetes#25319 **Release note**: ```release-note improve performance of affinity/anti-affinity predicate of default scheduler significantly. ``` /sig scheduling
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale /lifecycle frozen |
what time the affinity and anti-affinity release stable version? |
They are currently in Beta. We haven't planned on promoting them to GA because they had many performance/scalability issues. We have addressed some of those issues, but there are still some left to be addressed. No concrete plan yet, but I feel we are getting closer. Maybe we go to GA in 1.14 or 1.15. |
Selecting labels over all namespaces would be a really great feature ! So please keep on mind that it might be a good way to reintroduce cross-namespaced selectors feature in affinity rules :-) , definitely not useful for everyone but a really great tool in some cases ! |
@BarthV we heard you all and we will try to add the feature, but we need to be careful here as this feature can increase possibility of DoS attacks. Please see my comment for more info: #68827 (comment). |
#97410 need we cleanup those alpha annotations? |
We'd like to graduate node and pod affinity to Beta in 1.4.
Beta
scheduler.alpha.kubernetes.io/affinity
toscheduler.beta.kubernetes.io/affinity
v1
Better solution for DoS issues? (or just rely on priority/preemption when we have it?) (see #18265 (comment))
cc/ @kevin-wangzefeng @bgrant0607 @wojtek-t
ref/ #18261 #18265 #24853 #22985 #19758
The text was updated successfully, but these errors were encountered: