Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and optimize preferred (anti) pod affinity #85959

Merged
merged 1 commit into from Dec 7, 2019

Conversation

ahg-g
Copy link
Member

@ahg-g ahg-g commented Dec 5, 2019

What type of PR is this?

/kind feature

What this PR does / why we need it:
Optimize preferred (anti) pod affinity by removing redundant creation of Selectors during metadata calculation, and eliminates unnecessary and excessive nodeinfo map lookups. Gain is roughly 25% on 5k clusters.

before

ahg@ahg1:~/go/src/k8s.io/kubernetes$ tail /tmp/affinity-master 
I1205 13:56:33.337355  222356 etcd.go:81] etcd already running at http://127.0.0.1:2379
goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/test/integration/scheduler_perf
BenchmarkSchedulingPreferredPodAffinity/500Nodes/500Pods-12                 1000           2099450 ns/op
BenchmarkSchedulingPreferredPodAffinity/5000Nodes/15000Pods-12              1000          13419954 ns/op
PASS
ok      k8s.io/kubernetes/test/integration/scheduler_perf       204.199s
+++ [1205 13:59:57] Cleaning up etcd
+++ [1205 13:59:57] Integration test cleanup complete
ahg@ahg1:~/go/src/k8s.io/kubernetes$ 

after

ahg@ahg1:~/go/src/k8s.io/kubernetes$ tail /tmp/affinity-opt
I1205 14:04:00.566016  225809 etcd.go:81] etcd already running at http://127.0.0.1:2379
goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/test/integration/scheduler_perf
BenchmarkSchedulingPreferredPodAffinity/500Nodes/500Pods-12                 1000           1909800 ns/op
BenchmarkSchedulingPreferredPodAffinity/5000Nodes/15000Pods-12              1000          10676680 ns/op
PASS
ok      k8s.io/kubernetes/test/integration/scheduler_perf       182.283s
+++ [1205 14:07:02] Cleaning up etcd
+++ [1205 14:07:02] Integration test cleanup complete

Does this PR introduce a user-facing change?:

NONE

/cc @Huang-Wei

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 5, 2019
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 5, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 5, 2019
@@ -143,6 +166,75 @@ func CalculateInterPodAffinityPriorityReduce(pod *v1.Pod, meta interface{}, shar
return nil
}

func (p *podAffinityPriorityMap) processExistingPod(existingPod *v1.Pod, existingPodNodeInfo *schedulernodeinfo.NodeInfo) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just moving the processPod to a separate function instead of embedding it. This help showing it better in profiling.

@ahg-g
Copy link
Member Author

ahg-g commented Dec 5, 2019

/cc @alculquicondor

}

func (p *podAffinityPriorityMap) processTerm(term *v1.PodAffinityTerm, podDefiningAffinityTerm, podToCheck *v1.Pod, fixedNode *v1.Node, weight int64) error {
namespaces := priorityutil.GetNamespacesFromPodAffinityTerm(podDefiningAffinityTerm, term)
func getProcessedWeightedAffinityTerm(pod *v1.Pod, term *v1.PodAffinityTerm, weight int32) (*processedWeightedAffinityTerm, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newProcessedWeightedAffinityTerm. And you could accept v1.WeightedPodAffinityTerm as a parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodes: nodes,
topologyScore: make(topologyPairToScore),
}
type processedWeightedAffinityTerm struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just name it weightedAffinityTerm, to make it clear that it's our internal representation of the object v1.WeightedAffinityTerm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

pod *v1.Pod
affinityTerms []*processedWeightedAffinityTerm
antiAffinityTerms []*processedWeightedAffinityTerm
sharedLister schedulerlisters.SharedLister
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this being used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

if err := p.processTerm(&term.PodAffinityTerm, podDefiningAffinityTerm, podToCheck, fixedNode, int64(term.Weight*int32(multiplier))); err != nil {
func (p *podAffinityPriorityMap) processTerms(terms []*processedWeightedAffinityTerm, podDefiningAffinityTerm, podToCheck *v1.Pod, fixedNode *v1.Node, multiplier int) error {
for _, term := range terms {
if err := p.processTerm(term, podDefiningAffinityTerm, podToCheck, fixedNode, int64(term.weight*int32(multiplier))); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing the multiplier seems more readable to me.

Can we rename to updateScoreWithTerms or something like that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing the multiplier seems more readable to me.

done.

Can we rename to updateScoreWithTerms or something like that?

then we need to come up with a name for "processTerms" as well, let keep like this.

return processedTerms, nil
}

func (p *podAffinityPriorityMap) processTerm(term *processedWeightedAffinityTerm, podDefiningAffinityTerm, podToCheck *v1.Pod, fixedNode *v1.Node, weight int64) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

podDefiningAffinityTerm is unused :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, that is because of the refactoring.

return nil
pm := podAffinityPriorityMap{
topologyScore: make(topologyPairToScore),
pod: pod,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth storing this guy? I feel like it should just be passed as parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are computing this map from the perspective of this pod, so I think it is reasonable to have it. In fact passing it as a parameter means the functions of a specific podAffinityPriorityMap instance may accept different pods, which is not true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so now we only have a single function relying on it, I removed it :)

@ahg-g ahg-g force-pushed the ahg-affinity-opt branch 2 times, most recently from 9685623 to 36a0743 Compare December 6, 2019 16:10
@alculquicondor
Copy link
Member

/lgtm

/hold

for rebase

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Dec 6, 2019
hardPodAffinityWeight int32
sync.Mutex
}

type processedWeightedAffinityTerm struct {
type weightedAffinityTerm struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth adding a comment indicating that this the representation of v1.WeightedPodAffinityTerm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -292,14 +288,15 @@ func buildTopologyPairToScore(
processNode := func(i int) {
nodeInfo := allNodes[i]
if nodeInfo.Node() != nil {
// Unless the pod itself has affinity terms, We only need to process nodes hosting pods with affinity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you are filtering pods, not nodes.

Unless the pod itself has affinity terms, we only need to process pods with affinity in the node

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, fixed.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 6, 2019
Copy link
Member Author

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

/hold

for rebase

squashed.

@alculquicondor
Copy link
Member

/hold cancel

/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Dec 6, 2019
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Dec 7, 2019

@ahg-g: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-conformance-kind-ipv6 7f2307b4e997650568fd49d3841c2ad59640bec9 link /test pull-kubernetes-conformance-kind-ipv6

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ahg-g
Copy link
Member Author

ahg-g commented Dec 7, 2019

/retest

@k8s-ci-robot k8s-ci-robot merged commit 77a95dc into kubernetes:master Dec 7, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Dec 7, 2019
@ahg-g ahg-g deleted the ahg-affinity-opt branch January 10, 2020 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants