Optimize required pod affinity #86046

ahg-g · 2019-12-09T01:26:16Z

What type of PR is this?

/kind feature

What this PR does / why we need it:
This is the second PR in optimizing required pod affinity. This PR converts the data structure used to calculate pod affinity from a map of topology-to-list-of-pods to a map of topology-to-pod-counts. This significantly reduces the overhead of creating the map without compromising the predicate performance.

Basically, for each topology, instead of tracking the exact set of existing pods, we just track their count.

This PR builds on #86030. It offers up to 2.3x improvement over #86030. The two PRs combined offer up to 3.7x improvement.

Before

BenchmarkSchedulingPodAntiAffinity/5000Nodes/1000Pods-12           1000  19391988  ns/op
BenchmarkSchedulingPodAffinity/5000Nodes/5000Pods-12               1000  29507122  ns/op

After #86030

BenchmarkSchedulingPodAntiAffinity/5000Nodes/1000Pods-12           1000  12974929  ns/op
BenchmarkSchedulingPodAffinity/5000Nodes/5000Pods-12               1000  18056693  ns/op

After this PR

BenchmarkSchedulingPodAntiAffinity/5000Nodes/1000Pods-12           1000  8942505   ns/op
BenchmarkSchedulingPodAffinity/5000Nodes/5000Pods-12               1000  7800239   ns/op

Does this PR introduce a user-facing change?:

NONE

k8s-ci-robot · 2019-12-09T01:27:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [ahg-g]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ahg-g · 2019-12-09T14:11:25Z

/cc @Huang-Wei @alculquicondor

alculquicondor · 2019-12-09T16:29:13Z

LOL, I just commented that we should do this in the previous PR. Is it worth having 2 PRs?

ahg-g · 2019-12-09T16:40:53Z

LOL, I just commented that we should do this in the previous PR. Is it worth having 2 PRs?

It is up to you (the reviewers). Do you think this one alone is manageable?

ahg-g · 2019-12-09T17:54:33Z

LOL, I just commented that we should do this in the previous PR. Is it worth having 2 PRs?

It is up to you (the reviewers). Do you think this one alone is manageable?

@Huang-Wei do you want to just look at this PR and close the other one?

Huang-Wei

Thanks @ahg-g ! LGTM generally, just some nits.

pkg/scheduler/algorithm/predicates/metadata.go

Huang-Wei · 2019-12-09T18:19:31Z

pkg/scheduler/algorithm/predicates/metadata.go

-	topologyPairToPods map[topologyPair]podSet
-	podToTopologyPairs map[string]topologyPairSet
-}
+type topologyToMatchedTermCount map[topologyPair]int64


Can we use map[topologyPair]*int64 so that in the initialization of the metadata, we can concurrently manipulate the value without locking (appendResult()).

(can be a followup PR maybe)

Nvm, if possible, I believe it can be a followup along with comment:

kubernetes/pkg/scheduler/algorithm/predicates/metadata.go

Lines 443 to 444 in 55f8131

// TODO(Huang-Wei): It might be possible to use "make(map[topologyPair]*int32)".

// In that case, need to consider how to init each tpPairToCount[pair] in an atomic fashion.

Yeah, I thought about that, I did a mutex contention profile, and this can potentially bring ~10% improvement. We can do that in a followup PR.

+1 for isolating in a separate PR. There might be other forms of locking that we could consider too. Or using a channel.

to initialize, you can do something like this:

if topologyToMatchedTermCount[pair] == nil { mutex.Lock() // we have to check again since by the time we get the lock, another thread might have already initialized the entry. if topologyToMatchedTermCount[pair] == nil { topologyToMatchedTermCount[pair] = new(int64) } mutex.Unlock () }

I'm not so sure that's thread safe. But let's leave the discussion for another PR :)

I guess we will need to use an RLock:

mutex.RLock ptr := topologyToMatchedTermCount[pair] mutex.RUnlock if ptr == nil { mutex.Lock() // we have to check again since by the time we get the lock, another thread might have already initialized the entry. if topologyToMatchedTermCount[pair] == nil { topologyToMatchedTermCount[pair] = new(int64) } ptr = topologyToMatchedTermCount[pair] mutex.Unlock () } atomicAdd(ptr, value)

pkg/scheduler/algorithm/predicates/metadata.go

pkg/scheduler/algorithm/predicates/predicates.go

alculquicondor

Is there another PR coming to cache the affinity terms of the incoming pod?

alculquicondor · 2019-12-09T18:44:27Z

pkg/scheduler/algorithm/predicates/metadata.go

+				m[pair] += value
+				// value could be a negative value, hence we delete the entry if
+				// the entry is down to zero.
+				if m[pair] == 0 {


is it actually worth deleting? we might end up re-adding it with the preemption algorithm.

Same here: checking if m[pair] == 0 works for both a non-existing entry or existing entry but with value 0.

if we want to manually delete the entry, we'd better keep the logic consistent to use if _, ok := m[pair]; ok on the logic of checking its existence

if we don't delete the entry, probably it increases some memory footprint, but we save time on checking and deletion. in this case, checking if m[pair] == 0 should be used to check its existence.

since the previous logic relied on the existence of the entry rather than whether or not it is zero, I opted to do this to avoid potential bugs. Doing this also made it easy to pass the tests that does DeepEqual test. We can clean this up in a followup PR that converts the type to *int64 so that we can do atomic adds instead of using a global mutex.

pkg/scheduler/algorithm/predicates/metadata.go

Huang-Wei · 2019-12-09T19:23:01Z

@Huang-Wei do you want to just look at this PR and close the other one?

yes, the volume of code changes are manageable in one PR, but feel free to keep 2 commits in this PR.

ahg-g · 2019-12-09T19:35:48Z

Is there another PR coming to cache the affinity terms of the incoming pod?

I guess you are referring to the preemption logic? yes, we can cache that in the metadata, but I thought we just wait for PodInfo.

alculquicondor

/lgtm

/hold

for squash

pkg/scheduler/algorithm/predicates/metadata.go

ahg-g · 2019-12-09T20:50:17Z

/lgtm

/hold

for squash

Thanks, @Huang-Wei please let me know if I can squash.

Huang-Wei · 2019-12-09T21:18:30Z

@ahg-g LGTM. Please go sqaushing.

alculquicondor · 2019-12-09T21:45:18Z

/lgtm
/hold cancel

k8s-ci-robot requested review from hex108 and liu-cong December 9, 2019 01:27

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 9, 2019

ahg-g force-pushed the ahg1-affinity branch from a9da59f to f8720bf Compare December 9, 2019 02:02

k8s-ci-robot requested review from alculquicondor and Huang-Wei December 9, 2019 14:11

ahg-g mentioned this pull request Dec 9, 2019

Optimize required pod affinity (1) #86030

Closed

Huang-Wei reviewed Dec 9, 2019

View reviewed changes

alculquicondor reviewed Dec 9, 2019

View reviewed changes

pkg/scheduler/algorithm/predicates/metadata.go Show resolved Hide resolved

k8s-ci-robot assigned alculquicondor Dec 9, 2019

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Dec 9, 2019

optimize required inter-pod affinity

32dc70e

ahg-g force-pushed the ahg1-affinity branch from 6463278 to 32dc70e Compare December 9, 2019 21:22

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 9, 2019

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Dec 9, 2019

k8s-ci-robot merged commit 9bf52c2 into kubernetes:master Dec 10, 2019

k8s-ci-robot added this to the v1.18 milestone Dec 10, 2019

ahg-g deleted the ahg1-affinity branch January 10, 2020 15:38

ahg-g changed the title ~~Optimize required pod affinity (2)~~ Optimize required pod affinity Mar 3, 2020

h-w-chen mentioned this pull request Apr 12, 2021

kube-sched: perf-related back-ports: Merge pull request #86046 from ahg-g/ahg1-affinity CentaurusInfra/arktos#1076

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize required pod affinity #86046

Optimize required pod affinity #86046

ahg-g commented Dec 9, 2019 •

edited

k8s-ci-robot commented Dec 9, 2019

ahg-g commented Dec 9, 2019

alculquicondor commented Dec 9, 2019 •

edited

ahg-g commented Dec 9, 2019

ahg-g commented Dec 9, 2019

Huang-Wei left a comment

Huang-Wei Dec 9, 2019

Huang-Wei Dec 9, 2019

ahg-g Dec 9, 2019 •

edited

alculquicondor Dec 9, 2019

ahg-g Dec 9, 2019

alculquicondor Dec 9, 2019

ahg-g Dec 9, 2019

alculquicondor left a comment

alculquicondor Dec 9, 2019

Huang-Wei Dec 9, 2019

ahg-g Dec 9, 2019

Huang-Wei commented Dec 9, 2019

ahg-g commented Dec 9, 2019

alculquicondor left a comment

ahg-g commented Dec 9, 2019

Huang-Wei commented Dec 9, 2019

alculquicondor commented Dec 9, 2019

	// TODO(Huang-Wei): It might be possible to use "make(map[topologyPair]*int32)".
	// In that case, need to consider how to init each tpPairToCount[pair] in an atomic fashion.

Optimize required pod affinity #86046

Optimize required pod affinity #86046

Conversation

ahg-g commented Dec 9, 2019 • edited

k8s-ci-robot commented Dec 9, 2019

ahg-g commented Dec 9, 2019

alculquicondor commented Dec 9, 2019 • edited

ahg-g commented Dec 9, 2019

ahg-g commented Dec 9, 2019

Huang-Wei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahg-g Dec 9, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei commented Dec 9, 2019

ahg-g commented Dec 9, 2019

alculquicondor left a comment

Choose a reason for hiding this comment

ahg-g commented Dec 9, 2019

Huang-Wei commented Dec 9, 2019

alculquicondor commented Dec 9, 2019

ahg-g commented Dec 9, 2019 •

edited

alculquicondor commented Dec 9, 2019 •

edited

ahg-g Dec 9, 2019 •

edited