New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize required pod affinity #86046
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
LOL, I just commented that we should do this in the previous PR. Is it worth having 2 PRs? |
It is up to you (the reviewers). Do you think this one alone is manageable? |
@Huang-Wei do you want to just look at this PR and close the other one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ahg-g ! LGTM generally, just some nits.
topologyPairToPods map[topologyPair]podSet | ||
podToTopologyPairs map[string]topologyPairSet | ||
} | ||
type topologyToMatchedTermCount map[topologyPair]int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use map[topologyPair]*int64
so that in the initialization of the metadata, we can concurrently manipulate the value without locking (appendResult()
).
(can be a followup PR maybe)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm, if possible, I believe it can be a followup along with comment:
kubernetes/pkg/scheduler/algorithm/predicates/metadata.go
Lines 443 to 444 in 55f8131
// TODO(Huang-Wei): It might be possible to use "make(map[topologyPair]*int32)". | |
// In that case, need to consider how to init each tpPairToCount[pair] in an atomic fashion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about that, I did a mutex contention profile, and this can potentially bring ~10% improvement. We can do that in a followup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for isolating in a separate PR. There might be other forms of locking that we could consider too. Or using a channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to initialize, you can do something like this:
if topologyToMatchedTermCount[pair] == nil {
mutex.Lock()
// we have to check again since by the time we get the lock, another thread might have already initialized the entry.
if topologyToMatchedTermCount[pair] == nil {
topologyToMatchedTermCount[pair] = new(int64)
}
mutex.Unlock ()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure that's thread safe. But let's leave the discussion for another PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we will need to use an RLock:
mutex.RLock
ptr := topologyToMatchedTermCount[pair]
mutex.RUnlock
if ptr == nil {
mutex.Lock()
// we have to check again since by the time we get the lock, another thread might have already initialized the entry.
if topologyToMatchedTermCount[pair] == nil {
topologyToMatchedTermCount[pair] = new(int64)
}
ptr = topologyToMatchedTermCount[pair]
mutex.Unlock ()
}
atomicAdd(ptr, value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there another PR coming to cache the affinity terms of the incoming pod?
m[pair] += value | ||
// value could be a negative value, hence we delete the entry if | ||
// the entry is down to zero. | ||
if m[pair] == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it actually worth deleting? we might end up re-adding it with the preemption algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: checking if m[pair] == 0
works for both a non-existing entry or existing entry but with value 0.
- if we want to manually delete the entry, we'd better keep the logic consistent to use
if _, ok := m[pair]; ok
on the logic of checking its existence - if we don't delete the entry, probably it increases some memory footprint, but we save time on checking and deletion. in this case, checking
if m[pair] == 0
should be used to check its existence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the previous logic relied on the existence of the entry rather than whether or not it is zero, I opted to do this to avoid potential bugs. Doing this also made it easy to pass the tests that does DeepEqual test. We can clean this up in a followup PR that converts the type to *int64 so that we can do atomic adds instead of using a global mutex.
yes, the volume of code changes are manageable in one PR, but feel free to keep 2 commits in this PR. |
I guess you are referring to the preemption logic? yes, we can cache that in the metadata, but I thought we just wait for PodInfo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold
for squash
Thanks, @Huang-Wei please let me know if I can squash. |
@ahg-g LGTM. Please go sqaushing. |
/lgtm |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This is the second PR in optimizing required pod affinity. This PR converts the data structure used to calculate pod affinity from a map of topology-to-list-of-pods to a map of topology-to-pod-counts. This significantly reduces the overhead of creating the map without compromising the predicate performance.
Basically, for each topology, instead of tracking the exact set of existing pods, we just track their count.
This PR builds on #86030. It offers up to 2.3x improvement over #86030. The two PRs combined offer up to 3.7x improvement.
Before
After #86030
After this PR
Does this PR introduce a user-facing change?: