New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize preferred spreading for hostname topology #89487
Optimize preferred spreading for hostname topology #89487
Conversation
/priority important-longterm cc @ahg-g |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alculquicondor ! Just some nits, otherwise LGTM.
@@ -82,3 +82,17 @@ func filterTopologySpreadConstraints(constraints []v1.TopologySpreadConstraint, | |||
} | |||
return result, nil | |||
} | |||
|
|||
func countPodsMatchSelector(pods []*v1.Pod, selector labels.Selector, ns string) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: change int to int64 so we don't need to add explicit int64(num)
conversion later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to reuse this in Filter, which uses int32
for whatever reason. int
seems more portable.
@@ -104,7 +107,7 @@ func (pl *PodTopologySpread) PreScore( | |||
} | |||
|
|||
state := &preScoreState{ | |||
NodeNameSet: sets.String{}, | |||
NodeNameSet: make(sets.String, len(filteredNodes)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for curiosity: how much percentage does it help in the performance boost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmark for 100 nodes remains at 0.3ms and goes up to 2.1ms for 1000 nodes.
The profile shows 10% CPU usage from sets.String.Insert
, dominated by growing the hash table, which is not negligible.
I think it's fair to assume that if a user sets spreading, they expect all nodes to have the topology keys.
gotList := make(framework.NodeScoreList, len(filteredNodes)) | ||
scoreNode := func(i int) { | ||
n := filteredNodes[i] | ||
score, _ := plugin.Score(context.Background(), state, pod, n.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instantiating context.Background()
everytime looks odd, cannot we reuse the outer ctx
? Check out wait.UntilWithContext
:
func UntilWithContext(ctx context.Context, f func(context.Context), period time.Duration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can... I just missed it 😅
This is closer to what happens in the core scheduler Signed-off-by: Aldo Culquicondor <acondor@google.com>
2c156b9
to
d2b1903
Compare
/retest |
/sig scheduling |
/lgtm |
/remove-sig scheduling |
/sig scheduling |
/retest |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Optimize spreading when using
kubernetes.io/hostname
as a topology key. We avoid calculating per-node counters during PreScore and do it in Score instead. This is faster because there are less map insertions and accesses.This makes the implementation of default spreading with "topology spreading" closer to the default spreading:
Which issue(s) this PR fixes:
Part of #84936
Special notes for your reviewer:
The first commit contains the optimization.
The second commit updates the benchmark to reflect scheduler code, which parallelizes score calculation.
Does this PR introduce a user-facing change?: