-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement MinDomains on Pod Topology Spread #108362
Implement MinDomains on Pod Topology Spread #108362
Conversation
/cc @alculquicondor |
e6f3111
to
904c94f
Compare
/retest |
@@ -57,11 +59,15 @@ func (s *preFilterState) Clone() framework.StateData { | |||
// Constraints are shared because they don't change. | |||
Constraints: s.Constraints, | |||
TpKeyToCriticalPaths: make(map[string]*criticalPaths, len(s.TpKeyToCriticalPaths)), | |||
TpKeyToDomainsNum: make(map[string]int32, len(s.TpKeyToDomainsNum)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we don't use int
as TpPairToMatchNum
below?
minMatchNum := paths[0].MatchNum | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be readable to arrange this to a function like MinMatchNum()
? It can be a method of preFilterState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. You can move this logic to MinMatchNum(), and return (int, error)
so that the detailed error message can be composed properly.
pkg/apis/core/types.go
Outdated
// When value is different from 1, WhenUnsatisfiable must be DoNotSchedule. | ||
// | ||
// For example, in a 3-zone cluster, MaxSkew is set to 2, MinDomains is set to 5 and pods with the same | ||
// labelSelector spread as 2/2/2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering what will happen if we set MaxSkew
here to 10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is described in the above part. (L5666)
// As a result, when the number of eligible domains is less than
minDomains
,
// scheduler won't schedule more than maxSkew Pods to those domains.
Even if we set MaxSkew=10
(and the other parameters and situation are the same as this example), global minimum is treated as 0. (because the number of domains is less than 5(MinDomains))
But, before spreading pods as 10/10/10, scheduling will not fail as the example.
In Filter, we judged the Nodes with the following criteria,
('existing matching num' + 'if self-match (1 or 0)' - 'global min matching num') <= 'maxSkew'
and, before spreading pods as 10/10/10, x + 1 - 0 <= 10
will be true
in at least one Node .
Do you think that the current documentation is not enough? If this documentation is difficult to understand for someone who understands scheduler (includes you), then many users also may not be able to understand this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, even we have less eligible domains than MinDomains
, we can still schedule pods until we reach the Skew limit, right? I think this was the point I'm confusing. I thought MinDomains
was a hard constraint. I wonder whether this will also be mis-understood by end-users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, your understanding is right.
Small change, but I added the statement that skew is calculated after MinDomains are taken into account in 03a6294.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kerthcet this was discussed in the original KEP, and had a consensus that the global minimum is read as 0 (like a virtual topology not existing yet) if the total number of topologies is less than minDomains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.
Will review after @kerthcet gives an LGTM |
pkg/api/pod/util_test.go
Outdated
}, | ||
} | ||
for _, tc := range testCases { | ||
tc := tc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant
pkg/apis/core/types.go
Outdated
// The number of domains is less than 5(MinDomains), so "global minimum" is treated as 0. | ||
// In this situation, new pod with the same labelSelector cannot be scheduled, | ||
// because computed skew will be 3(3 - 0) if new Pod is scheduled to any of the three zones, | ||
// it will violate MinDomains. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/it will violate MinDomains/it will violate MaxSkew/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@@ -6518,6 +6518,9 @@ func validateTopologySpreadConstraints(constraints []core.TopologySpreadConstrai | |||
if err := ValidateMaxSkew(subFldPath.Child("maxSkew"), constraint.MaxSkew); err != nil { | |||
allErrs = append(allErrs, err) | |||
} | |||
if err := validateMinDomains(subFldPath.Child("minDomains"), constraint.MinDomains); err != nil { | |||
allErrs = append(allErrs, err) | |||
} | |||
if err := ValidateTopologyKey(subFldPath.Child("topologyKey"), constraint.TopologyKey); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add validations for MinDomains!=1
and WhenUnsatisfiable=ScheduleAnyway
combination to fail fast at api level? Tests also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -633,13 +634,13 @@ func TestDryRunPreemption(t *testing.T) { | |||
{ | |||
name: "preemption to resolve pod topology spread filter failure", | |||
registerPlugins: []st.RegisterPluginFunc{ | |||
st.RegisterPluginAsExtensions(podtopologyspread.Name, podtopologyspread.New, "PreFilter", "Filter"), | |||
st.RegisterPluginAsExtensions(podtopologyspread.Name, schedruntime.FactoryAdapter(feature.Features{}, podtopologyspread.New), "PreFilter", "Filter"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's name var podTopologySpreadFunc = schedruntime.FactoryAdapter(feature.Features{}, podtopologyspread.New)
at the file header and reuse it.
if !ok { | ||
// error which should not happen. | ||
// If this error occurs, MinDomains will be ignored in judge. | ||
klog.ErrorS(nil, "Internal error occurred while retrieving the number of eligible domains from topology key. MinDomains will be ignored.", "topologyKey", tpKey, "domainsNums", s.TpKeyToDomainsNum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/from topology key/by topology key/
@@ -1295,142 +1405,65 @@ func TestSingleConstraint(t *testing.T) { | |||
}, | |||
}, | |||
{ | |||
name: "pods spread across nodes as 2/1/0/3, maxSkew is 2, node-b and node-x fit", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove these tests? Seems like these tests can work well if we set proper MinDomains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deletion seems to be a mistake. Will revert this change. Thanks
@@ -732,7 +742,7 @@ func TestPodTopologySpreadScore(t *testing.T) { | |||
allNodes := append([]*v1.Node{}, tt.nodes...) | |||
allNodes = append(allNodes, tt.failedNodes...) | |||
state := framework.NewCycleState() | |||
pl := plugintesting.SetupPluginWithInformers(ctx, t, New, &config.PodTopologySpreadArgs{DefaultingType: config.SystemDefaulting}, cache.NewSnapshot(tt.existingPods, allNodes), tt.objs) | |||
pl := plugintesting.SetupPluginWithInformers(ctx, t, frameworkruntime.FactoryAdapter(feature.Features{}, New), &config.PodTopologySpreadArgs{DefaultingType: config.SystemDefaulting}, cache.NewSnapshot(tt.existingPods, allNodes), tt.objs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare var podTopologySpreadFunc frameworkruntime.FactoryAdapter(feature.Features{}, New)
for reuse.
if !ok { | ||
// error which should not happen. | ||
// If this error occurs, MinDomains will be ignored in judge. | ||
klog.ErrorS(nil, "Internal error occurred while retrieving the number of eligible domains from topology key. MinDomains will be ignored.", "topologyKey", tpKey, "domainsNums", s.TpKeyToDomainsNum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
domainsNum
here will always be 0, the same as minMatchNum
, this is not right. Change to something like:
if domainsNum, ok := s.TpKeyToDomainsNum[c.TopologyKey]; ok {
if domainsNum < c.MinDomains {
minMatchNum = 0
}
} else {
// When the number of eligible domains with matching topology keys is less than `minDomains`,
// it treats "global minimum" as 0.
klog.ErrorS(nil, "Internal error occurred while retrieving the number of eligible domains from topology key. MinDomains will be ignored.", "topologyKey", tpKey, "domainsNums", s.TpKeyToDomainsNum)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice catch 🙏
We prefer integration tests were possible, because they run faster |
904c94f
to
bd3c551
Compare
Thanks. squashed the commits into 2 commits. |
/lgtm |
a1e98ee
to
6141aa5
Compare
@alculquicondor rebased this pr. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, sanposhiho The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
@alculquicondor Could you |
/hold cancel |
/retest 🤔 I'm not seeing other failures for If you see it again, please open an issue. |
/retest |
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
/retest |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Implement MinDomains and tests
This PR contains the changes on #107674, so should be merged after #107674. The changes in this PR is only f3bd5e5
/hold
Which issue(s) this PR fixes:
Part of #105291
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: