Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MinDomains on Pod Topology Spread #108362

Merged
merged 1 commit into from
Mar 15, 2022

Conversation

sanposhiho
Copy link
Member

@sanposhiho sanposhiho commented Feb 26, 2022

What type of PR is this?

/kind feature

What this PR does / why we need it:

Implement MinDomains and tests

This PR contains the changes on #107674, so should be merged after #107674. The changes in this PR is only f3bd5e5
/hold

Which issue(s) this PR fixes:

Part of #105291

Special notes for your reviewer:

  • Should I add e2e tests in this PR?

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/3022

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 26, 2022
@sanposhiho
Copy link
Member Author

/cc @alculquicondor

@sanposhiho
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 27, 2022
@@ -57,11 +59,15 @@ func (s *preFilterState) Clone() framework.StateData {
// Constraints are shared because they don't change.
Constraints: s.Constraints,
TpKeyToCriticalPaths: make(map[string]*criticalPaths, len(s.TpKeyToCriticalPaths)),
TpKeyToDomainsNum: make(map[string]int32, len(s.TpKeyToDomainsNum)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we don't use int as TpPairToMatchNum below?

minMatchNum := paths[0].MatchNum

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be readable to arrange this to a function like MinMatchNum()? It can be a method of preFilterState.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. You can move this logic to MinMatchNum(), and return (int, error) so that the detailed error message can be composed properly.

// When value is different from 1, WhenUnsatisfiable must be DoNotSchedule.
//
// For example, in a 3-zone cluster, MaxSkew is set to 2, MinDomains is set to 5 and pods with the same
// labelSelector spread as 2/2/2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering what will happen if we set MaxSkew here to 10.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is described in the above part. (L5666)

// As a result, when the number of eligible domains is less than minDomains,
// scheduler won't schedule more than maxSkew Pods to those domains.


Even if we set MaxSkew=10 (and the other parameters and situation are the same as this example), global minimum is treated as 0. (because the number of domains is less than 5(MinDomains))
But, before spreading pods as 10/10/10, scheduling will not fail as the example.
In Filter, we judged the Nodes with the following criteria,

('existing matching num' + 'if self-match (1 or 0)' - 'global min matching num') <= 'maxSkew'

and, before spreading pods as 10/10/10, x + 1 - 0 <= 10 will be true in at least one Node .

Do you think that the current documentation is not enough? If this documentation is difficult to understand for someone who understands scheduler (includes you), then many users also may not be able to understand this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, even we have less eligible domains than MinDomains, we can still schedule pods until we reach the Skew limit, right? I think this was the point I'm confusing. I thought MinDomains was a hard constraint. I wonder whether this will also be mis-understood by end-users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your understanding is right.
Small change, but I added the statement that skew is calculated after MinDomains are taken into account in 03a6294.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kerthcet this was discussed in the original KEP, and had a consensus that the global minimum is read as 0 (like a virtual topology not existing yet) if the total number of topologies is less than minDomains.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it.

@alculquicondor
Copy link
Member

Will review after @kerthcet gives an LGTM

},
}
for _, tc := range testCases {
tc := tc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant

// The number of domains is less than 5(MinDomains), so "global minimum" is treated as 0.
// In this situation, new pod with the same labelSelector cannot be scheduled,
// because computed skew will be 3(3 - 0) if new Pod is scheduled to any of the three zones,
// it will violate MinDomains.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/it will violate MinDomains/it will violate MaxSkew/

Copy link
Member Author

@sanposhiho sanposhiho Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@@ -6518,6 +6518,9 @@ func validateTopologySpreadConstraints(constraints []core.TopologySpreadConstrai
if err := ValidateMaxSkew(subFldPath.Child("maxSkew"), constraint.MaxSkew); err != nil {
allErrs = append(allErrs, err)
}
if err := validateMinDomains(subFldPath.Child("minDomains"), constraint.MinDomains); err != nil {
allErrs = append(allErrs, err)
}
if err := ValidateTopologyKey(subFldPath.Child("topologyKey"), constraint.TopologyKey); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add validations for MinDomains!=1 and WhenUnsatisfiable=ScheduleAnyway combination to fail fast at api level? Tests also.

Copy link
Member Author

@sanposhiho sanposhiho Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been implemented in #107674. (Sorry I hadn't rebased this PR branch on latest #107674 branch)

@@ -633,13 +634,13 @@ func TestDryRunPreemption(t *testing.T) {
{
name: "preemption to resolve pod topology spread filter failure",
registerPlugins: []st.RegisterPluginFunc{
st.RegisterPluginAsExtensions(podtopologyspread.Name, podtopologyspread.New, "PreFilter", "Filter"),
st.RegisterPluginAsExtensions(podtopologyspread.Name, schedruntime.FactoryAdapter(feature.Features{}, podtopologyspread.New), "PreFilter", "Filter"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's name var podTopologySpreadFunc = schedruntime.FactoryAdapter(feature.Features{}, podtopologyspread.New) at the file header and reuse it.

if !ok {
// error which should not happen.
// If this error occurs, MinDomains will be ignored in judge.
klog.ErrorS(nil, "Internal error occurred while retrieving the number of eligible domains from topology key. MinDomains will be ignored.", "topologyKey", tpKey, "domainsNums", s.TpKeyToDomainsNum)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/from topology key/by topology key/

@@ -1295,142 +1405,65 @@ func TestSingleConstraint(t *testing.T) {
},
},
{
name: "pods spread across nodes as 2/1/0/3, maxSkew is 2, node-b and node-x fit",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove these tests? Seems like these tests can work well if we set proper MinDomains.

Copy link
Member Author

@sanposhiho sanposhiho Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deletion seems to be a mistake. Will revert this change. Thanks

@@ -732,7 +742,7 @@ func TestPodTopologySpreadScore(t *testing.T) {
allNodes := append([]*v1.Node{}, tt.nodes...)
allNodes = append(allNodes, tt.failedNodes...)
state := framework.NewCycleState()
pl := plugintesting.SetupPluginWithInformers(ctx, t, New, &config.PodTopologySpreadArgs{DefaultingType: config.SystemDefaulting}, cache.NewSnapshot(tt.existingPods, allNodes), tt.objs)
pl := plugintesting.SetupPluginWithInformers(ctx, t, frameworkruntime.FactoryAdapter(feature.Features{}, New), &config.PodTopologySpreadArgs{DefaultingType: config.SystemDefaulting}, cache.NewSnapshot(tt.existingPods, allNodes), tt.objs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declare var podTopologySpreadFunc frameworkruntime.FactoryAdapter(feature.Features{}, New) for reuse.

if !ok {
// error which should not happen.
// If this error occurs, MinDomains will be ignored in judge.
klog.ErrorS(nil, "Internal error occurred while retrieving the number of eligible domains from topology key. MinDomains will be ignored.", "topologyKey", tpKey, "domainsNums", s.TpKeyToDomainsNum)
Copy link
Member

@kerthcet kerthcet Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domainsNum here will always be 0, the same as minMatchNum, this is not right. Change to something like:

		if domainsNum, ok := s.TpKeyToDomainsNum[c.TopologyKey]; ok {
			if domainsNum < c.MinDomains {
				minMatchNum = 0
			}
		} else {
			// When the number of eligible domains with matching topology keys is less than `minDomains`,
			// it treats "global minimum" as 0.
			klog.ErrorS(nil, "Internal error occurred while retrieving the number of eligible domains from topology key. MinDomains will be ignored.", "topologyKey", tpKey, "domainsNums", s.TpKeyToDomainsNum)
		}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice catch 🙏

@alculquicondor
Copy link
Member

alculquicondor commented Mar 1, 2022

Should I add e2e tests in this PR?

We prefer integration tests were possible, because they run faster

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 1, 2022
@sanposhiho
Copy link
Member Author

sanposhiho commented Mar 11, 2022

Thanks. squashed the commits into 2 commits.
And, change the release notes as suggestion.

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 11, 2022
@alculquicondor
Copy link
Member

/lgtm
/approve
/hold

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 14, 2022
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 14, 2022
@sanposhiho
Copy link
Member Author

sanposhiho commented Mar 14, 2022

@alculquicondor rebased this pr.

@alculquicondor
Copy link
Member

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 14, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sanposhiho
Copy link
Member Author

/retest

@sanposhiho
Copy link
Member Author

@alculquicondor Could you /unhold if looks good?

@alculquicondor
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 15, 2022
@alculquicondor
Copy link
Member

/retest

🤔 I'm not seeing other failures for TestVolumeBinding/bound_and_unbound_unsatisfied in https://storage.googleapis.com/k8s-triage/index.html?date=2022-03-06&pr=1&text=TestVolumeBinding&job=pull-kubernetes-unit

If you see it again, please open an issue.

@alculquicondor
Copy link
Member

/retest

@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@sanposhiho
Copy link
Member Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants