"scheduler: non-compatible change in default topology spread constraints" #102136

gaorong · 2021-05-19T12:57:17Z

What happened:

From v1.19, the scheduler use PodTopologySpread plugin to do default spreading, with the following constraints, and the original SelectorSpread is disabled by default.

- whenUnsatisfiable: ScheduleAnyway
  topologyKey: kubernetes.io/hostname
  maxSkew: 3
- whenUnsatisfiable: ScheduleAnyway
  topologyKey: topology.kubernetes.io/zone
  maxSkew: 5

These default constraints suppose nodes have a topology.kubernetes.io/zone label. The algorithm will ignore nodes that don't have all the required topology keys presented when scoring.

kubernetes/pkg/scheduler/framework/plugins/podtopologyspread/scoring.go

Lines 77 to 82 in ee9f365

    
           if !nodeLabelsMatchSpreadConstraints(node.Labels, s.Constraints) { 
        
           	// Nodes which don't have all required topologyKeys present are ignored 
        
           	// when scoring later. 
        
           	s.IgnoredNodes.Insert(node.Name) 
        
           	continue 
        
           }

However, in many on-premise clusters, those labels are not added by default and no guaranteed to exist, which leads to the PodTopologySpread will not take effect at all.

In a worse case, If we assign a NodeResourcesMostAllocated plugin and the PodTopologySpread do not take effect as described above, the NodeResourcesMostAllocated will dominate the scheduling and be likely to bind all pods belonging to ReplicaSets to the same node, which will lead to low availability and is not accepted by our user.

In the SelectorSpread plugin, we have already taken care of this case:

kubernetes/pkg/scheduler/framework/plugins/selectorspread/selector_spread.go

Lines 153 to 154 in ee9f365

    
           // If there is zone information present, incorporate it 
        
           if haveZones {

When changed to PodTopologySpread, we do not care zome, so are not compatible with the original behavior.

What you expected to happen:

the default PodTopologySpread plugin should take care of the case in which nodes do not have zone labels and work properly.

How to reproduce it (as minimally and precisely as possible):

create a deployment in a cluster whose nodes do not have zone labels and observe the default spreading.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):
master branch

/assign
/sig scheduling

The text was updated successfully, but these errors were encountered:

gaorong · 2021-05-19T13:49:41Z

A potential fix would like this: In the default constraints case, we do not ignore nodes in prescore phase, so nodes with no topology-labels will be grouped to an empty-value topology and count pod per topology normally. At the later score phase, those empty-value topology's scores will not be added to the final score as we have a label check as below, so the last result will favour nodes with all required topology keys presented and penalize without.

for i, c := range s.Constraints {
	if tpVal, ok := node.Labels[c.TopologyKey]; ok {
		var cnt int64
		if c.TopologyKey == v1.LabelHostname {
			cnt = int64(countPodsMatchSelector(nodeInfo.Pods, c.Selector, pod.Namespace))
		} else {
			pair := topologyPair{key: c.TopologyKey, value: tpVal}
			cnt = *s.TopologyPairToPodCounts[pair]
		}
		score += scoreForCount(cnt, c.MaxSkew, s.TopologyNormalizingWeight[i])
	}
}

can this fix make sense? feel free to leave comments.

ahg-g · 2021-05-19T18:17:33Z

There are two options:

Add topology.kubernetes.io/zone label to the nodes. This is reasonable since this label (along with topology.kubernetes.io/region) is now standard: https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/1659-standard-topology-labels
Update the default pod spreading constraints in the scheduler CC to only use the hostname constraint.

@alculquicondor we faced/discussed this issue before, what was the conclusion?

alculquicondor · 2021-05-19T18:23:13Z

My suggestion was to special case the system defaulting to still provide some spreading if the zone label doesn't exist #98480 (comment)

Feel free to implement that suggestion. This should apply for backporting.

alculquicondor · 2021-05-19T18:23:24Z

/triage accepted

alculquicondor · 2021-05-19T18:24:03Z

cc @Huang-Wei who was in the previous thread.

alculquicondor · 2021-05-19T18:25:27Z

Btw, in the meantime, you can disable the feature gate DefaultPodTopologySpread

Huang-Wei · 2021-05-19T20:29:40Z

@alculquicondor 's suggestion looks good, but where should can we achieve that? Not during init time as defaultConstraints is instantiated without knowing the underlying node labels:

kubernetes/pkg/scheduler/framework/plugins/podtopologyspread/plugin.go

Lines 97 to 99 in bc637a7

    
           if args.DefaultingType == config.SystemDefaulting { 
        
           	pl.defaultConstraints = systemDefaultConstraints 
        
           }

If in runtime, that implies we have to do it in every scoring cycle, and iterate the filtered nodes to know if the zone label is present or not.

Huang-Wei · 2021-05-19T20:30:52Z

can this fix make sense? feel free to leave comments.

This may work except for the unnecessary calculation on the "artificial" empty-value topology.

alculquicondor · 2021-05-19T20:35:02Z

Yes, it should be in runtime. We already iterate over all the nodes in PreScore, so I don't see why it wouldn't work.

can this fix make sense?

Something like that, but it should be done just for the System defaulting mode. If the cluster administrator set another default, we assume they know what they are doing.

ahg-g · 2021-05-19T21:15:17Z

This isn't pretty, but I guess we have to do it because it was a breaking change.

ahg-g · 2021-05-25T20:24:00Z

/retitle "scheduler: non-compatible change in default topology spread constraints"

k8s-triage-robot · 2021-08-23T20:49:46Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

alculquicondor · 2021-08-31T14:26:54Z

We probably need to solve this before graduating to GA kubernetes/enhancements#1258

@gaorong I noticed you still have a PR open. Are you still working on it?

alculquicondor · 2021-09-15T14:17:44Z

/unassign @gaorong
as they seem inactive

I can take it over
/assign

alculquicondor · 2021-09-16T18:59:05Z

I'm not going to backport this, but I will support a cherry pick if anybody needs it. 1.20 would be the oldest that can be fixed.

alculquicondor · 2021-09-16T18:59:42Z

cc @damemi

ahg-g · 2021-11-19T21:20:59Z

/open

for backporting

alculquicondor · 2021-11-22T14:55:48Z

/reopen

k8s-ci-robot · 2021-11-22T14:56:15Z

@alculquicondor: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ahg-g · 2021-11-22T15:03:28Z

we need to backport this

alculquicondor · 2021-11-22T21:32:00Z

Opened: #106604, #106605, #106607

k8s-triage-robot · 2021-12-22T22:29:59Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

ahg-g · 2021-12-23T17:55:48Z

/close

k8s-ci-robot · 2021-12-23T17:56:08Z

@ahg-g: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

gaorong added the kind/bug Categorizes issue or PR as related to a bug. label May 19, 2021

k8s-ci-robot assigned gaorong May 19, 2021

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 19, 2021

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 19, 2021

alculquicondor mentioned this issue May 19, 2021

serviceSpreadPriority not properly scoring nodes after being mapped to podTopologySpread #98480

Closed

k8s-ci-robot changed the title ~~scheduler: non-compatilbel change in default topology spread constraints~~ "scheduler: non-compatible change in default topology spread constraints" May 25, 2021

gaorong mentioned this issue May 28, 2021

scheduler: fix non-compatible change in default topology spread constraints #102383

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 23, 2021

k8s-ci-robot assigned alculquicondor and unassigned gaorong Sep 15, 2021

alculquicondor mentioned this issue Sep 15, 2021

Skip check for all topology labels when using system default spreading #105046

Merged

k8s-ci-robot closed this as completed in #105046 Sep 16, 2021

k8s-ci-robot reopened this Nov 22, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 22, 2021

k8s-ci-robot closed this as completed Dec 23, 2021

alculquicondor mentioned this issue Jan 10, 2022

DefaultPodTopologySpread graduation to Stable kubernetes/enhancements#3117

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"scheduler: non-compatible change in default topology spread constraints" #102136

"scheduler: non-compatible change in default topology spread constraints" #102136

gaorong commented May 19, 2021

gaorong commented May 19, 2021

ahg-g commented May 19, 2021

alculquicondor commented May 19, 2021

alculquicondor commented May 19, 2021

alculquicondor commented May 19, 2021

alculquicondor commented May 19, 2021

Huang-Wei commented May 19, 2021

Huang-Wei commented May 19, 2021

alculquicondor commented May 19, 2021

ahg-g commented May 19, 2021

ahg-g commented May 25, 2021

k8s-triage-robot commented Aug 23, 2021

alculquicondor commented Aug 31, 2021

alculquicondor commented Sep 15, 2021

alculquicondor commented Sep 16, 2021

alculquicondor commented Sep 16, 2021

ahg-g commented Nov 19, 2021

alculquicondor commented Nov 22, 2021

k8s-ci-robot commented Nov 22, 2021

ahg-g commented Nov 22, 2021

alculquicondor commented Nov 22, 2021

k8s-triage-robot commented Dec 22, 2021

ahg-g commented Dec 23, 2021

k8s-ci-robot commented Dec 23, 2021

"scheduler: non-compatible change in default topology spread constraints" #102136

"scheduler: non-compatible change in default topology spread constraints" #102136

Comments

gaorong commented May 19, 2021

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

gaorong commented May 19, 2021

ahg-g commented May 19, 2021

alculquicondor commented May 19, 2021

alculquicondor commented May 19, 2021

alculquicondor commented May 19, 2021

alculquicondor commented May 19, 2021

Huang-Wei commented May 19, 2021

Huang-Wei commented May 19, 2021

alculquicondor commented May 19, 2021

ahg-g commented May 19, 2021

ahg-g commented May 25, 2021

k8s-triage-robot commented Aug 23, 2021

alculquicondor commented Aug 31, 2021

alculquicondor commented Sep 15, 2021

alculquicondor commented Sep 16, 2021

alculquicondor commented Sep 16, 2021

ahg-g commented Nov 19, 2021

alculquicondor commented Nov 22, 2021

k8s-ci-robot commented Nov 22, 2021

ahg-g commented Nov 22, 2021

alculquicondor commented Nov 22, 2021

k8s-triage-robot commented Dec 22, 2021

ahg-g commented Dec 23, 2021

k8s-ci-robot commented Dec 23, 2021