Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Even Pods Spread - 1. API changes #77327

Merged
merged 3 commits into from Jul 20, 2019

Conversation

@Huang-Wei
Copy link
Member

commented May 2, 2019

What type of PR is this?

/kind api-change
/sig scheduling
/hold

What this PR does / why we need it:

This is the 1st PR of the "Even Pods Spread" KEP implementation.

  • Manual API changes
  • Defaulting, validation and spec removal if feature gate is disabled
  • Unit tests
  • Auto-generated files

Special notes for your reviewer::

Which issue(s) this PR fixes:

Part of #77284.

Does this PR introduce a user-facing change?:

`[]TopologySpreadConstraint` is introduced into PodSpec to support the "Even Pods Spread" alpha feature.
@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented May 2, 2019

/priority important-soon

/cc @bsalamat @lavalamp @liggitt
for review.

@k8s-ci-robot k8s-ci-robot requested review from bsalamat, lavalamp and liggitt May 2, 2019

@bsalamat bsalamat self-assigned this May 3, 2019

@bsalamat bsalamat removed the request for review from davidopp May 3, 2019

@bsalamat
Copy link
Member

left a comment

Thanks, @Huang-Wei! Most of my comments are minor.

@@ -4755,3 +4761,42 @@ const (
// DefaultHardPodAffinityWeight defines the weight of the implicit PreferredDuringScheduling affinity rule.
DefaultHardPodAffinitySymmetricWeight int32 = 1
)

type UnsatisfiableConstraintResponse string

This comment has been minimized.

Copy link
@bsalamat

bsalamat May 3, 2019

Member

maybe rename to UnsatisfiedContraintAction?

This comment has been minimized.

Copy link
@bsalamat

bsalamat May 3, 2019

Member

An integer would be less expensive to compare. Is there any recommendation on the type?

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei May 4, 2019

Author Member

I'm ok with UnsatisfiedContraintAction . Would like to hear from API experts on naming and the type.

/cc @lavalamp @liggitt

(BTW: ReadWriteOnce/ReadOnlyMany/ReadWriteMany are of type string)

This comment has been minimized.

Copy link
@krmayankk

krmayankk May 7, 2019

Contributor

Yes Action sounds much better than Response

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei May 7, 2019

Author Member

Let's summarize all suggestions and make a change later.

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Jun 13, 2019

Author Member

It's been renamed to "Action".

Show resolved Hide resolved pkg/apis/core/types.go Outdated
Show resolved Hide resolved pkg/apis/core/types.go Outdated
Show resolved Hide resolved pkg/apis/core/types.go Outdated
Show resolved Hide resolved pkg/apis/core/types.go Outdated
// alpha: v1.15
//
// Schedule pods evenly across available topology domains.
EvenPodsSpread utilfeature.Feature = "EvenPodsSpread"

This comment has been minimized.

Copy link
@bsalamat

bsalamat May 3, 2019

Member

Is EvenPodSpreading a better name for the feature?

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei May 4, 2019

Author Member

I recalled @smarterclayton mentioned that ing is not a good name. But I'm open to any reasonable name of the feature gate.

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Jun 13, 2019

Author Member

Any suggestion on the naming?

cc/ @bsalamat @lavalamp @liggitt @smarterclayton

Show resolved Hide resolved pkg/apis/core/validation/validation.go Outdated
Show resolved Hide resolved pkg/apis/core/validation/validation.go Outdated
Show resolved Hide resolved pkg/api/pod/util_test.go Outdated
Show resolved Hide resolved pkg/api/pod/util_test.go Outdated

@Huang-Wei Huang-Wei force-pushed the Huang-Wei:eps-api branch from 6010630 to 0b24860 May 4, 2019

@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented May 4, 2019

Thanks @bsalamat ! Most of the comments have been addressed. PTAL.

@lavalamp

This comment has been minimized.

Copy link
Member

commented May 6, 2019

/assign

I'll review the API (might not get to it today though)

@krmayankk
Copy link
Contributor

left a comment

added some comments, excited to see this coming

Show resolved Hide resolved pkg/api/pod/util_test.go Outdated
Show resolved Hide resolved pkg/api/pod/util_test.go Outdated
@@ -4755,3 +4761,42 @@ const (
// DefaultHardPodAffinityWeight defines the weight of the implicit PreferredDuringScheduling affinity rule.
DefaultHardPodAffinitySymmetricWeight int32 = 1
)

type UnsatisfiableConstraintResponse string

This comment has been minimized.

Copy link
@krmayankk

krmayankk May 7, 2019

Contributor

Yes Action sounds much better than Response

Show resolved Hide resolved pkg/apis/core/types.go Outdated
// Pods that match this label selector are counted to determine the number of pods
// in their corresponding topology domain.
// +optional
LabelSelector *metav1.LabelSelector

This comment has been minimized.

Copy link
@krmayankk

krmayankk May 7, 2019

Contributor

i am trying to remember what was the actual use case for the label selector ?

This comment has been minimized.

Copy link
@krmayankk

krmayankk May 31, 2019

Contributor

Another way of asking this is why would this label selector be different then the deployments own label selector ? I see two scenarios:-
1: i want to spread all pods of a workload controller (deployment, RS, statefulset). In this case we dont need a selector
2: i want to spread all pods matching a label. This is more common in CRD controllers which deploy a pod and ditch the regular core controllers.
I feel like we need a model where specifying this labelSelector for the first case should not be mandatory and should be derivable from the uber object . @Huang-Wei @bsalamat wdyt ?

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei May 31, 2019

Author Member

A reasonable model is to not impose limitations in the beginning unless we have a strong reason (e.g. perf) to do that.

This comment has been minimized.

Copy link
@krmayankk

krmayankk Jul 2, 2019

Contributor

@Huang-Wei we discussed this in one of our sig-scheduling meetings to deduce the labelselector from its owning Deployment if one is omitted from the topologySpreadConstraint . This would make it painless for majority of the users who just want to spread the pods of their Deployment /Statefulset and wont have to specify the LabelSelector. I can create a issue on this if you want or if you want to address as part of the other implementation PR's you have.

// schedule it onto zone1(zone2) will make the ActualSkew(2) violates MaxSkew(1)
// - if MaxSkew is 2, incoming pod can be scheduled to any zone.
// It's a required value. Default value is 1 and 0 is not allowed.
MaxSkew int32

This comment has been minimized.

Copy link
@krmayankk

krmayankk May 7, 2019

Contributor

should we make it unsigned ?

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei May 7, 2019

Author Member

I also thought of this uint32. But it seems we're not enforcing a rule - DeploymentSpec.Replicas is using int32.

This comment has been minimized.

Copy link
@krmayankk
allConstraintKinds := sets.String{}
for i, constraint := range constraints {
subFldPath := fldPath.Index(i)
if val := constraint.MaxSkew; val <= 0 {

This comment has been minimized.

Copy link
@krmayankk

krmayankk May 7, 2019

Contributor

why do we need to check this , lets make this unsigned

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei May 7, 2019

Author Member

with uint32, it also needs to be validated to be non-zero.

This comment has been minimized.

Copy link
@lavalamp

lavalamp May 28, 2019

Member

fyi @krmayankk: Please do not use uint types for restricting the domain to positive integers--it encourages math errors. The code is correct.

This comment has been minimized.

Copy link
@krmayankk

krmayankk May 31, 2019

Contributor

@lavalamp could you elaborate ? underflow/overflow can occur in both . Keeping it unsigned means i dont need to validate if its negative

@Huang-Wei why should it be validated to be non zero ? It can be zero, which means perfect spreading with no skew

This comment has been minimized.

Copy link
@lavalamp

lavalamp May 31, 2019

Member

at zero, it will be impossible to legally add another pod. So zero can't be allowed.

underflow can occur always, but underflow at 0 is vastly more likely to be hit in normal code. e.g. if x - y > z doesn't do what people expect any more.

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei May 31, 2019

Author Member

We can't use zero to describe a "perfect" balanced case - think about how you can spread 2 pods into 2 zones if it's set to zero.

It's similar to the definition of "perfect balanced tree" - where we usually define it as "height of the left and right subtrees from every node differ by 1(maxSkew) or less"

This comment has been minimized.

Copy link
@krmayankk

krmayankk Jul 2, 2019

Contributor

thanks @lavalamp @Huang-Wei for the explanation and it makes sense. Currently the scheduler decides on a per pod basis .For a hypothetical use case, where an application cannot tolerate unbalance by even one pod, what would be ideal is to allow the scheduler to schedule in batches. So in your example of two pods and two zones, the next scheduling can happen for 2 or 4 more pods otherwise it fails. May be we support this when we have batch scheduling (i think there was some work going on).

We can even argue, whether we should even design such applications which cannot tolerate an imbalance of 1 pod .

Show resolved Hide resolved pkg/apis/core/validation/validation_test.go

@Huang-Wei Huang-Wei force-pushed the Huang-Wei:eps-api branch from 0b24860 to 8a241f8 May 7, 2019

@Huang-Wei Huang-Wei force-pushed the Huang-Wei:eps-api branch from 03f62f3 to 1869e09 Jul 18, 2019

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jul 18, 2019

@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented Jul 18, 2019

The recent force-push didn't change the code but just rebasing.

@lavalamp the resting PRs have got reviewed by @ahg-g and @bsalamat, we just have a few nits to figure out. It'd good if you can /approve this one, so that we can merge the resting ones one by one.

Thanks for the tremendous reviewing efforts all guys put on this PR. Really appreciate that!

}

// ValidateScheduleAction tests that the argument is a valid UnsatisfiableConstraintAction.
func ValidateScheduleAction(fldPath *field.Path, action core.UnsatisfiableConstraintAction) *field.Error {

This comment has been minimized.

Copy link
@alculquicondor

alculquicondor Jul 18, 2019

Member

Why not ValidateWhenUnsatisfiable, to keep it consistent?

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Jul 19, 2019

Author Member

Done.


type spreadConstraintPair struct {
topologyKey string
scheduleAction core.UnsatisfiableConstraintAction

This comment has been minimized.

Copy link
@alculquicondor

alculquicondor Jul 18, 2019

Member

WhenUnsatisfiable?

This comment has been minimized.

Copy link
@bsalamat

bsalamat Jul 18, 2019

Member

+1. It'd make it more consistent with the API, etc.

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Jul 19, 2019

Author Member

Done.

}

// ValidateSpreadConstraintPair tests that if `pair` exists in `existingConstraintPairs`.
func ValidateSpreadConstraintPair(fldPath *field.Path, pair spreadConstraintPair, existingConstraintPairs []spreadConstraintPair) *field.Error {

This comment has been minimized.

Copy link
@alculquicondor

alculquicondor Jul 18, 2019

Member

Both the comment and the method name should describe what's being validated. Update once the discussion below resolves. Something like: ValidateSpreadConstraintDoesNotRepeat. And the comment would explain explicitly which keys are used to determine a repetition. And you should be able to iterate through the original constraints slice instead of creating a new array of pairs (up to the current index).

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Jul 19, 2019

Author Member

And you should be able to iterate through the original constraints slice instead of creating a new array of pairs (up to the current index).

Not really. We shouldn't re-iterate the visited constraints.

This comment has been minimized.

Copy link
@alculquicondor

alculquicondor Jul 19, 2019

Member

Why not? It would be just to check for repetitions. The complexity is the same. Unless you plan to replace []spreadConstraintPair by a map in the future. But we don't expect to have a big amount of constraints.

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Jul 19, 2019

Author Member

Complexity is basically the same

Not for the worst/average case.

If we check repeatedly, for the ith constraint, each time it's compared with all constraints. So in total it's n^2 times, and notice that we need additional logic to exclude the case it compares with itself.
If we only check visited constraints, in total it's 1+2+...+n = n^2 / 2. And we don't need to consider the case it compares with itself.

This comment has been minimized.

Copy link
@alculquicondor

alculquicondor Jul 19, 2019

Member

You just need to pass a subslice: constraints[:i]

{
name: "missing scheduling mode",
constraints: []core.TopologySpreadConstraint{
{MaxSkew: 1, TopologyKey: "k8s.io/zone"},

This comment has been minimized.

Copy link
@alculquicondor

alculquicondor Jul 18, 2019

Member

nit: you could use the well known label apis.labelZoneFailureDomainGA or simply define a constant for the test. This is not necessary, but it could help prevent typos.

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Jul 19, 2019

Author Member

That's a private variable, so not worth making it public and make a reference here.

This comment has been minimized.

Copy link
@alculquicondor

alculquicondor Jul 19, 2019

Member

Oh, I missed that. There is still v1.LabelZoneFailureDomain or simply a constant at the top of this file. But again, just a nit.

@bsalamat
Copy link
Member

left a comment

Only a minor comment. Otherwise, LGTM.

/retest


type spreadConstraintPair struct {
topologyKey string
scheduleAction core.UnsatisfiableConstraintAction

This comment has been minimized.

Copy link
@bsalamat

bsalamat Jul 18, 2019

Member

+1. It'd make it more consistent with the API, etc.

Huang-Wei added some commits Apr 26, 2019

EvenPodsSpread: regenerated API compatibility data
- UPDATE_COMPATIBILITY_FIXTURE_DATA=true go test ./vendor/k8s.io/api -run //HEAD &>/dev/null

@Huang-Wei Huang-Wei force-pushed the Huang-Wei:eps-api branch from 1869e09 to f9d49a6 Jul 19, 2019

@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented Jul 19, 2019

@bsalamat @alculquicondor The naming comment addressed. PTAL.

@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented Jul 19, 2019

/retest

@alculquicondor

This comment has been minimized.

Copy link
Member

commented Jul 19, 2019

Hopefully others don't disagree, but it's useful if you add more commits instead of amending. We can squash once the review is finalized.

@bsalamat
Copy link
Member

left a comment

/lgtm
/approve

Thanks a lot, @Huang-Wei for your exemplary dedication and great work!

@k8s-ci-robot k8s-ci-robot added the lgtm label Jul 19, 2019

@lavalamp

This comment has been minimized.

Copy link
Member

commented Jul 19, 2019

/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jul 19, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Huang-Wei, lavalamp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented Jul 19, 2019

/hold cancel

@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented Jul 19, 2019

/retest

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jul 19, 2019

@Huang-Wei: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-storage-slow b7c145f link /test pull-kubernetes-e2e-gce-storage-slow
pull-kubernetes-e2e-gce-csi-serial b7c145f link /test pull-kubernetes-e2e-gce-csi-serial
pull-kubernetes-conformance-image-test b7c145f link /test pull-kubernetes-conformance-image-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented Jul 20, 2019

/retest

@k8s-ci-robot k8s-ci-robot merged commit 323356f into kubernetes:master Jul 20, 2019

23 checks passed

cla/linuxfoundation Huang-Wei authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details

@Huang-Wei Huang-Wei deleted the Huang-Wei:eps-api branch Jul 24, 2019

@krmayankk

This comment has been minimized.

Copy link
Contributor

commented Jul 24, 2019

this is awesome thanks @Huang-Wei @bsalamat and @lavalamp for getting this merged

@alculquicondor alculquicondor referenced this pull request Jul 24, 2019

Closed

REQUEST: New membership for alculquicondor #1042

6 of 6 tasks complete

@liggitt liggitt moved this from In progress to API review completed, 1.16 in API Reviews Jul 29, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.