Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP of pod affinity/anti-affinity supports Gt and Lt operators #890

Merged
merged 1 commit into from Apr 30, 2019

Conversation

wgliang
Copy link
Contributor

@wgliang wgliang commented Mar 12, 2019

As a follow-up of discussion kubernetes/kubernetes#72556 (comment)

/sig scheduling
/kind design
/priority important-longterm
cc/ @bsalamat @k82cn @ravisantoshgudimetla @misterikkit @resouer @Huang-Wei @liggitt

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. kind/design Categorizes issue or PR as related to design. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Mar 12, 2019
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 12, 2019
Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comments below.

Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comments below.


- if we use the original `Exists` operator to implement, then we will write affinity
like this:
```go
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to "yaml" to make it indent well

```
- if we use the original `In` operator to implement, then we will write affinity
like this:
```go
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to "yaml" to make it indent well

Both are not ideal solutions. A promising solution is to provide users with `Gt` and `Lt`
operators, giving users the ability to specify the scope of the tag. Users can achieve
this in this way:
```go
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to "yaml" to make it indent well

- labelSelector:
matchExpressions:
- key: foo
operator: Exists
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exists behaves like "whether a key exists". It has nothing to do with values.

So I think this sample can be deleted, just keep the latter "In" sample.

Copy link
Contributor Author

@wgliang wgliang Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@wgliang
Copy link
Contributor Author

wgliang commented Mar 13, 2019

@Huang-Wei Thank you for your review, I have updated based on your comment.

### Non-Goals

- Just add `Gt` and `Lt` operator functions without changing their original definition.
- Not changing the behavior of other label selectors, such as `Replicases`, `Daemonsets`, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Replicases

API of `PodSelector` is defined as below:

```go
type PodSelector struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leverage most of node affinity structures here instead of using new ones? I know that the existing ones are hardcoded with prefix Node but I believe they're generic enough for example NodeSelector, NodeSelectorRequirements, NodeSelectorOperators have the same fields and types that we could use for pod affinity too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @Huang-Wei also proposed before. The PodSelector was re-proposed to avoid confusion with NodeSelector. Because from the existing code, many checks and usages of LabelSelector and NodeSelector are inconsistent. I am worried about destroying its original usage habits.

But I think this is a reasonable choice, we can reuse the logic of NodeSelector from a new abstraction.

/cc @bsalamat needs some of your opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the APIs are the same and act in the same way, but given that those have a "Node" prefix, we cannot use them for Pod selection. Otherwise, our API would be very confusing. I like the already proposed solution better that we can create a common LabelSelectorRequirement that is used in both Node and Pod selectors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a backwards compatible change to change the name of the go structs, as those don't actually appear in user config (except for the top level objects like Pod). The field names cannot be changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change the name of go structs, the operators who is using it have to update the code accordingly, right?


Along with this feature, the biggest risk should be performance. This may also be the reason
why the `Gt` and `Lt` operators are not supported when Pod affinity/anti-affinity is
first proposed. In order to eliminate the user's panic, carefully measure the impact of these operators and understand their impact is mandatory. So we will perform performance/benchmark tests on this change and backward compatibility test.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/In order to eliminate the user's panic, carefully measure the impact of these operators and understand their impact is mandatory/In order to understand the impact of these changes, we need to understand the their performance implications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks.


### Non-Goals

- Just add `Gt` and `Lt` operator functions without changing their original definition.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed as a non-goal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, IMO, the definition of the operator is the same for Pod or Node, or any other entity object.
Do you think there is anything added?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also confused about why this needs to be stated. Who would expect us to change the definition of less than / greater than?


Along with this feature, the biggest risk should be performance. This may also be the reason
why the `Gt` and `Lt` operators are not supported when Pod affinity/anti-affinity is
first proposed. In order to understand the impact of these changes, we need to understand the their performance implications.So we will perform performance/benchmark tests on this change and backward compatibility test.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/.So we will/. We will/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

why the `Gt` and `Lt` operators are not supported when Pod affinity/anti-affinity is
first proposed. In order to understand the impact of these changes, we need to understand the their performance implications.So we will perform performance/benchmark tests on this change and backward compatibility test.

As part of the mitigations, the good news is that the performance of the entire scheduler has been greatly improved (https://github.com/kubernetes/kubernetes/pull/74041#issuecomment-466191359), and the processing of Pod affinity/anti-affinity has been surprisingly Optimized(https://github.com/kubernetes/kubernetes/pull/67788).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "As part the mitigation..." you may want to say "Compared to the time that inter-pod affinity was introduced, performance of the scheduler has been greatly improved..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thanks.

of `metav1.LabelSelector`:

```go
type PodAffinityTerm struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs an API review by our API experts.
I would personally prefer a generic approach that would work for other selectors as well, as opposed to a selector that works differently for a specific feature.

@kubernetes/api-reviewers @liggitt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, will update the design document.


### Test Plan

_To be filled until targeted at a release._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to cover both correctness and performance tests in the plan.

@wgliang wgliang force-pushed the master branch 2 times, most recently from 0a62fce to defb6cb Compare March 21, 2019 08:15
@wgliang
Copy link
Contributor Author

wgliang commented Mar 21, 2019

@ravisantoshgudimetla @bsalamat Thanks for your review, I have updated the content mentioned in the comments.
What needs special attention is to change the design and implementation of the API.

Copy link
Member

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @wgliang! Looks good to me. Given that this KEP proposes changes to existing API, we need an API review as well.

API of `PodSelector` is defined as below:

```go
type PodSelector struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the APIs are the same and act in the same way, but given that those have a "Node" prefix, we cannot use them for Pod selection. Otherwise, our API would be very confusing. I like the already proposed solution better that we can create a common LabelSelectorRequirement that is used in both Node and Pod selectors.

@bsalamat
Copy link
Member

bsalamat commented Apr 4, 2019

@kubernetes/api-reviewers

@wgliang
Copy link
Contributor Author

wgliang commented Apr 17, 2019

/ping @kubernetes/api-reviewers

@lavalamp lavalamp added this to 1.15, Unassigned in API Reviews Apr 17, 2019
@lavalamp
Copy link
Member

Added to the api review board https://github.com/orgs/kubernetes/projects/13

@lavalamp lavalamp moved this from 1.15, Unassigned to Assigned in API Reviews Apr 17, 2019
@lavalamp
Copy link
Member

/assign

LabelSelectorOpGt LabelSelectorOperator = "Gt"
LabelSelectorOpLt LabelSelectorOperator = "Lt"
)
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last thought: whatever design you come up with has to maintain backwards compatibility. Existing pods must continue to work the same without any changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this has also clarified in the past that the definition of Pod will not be changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand that response, this proposal definitely looks like it is modifying types which are part of a Pod?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backward compatibility is a requirement for this change. The intention of this KEP is that existing pod specs should work without needing any changes.

@lavalamp lavalamp moved this from Assigned to Changes requested in API Reviews Apr 17, 2019
@lavalamp
Copy link
Member

P.S. for better service from api reviewers next time, please see: https://github.com/kubernetes/community/blob/master/sig-architecture/api-review-process.md#mechanics

@wgliang wgliang force-pushed the master branch 3 times, most recently from 579bf71 to 9e07fc3 Compare April 19, 2019 12:15
- `NodeSelector` related tests
- `PodSelector` related tests
- `Gt` and `Lt` functional tests
- `LabelSelectorRequirement` related tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need tests ensuring that the new Gt/Lt operators are not permitted in "normal" selector fields (e.g. pod.spec.nodeSelector, etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is for sure, pod.spec.nodeSelector(https://github.com/kubernetes/kubernetes/blob/22c67bb113dee2ce0858075cd5af5d110c255985/pkg/apis/core/types.go#L2599) is a standalone map structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backwards compatibility - pods made with the new types should still be updatable if apiserver version is rolled back.
Forwards compatibility - all pods created today are wire-compatible (both proto and json) with the new api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your comment. But what needs to be clarified is that because we have added Gt and Lt operators, this does not exist in the previous API.

@wgliang
Copy link
Contributor Author

wgliang commented Apr 22, 2019

@liggitt @lavalamp @bsalamat @Huang-Wei
Thanks everyone for your comments, I have updated and replied to the comments. Any other questions, please comment here. Thanks.

As far as I know, the enhancement freeze is on April 30th. I hope to catch this train. :)

}
```

`LabelSelectorRequirement` is an abstract selector implementation of `NodeSelector` and `PodSelector`. And the `LabelSelectorRequirement` is defined as below:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried about this name, I think it needs to be more specific, since it is only appropriate in these two places, and not for use in general label selectors everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense, I will change to NodeAndPodSelectorRequirement.

}
```

We will use `LabelSelectorRequirement` to replace the original `NodeSelectorRequirement` as the base selector of `NodeSelector`. So it looks like this:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this change? Is anything wrong with the existing NodeSelectorRequirement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Because we want to use a new selector(without a Node prefix) as the base selector for NodeSelector and PodSelector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you can do that without modifying the existing NodeSelector?

- `NodeSelector` related tests
- `PodSelector` related tests
- `Gt` and `Lt` functional tests
- `LabelSelectorRequirement` related tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backwards compatibility - pods made with the new types should still be updatable if apiserver version is rolled back.
Forwards compatibility - all pods created today are wire-compatible (both proto and json) with the new api.

LabelSelectorOpNotIn LabelSelectorOperator = "NotIn"
LabelSelectorOpExists LabelSelectorOperator = "Exists"
LabelSelectorOpDoesNotExist LabelSelectorOperator = "DoesNotExist"
LabelSelectorOpGt LabelSelectorOperator = "Gt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this name needs to indicate that the string-typed value will first be interpreted as a number; one could read this as meaning lexically greater--actually that's the natural meaning since label values are strings. Also, typically we request that things be spelled out in the API, although we should probably be consistent. :/ If this were starting from scratch I'd suggest the name "NumericallyGreater". You can still change the name of the constant to e.g. "LabelSelectorNumericallyGreater".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE, Thank you very much for your help and corrections.

@wgliang wgliang force-pushed the master branch 2 times, most recently from 3dcf76b to 6e4cf7a Compare April 23, 2019 00:01
We will use `NodeAndPodSelectorRequirement` to replace the original `NodeSelectorRequirement` as the base selector of `NodeSelector`. So it looks like this:
```go
type NodeSelectorTerm struct {
MatchExpressions []NodeAndPodSelectorRequirement
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name it by what it does, not where it's used. SchedulerSupportedSelectorRequirement? NumericAwareSelectorRequirement? I have to run, maybe @liggitt can think of a better name.

NodeAndPod won't work for this, the name can't be about the things being selected, it needs to be about the selection process. Otherwise people will probably use the type wrongly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, please forgive my doubts. I don't particularly understand why mentioned numeric. Gt or Lt is not only able to compare numbers but also to compare letters, in other words it compares strings (including numbers).

I think SchedulerSupportedSelectorRequirement may be a possible choice, but inspired by you, I think SchedulerSelectorRequirement is more concise and appropriate. What do you think of it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gt or Lt is not only able to compare numbers but also to compare letters, in other words it compares strings (including numbers).

that is inconsistent with how these are used today:

https://github.com/kubernetes/kubernetes/blob/8ec6167f61463cbc6874514b02d0137d51b23542/staging/src/k8s.io/apimachinery/pkg/labels/selector.go#L151-L159

Are there compelling use cases for supporting lexical comparisons? It certainly makes the behavior less intuitive... e.g. what would the sort order of the following be?

1
1a
2
2a
10
20

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversation is exactly why it needs to be prefixed with either "Numeric" or "Lexical" (or if it's some hybrid, something explaining that). We shouldn't make users work hard to figure this out.

Lexical comparison implies that "10" < "2", I do not think users will want that behavior.

Numeric comparison implies that error handling is needed if either of the label values are not numbers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to numeric comparison + error checking. I don't think there are many users who need lexical comparison in the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I mean, in fact, we use NumericXXX is not suitable, because there are still operators such as "In", "Equals" and so on.

Therefore, I agree that Gt and Lt only work on numbers. Summarize the above discussion:

  1. Gt and Lt only work on numbers
  2. NodeAndPodSelectorRequirement changed to SchedulerSelectorRequirement

What do you think of this?
@lavalamp @liggitt @bsalamat

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In, Equals, etc all would behave the same whether the type is string or numeric. It's only ordered comparisons where the distinction becomes important.

"SchedulerSelectorRequirement" is not my favorite name, it's named for the component you expect to consume the thing. It is better than "NodeAndPod", but still: other components might eventually interpret this, the name isn't future-proof. It's also not at all obvious that "scheduler" here means "supports ordered comparisons on numeric values". That's really what you want the name to imply to people.

Copy link
Contributor Author

@wgliang wgliang Apr 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood.

Naming is really the hardest thing, I can't think of a better name than SchedulerSupportedNumericSelectorRequirement.
Do you have any other advice? @lavalamp @bsalamat

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ideas:

  • NumericAwareSelectorRequirement
  • NumericallyOrderedSelectorRequirement
  • AffinitySelectorRequirement (only if this is going to be used for all and only affinity selection features)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree to use NumericAwareSelectorRequirement, I have updated. Thanks a lot for comments and your review.

}
```

`PodSelector` is our new selector. A pod selector represents the union of the results of one or more label queries over a set of pods. The of `PodSelector` is defined as below:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new, and you're not changing any existing selectors to use this type?

If so, I suggest calling it "NumericAwareSelector" or something else. That will make it more clear when it is appropriate to reuse it. It will also let you correctly make the documentation explain the selector--that it is applied to pods is really a consequence of where it is used, not something intrinsic to the type.

@wgliang
Copy link
Contributor Author

wgliang commented Apr 30, 2019

@lavalamp @liggitt @bsalamat I don't know if this KEP can be merged before the end period. If it can, then it would be better.

@lavalamp
Copy link
Member

LGTM from api review perspective now

@k82cn
Copy link
Member

k82cn commented Apr 30, 2019

/lgtm
/approve

Mark the status according to the above discussion.

But I have a question that: if we change the name of go struct, the operators/customized-controllers has to change the code accordingly; do we need to keep such kind of backward compatibility? Golang support constructing nested struct without name, but it dependent on the coding style :).

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 30, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: k82cn, wgliang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 30, 2019
@k8s-ci-robot k8s-ci-robot merged commit a65f0d1 into kubernetes:master Apr 30, 2019
@liggitt liggitt moved this from Changes requested to Completed, 1.15 in API Reviews Apr 30, 2019
@bsalamat
Copy link
Member

Given that the goal of this KEP is to add new operators to an existing feature, we must ensure that our implementation is fully backward compatible. So, we should avoid changing names or refactoring existing API structs if backward compatibility is broken.

@lavalamp
Copy link
Member

lavalamp commented May 1, 2019

But I have a question that: if we change the name of go struct, the operators/customized-controllers has to change the code accordingly; do we need to keep such kind of backward compatibility? Golang support constructing nested struct without name, but it dependent on the coding style :).

The wire format is the thing we make guarantees about. People are used to us breaking client-go compatibility, unfortunately. As I tried to make clear in earlier comments, I don't see a compelling reason to reuse and therefore rename the existing types. I think it's fine to have duplicate types, and somewhat prefer that. But this is only advice, if you all in the scheduling SIG think it's worth breaking client-go users for prettier types, I'm not going to block it, and I do at least agree that the new name is an improvement.

I hope a next step on the journey to fixing our deps that @liggitt is taking us on includes beginning to be a good golang citizen and keeping compatibility guarantees on the libraries we offer, at which point a change like this would not be permitted for that reason (or would have to be maintained differently in older client-go branches).

@k82cn
Copy link
Member

k82cn commented May 1, 2019

if you all in the scheduling SIG think it's worth breaking client-go users for prettier types, I'm not going to block it

aha, "cyclic dependency" :)

we must ensure that our implementation is fully backward compatible

Agree with Bobby, we should keep backward compatible for this implementation. @wgliang , would you help to open a PR for that.

@wgliang
Copy link
Contributor Author

wgliang commented May 2, 2019

@wgliang , would you help to open a PR for that.

Yeah, of course. And I will seriously consider the above comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: API review completed, 1.15
Development

Successfully merging this pull request may close these issues.

None yet

8 participants