Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-1040: Update APF for borrowing by exempt priority levels #3906

Merged
merged 4 commits into from
Apr 12, 2023

Conversation

MikeSpreitzer
Copy link
Member

  • One-line PR description: Give exempt priority levels nominal concurrency and let them borrow from the others
  • Issue link:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 10, 2023
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Mar 10, 2023
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 10, 2023
@MikeSpreitzer
Copy link
Member Author

/unassign @lavalamp
/unassign @deads2k
/unassign @tkashem
/unassign @wojtek-t

@MikeSpreitzer
Copy link
Member Author

/cc @lavalamp
/cc @deads2k
/cc @tkashem
/cc @wojtek-t

@lavalamp
Copy link
Member

/lgtm
/approve

For a day or two to give others time for feedback:
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 10, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 10, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, MikeSpreitzer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2023
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the overall proposal, but I'm not convinced we actually want to expose configurability of it from get-go and not instead start-simple and introduce it if it appears to be needed in the future.

At the same time, the relevant new datatype was added. It is shown below.

```go
type ExemptPriorityLevelConfiguration struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought that we won't need any additional configuration for borrowing from exempt PL.

I understand the idea behind it (and reserving some capacity for exempt requests for all the time), but I'm wondering if it's not introducing additional cognitive complexity for users than it's worth it.

Given that you're not blocking dispatching of exempt requests anyway, there is still an ability of exceed capacity if number of exempt requests is growing over time.

So my personal preference would be to start simple and not introduce ExemptPriorityLevelConfiguration at all for now, and just implicitly assume that:

  • NominalConcurrencyShares for it is 0 [we start with 0, may grow over time due to borrowing]
  • LendablePercent is 0

If it appears that we need more configuration, this can be added in the future, but I would actually prefer to start simple and wait for the request to have it configurable that comes from the real usecase.

WDYT?

@MikeSpreitzer
@lavalamp @deads2k - for thoughts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) I actually like giving the exempt requests some nominal concurrency. That means they will not always borrow, so in some sense they are less intrusive on the rest of the system.

(2) I think it is good to make the configuration explicit. Maybe if it is always zeroes then you could argue nothing needs to be explicit. But as I mentioned above, I like non-zero values.

(3) Anyone wanting to understand what APF is going is going to have to see the complexity of what the exempt requests are doing. Anyone who does not care can ignore it, even if the configuration for exempt requests is reified in API objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem a cluster operator wants solved is that an exempt priority level should not overwhelm the server, for that it does not seem very intuitive for an admin to tweak the the fields of ExemptPriorityLevelConfiguration for an exempt PL.

can we try tackling this problem without adding any API fields? (I think @wojtek-t originally suggested the same). In the future, if need be, we can add new fields for exempt PL.
For now, we can assign a default CL to each exempt PL and have it borrow from other non-exempt PLs without any limit.

Also, are we going to reject a request from an exempt PL if it can't borrow or the SCL has been reached? (this would not be in keeping with the current behavior)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change does not introduce limiting to the exempt priority levels, and the revised text attempts to make that clear.

If I understand correctly, based on our latest conversation (#3906 (comment)), what concerned @wojtek-t the most was changing the concurrency limits for the non-exempt priority levels. Taking some of the server's concurrency from them by default would do that. Alternatively, dispatching could go over the server's concurrency limit by a certain amount by default --- but that would be confusing.

Another concern, not quite as strong, is the intrusiveness of the borrowing by the exempt priority level. If it has a nominal concurrency of zero, then every adjustment period it will borrow a unique value and the behavior of all the other priority levels will be tweaked. By allowing admins to configure a non-zero nominal concurrency for the exempt priority level, this makes it possible to not have the exempt priority level tweak all the non-exempt ones every time. By allow a certain amount to be borrowable, that allows the admins a safety margin; it means that nominal concurrency limit for exempt does not have to be the one right number that is great for every adjustment period.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 13, 2023
@MikeSpreitzer
Copy link
Member Author

Last week we discussed this. The real concern was that giving non-zero nominal concurrency shares to exempt priority levels by default will change the effect of the existing configurations (diluting the existing shares). So the agreement was to give the exempt priority levels zero (or one?) nominal concurrency shares by default and allow admins to change that if they want to.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Mar 20, 2023
@MikeSpreitzer
Copy link
Member Author

The latest push updates as described above.

@@ -755,7 +763,7 @@ type ExemptPriorityLevelConfiguration struct {
//
// Bigger numbers mean a larger nominal concurrency limit,
// at the expense of every other Limited priority level.
// This field has a default value of 30.
// This field has a default value of 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can change the default without revving the api, it will make config behave differently. (Imagine provisioning a new cluster with yaml manifests that rely on the default value)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, this is not shared with the other options and is a new field, right? This is fine then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MikeSpreitzer - why do you suggest default of 1 instead of 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a change. See, this is in a new type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtek-t : brain fart. Yes, I think we can make the default zero.

See also the remark, currently at new lines 790--794, about hoisting the common fields into the containing struct. That will eliminate the need for ExemptPriorityLevelConfiguration and make the default value for the hoisted NominalConcurrencyShares field depend on the Type PriorityLevelEnablement field (0 in one case, 30 in the other).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can make the default zero.

So I would vote for making it 0.

See also the remark, currently at new lines 790--794, a

I'm aware of it, but we will not do that for existing API types - we can do that as part of promotion to v1.
And it's kind-of refactoring, we should preserve the default once we do that.
That's why I would like to use 0 here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think the choice of default here matters to what happens in the refactoring; the refactoring can preserve whatever default we want. My point above was that doing so involves making the refactored defaulting logic have to look a the Type PriorityLevelEnablement field. The only thing that would obviate that dependency would to make the default here match the default for non-exempt (30), which I think we are agreed we do not want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest revision sets this default to zero.

@lavalamp
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 20, 2023
```go
// ExemptPriorityLevelConfiguration describes the configurable aspects
// of the handling of exempt requests.
// In the mandatory exemp configuration object the values in the fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo - exempt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

At the same time, the relevant new datatype was added. It is shown below.

```go
type ExemptPriorityLevelConfiguration struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem a cluster operator wants solved is that an exempt priority level should not overwhelm the server, for that it does not seem very intuitive for an admin to tweak the the fields of ExemptPriorityLevelConfiguration for an exempt PL.

can we try tackling this problem without adding any API fields? (I think @wojtek-t originally suggested the same). In the future, if need be, we can add new fields for exempt PL.
For now, we can assign a default CL to each exempt PL and have it borrow from other non-exempt PLs without any limit.

Also, are we going to reject a request from an exempt PL if it can't borrow or the SCL has been reached? (this would not be in keeping with the current behavior)

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 27, 2023
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, other than that LGTM.


| Name | Nominal Shares | Lendable |
| ---- | -------------- | -------- |
| exempt | 30 | 50% |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated, given we actually changed this :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks. Fixed in latest revision.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 28, 2023
Signed-off-by: Mike Spreitzer <mspreitz@us.ibm.com>
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 11, 2023
@MikeSpreitzer
Copy link
Member Author

The force-push to ceb7f6f was to prompt the k8s-ci-robot to reconsider my CNCF CLA agreement.

@wojtek-t
Copy link
Member

/lgtm

Let's have that merged, given @lavalamp approval above.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 12, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 12, 2023
@k8s-ci-robot k8s-ci-robot merged commit 1f932b6 into kubernetes:master Apr 12, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants