-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-1040: Update APF for borrowing by exempt priority levels #3906
KEP-1040: Update APF for borrowing by exempt priority levels #3906
Conversation
MikeSpreitzer
commented
Mar 10, 2023
- One-line PR description: Give exempt priority levels nominal concurrency and let them borrow from the others
- Issue link:
- Other comments: This is a design for how to respond to Exempt requests should still count into flowcontrol so other requests get pushed out kubernetes#94907 .
/lgtm For a day or two to give others time for feedback: |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lavalamp, MikeSpreitzer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the overall proposal, but I'm not convinced we actually want to expose configurability of it from get-go and not instead start-simple and introduce it if it appears to be needed in the future.
At the same time, the relevant new datatype was added. It is shown below. | ||
|
||
```go | ||
type ExemptPriorityLevelConfiguration struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially thought that we won't need any additional configuration for borrowing from exempt PL.
I understand the idea behind it (and reserving some capacity for exempt requests for all the time), but I'm wondering if it's not introducing additional cognitive complexity for users than it's worth it.
Given that you're not blocking dispatching of exempt requests anyway, there is still an ability of exceed capacity if number of exempt requests is growing over time.
So my personal preference would be to start simple and not introduce ExemptPriorityLevelConfiguration
at all for now, and just implicitly assume that:
NominalConcurrencyShares
for it is 0 [we start with 0, may grow over time due to borrowing]LendablePercent
is 0
If it appears that we need more configuration, this can be added in the future, but I would actually prefer to start simple and wait for the request to have it configurable that comes from the real usecase.
WDYT?
@MikeSpreitzer
@lavalamp @deads2k - for thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) I actually like giving the exempt requests some nominal concurrency. That means they will not always borrow, so in some sense they are less intrusive on the rest of the system.
(2) I think it is good to make the configuration explicit. Maybe if it is always zeroes then you could argue nothing needs to be explicit. But as I mentioned above, I like non-zero values.
(3) Anyone wanting to understand what APF is going is going to have to see the complexity of what the exempt requests are doing. Anyone who does not care can ignore it, even if the configuration for exempt requests is reified in API objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem a cluster operator wants solved is that an exempt priority level should not overwhelm the server, for that it does not seem very intuitive for an admin to tweak the the fields of ExemptPriorityLevelConfiguration
for an exempt PL.
can we try tackling this problem without adding any API fields? (I think @wojtek-t originally suggested the same). In the future, if need be, we can add new fields for exempt PL.
For now, we can assign a default CL to each exempt PL and have it borrow from other non-exempt PLs without any limit.
Also, are we going to reject a request from an exempt PL if it can't borrow or the SCL has been reached? (this would not be in keeping with the current behavior)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change does not introduce limiting to the exempt priority levels, and the revised text attempts to make that clear.
If I understand correctly, based on our latest conversation (#3906 (comment)), what concerned @wojtek-t the most was changing the concurrency limits for the non-exempt priority levels. Taking some of the server's concurrency from them by default would do that. Alternatively, dispatching could go over the server's concurrency limit by a certain amount by default --- but that would be confusing.
Another concern, not quite as strong, is the intrusiveness of the borrowing by the exempt priority level. If it has a nominal concurrency of zero, then every adjustment period it will borrow a unique value and the behavior of all the other priority levels will be tweaked. By allowing admins to configure a non-zero nominal concurrency for the exempt priority level, this makes it possible to not have the exempt priority level tweak all the non-exempt ones every time. By allow a certain amount to be borrowable, that allows the admins a safety margin; it means that nominal concurrency limit for exempt does not have to be the one right number that is great for every adjustment period.
Last week we discussed this. The real concern was that giving non-zero nominal concurrency shares to exempt priority levels by default will change the effect of the existing configurations (diluting the existing shares). So the agreement was to give the exempt priority levels zero (or one?) nominal concurrency shares by default and allow admins to change that if they want to. |
The latest push updates as described above. |
@@ -755,7 +763,7 @@ type ExemptPriorityLevelConfiguration struct { | |||
// | |||
// Bigger numbers mean a larger nominal concurrency limit, | |||
// at the expense of every other Limited priority level. | |||
// This field has a default value of 30. | |||
// This field has a default value of 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can change the default without revving the api, it will make config behave differently. (Imagine provisioning a new cluster with yaml manifests that rely on the default value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, this is not shared with the other options and is a new field, right? This is fine then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MikeSpreitzer - why do you suggest default of 1 instead of 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a change. See, this is in a new type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wojtek-t : brain fart. Yes, I think we can make the default zero.
See also the remark, currently at new lines 790--794, about hoisting the common fields into the containing struct. That will eliminate the need for ExemptPriorityLevelConfiguration
and make the default value for the hoisted NominalConcurrencyShares
field depend on the Type PriorityLevelEnablement
field (0 in one case, 30 in the other).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we can make the default zero.
So I would vote for making it 0.
See also the remark, currently at new lines 790--794, a
I'm aware of it, but we will not do that for existing API types - we can do that as part of promotion to v1.
And it's kind-of refactoring, we should preserve the default once we do that.
That's why I would like to use 0 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think the choice of default here matters to what happens in the refactoring; the refactoring can preserve whatever default we want. My point above was that doing so involves making the refactored defaulting logic have to look a the Type PriorityLevelEnablement
field. The only thing that would obviate that dependency would to make the default here match the default for non-exempt (30), which I think we are agreed we do not want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest revision sets this default to zero.
/lgtm |
```go | ||
// ExemptPriorityLevelConfiguration describes the configurable aspects | ||
// of the handling of exempt requests. | ||
// In the mandatory exemp configuration object the values in the fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo - exempt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks.
At the same time, the relevant new datatype was added. It is shown below. | ||
|
||
```go | ||
type ExemptPriorityLevelConfiguration struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem a cluster operator wants solved is that an exempt priority level should not overwhelm the server, for that it does not seem very intuitive for an admin to tweak the the fields of ExemptPriorityLevelConfiguration
for an exempt PL.
can we try tackling this problem without adding any API fields? (I think @wojtek-t originally suggested the same). In the future, if need be, we can add new fields for exempt PL.
For now, we can assign a default CL to each exempt PL and have it borrow from other non-exempt PLs without any limit.
Also, are we going to reject a request from an exempt PL if it can't borrow or the SCL has been reached? (this would not be in keeping with the current behavior)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment, other than that LGTM.
|
||
| Name | Nominal Shares | Lendable | | ||
| ---- | -------------- | -------- | | ||
| exempt | 30 | 50% | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be updated, given we actually changed this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, thanks. Fixed in latest revision.
Signed-off-by: Mike Spreitzer <mspreitz@us.ibm.com>
11cf0ec
to
ceb7f6f
Compare
The force-push to ceb7f6f was to prompt the k8s-ci-robot to reconsider my CNCF CLA agreement. |
/lgtm Let's have that merged, given @lavalamp approval above. /hold cancel |