-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates default behavior for the request-management KEP to sync up with implementation #1214
Updates default behavior for the request-management KEP to sync up with implementation #1214
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: yue9944882 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
1. every requests from node group, i.e kubelet, kube-proxy. | ||
2. controller/scheduler leases | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original, this priority level was also intended for leader elections by system controllers. Was the removal of that intentional? If so, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the leader-election is all moving leases api, kubernetes/kubernetes#80289 it doesn't conflict with that. i can put my bandwidth to help moving this into 1.16 milestone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see a guarantee that this change will land in release 1.16. Even if it does land, these will still be for leader elections and leader elections is an older and more widely known concept. So it seems to me that the safe wording here would be to refer to leader elections, that is true and will remain true.
objects and (b) all the exhibited default configurations above. The | ||
system uses the preserving configuration as its initial state on | ||
launching when not all the default configuration present in the system | ||
and won't reload until ensuring/observing the preserving objects exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"reload" refers to an implementation detail, I think instead we should speak here in terms of the delivered behavior. See my example in Slack of language that is in terms of the delivered behavior rather than the implementation.
The default configurations will always be created everytime an | ||
kube-apiserver restarts. | ||
|
||
Additionally, if an administrator disagrees with the default settings and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is grammatically incomplete. It says "if X" but there is no "then" clause.
I suggest rewriting to be something like "Administrators who wish to operate with a different configuration can create, update, and delete additional configuration objects and edit the predefined ones; the predefined ones can not be deleted but they can be modified to minimize their effects. In particular, a FlowSchema can be modified to match no requests and a PriorityLevelConfiguration can be given AssuredConcurrencyShares=1 ". You might try the exercise of writing a FlowSchema that is guaranteed to match no requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated and added example for an unreachable flow-schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The grammatically incomplete sentence is still there. It and the one after it look redundant with the rest of the paragraph. I am not sure whether this is a matter of overlooked deletion or whether you think these first two sentences say something that the rest of the paragraph does not. If it is the latter, then my guess is that what these first two sentences add is a high-level overview. If that is the idea, here is a suggested replacement for the whole paragraph that attempts to do the whole job without incomplete sentences.
Administrators can choose to operate with a partially or completely different set of persistent configuration API objects than the default one; note that this does not affect the behavior of the filter before it has had a chance to read the persisted configuration API objects. Additional objects can be created, updated, and deleted. The predefined objects can be updated. The predefined objects can not be deleted; administrators can get almost the same effects by updating the predefined objects to minimize their effects. A PriorityLevelConfiguration can be set to have just 1 assured concurrency share; care should be taken to do this only after its queues have drained, otherwise this setting may greatly slow down the rate at which the queues drain. A FlowSchema can be updated to match no requests; following is an example.
Note also that kubernetes/kubernetes#81003 does not have the problem of cutting down a priority level's concurrency limit prematurely. A priority level can be deleted but each corresponding priorityLevelState retains its assured concurrency shares until it is unused and removed. I just reviewed the KEP and noticed that it does not explicitly address the concurrency limit imposed for a lingering priority level but suggests something a little different from what that PR does. I think the KEP suggests that a lingering priorityLevelState retains the concurrency limit computed when it was last desired. I chose to do what PR 81003 does instead because that makes less of a bump in total allowed concurrency.
3a480ad
to
9483fba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this PR changes more than its title identifies. In addition to modifying the default behavior, this PR also changes the way a FlowSchema describes which requests it matches --- to match the current implementation, which is not bad --- and also changes the design of the set of FlowSchemas, changing some classification and prioritization decisions. To be sure, some of that is necessary because the current implementation is not as expressive as the original design. This PR makes more changes that are necessitated by the implementation.
I have a few suggestions on terminology. The word "preserving" is used to describe what would be better described as "initial" objects; these are the objects that the filter uses before it has had a chance to read the API objects that actually exist. The word "default" is used to describe objects that are created by default and may not be deleted; for those, "preserved" would be better but "predefined" is the more usual term for such things.
Also, it might be better if the associated FlowSchema and PriorityLevelConfiguration objects had the same name. It is perhaps a little more complex than necessary to have the "system-top" FlowSchema map to the "exempt" PriorityLevelConfiguration; why not give them both the name "exempt"? Similarly, it is perhaps a little more complex than necessary to have the "catch-all" FlowSchema map to the "default" PriorityLevelConfiguration; why not call them both "default", or call them both "catch-all"?
This revsion does not insist that the PriorityLevelConfiguration named "exempt" has Spec.Exempt==true
but I suspect that is intended.
This revision does not forbid creating a configuration in which some requests match no FlowSchema object, and does not say what happens to such requests. There are a few ways we could deal with this. One would be to forbid updating the matching rules and the priority level reference of the catch-all FlowSchema. Another would be to define that if a request matches no FlowSchema then it is given a flow identifier of {"backstop", reqInfo.Namespace}
and mapped to the catch-all priority lefvel. Yet another would be to say that admins can indeed create a configuration that does not match all requests and that unmatched requests are rejected.
|
||
1. every requests from node group, i.e kubelet, kube-proxy. | ||
2. controller/scheduler leases | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see a guarantee that this change will land in release 1.16. Even if it does land, these will still be for leader elections and leader elections is an older and more widely known concept. So it seems to me that the safe wording here would be to refer to leader elections, that is true and will remain true.
The default configurations will always be created everytime an | ||
kube-apiserver restarts. | ||
|
||
Additionally, if an administrator disagrees with the default settings and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The grammatically incomplete sentence is still there. It and the one after it look redundant with the rest of the paragraph. I am not sure whether this is a matter of overlooked deletion or whether you think these first two sentences say something that the rest of the paragraph does not. If it is the latter, then my guess is that what these first two sentences add is a high-level overview. If that is the idea, here is a suggested replacement for the whole paragraph that attempts to do the whole job without incomplete sentences.
Administrators can choose to operate with a partially or completely different set of persistent configuration API objects than the default one; note that this does not affect the behavior of the filter before it has had a chance to read the persisted configuration API objects. Additional objects can be created, updated, and deleted. The predefined objects can be updated. The predefined objects can not be deleted; administrators can get almost the same effects by updating the predefined objects to minimize their effects. A PriorityLevelConfiguration can be set to have just 1 assured concurrency share; care should be taken to do this only after its queues have drained, otherwise this setting may greatly slow down the rate at which the queues drain. A FlowSchema can be updated to match no requests; following is an example.
Note also that kubernetes/kubernetes#81003 does not have the problem of cutting down a priority level's concurrency limit prematurely. A priority level can be deleted but each corresponding priorityLevelState retains its assured concurrency shares until it is unused and removed. I just reviewed the KEP and noticed that it does not explicitly address the concurrency limit imposed for a lingering priority level but suggests something a little different from what that PR does. I think the KEP suggests that a lingering priorityLevelState retains the concurrency limit computed when it was last desired. I chose to do what PR 81003 does instead because that makes less of a bump in total allowed concurrency.
ones; the predefined ones can not be deleted but they can be modified to | ||
minimize their effects. In particular, to mute a FlowSchema, you can modify | ||
to match no requests. To mute a PriorityLevelConfiguration modify | ||
AssuredConcurrencyShares=1. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"mute" is not quite accurate. Giving a priority level AssuredConcurrencyShares=1 makes it have a small effect but it is a non-zero effect, so I do not think that qualifies as "mute". That is why I wrote "minimize their effects" instead.
https://github.com/kubernetes/kubernetes/blob/3edefea7e20fbbc92af4613af62b0c7dbc21d717/staging/src/k8s.io/api/flowcontrol/v1alpha1/types.go#L205 prohibits setting AssuredConcurrencyShares=0 for a non-exempt priority level. If we change that to allowing AssuredConcurrencyShares=0 then there would be a way to make a non-empty priority level have zero effect.
Also, even if we could make these objects truly have no effects then the word "mute" would still be a bit awkward; this is an unusual usage of that word. I think more familiar wording would be something like "make these objects inconsequential" or "make these objects have no effect".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a reworded replacement for the paragraph that starts with the incomplete sentence, not referring to "the filter".
Administrators can choose to operate with a partially or completely different set of persistent configuration API objects than the default one; note that this does not affect the behavior of the apiserver before it has had a chance to read the persisted configuration API objects. Additional objects can be created, updated, and deleted. The predefined objects can be updated. The predefined objects can not be deleted; administrators can get almost the same effects by updating the predefined objects to minimize their effects. A PriorityLevelConfiguration can be set to have just 1 assured concurrency share; care should be taken to do this only after its queues have drained, otherwise this setting may greatly slow down the rate at which the queues drain. A FlowSchema can be updated to match no requests; following is an example.
Additionally, if an administrator disagrees with the default settings and | ||
feel like replacing with a complete new set. To avoid re-creation of the | ||
system defaults above, the administrator is expected to update the objects | ||
to mute those configurations doesn't meet expectation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a matter of "meet expectation"; it is a matter of whether the administrators want these objects or not. I would say something like "the administrators have to update the undesired objects to have no effects".
@@ -745,11 +756,12 @@ spec: | |||
queueLengthLimit: 1000 | |||
``` | |||
|
|||
For user requests from kubectl. | |||
For normal workloads: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original intent here was to distinguish user requests issued using kubectl from all other normal workload requests. This revision replaces (A) "user requests from kubectl" vs "requests from controllers processing workload" with (B) "normal workloads" vs "requests from controllers processing workload". I think that "requests from controllers processing workload" is part of "normal workloads", so that makes the wording for (B) problematic. Also, this revision replaces two names that include "workload" with one name that is simply "workload" and another that does not include "workload" but is associated with a meaning that involves "workload". If we are going to insist on one name that does not include the word "workload" but has a meaning that does involve workload then I suggest the other name be one that identifies its distinction (e.g., "workload-controllers"). This revision also moves much of the workload to "system-low"; in the original, "system-low" was deliberately very restricted, to just the garbage collector, and the other controllers were assigned to "workload-low".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i renamed all default objects. now they're:
- FS
exempt
=> PLexempt
- FS
nodes
,leader-election
=> PLsystem-high
- FS
kube-controller-manager
,kube-scheduler
=> PLsystem-low
- FS
serviceaccounts
=> PLworkload
- FS
default
=> PLdefault
is it cleaner ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did another review to respond.
Aligning the predefined names is good.
I see a new idea here, using only one rule per FlowSchema so we can use the flow schema's name to document the intent of the rule. Perhaps a better way would be to add a comment or description field to a rule?
``` | ||
|
||
```yaml | ||
kind: FlowSchema | ||
meta: | ||
name: workload-high | ||
name: workload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original design was that the workload consists of two parts: kubectl requests from users and requests from controller. The original design prioritized kubectl requests higher than controller requests. The revised design prioritizes the controller requests higher (in "system-low"); I think this is not an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's better we keep it simple in the alpha stage. users can customize their own settings theirself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really a matter of simplicity. We already made an estimate of what we think will be a good configuration. I do not think it is an improvement to switch to a less good configuration just because it can be overridden.
The KEP should clearly identify that there are two "default" behavior problems, or perhaps better described as initial behavior or initialization problems. One is initializing a cluster. The other is how the filter behaves before the persisted configuration objects are read. There may also be a third "default behavior" problem, depending on whether the rest of the design allows it to be posed: after cluster initialization, after filter startup transients, after administrators make whatever changes they want to the persisted configuration objects, what happens to a a request that does not match any persisted FlowSchema? |
5ad038b
to
f02ea9c
Compare
subjects: | ||
- kind: Group | ||
name: "system:masters" | ||
- rule: # Delegated auth[n/z] requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the matching condition here could be tightened up. Have a look at the example data at kubernetes/kubernetes#81224 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, ['authentication.k8s.io', 'authorization.k8s.io']
is sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that no other requests involve those API groups? I doubt that's true. I am suggesting that we be more discriminating here, giving this exceptionally high priority only to requests that really need it, as best we can distinguish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the group is dedicated for delegated authn/authz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. OK then. We could still tighten up a bit, by requiring that the user be a service account --- but that still leaves a lot of access to this high priority.
distinguisherMethod: | ||
type: ByUser | ||
rules: | ||
- rule: # All resource requests from node group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at what the KEP says right now, it calls for "heartbeats by nodes" and "kubelet and kube-proxy ops on system objects". That qualifier "on system objects" was deliberate. The idea was to give higher priority to maintenance of system things (e.g., pods, services) than for regular things. But since our current models can not discriminate on object namespace, that qualifier can not be expressed. That is one of the reasons for kubernetes/kubernetes#81637 .
- rule: # Leader-election requests from kube-controller-manager/kube-scheduler | ||
verbs: ['*'] | ||
apiGroups: ['coordination.k8s.io'] | ||
resources: ['leases'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leader elections are not done this way yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that was not quite right. Leader elections are not all necessarily done that way now. https://github.com/kubernetes/client-go/blob/master/tools/leaderelection/resourcelock/interface.go::New still supports configmaps and endpoints. Since this flow schema is intended to match leader elections of system controllers, the question is whether they all use the new way (leases).
``` | ||
|
||
```yaml | ||
kind: FlowSchema | ||
meta: | ||
name: workload-high | ||
name: kube-manager-controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name covers only part of what is matched. Did you mean to leave the scheduler in here? It is also has a flow schema dedicated to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ouch removed
resources: ['*'] | ||
subjects: | ||
- kind: User | ||
name: "system:kube-controller-manager" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kube-controller-manager has two modes. In one mode, all the controllers appear as the same user. In the other mode, each controller appears as a distinct user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added service-account rules for matching controller-manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, that was an improvement.
distinguisherMethod: | ||
type: ByUser | ||
rules: | ||
- rule: # All resource requests from kube-controller-manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the KEP currently, almost all of these are given much lower priority. They are part of "workload-low", while kubectl requests from users are prioritized more highly at "workload-high". I think moving controller requests above user kubectl requests is not an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's difficult to wirte a rule for "kubectl requests".. kubectl is just a tooling, whether use kubectl or not isn't related with the user identity..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and i renamed system-low
to workload-high
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that kubectl is not directly identified in the request attributes but the complement is nearly true. After logically higher precedence levels have identified other stuff, what is left is workload. That mainly consists of requests from controllers --- running in-cluster using a service account --- and requests from users using kubectl. So the latter can be identified by subtracting out the former. That is the approach you have been seeing from me since I had to start coping with the lack of negation in the matching language.
f02ea9c
to
b1a2341
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend splitting this PR into two. The first PR focuses on the changes regarding matching/defaulting/initializing/preserving, and the second PR focuses on updating the exhibited configuration. The first PR will have to update the exhibited configuration to rename "system-top" to "exempt" and rename "workload-low" to "catch-all" but will not have to make any other changes to the exhibited config, not even to switch the matching language. I recommend keeping these two issues separate.
For the PR that updates the matching/defaulting/preserving, there are unanswered questions here. I think answering those is more urgent than tweaking the non-exempt non-catchall config objects.
|
||
1. requests from system-privileged group, including kubectl requests from admin and | ||
apiserver loopback requests. | ||
2. delegated authentication/authorization API groups, including delegated delegated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo here: "delegated" is repeated.
resources: ['*'] | ||
subjects: | ||
- kind: User | ||
name: "null" # or assign it any unreachable username |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any invalid usernames?
Looking at the validation now, I see gaps in validation of AIPGroup, Resources, and NonResourceURLs.
I think we have agreed to split this into two PRs; see #1221 for my attempt to draft one of those two parts. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig api-machinery