Admission chain for underlying resource is not called for requests to /scale subresource #84530

jennybuckley · 2019-10-29T18:20:20Z

What happened:
When a user makes a modifying request to an object through the /scale subresource, the /scale admission chain is called, but mutating and validating admission for the underlying resource are not. This introduces some unexpected behavior and also makes certain valid use cases impossible to enforce consistently.

For example, if a user wants to register a webhook on Deployments which prevents the total resource limits of the deployment from exceeding a certain amount, by validating that the product of replicas*memory is below a certain amount, this is not possible. Because a user could always increase the replicas through /deployments/scale, and bypass the validating webhooks registered for /deployments.

What you expected to happen:
The mutating and validating admission for the underlying resource would be called when making a request to the /scale subresource

How to reproduce it (as minimally and precisely as possible):
Register a webhook for /deployments which prevents changing replicas
Make a request to /deployments/scale

/cc @apelisse

jennybuckley · 2019-10-29T19:27:09Z

/sig api-machinery
/priority help-wanted
/cc @lavalamp @mariantalla

This is related to (one of the causes of) #82046 "Server-side Apply: Ownership not tracked for scale subresource"

k8s-ci-robot · 2019-10-29T19:27:10Z

@jennybuckley: The label(s) priority/help-wanted cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/sig api-machinery
/priority help-wanted
/cc @lavalamp @mariantalla

This is related to (one of the causes of) #82046 "Server-side Apply: Ownership not tracked for scale subresource"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jennybuckley · 2019-10-29T19:30:34Z

/priority backlog
/help

liggitt · 2019-10-29T20:32:01Z

This probably isn't well-defined enough to be marked as help wanted (especially the Clear Task and Low Barrier to Entry requirements). Need to think through the implications on existing webhooks before making changes here.

liggitt · 2019-10-29T20:48:10Z

/remove-help

jennybuckley · 2019-10-29T22:20:49Z

@liggitt I should have read https://github.com/kubernetes/community/blob/master/contributors/guide/help-wanted.md first, I assumed Good First Issue was the only one that needed those requirements

As for the implications on existing webhooks, I think the only safe thing to do is to call both the /x/scale and /x admission when a user makes a request to /x/scale. Then the decision would be which of these orders to call them in:
Mutating(/x/scale) -------> Mutating(/x) -----> Validating(/x/scale) ---> Validating(/x)
Mutating(/x/scale) ---> Validating(/x/scale) -----> Mutating(/x) -------> Validating(/x)

lavalamp · 2019-10-29T22:31:26Z

Implicitly this would mean that a mutating webhook might change some part of the object other than the .spec.replicas field, even though the user called the /scale subresource, which is surprising but I'm not sure how it could be any other way.

apelisse · 2019-10-29T22:32:51Z

Implicitly this would mean that a mutating webhook might change some part of the object other than the .spec.replicas field, even though the user called the /scale subresource, which is surprising but I'm not sure how it could be any other way.

Also, the changes are typically shown because the object is returned to the client. In this case, the scale object is returned to the client.

lavalamp · 2019-10-29T22:34:22Z

Mutating(/x/scale) ---> Validating(/x/scale) -----> Mutating(/x) -------> Validating(/x)

Validating & Mutating webhooks can't be interleaved, I think, so this isn't an option.

liggitt · 2019-10-30T02:12:50Z

Implicitly this would mean that a mutating webhook might change some part of the object other than the .spec.replicas field, even though the user called the /scale subresource, which is surprising but I'm not sure how it could be any other way.

That seems really problematic… the whole point of the scale subresource is that only the replicas field will change

liggitt · 2019-10-30T02:29:06Z

I think the only safe thing to do is to call both the /x/scale and /x admission when a user makes a request to /x/scale.

A few questions to explore the implications of that:

What user would be passed in the admission attributes to the /x admission call (the original user or the apiserver loopback user)?
Would you expect the same double-call approach for /status?
(already raised above) what would happen if a mutating admission plugin for /x modified a field other than replicas?

deads2k · 2019-10-30T13:12:33Z

I can see value in adjusting this behavior, but it is a significant behavioral change for admission plugin authors. They should be relying on diffs of old and new objects to make decisions about whether an action is allowed, but I think we can be fairly certain they aren't all doing it correctly. How would you phase this in to be safe for existing consumers? Just an opt-in or would we be one of the first v2's?

As Jordan pointed out, the same logical problem exists for some (but not all) other subresources, so that would need to be crisped up.

julianvmodesto · 2019-12-02T02:11:24Z

Results of the discussion from the sig-apimachinery meeting on Nov 6:

Then the decision would be which of these orders to call them in:
Mutating(/x/scale) -------> Mutating(/x) -----> Validating(/x/scale) ---> Validating(/x)
Mutating(/x/scale) ---> Validating(/x/scale) -----> Mutating(/x) -------> Validating(/x)

Run the mutating admission controllers first for the subresource then for the parent resource. Optionally, invoke the mutating admissions controllers twice in that same order. Finally run all validating admission controllers in parallel.
Mutating(/x/scale) -------> Mutating(/x) -----> Validating(/x/scale) + Validating(/x)

What user would be passed in the admission attributes to the /x admission call (the original user or the apiserver loopback user)?

The API webhook configuration for the parent resource must opt-in to observe changes to the parent from specific subresources e.g. includeSubresources: ["*"]. These subresources that change the parent resource will call the parent resource's webhooks.

Thus, the webhook on the parent resource will receive the actual user for the subresource event.

(already raised above) what would happen if a mutating admission plugin for /x modified a field other than replicas?

This is unresolved: can mutating webhooks change fields that cannot be changed via a normal API call to that subresource?

Options:

No, disallow by discarding changes to other fields. This is the current state today.
No, disallow by entirely failing the webhook that attempts to change other fields.
Yes, allow webhooks to change fields outside the normal resource's restrictions. This changes a current expectation that a webhook cannot only do things to an API object a normal API request could do.

tedyu · 2019-12-12T22:05:01Z

No, disallow by entirely failing the webhook that attempts to change other fields.

I would vote for the above option.
It is not desirable to only fulfill part of the request.

julianvmodesto · 2020-01-23T17:21:11Z

To re-summarize the current solution for this issue:

add the field includeSubresources to MutatingWebhookConfiguration and ValidatingWebhookConfiguration to call the webhook on both the resource and its subresources in the following order: Mutating(/x/scale) -------> Mutating(/x) -----> Validating(/x/scale) + Validating(/x)

Decide:

Do we fail a mutating webhook on a subresource if it attempts to implicitly change fields on the parent?

lavalamp · 2020-01-23T18:30:27Z

Do we fail a mutating webhook on a subresource if it attempts to implicitly change fields on the parent?

Option 1: silently drop such changes; if that would leave the object in an inconsistent or bad state, a validating webhook can fail the entire operation.
Option 2: fail any mutating webhook attempting to modify fields not included in the subresource.
Option 3: permit such changes when done by webhooks, even though the subresource caller can't make them.

Option 1 and 2 both have the con that user changes that used to work will stop working when the webhook is in the path, and the end user (the one using the subresource) has no recourse. Option 1 would permit more such changes to go through, at the cost of producing (from the webhook's perspective) objects with partially applied changes.

Option 3 has the con that changes that used to be confined to known fields could have affects on more and different fields now.

Personally, I think that webhooks are highly privileged, and therefore we can trust them to make changes that we would not trust the subresource caller to make, and therefore option 3 is best. We can formalize this even more by e.g. explicitly giving webhooks identities and doing an RBAC check for update on the parent resource. If the webhook has this permission, then use option 3. If the webhook lacks this permission, then I guess option 2 is clearer than option 1.

@deads2k and @liggitt probably have thoughts.

liggitt · 2020-01-23T18:57:13Z

Option 1:

does the least violence to the REST handling stack
is the most consistent with current behavior and expectations
is nicest for the end user

Option 2:

requires plumbing knowledge about which fields are not writeable via which endpoints (currently held in each REST strategy) to the admission webhook (not just for subresources... a similar issue exists if an admission webhook tries to set status fields when admitting a write via the main resource)
is the clearest for the webhook author

Option 3:

requires distinguishing between admission-set fields and user-set fields in REST strategy and validation code
and/or requires reordering mutating admission relative to those operations
assuming we don't trust admission webhooks enough to skip validation altogether, would require doing additional validation - in some of our status update paths, we only validate metadata and status, since the spec is a no-op for status subresource API calls (currently guaranteed by the strategy copying the spec from the existing object to the new object)

I would find status fields changing in spec updates (or vice-versa) in ways that could not be accomplished simply via the API very confusing.

lavalamp · 2020-01-23T19:00:06Z

Let me modify option 3-- the status/spec distinction is solid enough that it should probably always be handled via option 1 or 2; no existing webhook should be attempting to set both.

I had in mind things like /scale, not status, for option 3.

lavalamp · 2020-01-23T19:02:32Z

Since this is still in the beta period and we have plenty to do, I will vote for option 1 for the purpose of expediency. Option 2 and 3 will be more work, therefore we should wait until we have evidence that option 1 isn't sufficient.

julianvmodesto · 2020-02-04T17:56:31Z

Ok so next step, now that we are closer to a decision we might want an updated or new KEP to document this since among the changes there's a new field includeSubresources in the webhook configuration.

fejta-bot · 2020-07-16T04:14:45Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

nikhita · 2020-07-16T04:21:07Z

/remove-lifecycle stale

fejta-bot · 2020-10-14T05:18:00Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

julianvmodesto · 2020-10-20T01:30:30Z

/remove-lifecycle stale

mt-inside · 2021-01-02T15:13:58Z

Hi, I'm also looking into this area. Not quite the same use-case, but I want to mutate things about a Pod depending on the scheduler's placement of it (Binding create).

A change of the nodeName field on the Pod doesn't cause the Pod to go through admission (binding is a weird one that's called a subresource but isn't a sub-tree of the Pod object). All I do see is a create of a binding object, but that doesn't give the rest of the Pod to inspect or mutate (I could go mutate it as a side effect, but that puts me in a race with the kubelet, and I don't want it running until I've made my changes).

If anyone has any opinions on this I'd be interested? I'd also like to continue my own research, but I can't find the "apply" KEP being talked about? It's not server side apply I guess. The mooted includeSubresources flag sounds like exactly what I want! Did it get dropped, cause google's not finding anything, and it's not in the v1 fields.

apelisse · 2021-01-06T23:09:48Z

The feature started before the KEP process, so our KEP is mostly work-in-progress. I'm trying to get people to update the KEP as we implement specific features for it, but it's still far from complete.

This feature/bug is mostly unrelated to server-side apply, and the api-expression working group is focused on graduating server-side apply to GA at this time.

If you want to get involved in solving this problem, feel free to let us know so that we can discuss the next steps!

mt-inside · 2021-01-06T23:15:46Z

Yeah I'll take a look, shall we have a chat? 🙂

apelisse · 2021-01-06T23:17:15Z

I'd love it! Happy to talk on Slack (apelisse@) whenever you want :-) Also feel free to join #wg-api-expression

fejta-bot · 2021-04-06T23:35:39Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fejta-bot · 2021-05-06T23:56:38Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

julianvmodesto · 2021-05-07T00:01:22Z

/remove-lifecycle rotten

k8s-triage-robot · 2021-08-05T01:01:07Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

k8s-triage-robot · 2021-09-04T01:07:54Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

lavalamp · 2021-09-07T21:14:17Z

/lifecycle frozen

Bot, these issues don't magically fix themselves and closing them doesn't make them go away...

jennybuckley added the kind/bug Categorizes issue or PR as related to a bug. label Oct 29, 2019

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 29, 2019

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 29, 2019

k8s-ci-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Oct 29, 2019

liggitt closed this as completed Oct 29, 2019

liggitt reopened this Oct 29, 2019

k8s-ci-robot removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Oct 29, 2019

mariantalla mentioned this issue Oct 30, 2019

[WIP] Set ownership of spec.replicas to scale subresource #83444

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 16, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 16, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 14, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 20, 2020

nodo mentioned this issue Oct 27, 2020

Track ownership for scale subresource of deployments #95921

Closed

yastij mentioned this issue Jan 8, 2021

Scaling KubeadmControlPlane from 3 to 4 replicas is allowed by the validating webhook kubernetes-sigs/cluster-api#4057

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 6, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 6, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 7, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 5, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 4, 2021

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Sep 7, 2021

chipzoller mentioned this issue Jul 25, 2022

[BUG] Scaling Deployment passing through validation kyverno/kyverno#3118

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Admission chain for underlying resource is not called for requests to /scale subresource #84530

Admission chain for underlying resource is not called for requests to /scale subresource #84530

jennybuckley commented Oct 29, 2019

jennybuckley commented Oct 29, 2019

k8s-ci-robot commented Oct 29, 2019

jennybuckley commented Oct 29, 2019

liggitt commented Oct 29, 2019 •

edited

liggitt commented Oct 29, 2019

jennybuckley commented Oct 29, 2019 •

edited

lavalamp commented Oct 29, 2019

apelisse commented Oct 29, 2019

lavalamp commented Oct 29, 2019

liggitt commented Oct 30, 2019

liggitt commented Oct 30, 2019

deads2k commented Oct 30, 2019

julianvmodesto commented Dec 2, 2019 •

edited

tedyu commented Dec 12, 2019

julianvmodesto commented Jan 23, 2020 •

edited

lavalamp commented Jan 23, 2020

liggitt commented Jan 23, 2020

lavalamp commented Jan 23, 2020

lavalamp commented Jan 23, 2020

julianvmodesto commented Feb 4, 2020

fejta-bot commented Jul 16, 2020

nikhita commented Jul 16, 2020

fejta-bot commented Oct 14, 2020

julianvmodesto commented Oct 20, 2020

mt-inside commented Jan 2, 2021

apelisse commented Jan 6, 2021

mt-inside commented Jan 6, 2021

apelisse commented Jan 6, 2021

fejta-bot commented Apr 6, 2021

fejta-bot commented May 6, 2021

julianvmodesto commented May 7, 2021

k8s-triage-robot commented Aug 5, 2021

k8s-triage-robot commented Sep 4, 2021

lavalamp commented Sep 7, 2021

Admission chain for underlying resource is not called for requests to /scale subresource #84530

Admission chain for underlying resource is not called for requests to /scale subresource #84530

Comments

jennybuckley commented Oct 29, 2019

jennybuckley commented Oct 29, 2019

k8s-ci-robot commented Oct 29, 2019

jennybuckley commented Oct 29, 2019

liggitt commented Oct 29, 2019 • edited

liggitt commented Oct 29, 2019

jennybuckley commented Oct 29, 2019 • edited

lavalamp commented Oct 29, 2019

apelisse commented Oct 29, 2019

lavalamp commented Oct 29, 2019

liggitt commented Oct 30, 2019

liggitt commented Oct 30, 2019

deads2k commented Oct 30, 2019

julianvmodesto commented Dec 2, 2019 • edited

tedyu commented Dec 12, 2019

julianvmodesto commented Jan 23, 2020 • edited

lavalamp commented Jan 23, 2020

liggitt commented Jan 23, 2020

lavalamp commented Jan 23, 2020

lavalamp commented Jan 23, 2020

julianvmodesto commented Feb 4, 2020

fejta-bot commented Jul 16, 2020

nikhita commented Jul 16, 2020

fejta-bot commented Oct 14, 2020

julianvmodesto commented Oct 20, 2020

mt-inside commented Jan 2, 2021

apelisse commented Jan 6, 2021

mt-inside commented Jan 6, 2021

apelisse commented Jan 6, 2021

fejta-bot commented Apr 6, 2021

fejta-bot commented May 6, 2021

julianvmodesto commented May 7, 2021

k8s-triage-robot commented Aug 5, 2021

k8s-triage-robot commented Sep 4, 2021

lavalamp commented Sep 7, 2021

liggitt commented Oct 29, 2019 •

edited

jennybuckley commented Oct 29, 2019 •

edited

julianvmodesto commented Dec 2, 2019 •

edited

julianvmodesto commented Jan 23, 2020 •

edited