Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Admission chain for underlying resource is not called for requests to /scale subresource #84530

Open
jennybuckley opened this issue Oct 29, 2019 · 36 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. wg/api-expression Categorizes an issue or PR as relevant to WG API Expression.

Comments

@jennybuckley
Copy link

What happened:
When a user makes a modifying request to an object through the /scale subresource, the /scale admission chain is called, but mutating and validating admission for the underlying resource are not. This introduces some unexpected behavior and also makes certain valid use cases impossible to enforce consistently.

For example, if a user wants to register a webhook on Deployments which prevents the total resource limits of the deployment from exceeding a certain amount, by validating that the product of replicas*memory is below a certain amount, this is not possible. Because a user could always increase the replicas through /deployments/scale, and bypass the validating webhooks registered for /deployments.

What you expected to happen:
The mutating and validating admission for the underlying resource would be called when making a request to the /scale subresource

How to reproduce it (as minimally and precisely as possible):
Register a webhook for /deployments which prevents changing replicas
Make a request to /deployments/scale

/cc @apelisse

@jennybuckley jennybuckley added the kind/bug Categorizes issue or PR as related to a bug. label Oct 29, 2019
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 29, 2019
@jennybuckley
Copy link
Author

/sig api-machinery
/priority help-wanted
/cc @lavalamp @mariantalla

This is related to (one of the causes of) #82046 "Server-side Apply: Ownership not tracked for scale subresource"

@k8s-ci-robot
Copy link
Contributor

@jennybuckley: The label(s) priority/help-wanted cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/sig api-machinery
/priority help-wanted
/cc @lavalamp @mariantalla

This is related to (one of the causes of) #82046 "Server-side Apply: Ownership not tracked for scale subresource"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 29, 2019
@jennybuckley
Copy link
Author

/priority backlog
/help

@k8s-ci-robot k8s-ci-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Oct 29, 2019
@liggitt
Copy link
Member

liggitt commented Oct 29, 2019

This probably isn't well-defined enough to be marked as help wanted (especially the Clear Task and Low Barrier to Entry requirements). Need to think through the implications on existing webhooks before making changes here.

@liggitt liggitt closed this as completed Oct 29, 2019
@liggitt liggitt reopened this Oct 29, 2019
@liggitt
Copy link
Member

liggitt commented Oct 29, 2019

/remove-help

@k8s-ci-robot k8s-ci-robot removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Oct 29, 2019
@jennybuckley
Copy link
Author

jennybuckley commented Oct 29, 2019

@liggitt I should have read https://github.com/kubernetes/community/blob/master/contributors/guide/help-wanted.md first, I assumed Good First Issue was the only one that needed those requirements

As for the implications on existing webhooks, I think the only safe thing to do is to call both the /x/scale and /x admission when a user makes a request to /x/scale. Then the decision would be which of these orders to call them in:
Mutating(/x/scale) -------> Mutating(/x) -----> Validating(/x/scale) ---> Validating(/x)
Mutating(/x/scale) ---> Validating(/x/scale) -----> Mutating(/x) -------> Validating(/x)

@lavalamp
Copy link
Member

Implicitly this would mean that a mutating webhook might change some part of the object other than the .spec.replicas field, even though the user called the /scale subresource, which is surprising but I'm not sure how it could be any other way.

@apelisse
Copy link
Member

Implicitly this would mean that a mutating webhook might change some part of the object other than the .spec.replicas field, even though the user called the /scale subresource, which is surprising but I'm not sure how it could be any other way.

Also, the changes are typically shown because the object is returned to the client. In this case, the scale object is returned to the client.

@lavalamp
Copy link
Member

Mutating(/x/scale) ---> Validating(/x/scale) -----> Mutating(/x) -------> Validating(/x)

Validating & Mutating webhooks can't be interleaved, I think, so this isn't an option.

@liggitt
Copy link
Member

liggitt commented Oct 30, 2019

Implicitly this would mean that a mutating webhook might change some part of the object other than the .spec.replicas field, even though the user called the /scale subresource, which is surprising but I'm not sure how it could be any other way.

That seems really problematic… the whole point of the scale subresource is that only the replicas field will change

@liggitt
Copy link
Member

liggitt commented Oct 30, 2019

I think the only safe thing to do is to call both the /x/scale and /x admission when a user makes a request to /x/scale.

A few questions to explore the implications of that:

  • What user would be passed in the admission attributes to the /x admission call (the original user or the apiserver loopback user)?
  • Would you expect the same double-call approach for /status?
  • (already raised above) what would happen if a mutating admission plugin for /x modified a field other than replicas?

@deads2k
Copy link
Contributor

deads2k commented Oct 30, 2019

I can see value in adjusting this behavior, but it is a significant behavioral change for admission plugin authors. They should be relying on diffs of old and new objects to make decisions about whether an action is allowed, but I think we can be fairly certain they aren't all doing it correctly. How would you phase this in to be safe for existing consumers? Just an opt-in or would we be one of the first v2's?

As Jordan pointed out, the same logical problem exists for some (but not all) other subresources, so that would need to be crisped up.

@julianvmodesto
Copy link
Contributor

julianvmodesto commented Dec 2, 2019

Results of the discussion from the sig-apimachinery meeting on Nov 6:

Then the decision would be which of these orders to call them in:
Mutating(/x/scale) -------> Mutating(/x) -----> Validating(/x/scale) ---> Validating(/x)
Mutating(/x/scale) ---> Validating(/x/scale) -----> Mutating(/x) -------> Validating(/x)

Run the mutating admission controllers first for the subresource then for the parent resource. Optionally, invoke the mutating admissions controllers twice in that same order. Finally run all validating admission controllers in parallel.
Mutating(/x/scale) -------> Mutating(/x) -----> Validating(/x/scale) + Validating(/x)

What user would be passed in the admission attributes to the /x admission call (the original user or the apiserver loopback user)?

The API webhook configuration for the parent resource must opt-in to observe changes to the parent from specific subresources e.g. includeSubresources: ["*"]. These subresources that change the parent resource will call the parent resource's webhooks.

Thus, the webhook on the parent resource will receive the actual user for the subresource event.

(already raised above) what would happen if a mutating admission plugin for /x modified a field other than replicas?

This is unresolved: can mutating webhooks change fields that cannot be changed via a normal API call to that subresource?

Options:

  • No, disallow by discarding changes to other fields. This is the current state today.
  • No, disallow by entirely failing the webhook that attempts to change other fields.
  • Yes, allow webhooks to change fields outside the normal resource's restrictions. This changes a current expectation that a webhook cannot only do things to an API object a normal API request could do.

@tedyu
Copy link
Contributor

tedyu commented Dec 12, 2019

No, disallow by entirely failing the webhook that attempts to change other fields.

I would vote for the above option.
It is not desirable to only fulfill part of the request.

@julianvmodesto
Copy link
Contributor

julianvmodesto commented Jan 23, 2020

To re-summarize the current solution for this issue:

  • add the field includeSubresources to MutatingWebhookConfiguration and ValidatingWebhookConfiguration to call the webhook on both the resource and its subresources in the following order: Mutating(/x/scale) -------> Mutating(/x) -----> Validating(/x/scale) + Validating(/x)

Decide:

  • Do we fail a mutating webhook on a subresource if it attempts to implicitly change fields on the parent?

@lavalamp
Copy link
Member

Do we fail a mutating webhook on a subresource if it attempts to implicitly change fields on the parent?

Option 1: silently drop such changes; if that would leave the object in an inconsistent or bad state, a validating webhook can fail the entire operation.
Option 2: fail any mutating webhook attempting to modify fields not included in the subresource.
Option 3: permit such changes when done by webhooks, even though the subresource caller can't make them.

Option 1 and 2 both have the con that user changes that used to work will stop working when the webhook is in the path, and the end user (the one using the subresource) has no recourse. Option 1 would permit more such changes to go through, at the cost of producing (from the webhook's perspective) objects with partially applied changes.

Option 3 has the con that changes that used to be confined to known fields could have affects on more and different fields now.

Personally, I think that webhooks are highly privileged, and therefore we can trust them to make changes that we would not trust the subresource caller to make, and therefore option 3 is best. We can formalize this even more by e.g. explicitly giving webhooks identities and doing an RBAC check for update on the parent resource. If the webhook has this permission, then use option 3. If the webhook lacks this permission, then I guess option 2 is clearer than option 1.

@deads2k and @liggitt probably have thoughts.

@liggitt
Copy link
Member

liggitt commented Jan 23, 2020

Option 1:

  • does the least violence to the REST handling stack
  • is the most consistent with current behavior and expectations
  • is nicest for the end user

Option 2:

  • requires plumbing knowledge about which fields are not writeable via which endpoints (currently held in each REST strategy) to the admission webhook (not just for subresources... a similar issue exists if an admission webhook tries to set status fields when admitting a write via the main resource)
  • is the clearest for the webhook author

Option 3:

  • requires distinguishing between admission-set fields and user-set fields in REST strategy and validation code
  • and/or requires reordering mutating admission relative to those operations
  • assuming we don't trust admission webhooks enough to skip validation altogether, would require doing additional validation - in some of our status update paths, we only validate metadata and status, since the spec is a no-op for status subresource API calls (currently guaranteed by the strategy copying the spec from the existing object to the new object)

I would find status fields changing in spec updates (or vice-versa) in ways that could not be accomplished simply via the API very confusing.

@lavalamp
Copy link
Member

Let me modify option 3-- the status/spec distinction is solid enough that it should probably always be handled via option 1 or 2; no existing webhook should be attempting to set both.

I had in mind things like /scale, not status, for option 3.

@lavalamp
Copy link
Member

Since this is still in the beta period and we have plenty to do, I will vote for option 1 for the purpose of expediency. Option 2 and 3 will be more work, therefore we should wait until we have evidence that option 1 isn't sufficient.

@julianvmodesto
Copy link
Contributor

Ok so next step, now that we are closer to a decision we might want an updated or new KEP to document this since among the changes there's a new field includeSubresources in the webhook configuration.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 16, 2020
@nikhita
Copy link
Member

nikhita commented Jul 16, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 16, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 14, 2020
@julianvmodesto
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 20, 2020
@mt-inside
Copy link

Hi, I'm also looking into this area. Not quite the same use-case, but I want to mutate things about a Pod depending on the scheduler's placement of it (Binding create).

A change of the nodeName field on the Pod doesn't cause the Pod to go through admission (binding is a weird one that's called a subresource but isn't a sub-tree of the Pod object). All I do see is a create of a binding object, but that doesn't give the rest of the Pod to inspect or mutate (I could go mutate it as a side effect, but that puts me in a race with the kubelet, and I don't want it running until I've made my changes).

If anyone has any opinions on this I'd be interested? I'd also like to continue my own research, but I can't find the "apply" KEP being talked about? It's not server side apply I guess. The mooted includeSubresources flag sounds like exactly what I want! Did it get dropped, cause google's not finding anything, and it's not in the v1 fields.

@apelisse
Copy link
Member

apelisse commented Jan 6, 2021

The feature started before the KEP process, so our KEP is mostly work-in-progress. I'm trying to get people to update the KEP as we implement specific features for it, but it's still far from complete.

This feature/bug is mostly unrelated to server-side apply, and the api-expression working group is focused on graduating server-side apply to GA at this time.

If you want to get involved in solving this problem, feel free to let us know so that we can discuss the next steps!

@mt-inside
Copy link

Yeah I'll take a look, shall we have a chat? 🙂

@apelisse
Copy link
Member

apelisse commented Jan 6, 2021

I'd love it! Happy to talk on Slack (apelisse@) whenever you want :-) Also feel free to join #wg-api-expression

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 6, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 6, 2021
@julianvmodesto
Copy link
Contributor

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 7, 2021
@k8s-triage-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 5, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 4, 2021
@lavalamp
Copy link
Member

lavalamp commented Sep 7, 2021

/lifecycle frozen

Bot, these issues don't magically fix themselves and closing them doesn't make them go away...

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. wg/api-expression Categorizes an issue or PR as relevant to WG API Expression.
Projects
None yet
Development

No branches or pull requests