Remove unresolved in manifest based webhooks KEP #2074

vivekbagade · 2020-10-05T13:10:32Z

Also add test plan and answer scalability questionnaire

k8s-ci-robot · 2020-10-05T13:10:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vivekbagade
To complete the pull request process, please assign lavalamp after the PR has been reviewed.
You can assign the PR to them by writing /assign @lavalamp in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-api-machinery/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

vivekbagade · 2020-10-05T13:12:38Z

/assign @liggitt

deads2k · 2020-10-05T14:43:44Z

keps/sig-api-machinery/1872-manifest-based-admission-webhooks/README.md

+This section details the behavior of the mechanism when encountering non-optimal
+situations.
+
+In cases where:


and if any of these are true when reading the file on a process start, does the process fail to start? I would expect it to.

that matches my expectation, but should be called out explicitly

deads2k · 2020-10-05T14:44:59Z

This answers the outstanding questions that I know of. I'm ok with this moving to implementable if we can find a solid reviewer who can sign up for the work (I'm not available).

@lavalamp any comments you'd like addressed before implementable?

Also add test plan and answer scalability questionnaire

liggitt · 2020-10-05T14:47:33Z

keps/sig-api-machinery/1872-manifest-based-admission-webhooks/README.md

+This section details the behavior of the mechanism when encountering non-optimal
+situations.
+
+In cases where:


that matches my expectation, but should be called out explicitly

liggitt · 2020-10-05T14:50:23Z

keps/sig-api-machinery/1872-manifest-based-admission-webhooks/README.md

+through the API. 

 ### New AdmissionConfig schema



I don't see this called out, but the treatment of non-absolute file paths for webhooksFile should be described. Absolute paths are required for the kubeconfigFile field in the same config, which is the simplest approach. If relative paths are allowed, I would expect them to be treated relative to the location of the admission config file, which requires replumbing some bits of admission config loading.

liggitt · 2020-10-05T14:51:40Z

keps/sig-api-machinery/1872-manifest-based-admission-webhooks/README.md

+
+a. Webhook metadata: This metric would contain metadata of each webhook invoked,
+both API based and manifest based. The metadata includes the webhook type
+(mutating/validating) and if it is manifest based.


detail the name and labels of the metric?

also, add the metric to the metrics section of the kep.yaml file (same for all the metrics)

liggitt · 2020-10-05T14:51:47Z

keps/sig-api-machinery/1872-manifest-based-admission-webhooks/README.md

+
+b. Manifest errors: Since the manifest file can be reconfigured post API server
+startup, we would add a metric informing users if there were errors
+loading/parsing the current manifest file.


detail the name and labels of the metric?

liggitt · 2020-10-05T14:52:21Z

keps/sig-api-machinery/1872-manifest-based-admission-webhooks/README.md

+loading/parsing the current manifest file.
+
+Audit annotations for mutating webhooks would also carry an additional field
+`manifestBased` indicating the invocation of a manifest based webhook. 


give an example of the audit annotation including the manifestBased field

liggitt · 2020-10-05T14:53:02Z

Asked for a few clarifications, then lgtm.

I can review the implementation.

lavalamp · 2020-10-05T17:01:11Z

This design has the following problems:

Doesn't work for other types (e.g. CRDs).
No way for status to be reported on the "virtual" resources.
There's no longer a single source of truth.
It's very hard for a user to figure out what the system is doing or how to change that.

So, I'm a big -1 on this design.

I'd accept a design that either

works for all resources, by adding a controller that reads a directory for manifests and puts them in etcd; and also configures an authz plugin to block non-system changes of those resources, OR
just an authz mechanism that prevents non-system changes to certain resources, plus blessing a super-super-user account that can change those resources (and then it's up to the holder of that account to fix those resources).

Honestly I think the authz addition is too heavy-handed and in the common case will prevent people from fixing clusters as frequently as it prevents them from breaking clusters; but I recognize that it is greatly desired.

liggitt · 2020-10-05T19:41:39Z

This design has the following problems:

Doesn't work for other types (e.g. CRDs).

No way for status to be reported on the "virtual" resources.

There's no longer a single source of truth.

It's very hard for a user to figure out what the system is doing or how to change that.

I'm ok with taking a step back to see if we can come up with an approach that is more coherent, and I agree there are multiple resources that have similar use cases, but some of those characteristics are features in some scenarios :-)

local-only policy config allows controlled rollout in a multi-server environment without worrying about another server stomping config ("my apiserver will load policy file X into etcd, then enforce whatever is present in etcd" is a way weaker guarantee than "my apiserver will enforce policy file X")
stratifying policy specified by the cluster operator and policy specified/inspectable/changeable by the cluster user is useful in managed environments

vivekbagade · 2020-10-06T13:57:26Z

No way for status to be reported on the "virtual" resources.

Webhook config does not have status field IIUC

There's no longer a single source of truth.

Yes, I partially agree with this. But, the source of truth moves from the API to the API server metrics which not as useful.

It's very hard for a user to figure out what the system is doing or how to change that.

I don't completely agree. This mechanism is meant to provide operators with a better mechanism compared to compiled in admission control. And this method provides better visibility compared to compiled in admission control. There are audit annotations (for mutating webhooks) and metrics that help with this. As for what "users" can do to alter the behavior, this is a feature of the mechanism(extremely desirable in managed platforms) and the users would just have to reach out to their operator.

So, I'm a big -1 on this design.

I'd accept a design that either

works for all resources, by adding a controller that reads a directory for manifests and puts them in etcd; and also configures an authz plugin to block non-system changes of those resources, OR

just an authz mechanism that prevents non-system changes to certain resources, plus blessing a super-super-user account that can change those resources (and then it's up to the holder of that account to fix those resources).

Honestly I think the authz addition is too heavy-handed and in the common case will prevent people from fixing clusters as frequently as it prevents them from breaking clusters; but I recognize that it is greatly desired.

I also want to +1 Jordan's comment on the usefulness of stratifying policy specified by the operator vs policy specified by the cluster user.

fejta-bot · 2021-01-04T14:29:51Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2021-02-03T15:13:13Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2021-03-05T15:59:27Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-03-05T15:59:36Z

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 5, 2020

k8s-ci-robot requested review from deads2k and lavalamp October 5, 2020 13:10

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Oct 5, 2020

vivekbagade force-pushed the master branch from c22ffb1 to 43b9a3b Compare October 5, 2020 13:12

k8s-ci-robot assigned liggitt Oct 5, 2020

deads2k reviewed Oct 5, 2020

View reviewed changes

Remove unresolved in manifest based webhooks KEP

d3c4c57

Also add test plan and answer scalability questionnaire

vivekbagade force-pushed the master branch from 43b9a3b to d3c4c57 Compare October 5, 2020 14:47

liggitt reviewed Oct 5, 2020

View reviewed changes

vivekbagade mentioned this pull request Nov 25, 2020

Allow manifest based registration of Admission webhooks. #1872

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 3, 2021

k8s-ci-robot closed this Mar 5, 2021

SaranBalaji90 mentioned this pull request Jul 14, 2021

Add node-level plugin support for pod admission handler #2811

Closed

Remove unresolved in manifest based webhooks KEP #2074

Remove unresolved in manifest based webhooks KEP #2074

Uh oh!

Conversation

vivekbagade commented Oct 5, 2020

Uh oh!

k8s-ci-robot commented Oct 5, 2020

Uh oh!

vivekbagade commented Oct 5, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deads2k commented Oct 5, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liggitt commented Oct 5, 2020

Uh oh!

lavalamp commented Oct 5, 2020

Uh oh!

liggitt commented Oct 5, 2020

Uh oh!

vivekbagade commented Oct 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fejta-bot commented Jan 4, 2021

Uh oh!

fejta-bot commented Feb 3, 2021

Uh oh!

fejta-bot commented Mar 5, 2021

Uh oh!

k8s-ci-robot commented Mar 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

vivekbagade commented Oct 6, 2020 •

edited

Loading