KEP-3659: ApplySet: kubectl apply --prune redesign and graduation strategy #3661

KnVerey · 2022-11-15T23:13:46Z

One-line PR description: Initial PR for ApplySet : kubectl apply --prune redesign and graduation strategy #3659

Issue link: ApplySet : kubectl apply --prune redesign and graduation strategy #3659

k8s-ci-robot · 2022-11-15T23:13:47Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Co-authored-by: Katrina Verey <katrina.verey@shopify.com>

justinsb · 2023-01-10T13:56:15Z

keps/sig-cli/3659-kubectl-apply-prune/README.md

+> has owner references, what should we do? This would be
+> somewhat unexpected; it implies that kubectl apply doesn't
+> really own the object, apparently.
+> We could consider this an error, or perhaps warn


If the owner reference is to the "parent" applyset object, then I suggest we should ignore that particular case. It feels very reasonable for a third party tool to create objects with ownerReferences, but still mark them with our labels/annotations for compatibility.

If the owner reference is to something else, I propose that in alpha we should make that an error, to force the discussion. I can see us eventually making it a warning (or ignored if we see that the parent is also in the applyset), but I think that we should use the alpha to try to discover use-cases here.

KnVerey · 2023-01-10T23:36:50Z

keps/sig-cli/3659-kubectl-apply-prune/README.md

+
+>  <<[UNRESOLVED @justinsb @KnVerey]>>
+>
+> Should we identify a few patterns to avoid collisions?  Either uid:12345 or <toolname>:...


I like the idea of allowing it to be arbitrary but suggesting <toolname:...>, e.g. kubectl:<apply-set-name>. I don't think we should recommend encoding uids, which aren't meaningful to humans. Plus imagine the scenario where the tracking object is deleted then re-created with the same name. I think the human's expectation would be that any previous children would continue to point to it.

EDIT: If the convention includes a separate annotation for the tool name though, perhaps recommending tool name in the set name is less useful.

keps/sig-cli/3659-kubectl-apply-prune/README.md

KnVerey · 2023-01-11T00:02:05Z

keps/sig-cli/3659-kubectl-apply-prune/README.md

+>  <<[UNRESOLVED @justinsb @KnVerey]>>
+>
+>  How should we recognize that we’re using the “wrong’ tool?
+> Should we have something like `applyset.k8s.io/tooling: helm/v2.0.6` on the applyset?


I think we should, and make checking that annotation part of the standard.

eddiezane · 2023-01-11T17:12:42Z

/cc

keps/sig-cli/3659-kubectl-apply-prune/README.md

soltysh · 2023-01-12T12:50:09Z

keps/sig-cli/3659-kubectl-apply-prune/README.md

+
+#### Helm
+
+**Pattern**: list of GVKNN (from secret) + labels


Same as before: GroupVersionKindNameNamespace (GVKNN) - I wasn't aware that such acronym existed and had to look it up 😅

keps/sig-cli/3659-kubectl-apply-prune/README.md

soltysh · 2023-01-12T13:10:38Z

keps/sig-cli/3659-kubectl-apply-prune/README.md

+>  How should we recognize that we’re using the “wrong’ tool?
+> Should we have something like `applyset.k8s.io/tooling: helm/v2.0.6` on the applyset?
+>
+> <<[/UNRESOLVED]>>


Since you're introducing a meta object holding information about applyset contents, have you considered versioning the metadata?

keps/sig-cli/3659-kubectl-apply-prune/README.md

Follow-on to initial feedback, address some unresolved blocks

Likely pushes us to GKNN-derived IDs.

We just choose the constrained applyset id to prevent "applyset ID impersonation".

Prune: document confused deputy attack and mitigations

jeremyrickard

Hello, I am a PRR shadow and I'm giving this a first pass for PRR 👋

Left a small comment, otherwise lgtm as a client side feature.

jeremyrickard · 2023-02-07T03:37:54Z

keps/sig-cli/3659-kubectl-apply-prune/README.md

+-->
+
+We will maintain tests for "prune v2" and "prune v1", until
+such time as prune v1 (technically still in alpha) is removed.


Do you plan to add unit tests that will exercise the environment variable flag?

Good call - yes - we can add to the doc. We'll add unit tests (or cases) that cover v2 apply/prune, and leave the existing v1 apply/prune unit tests basically as-is until that functionality is removed. I don't think there should be any possible "crosstalk" between the two, but if there is we'll make sure there's test coverage to make sure we haven't regressed.

johnbelamaric

From a PRR perspective I am content.

I do think we should consider making this NOT a client-side only feature, though. I think a lot of the complexity comes in because we're trying to avoid creating a new server side resource.

johnbelamaric · 2023-02-07T21:22:20Z

keps/sig-cli/3659-kubectl-apply-prune/README.md

+- An object can be part of at most one ApplySet.  This is a limitation, but seems to be a good one in that objects that are part of multiple ApplySets are complicated both conceptually for users and in terms of implementation behaviour.
+- An ApplySet object can be part of another ApplySet (sub-ApplySets).
+
+### ApplySet Naming


s/Naming/Identification/

johnbelamaric · 2023-02-07T21:22:37Z

keps/sig-cli/3659-kubectl-apply-prune/README.md

+
+### ApplySet Naming
+
+Each ApplySet MUST have an ID that can be used to uniquely identify the parent and member objects via the label selector conventions outlined in the following sections. As such, the name:


s/the name/the ID/

if I am understanding correctly?

johnbelamaric · 2023-02-07T21:25:50Z

keps/sig-cli/3659-kubectl-apply-prune/README.md

+
+Each ApplySet MUST have an ID that can be used to uniquely identify the parent and member objects via the label selector conventions outlined in the following sections. As such, the name:
+* is subject to the normal limits of label values
+* MUST be the base64 encoding of the hash of the GKNN, in the form `base64(sha256(<name>.<namespace>.<kind>.<group>))`, using the URL safe encoding of RFC4648.


<name> here is the metadata.name of the parent object?

If not this is really unclear. If so, you need to update the paragraph below saying that ID != name, because of course it's not equal, that would be an infinite recursion.

Or I am really confused. Which could be the case.

Yes, you're right! The paragraph below just wasn't updated properly when we added this restriction on the ID in a recent revision. I'll fix it now.

johnbelamaric · 2023-02-07T21:36:19Z

keps/sig-cli/3659-kubectl-apply-prune/README.md

+When a tool that wants to list the members of an ApplySet and cannot determine the list of GKs (i.e. because neither hint annotation is used), it may support “discovery”, likely warning that a full cluster crawl is being attempted. Insufficient permissions errors are likely with such functionality, and when they are encountered, the tool should warn that the membership list may be incomplete.
+
+```yaml
+applyset.k8s.io/contains-group-kinds: <kind>.<group>[,]` # OPTIONAL


There are a lot of annotations with varying complex validation logic. Maybe that makes sense for alpha.

But it would seem some of these issues raised can be addressed if we use a new metadata field or field + core type rather than a raft of annotations? This would avoid the adoption limitations for putting all this in a CRD. It would be empty for non-parent objects, so I understand that seems weird to put it everywhere.

Another option would be to reverse that, and have a core applyset type, and allow it to reference the tool-specific porcelain resource as an object ref. That also solves the discovery of applysets issue. Has that been considered?

johnbelamaric · 2023-02-07T21:47:42Z

keps/sig-cli/3659-kubectl-apply-prune/README.md

+In the future, we could downgrade this stance to recommending a warning,
+or to considering owner references orthogonal and ignoring them entirely.
+
+### Versioning


Why are we limiting the discussion to CRDs? Why not built-in types? This is a plumbing level feature. If we have a built-in applyset + object ref to tooling porcelain object, we would avoid the adoption issue related to needing to install the CRD, and have better discovery, and have better options for validation and consistency management of the objects.

seans3 · 2023-02-08T00:11:44Z

I believe this proposal addresses the most important problems with the current prune, and I don't think the two reservations I've enumerated are blockers.

/lgtm

soltysh · 2023-02-08T10:13:54Z

I do think we should consider making this NOT a client-side only feature, though. I think a lot of the complexity comes in because we're trying to avoid creating a new server side resource.

@johnbelamaric I haven't told that Katrina and Justin, yet, but I'm slowly preparing them to gain confidence to create that new server-side resource, but shhhh, don't tell them 😉

soltysh · 2023-02-08T20:40:46Z

/label tide/merge-method-squash

soltysh

A few minor nits
/lgtm
/approve
from sig-cli pov

keps/sig-cli/3659-kubectl-apply-prune/README.md

johnbelamaric · 2023-02-08T21:21:55Z

/approve

From a PRR perspective. From my "project" perspective, I question the resources spent on a a purely client side approach. But that's up to the SIG and those working on the feature. I believe that this approach won't necessarily get more feedback, and if ecosystem tools have to adapt to this approach and then to a later server side approach, it's a lot of churn and wasted effort. My 2 cents - do with it what you will :)

soltysh

/lgtm

k8s-ci-robot · 2023-02-09T09:24:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric, KnVerey, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [johnbelamaric]
~~keps/sig-cli/OWNERS~~ [KnVerey,johnbelamaric,soltysh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

justinsb · 2023-04-07T13:21:01Z

So I encountered a slight challenge while using this. If we have a CRD and an instance of that CRD (a CR) in the same manifest, we know the GVKs for all the objects, but we can't (easily) know the GVR for the CR until we apply the CRD. This means we can't update the applyset/contains-group-resources annotation before applying, which we need to do so that we know this annotation always "covers" all the resources in our applyset (i.e. so that we don't leak objects if we crash mid-apply).

Arguably this is a regression vs the existing kubectl behaviour. kubectl apply doesn't handle this particularly well, but what I believe would happen is that apply will apply the objects it can when first invoked, this would include the CRD. The user would then get an error and try it again, and apply would succeed that time (assuming the CRD was fully registered "in time"). Sometimes we would get lucky and if the CRD comes first in the manifest it would be fully registered by the time we got to the CR.

The regression is that now we can't even start the apply if we're using prune.

A few options:

We could change to GVKs. This is actually what the KEP says as-merged, we changed to group-version-resources based on a suggestion, but I think GVK is simpler and more user-friendly. (Migration would be a concern, though we are in alpha/feature-flagged)
We could just apply the objects that we can map to GVRs, and return an error. This is arguably much more complex but similar behaviour to kubectl today.
We could introduce some form of "smart reordering" into kubectl when used with prune, fixing the behaviour here. We would try to apply objects in order, but if we were unable to apply an object because we couldn't map it to a GVR we would put it into a "backlog" and try those objects again after the objects we were able to apply. This is much more user-friendly behaviour. We haven't been willing to change the behaviour of apply for compatibility reasons previously, but we could choose to argue here that prunev2 can include new semantics, and this feels like a very benign and user-friendly change. For efficiency, we might still want to change to GVKs, so that we don't have to update the parent object on each "apply wave".

In terms of compatibility if we did switch to GVKs, there are a few choices:

a) Create a new annotation and simply break everyone using the feature today - it is alpha & feature-flagged. We could implement the fallback to full discovery.
b) Implement our fsck command. The fsck command could be pretty simple, in that it could start by simply rewriting this annotation, but we do want an fsck command that can e.g. check for missing annotations generally. We could tell people to run fsck if we see the "old" annotation.
c) Seamlessly migrate users using the "old" annotation to the "new" annotation as part of kubectl apply. The challenge here is that it's not clear how we would ever drop this migration logic. Maybe we could come up with some criteria like "we will drop when we go GA", but there's always the risk that we break someone.

Personally I think option b is the best, it sets us up to put more functionality into fsck, and it feels more sustainable for an alpha / feature-flagged feature.

For reference, the current annotation is applyset.kubernetes.io/contains-group-resources, I would suggest we would switch to applyset.kubernetes.io/contains-group-kinds

https://github.com/kubernetes/kubernetes/blob/ad18954259eae3db51bac2274ed4ca7304b923c4/staging/src/k8s.io/kubectl/pkg/cmd/apply/applyset.go#L63

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 15, 2022

k8s-ci-robot requested a review from seans3 November 15, 2022 23:13

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Nov 15, 2022

k8s-ci-robot requested a review from soltysh November 15, 2022 23:13

k8s-ci-robot added sig/cli Categorizes an issue or PR as relevant to SIG CLI. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 15, 2022

KnVerey mentioned this pull request Nov 18, 2022

ApplySet : kubectl apply --prune redesign and graduation strategy #3659

Open

4 tasks

Initial KEP for improving pruning in kubectl apply

ecb3a7c

KnVerey force-pushed the kubectl_apply_prune branch from b63ff0c to ecb3a7c Compare November 18, 2022 00:32

Add design details

75b8715

Co-authored-by: Katrina Verey <katrina.verey@shopify.com>

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 3, 2022

KnVerey changed the title ~~3659: KEP for improving pruning in kubectl apply~~ KEP-3659: ApplySet: kubectl apply --prune redesign and graduation strategy Dec 14, 2022

KnVerey mentioned this pull request Dec 15, 2022

Disallow prune for server-side applied objects kubernetes/kubernetes#113348

Merged

Add another open question

2fbdc7c

justinsb reviewed Jan 10, 2023

View reviewed changes

KnVerey commented Jan 10, 2023

View reviewed changes

keps/sig-cli/3659-kubectl-apply-prune/README.md Outdated Show resolved Hide resolved

KnVerey commented Jan 10, 2023

View reviewed changes

keps/sig-cli/3659-kubectl-apply-prune/README.md Outdated Show resolved Hide resolved

KnVerey commented Jan 11, 2023

View reviewed changes

Links, clarifications, ownerRef and GKNN explanations

dee73a2

k8s-ci-robot requested a review from eddiezane January 11, 2023 17:12

soltysh reviewed Jan 12, 2023

View reviewed changes

Follow-on to initial feedback, address some unresolved blocks

d1a6bae

KnVerey mentioned this pull request Jan 18, 2023

Follow-on to initial feedback, address some unresolved blocks KnVerey/enhancements#1

Merged

Merge pull request #1 from justinsb/kubectl_apply_prune

5371185

Follow-on to initial feedback, address some unresolved blocks

justinsb and others added 4 commits January 30, 2023 14:53

Prune: document confused deputy attack and mitigations

4a3db72

Likely pushes us to GKNN-derived IDs.

Constrain applyset id

c718a94

We just choose the constrained applyset id to prevent "applyset ID impersonation".

Merge pull request #6 from justinsb/add_confused_deputy_attack

724b753

Prune: document confused deputy attack and mitigations

Update KEP and PRR metadata

a741be3

k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 2, 2023

jeremyrickard reviewed Feb 7, 2023

View reviewed changes

Enhance testing description

a8c185f

johnbelamaric reviewed Feb 7, 2023

View reviewed changes

k8s-ci-robot assigned seans3 Feb 8, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2023

ID vs name fixes

9db8943

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2023

k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Feb 8, 2023

soltysh approved these changes Feb 8, 2023

View reviewed changes

keps/sig-cli/3659-kubectl-apply-prune/README.md Outdated Show resolved Hide resolved

keps/sig-cli/3659-kubectl-apply-prune/README.md Outdated Show resolved Hide resolved

keps/sig-cli/3659-kubectl-apply-prune/README.md Outdated Show resolved Hide resolved

k8s-ci-robot assigned soltysh Feb 8, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2023

Fixes from soltysh's review

b347756

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 8, 2023

soltysh approved these changes Feb 9, 2023

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 9, 2023

k8s-ci-robot merged commit 01da7ba into kubernetes:master Feb 9, 2023

k8s-ci-robot added this to the v1.27 milestone Feb 9, 2023

KnVerey mentioned this pull request Apr 10, 2023

KEP-3659: Update proposal following initial implementation #3919

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-3659: ApplySet: kubectl apply --prune redesign and graduation strategy #3661

KEP-3659: ApplySet: kubectl apply --prune redesign and graduation strategy #3661

KnVerey commented Nov 15, 2022

k8s-ci-robot commented Nov 15, 2022

justinsb Jan 10, 2023

KnVerey Jan 10, 2023 •

edited

Loading

KnVerey Jan 11, 2023

eddiezane commented Jan 11, 2023

soltysh Jan 12, 2023

soltysh Jan 12, 2023

jeremyrickard left a comment

jeremyrickard Feb 7, 2023

justinsb Feb 7, 2023

johnbelamaric left a comment

johnbelamaric Feb 7, 2023

johnbelamaric Feb 7, 2023

johnbelamaric Feb 7, 2023

KnVerey Feb 8, 2023

johnbelamaric Feb 7, 2023

johnbelamaric Feb 7, 2023

seans3 commented Feb 8, 2023

soltysh commented Feb 8, 2023

soltysh commented Feb 8, 2023

soltysh left a comment

johnbelamaric commented Feb 8, 2023

soltysh left a comment

k8s-ci-robot commented Feb 9, 2023

justinsb commented Apr 7, 2023


		### ApplySet Naming

		Each ApplySet MUST have an ID that can be used to uniquely identify the parent and member objects via the label selector conventions outlined in the following sections. As such, the name:

KEP-3659: ApplySet: kubectl apply --prune redesign and graduation strategy #3661

KEP-3659: ApplySet: kubectl apply --prune redesign and graduation strategy #3661

Conversation

KnVerey commented Nov 15, 2022

k8s-ci-robot commented Nov 15, 2022

Choose a reason for hiding this comment

KnVerey Jan 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eddiezane commented Jan 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremyrickard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnbelamaric left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seans3 commented Feb 8, 2023

soltysh commented Feb 8, 2023

soltysh commented Feb 8, 2023

soltysh left a comment

Choose a reason for hiding this comment

johnbelamaric commented Feb 8, 2023

soltysh left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Feb 9, 2023

justinsb commented Apr 7, 2023

KnVerey Jan 10, 2023 •

edited

Loading