Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3659: ApplySet: kubectl apply --prune redesign and graduation strategy #3661

Merged
merged 21 commits into from
Feb 9, 2023

Conversation

KnVerey
Copy link
Contributor

@KnVerey KnVerey commented Nov 15, 2022

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 15, 2022
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Nov 15, 2022
@k8s-ci-robot k8s-ci-robot added sig/cli Categorizes an issue or PR as relevant to SIG CLI. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 15, 2022
Co-authored-by: Katrina Verey <katrina.verey@shopify.com>
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 3, 2022
@KnVerey KnVerey changed the title 3659: KEP for improving pruning in kubectl apply KEP-3659: ApplySet: kubectl apply --prune redesign and graduation strategy Dec 14, 2022
> has owner references, what should we do? This would be
> somewhat unexpected; it implies that kubectl apply doesn't
> really own the object, apparently.
> We could consider this an error, or perhaps warn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the owner reference is to the "parent" applyset object, then I suggest we should ignore that particular case. It feels very reasonable for a third party tool to create objects with ownerReferences, but still mark them with our labels/annotations for compatibility.

If the owner reference is to something else, I propose that in alpha we should make that an error, to force the discussion. I can see us eventually making it a warning (or ignored if we see that the parent is also in the applyset), but I think that we should use the alpha to try to discover use-cases here.


> <<[UNRESOLVED @justinsb @KnVerey]>>
>
> Should we identify a few patterns to avoid collisions? Either uid:12345 or <toolname>:...
Copy link
Contributor Author

@KnVerey KnVerey Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of allowing it to be arbitrary but suggesting <toolname:...>, e.g. kubectl:<apply-set-name>. I don't think we should recommend encoding uids, which aren't meaningful to humans. Plus imagine the scenario where the tracking object is deleted then re-created with the same name. I think the human's expectation would be that any previous children would continue to point to it.

EDIT: If the convention includes a separate annotation for the tool name though, perhaps recommending tool name in the set name is less useful.

> <<[UNRESOLVED @justinsb @KnVerey]>>
>
> How should we recognize that we’re using the “wrong’ tool?
> Should we have something like `applyset.k8s.io/tooling: helm/v2.0.6` on the applyset?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should, and make checking that annotation part of the standard.

@eddiezane
Copy link
Member

/cc

keps/sig-cli/3659-kubectl-apply-prune/README.md Outdated Show resolved Hide resolved

#### Helm

**Pattern**: list of GVKNN (from secret) + labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before: GroupVersionKindNameNamespace (GVKNN) - I wasn't aware that such acronym existed and had to look it up 😅

keps/sig-cli/3659-kubectl-apply-prune/README.md Outdated Show resolved Hide resolved
keps/sig-cli/3659-kubectl-apply-prune/README.md Outdated Show resolved Hide resolved
> How should we recognize that we’re using the “wrong’ tool?
> Should we have something like `applyset.k8s.io/tooling: helm/v2.0.6` on the applyset?
>
> <<[/UNRESOLVED]>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're introducing a meta object holding information about applyset contents, have you considered versioning the metadata?

Follow-on to initial feedback, address some unresolved blocks
justinsb and others added 4 commits January 30, 2023 14:53
Likely pushes us to GKNN-derived IDs.
We just choose the constrained applyset id to prevent "applyset ID
impersonation".
Prune: document confused deputy attack and mitigations
@k8s-ci-robot k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 2, 2023
Copy link
Contributor

@jeremyrickard jeremyrickard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, I am a PRR shadow and I'm giving this a first pass for PRR 👋

Left a small comment, otherwise lgtm as a client side feature.

-->

We will maintain tests for "prune v2" and "prune v1", until
such time as prune v1 (technically still in alpha) is removed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to add unit tests that will exercise the environment variable flag?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - yes - we can add to the doc. We'll add unit tests (or cases) that cover v2 apply/prune, and leave the existing v1 apply/prune unit tests basically as-is until that functionality is removed. I don't think there should be any possible "crosstalk" between the two, but if there is we'll make sure there's test coverage to make sure we haven't regressed.

Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a PRR perspective I am content.

I do think we should consider making this NOT a client-side only feature, though. I think a lot of the complexity comes in because we're trying to avoid creating a new server side resource.

- An object can be part of at most one ApplySet. This is a limitation, but seems to be a good one in that objects that are part of multiple ApplySets are complicated both conceptually for users and in terms of implementation behaviour.
- An ApplySet object can be part of another ApplySet (sub-ApplySets).

### ApplySet Naming
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Naming/Identification/


### ApplySet Naming

Each ApplySet MUST have an ID that can be used to uniquely identify the parent and member objects via the label selector conventions outlined in the following sections. As such, the name:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/the name/the ID/

if I am understanding correctly?


Each ApplySet MUST have an ID that can be used to uniquely identify the parent and member objects via the label selector conventions outlined in the following sections. As such, the name:
* is subject to the normal limits of label values
* MUST be the base64 encoding of the hash of the GKNN, in the form `base64(sha256(<name>.<namespace>.<kind>.<group>))`, using the URL safe encoding of RFC4648.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<name> here is the metadata.name of the parent object?

If not this is really unclear. If so, you need to update the paragraph below saying that ID != name, because of course it's not equal, that would be an infinite recursion.

Or I am really confused. Which could be the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right! The paragraph below just wasn't updated properly when we added this restriction on the ID in a recent revision. I'll fix it now.

When a tool that wants to list the members of an ApplySet and cannot determine the list of GKs (i.e. because neither hint annotation is used), it may support “discovery”, likely warning that a full cluster crawl is being attempted. Insufficient permissions errors are likely with such functionality, and when they are encountered, the tool should warn that the membership list may be incomplete.

```yaml
applyset.k8s.io/contains-group-kinds: <kind>.<group>[,]` # OPTIONAL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of annotations with varying complex validation logic. Maybe that makes sense for alpha.

But it would seem some of these issues raised can be addressed if we use a new metadata field or field + core type rather than a raft of annotations? This would avoid the adoption limitations for putting all this in a CRD. It would be empty for non-parent objects, so I understand that seems weird to put it everywhere.

Another option would be to reverse that, and have a core applyset type, and allow it to reference the tool-specific porcelain resource as an object ref. That also solves the discovery of applysets issue. Has that been considered?

In the future, we could downgrade this stance to recommending a warning,
or to considering owner references orthogonal and ignoring them entirely.

### Versioning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we limiting the discussion to CRDs? Why not built-in types? This is a plumbing level feature. If we have a built-in applyset + object ref to tooling porcelain object, we would avoid the adoption issue related to needing to install the CRD, and have better discovery, and have better options for validation and consistency management of the objects.

@seans3
Copy link
Contributor

seans3 commented Feb 8, 2023

I believe this proposal addresses the most important problems with the current prune, and I don't think the two reservations I've enumerated are blockers.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2023
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2023
@soltysh
Copy link
Contributor

soltysh commented Feb 8, 2023

I do think we should consider making this NOT a client-side only feature, though. I think a lot of the complexity comes in because we're trying to avoid creating a new server side resource.

@johnbelamaric I haven't told that Katrina and Justin, yet, but I'm slowly preparing them to gain confidence to create that new server-side resource, but shhhh, don't tell them 😉

@soltysh
Copy link
Contributor

soltysh commented Feb 8, 2023

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Feb 8, 2023
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor nits
/lgtm
/approve
from sig-cli pov

keps/sig-cli/3659-kubectl-apply-prune/README.md Outdated Show resolved Hide resolved
keps/sig-cli/3659-kubectl-apply-prune/README.md Outdated Show resolved Hide resolved
keps/sig-cli/3659-kubectl-apply-prune/README.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2023
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2023
@johnbelamaric
Copy link
Member

/approve

From a PRR perspective. From my "project" perspective, I question the resources spent on a a purely client side approach. But that's up to the SIG and those working on the feature. I believe that this approach won't necessarily get more feedback, and if ecosystem tools have to adapt to this approach and then to a later server side approach, it's a lot of churn and wasted effort. My 2 cents - do with it what you will :)

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 8, 2023
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 9, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric, KnVerey, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 01da7ba into kubernetes:master Feb 9, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Feb 9, 2023
@justinsb
Copy link
Member

justinsb commented Apr 7, 2023

So I encountered a slight challenge while using this. If we have a CRD and an instance of that CRD (a CR) in the same manifest, we know the GVKs for all the objects, but we can't (easily) know the GVR for the CR until we apply the CRD. This means we can't update the applyset/contains-group-resources annotation before applying, which we need to do so that we know this annotation always "covers" all the resources in our applyset (i.e. so that we don't leak objects if we crash mid-apply).

Arguably this is a regression vs the existing kubectl behaviour. kubectl apply doesn't handle this particularly well, but what I believe would happen is that apply will apply the objects it can when first invoked, this would include the CRD. The user would then get an error and try it again, and apply would succeed that time (assuming the CRD was fully registered "in time"). Sometimes we would get lucky and if the CRD comes first in the manifest it would be fully registered by the time we got to the CR.

The regression is that now we can't even start the apply if we're using prune.

A few options:

  1. We could change to GVKs. This is actually what the KEP says as-merged, we changed to group-version-resources based on a suggestion, but I think GVK is simpler and more user-friendly. (Migration would be a concern, though we are in alpha/feature-flagged)

  2. We could just apply the objects that we can map to GVRs, and return an error. This is arguably much more complex but similar behaviour to kubectl today.

  3. We could introduce some form of "smart reordering" into kubectl when used with prune, fixing the behaviour here. We would try to apply objects in order, but if we were unable to apply an object because we couldn't map it to a GVR we would put it into a "backlog" and try those objects again after the objects we were able to apply. This is much more user-friendly behaviour. We haven't been willing to change the behaviour of apply for compatibility reasons previously, but we could choose to argue here that prunev2 can include new semantics, and this feels like a very benign and user-friendly change. For efficiency, we might still want to change to GVKs, so that we don't have to update the parent object on each "apply wave".

In terms of compatibility if we did switch to GVKs, there are a few choices:

a) Create a new annotation and simply break everyone using the feature today - it is alpha & feature-flagged. We could implement the fallback to full discovery.
b) Implement our fsck command. The fsck command could be pretty simple, in that it could start by simply rewriting this annotation, but we do want an fsck command that can e.g. check for missing annotations generally. We could tell people to run fsck if we see the "old" annotation.
c) Seamlessly migrate users using the "old" annotation to the "new" annotation as part of kubectl apply. The challenge here is that it's not clear how we would ever drop this migration logic. Maybe we could come up with some criteria like "we will drop when we go GA", but there's always the risk that we break someone.

Personally I think option b is the best, it sets us up to put more functionality into fsck, and it feels more sustainable for an alpha / feature-flagged feature.

For reference, the current annotation is applyset.kubernetes.io/contains-group-resources, I would suggest we would switch to applyset.kubernetes.io/contains-group-kinds

https://github.com/kubernetes/kubernetes/blob/ad18954259eae3db51bac2274ed4ca7304b923c4/staging/src/k8s.io/kubectl/pkg/cmd/apply/applyset.go#L63

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/cli Categorizes an issue or PR as relevant to SIG CLI. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants