Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server-side apply bumps resourceVersion when re-applying the same object #124605

Open
sbueringer opened this issue Apr 29, 2024 · 14 comments · May be fixed by #125263
Open

Server-side apply bumps resourceVersion when re-applying the same object #124605

sbueringer opened this issue Apr 29, 2024 · 14 comments · May be fixed by #125263
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@sbueringer
Copy link
Member

sbueringer commented Apr 29, 2024

What happened?

When continuously re-applying the same object via a controller or kubectl with SSA the resourceVersion increases every time.

What did you expect to happen?

I expected the resourceVersion to stay the same if the client always sends the same object via SSA, especially if the resulting object doesn't change (apart from resourceVersion & managedFields, but these are managed by the apiserver).

How can we reproduce it (as minimally and precisely as possible)?

Apply this CRD:

kubectl apply -f ./crd_machines.yaml
crd_machines.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: machines.cluster.x-k8s.io
spec:
  group: cluster.x-k8s.io
  names:
    kind: Machine
    listKind: MachineList
    plural: machines
    singular: machine
  scope: Namespaced
  versions:
  - name: v1beta1
    schema:
      openAPIV3Schema:
        properties:
          apiVersion:
            type: string
          kind:
            type: string
          metadata:
            type: object
          spec:
            properties:
             infrastructureRef:
               properties:
                 name:
                   type: string
                 namespace:
                   type: string
                   default: default-namespace
               type: object
               x-kubernetes-map-type: atomic
            type: object
        type: object
    served: true
    storage: true

Apply this CR:

kubectl apply -f ./cr_machine.yaml --server-side=true --field-manager=test-manager
cr_machine.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Machine
metadata:
  name: machine-1
  namespace: default
spec:
  infrastructureRef:
    name: infrastructure-machine-1

Get state of the CR after first apply

k get machine -o yaml --show-managed-fields > /tmp/machine-1.yaml

Apply the CR again

kubectl apply -f ./cr_machine.yaml --server-side=true --field-manager=test-manager

Get state of the CR after second apply

k get machine -o yaml --show-managed-fields > /tmp/machine-2.yaml

Diff

 diff /tmp/machine-2.yaml /tmp/machines-3.yaml
16c16
<       time: "2024-04-29T09:39:07Z"
---
>       time: "2024-04-29T09:39:40Z"
19c19
<     resourceVersion: "1167"
---
>     resourceVersion: "1209"

=> The resourceVersion and the "time" field of the managedField entry of the "test-manager" increases.

Anything else we need to know?

The use case we have is a lot more complex than this simple scenario. The tl;dr is that we want to implement caching for our SSA calls, so that we don't have SSA calls on every reconcile of our controller if nothing actually changes. This is relatively hard to do if the resourceVersion increases after every apply.

I took a closer look at the APIserver and ~ the following happens:

  1. the "incoming" SSA patch is applied over the live object from etcd
    1. this leads to a change because infrastructureRef has x-kubernetes-map-type: atomic which means the namespace field will be dropped
    2. Because of that change the "time" field in the managedField entry gets set to "time.Now()" here
    3. later during admission the namespace field gets defaulted again (I also tried this with a mutating webhook, same result)

On a high-level:

  • client always applies the same object
  • defaulting always results in the same object
  • resourceVersion and managedFields[].time continuously go up

I think there are many similar cases that would lead to this behavior. Basically whenever the admission chain mutates a field previously set via SSA.

Notes:

  • This only happens if there is >1s between each apply, otherwise the managedField[].time is identical and then the resourceVersion stays the same (because we then don't write to etcd)

Kubernetes version

$  kubectl version
Client Version: v1.29.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2

I also tried this with client-go v0.30 and kube-apiserver v1.30. Same result.

OS version

Running this natively on Apple Silicon M2. But can't imagine this makes a difference. Also got the same results in Docker Desktop on Mac.

@sbueringer sbueringer added the kind/bug Categorizes issue or PR as related to a bug. label Apr 29, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 29, 2024
@sbueringer
Copy link
Member Author

(cc @fabriziopandini @chrischdi @alvaroaleman @vincepri, might be interesting for you)

@sbueringer
Copy link
Member Author

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 29, 2024
@cici37
Copy link
Contributor

cici37 commented May 30, 2024

It looks less a bug but more of a intended behavior to me.
/cc @jpbetz
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 30, 2024
@jpbetz
Copy link
Contributor

jpbetz commented May 30, 2024

Bypassing the writes triggered by noop applies would break storage version migration (#123344), so we're committed to supporting this behavior as a feature.

@howardjohn
Copy link
Contributor

@cici37 @jpbetz I am confused. Are you saying this is intended behavior for a NOP SSA apply to increment resourceVersion?

This seems to not actually happen in all cases, only for certain objects/edge cases (see #121404).

For example, try this:

for i in {0..1000}; do
cat <<EOF | kubectl apply -f - --server-side
apiVersion: v1
kind: ConfigMap
metadata:
  name: game-demo
data:
  player_initial_lives: "3"
  ui_properties_file_name: "user-interface.properties"
EOF
sleep .5
done

No resourceVersion is incremented above).

Surprisingly I couldn't find any clear statement that SSA doesn't increment the resource version, but https://kubernetes.io/blog/2022/10/20/advanced-server-side-apply/ does say "a no-op apply is not very much more work for the API server than an extra GET" which seems impossible if resourceVersion is incremented

@howardjohn
Copy link
Contributor

https://github.com/kubernetes/kubernetes/pull/123344/files#r1506112409 is probably only updating it because either the fieldManager changes or in the patch body the apiVersion is different (or both, perhaps)?

@nilekhc
Copy link
Contributor

nilekhc commented May 30, 2024

https://github.com/kubernetes/kubernetes/pull/123344/files#r1506112409 is probably only updating it because either the fieldManager changes or in the patch body the apiVersion is different (or both, perhaps)?

We need this in storage version migration to make sure we correctly handles the CRD schema changes. See Joe's comments here - https://github.com/kubernetes/kubernetes/pull/123344/files#r1508023986

If it helps to understand this better, we have tests to validate this. https://github.com/kubernetes/kubernetes/blob/master/test/integration/storageversionmigrator/storageversionmigrator_test.go

@jpbetz
Copy link
Contributor

jpbetz commented May 31, 2024

I am confused. Are you saying this is intended behavior for a NOP SSA apply to increment resourceVersion?

I may have misspoke. When storage versions changes, a NOP SSA apply will trigger a resourceVersion bump. If the storage version is unchanged and an update would be a NOP for etcd, I believe we short-circuit the write (here I think?). This might explain the configMap case.

@jpbetz
Copy link
Contributor

jpbetz commented May 31, 2024

For the reported issue:

We've seen cases like this before where the time gets bumped causing a NOP to become a write. This could be improved. Note that this isn't technically a bug because the API server is allowed to bump the resourceVersion for any write (the discussion about storage version changes causing this is a good example of where this can happen in a way that the caller wouldn't expect). But I'd like to see the etcd write optimized away for cases like this.

@howardjohn
Copy link
Contributor

Sounds similar to #121404 perhaps which had some investigation.

FWIW as a controller author my expectation is that repeatedly applying does not repeatedly trigger resourceVersion increments. It's fine if it happens rarely (storage version change, etc) but if it happens consistently that means our own writes re-trigger reconciliation and we have an infinite loop of writes.

AFAIK all the marketing around writing SSA controllers has been that you just apply what you want and don't try to do a get+diff+apply. Even if we did want to do that, it doesn't really work well currently (#115563). If we aren't able to rely on NOP writes (usually) being NOPs, I don't see how we are supposed to use SSA to write a controller at all?

@jpbetz
Copy link
Contributor

jpbetz commented May 31, 2024

but if it happens consistently that means our own writes re-trigger reconciliation and we have an infinite loop of writes.

This is a really good point. I agree.

@jpbetz
Copy link
Contributor

jpbetz commented May 31, 2024

#106388 looks like it was intended fix this sort of problem. I'm looking into why it doesn't work for this case.

@jpbetz
Copy link
Contributor

jpbetz commented Jun 1, 2024

doesn't work for CRs since the managed fields are inside of a unstructured object and are not typed as ManagedFieldsEntry (for the slow path equality check in IgnoreManagedFieldsTimestampsTransformer).

@jpbetz
Copy link
Contributor

jpbetz commented Jun 1, 2024

#125263 is a draft proposal for a fix. Feedback welcome on how to make this fix cleaner. I'll test carefully before moving it out of draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants