Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/patch, partial status loss after helper.Patch() #7594

Closed
i-prudnikov opened this issue Nov 22, 2022 · 23 comments
Closed

util/patch, partial status loss after helper.Patch() #7594

i-prudnikov opened this issue Nov 22, 2022 · 23 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@i-prudnikov
Copy link

i-prudnikov commented Nov 22, 2022

What steps did you take and what happened:
I've been building infrastructure provider (following cluster-API book). In infra-cluster CRD reconcile loop, I'm using patch helper (sigs.k8s.io/cluster-api/util/patch) to persist change in CRD after reconciliation.

There are following generic steps in reconciliation procedure:

  1. After fetching infra-cluster CRD and making preliminary checks (annotations, owner, etc.) the patch helper with the current version of infra-cluster CRD created (there is a unique patch helper for every reconciliation loop)
  2. During reconciliation some changes are made for spec (i.e. ControlPlaneEndpoint set) and status of CRD.
  3. At the very end of reconciliation (deferred call) the patch helper Patch() method called with the modified CRD object passed.

In the next reconciliation call (triggered as a consequence of previous patch), after fetching infra-cluster CRD resource some-times (sporadically) status info is partially missed, - conditions preserved, but the rest is missed (i.e. CRD specific fields and observedGeneration field)

Here the evidences from controller logs clearly showing that after patching we have partial data loss in status (in order to spot the error, some extensive logging added into the controller):

Some abbreviations used in logs output below:
RV, r_version - Object.ResourceVersion
Gen - Object.ObjectMeta.Generation
OGen - Object.Status.ObservedGeneration
Status - Object.Status

// New CRD object pushed into cluster
2022-11-22T21:18:30.253676+02:00        INFO    infracluster-resource Default {"name": "infra-cluster", "r_version": ""}
2022-11-22T21:18:30.264622+02:00        INFO    infracluster-resource Validate create {"name": "infra-cluster"}

// 1st reconcile call, we didn't pass some preliminary checks, just exit reconcile loop without patching...
2022-11-22T21:18:30.277206+02:00        INFO    Reconcile(): RV: 1847392, Gen: 1, OGen: 0, Status: v1alpha1.infraclusterStatus{Ready:false, FailureReason:"", FailureMessage:"", Conditions:v1beta1.Conditions(nil), SDNS:[]v1alpha1.SDNStatus(nil), LoadBalancers:[]v1alpha1.LoadBalancerStatus(nil), ObservedGeneration:0}      {"controller": "infracluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "infracluster", "infracluster": {"name":"infra-cluster","namespace":"test"}, "namespace": "test", "name": "infra-cluster", "reconcileID": "f47e7d91-bf68-4a7f-b3fc-ccb4b3e04bd5"}
2022-11-22T21:18:30.277283+02:00        INFO    Waiting for Cluster Controller to set OwnerRef on infracluster        {"controller": "infracluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "infracluster", "infracluster": {"name":"infra-cluster","namespace":"test"}, "namespace": "test", "name": "infra-cluster", "reconcileID": "f47e7d91-bf68-4a7f-b3fc-ccb4b3e04bd5"}

2022-11-22T21:19:00.190754+02:00        INFO    infracluster-resource Default {"name": "infra-cluster", "r_version": "1847392"}
2022-11-22T21:19:00.203014+02:00        INFO    infracluster-resource Validate update {"name": "infra-cluster", "r_version": "1847392"}

// 2nd reconcile call, preliminary checks passed, setting CRD finalizers only
2022-11-22T21:19:00.216385+02:00        INFO    Reconcile(): RV: 1847535, Gen: 1, OGen: 0, Status: v1alpha1.infraclusterStatus{Ready:false, FailureReason:"", FailureMessage:"", Conditions:v1beta1.Conditions(nil), SDNS:[]v1alpha1.SDNStatus(nil), LoadBalancers:[]v1alpha1.LoadBalancerStatus(nil), ObservedGeneration:0}      {"controller": "infracluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "infracluster", "infracluster": {"name":"infra-cluster","namespace":"test"}, "namespace": "test", "name": "infra-cluster", "reconcileID": "85c11c5e-6e71-46a1-88a5-1ef1608ac8eb"}
2022-11-22T21:19:00.218031+02:00        INFO    Setting infracluster & secrets finalizers...  {"controller": "infracluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "infracluster", "infracluster": {"name":"infra-cluster","namespace":"test"}, "namespace": "test", "name": "infra-cluster", "reconcileID": "85c11c5e-6e71-46a1-88a5-1ef1608ac8eb", "cluster": "infra-cluster"}
2022-11-22T21:19:00.252022+02:00        INFO    Patching: RV: 1847535, Gen: 1, OGen: 0, Status: v1alpha1.infraclusterStatus{Ready:false, FailureReason:"", FailureMessage:"", Conditions:v1beta1.Conditions(nil), SDNS:[]v1alpha1.SDNStatus(nil), LoadBalancers:[]v1alpha1.LoadBalancerStatus(nil), ObservedGeneration:0}, opts: []patch.Option(nil)      {"controller": "infracluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "infracluster", "infracluster": {"name":"infra-cluster","namespace":"test"}, "namespace": "test", "name": "infra-cluster", "reconcileID": "85c11c5e-6e71-46a1-88a5-1ef1608ac8eb"}

// Webhooks output
2022-11-22T21:19:00.271825+02:00        INFO    infracluster-resource Default {"name": "infra-cluster", "r_version": "1847535"}
2022-11-22T21:19:00.290159+02:00        INFO    infracluster-resource Validate update {"name": "infra-cluster", "r_version": "1847535"}

// 3rd reconcile call, the infrastructure for infra-cluster is deployed
2022-11-22T21:19:00.312475+02:00        INFO    Reconcile(): RV: 1847540, Gen: 1, OGen: 0, Status: v1alpha1.infraclusterStatus{Ready:false, FailureReason:"", FailureMessage:"", Conditions:v1beta1.Conditions(nil), SDNS:[]v1alpha1.SDNStatus(nil), LoadBalancers:[]v1alpha1.LoadBalancerStatus(nil), ObservedGeneration:0}      {"controller": "infracluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "infracluster", "infracluster": {"name":"infra-cluster","namespace":"test"}, "namespace": "test", "name": "infra-cluster", "reconcileID": "673f93e9-a21a-47a8-bfec-8d624cb16174"}
// ... some reconciliation work is done

// Here we clearly see that object passed to Patch() have /status populated
2022-11-22T21:19:03.37858+02:00         INFO    Patching: RV: 1847540, Gen: 1, OGen: 0, Status: v1alpha1.infraclusterStatus{Ready:true, FailureReason:"", FailureMessage:"", Conditions:v1beta1.Conditions{v1beta1.Condition{Type:"CPSDNAvailable", Status:"True", Severity:"", LastTransitionTime:time.Date(2022, time.November, 22, 19, 19, 2, 0, time.UTC), Reason:"", Message:""}, v1beta1.Condition{Type:"DPSDNAvailable", Status:"True", Severity:"", LastTransitionTime:time.Date(2022, time.November, 22, 19, 19, 2, 0, time.UTC), Reason:"", Message:""}}, SDNS:[]v1alpha1.SDNStatus{v1alpha1.SDNStatus{CloudAPIObject:v1alpha1.CloudAPIObject{UUID:"03a277cd-b614-48de-a880-d549c3c0eb65", Adopted:false}, Type:"control-plane", CIDR:"10.200.8.0/24"}, v1alpha1.SDNStatus{CloudAPIObject:v1alpha1.CloudAPIObject{UUID:"03d0bf70-366d-4ff3-b38e-ceecb0da7074", Adopted:true}, Type:"data-plane", CIDR:"10.200.1.0/24"}}, LoadBalancers:[]v1alpha1.LoadBalancerStatus{v1alpha1.LoadBalancerStatus{CloudAPIObject:v1alpha1.CloudAPIObject{UUID:"0ae2aec5-0def-491e-a4e4-e1a2ad60c65a", Adopted:false}, Type:"primary", State:"pending", Networks:[]v1alpha1.LoadBalancerNetworkStatus{v1alpha1.LoadBalancerNetworkStatus{Type:"public", Name:"public-network", DNSName:"lb-0ae2aec50def491ea4e4e1a2ad60c65a-1.cloudlb.com", IPAddress:""}, v1alpha1.LoadBalancerNetworkStatus{Type:"private", Name:"control-plane", DNSName:"lb-0ae2aec50def491ea4e4e1a2ad60c65a-2.cloudlb.com", IPAddress:""}}, Backends:[]v1alpha1.LoadBalancerBackendStatus{v1alpha1.LoadBalancerBackendStatus{Name:"konnectivity-server", Members:[]v1alpha1.LoadBalancerBackendMember{}}, v1alpha1.LoadBalancerBackendStatus{Name:"control-plane", Members:[]v1alpha1.LoadBalancerBackendMember{}}}}}, ObservedGeneration:0}, opts: []patch.Option{patch.WithStatusObservedGeneration{}}     {"controller": "infracluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "infracluster", "infracluster": {"name":"infra-cluster","namespace":"test"}, "namespace": "test", "name": "infra-cluster", "reconcileID": "673f93e9-a21a-47a8-bfec-8d624cb16174"}

2022-11-22T21:19:03.424943+02:00        INFO    infracluster-resource Default {"name": "infra-cluster", "r_version": "1847554"}
2022-11-22T21:19:03.437496+02:00        INFO    infracluster-resource Validate update {"name": "infra-cluster", "r_version": "1847554"}

// 4th reconcile call, the generation of object is increased from 1 -> 2, as we save cluster endpoint into the spec,
// but the status have only conditions, the rest is lost (observed geneartion and CRD specific fields)
2022-11-22T21:19:03.479981+02:00        INFO    Reconcile(): RV: 1847555, Gen: 2, OGen: 0, Status: v1alpha1.infraclusterStatus{Ready:false, FailureReason:"", FailureMessage:"", Conditions:v1beta1.Conditions{v1beta1.Condition{Type:"CPSDNAvailable", Status:"True", Severity:"", LastTransitionTime:time.Date(2022, time.November, 22, 21, 19, 2, 0, time.Local), Reason:"", Message:""}, v1beta1.Condition{Type:"DPSDNAvailable", Status:"True", Severity:"", LastTransitionTime:time.Date(2022, time.November, 22, 21, 19, 2, 0, time.Local), Reason:"", Message:""}}, SDNS:[]v1alpha1.SDNStatus(nil), LoadBalancers:[]v1alpha1.LoadBalancerStatus(nil), ObservedGeneration:0}   {"controller": "infracluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "infracluster", "infracluster": {"name":"infra-cluster","namespace":"test"}, "namespace": "test", "name": "infra-cluster", "reconcileID": "d378f0b6-d568-4a5f-9f43-2a6a096c22b0"}
// ....

P.S. Controller of course can proceed normally when /status sub-resource is missed, but this is different case.

What did you expect to happen:
After issuing Patch(), changes in conditions, status, spec and object meta are persisted

Anything else you would like to add:
There is a single controller instance running against cluster. Issue is sporadic, and reproduced in roughly 30% of the attempts. The infra-cluster CRD is created from scratch in the tests.

Calling to Patch() is done by default with patch.WithStatusObservedGeneration{} option, that takes care about setting object.Status.ObservedGeneration to the value of object.Generation. The only violation is when patching object after finalizers set (on the very beginning on CRD lifecycle), this is done to avoid marking that current generation of the CRD object as "seen" by controller (controller returns early after setting finalizers, avoiding attempts to execute any reconciliation logic).
Environment:

  • Cluster-api version: 1.2.5 (project dependencies)
  • minikube/kind version: kind 0.12.0 (with CAPI controller 1.2.4)
  • Kubernetes version: (use kubectl version): 1.23.4
  • OS (e.g. from /etc/os-release): MacOS 11.6.5

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 22, 2022
@sbueringer
Copy link
Member

sbueringer commented Nov 23, 2022

Would be interesting to see the patches actually sent to the APIserver by the patch helper.

Is there any way for us to reproduce this issue? (e.g. your controller code with an integration test using envtest so we can step through the code to figure out what is going on)

@i-prudnikov
Copy link
Author

Sure, I will try to narrow down the issue with unit test showing this. So far, after been digging into util/patch, I guess the way how patch built, explains that this situation can happen, as actually we patch the whole resources in 3 sequential steps:

  • patch conditions
  • patch spec and meta
  • patch status...

from here: (https://github.com/kubernetes-sigs/cluster-api/blob/main/util/patch/patch.go#L133)

@i-prudnikov
Copy link
Author

@sbueringer, I've just built tests with envtest environment that able to reproduce the issue (it still sporadic, so test is flaky - most of the time it fails because of the possible problem in patch helper).

I've switched to Cluster API 1.2.6, and using k8s 1.25 in evntest. Let me know how would you like me to share the code which able to reproduce it?

@sbueringer
Copy link
Member

sbueringer commented Dec 1, 2022

Nice!

Ideally some repo on GitHub that I can just checkout and run
(disclaimer I probably won't get to it for the next 2-3 weeks as I'm on PTO starting next week)

@fabriziopandini
Copy link
Member

@i-prudnikov it will be great if you can share your code, so we can make progress in triaging this issue
/triage needs-information

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Dec 28, 2022
@i-prudnikov
Copy link
Author

@sbueringer , @fabriziopandini , guys, sorry for delay. Here you can find a repo. I've just add you both with read permissions to this repo since it is private.

There is a special make target make test-status-lost that start specific test against env-test, in endless loop until it fails. The failure demonstrates the problem (it should fail on one of these lines: controllers/upcloudcluster_test.go 219 or 220). According to code it shows the situation when status after patch holds condition info but miss some other items.

Pleas let me know if you managed to get access and able to run target.

@sbueringer
Copy link
Member

Thx, trying to take a look soon

@fabriziopandini
Copy link
Member

FYI we have a theory about what could be happening (race conditions between reconcile and cache updates) but we were kind of busy between release and investigating some test failures.

@i-prudnikov
Copy link
Author

Thanks, for update, I can add that in case the frequency of update is lower (for example there is a delay of 1 sec every time before patch (just sleep to 1s)), then there is no issue.

@sbueringer
Copy link
Member

sbueringer commented Jan 13, 2023

@i-prudnikov

Just to confirm my understanding of what the test is doing:

  • Create a cluster
    • which triggers the running reconciler
  • Eventually (for 10s)
    • Get the created cluster
    • fail if the conditions are set, but the loadbalancer and/or sdns field is nil

I think this doesn't reproduce the issue correctly as the patchHelper always writes in this order:

  1. status conditions
  2. spec
  3. rest of status

Which means that when the Get in eventually get's a version of the cluster after 1. and before 3. the test will fail.

@sbueringer
Copy link
Member

Independent of that, we found a race condition in our KCP controller which probably has the same root cause as your issue. Some background information:

patchHelper:

  • patchHelper is calculating a diff between before/after objects and sends the diff as patch to the apiserver
  • The patching is done separately for status conditions / spec / rest of status

controller-runtime (CR) reconciler: (a bit simplified and fuzzy as I don't know the exact details)

  • when starting controllers CR sets up watches for objects (e.g. InfraCluster)
  • when CR receives an event for the object your controller is watching it triggers the Reconcile func of your controller

controller-runtime mgr.GetClient:

  • The client that the CR Manager builds per default (and returns via mgr.GetClient) is a cached client
  • This means it uses a cache for all reads (but the writes are still done directly against the APIserver)
  • The cache of the client is updated by CR based on informers,... (would have to research the details)
  • The cache is not updated by any write calls (update, patch, ....)

This information should be ~ accurate, but should be good enough to explain what I think the problem is.

A very simplified example:

  • 1st Reconcile call
    • patchHelper writes status to an object
  • 2nd Reconcile call
    • r.Client.Get retrieves the object but doesn't see the status written by 1st reconcile
    • instead it sees some previous stale version of the object

Essentially the patch call in the 1st Reconcile is done directly against the APIserver and then the Get in the 2nd reconcile is getting the object from a stale cache, because CR didn't update the cache yet with the new version of the object.

Then interesting edge cases can occur when the reconcile loop is running with a stale object and then the patchHelper also calculates patches based on that stale object.

One simple way to verify if the cause of the issue is the stale cache would be to use mgr.GetAPIReader() instead of mgr.GetClient() in the Reconcile func.

@i-prudnikov
Copy link
Author

i-prudnikov commented Jan 13, 2023

@sbueringer , thank you very much for explanation. Indeed It feels like reconcile loop sometime act on a stale object (looks like I noticed this some time ago).

Regarding your question about test - you got it right.
One more question here: I was under impression that uitl.Patch is a blocking call, so it wouldn't return until it did it's job, avoiding the situation you are talking. If util.Patch is a non-blocking call, then of course this explains why we see partial data in status.

I will also test with mgr.GetAPIReader() instead of mgr.GetClient()

@sbueringer
Copy link
Member

sbueringer commented Jan 13, 2023

the Patch call is blocking and I think it's doing its job correctly. The problem is just that you can't see it as the client is retrieving a stale object.

(There is one edge case where the patch that is sent to the server is calculated incorrectly because the diff to the stale object is different than the diff to the actual object on server side)

For debugging you could also Get & log with Client and with the APIReader and you should see exactly when and how the cache is stale

@i-prudnikov
Copy link
Author

Ok, so the root cause is the stale cache problem right? As there is no concurrent reconcile calls.

Just to be on a safe side, you mention this about the test:

I think this doesn't reproduce the issue correctly as the patchHelper always writes in this order:

  1. status conditions
  2. spec
  3. rest of status
    Which means that when the Get in eventually get's a version of the cluster after 1. and before 3. the test will fail.

As we start pooling the object after the reconcile call is successfully completed and util.Patch is a blocking call, then it is technically impossible to call Get somewhere between 1 and 3. Right? Or I miss smth?

@sbueringer
Copy link
Member

In your test you are just creating the cluster which is non blocking. The reconciler is running concurrently in another go routine

@i-prudnikov
Copy link
Author

i-prudnikov commented Jan 13, 2023

Ah, yes, just mixing the stuff (was thinking of another test). Thanks, all clear!

@fabriziopandini
Copy link
Member

/triage accepted
/help

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/triage accepted
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 21, 2023
@fabriziopandini
Copy link
Member

Thanks, all clear!

Given comment above and lack of updates since last year

/close

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

Thanks, all clear!

Given comment above and lack of updates since last year

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@i-prudnikov
Copy link
Author

i-prudnikov commented Jan 29, 2024

Colleagues @fabriziopandini, @sbueringer , sorry for writing it here after issue was closed, but was just pinged with this update.

Since the sane explanation was provided about root cause - stale cache, I didn't dig further, but now just decided to double check it again.

So, I encounter the same problem now in different controller with CAPI 1.5.2. I switched to use API reader instead of cached client at the beginning of the Reconcile() function, and this is not solving the problem. I still got an object with OLD status (changes in condition missed in this current case) after it was patched in previous reconcile loop. So I decided that it would be worth to report it here.

P.S. I will double check it again to prevent any erroneous conclusion from my side, due to complexity of the code, and post it here.

UPD:
Ok, double checked, I missed one place where still was used Client instead of Reader, now I can confirm that using reader do fix this behaviour.

Is there any other option you can suggest to overcome this, since making a direct call to API is not a good decision from scalability perspective, and using cache is encouraged?

@sbueringer
Copy link
Member

sbueringer commented Jan 30, 2024

You can do the following:

  • Patch call
  • Wait until object is changed in cache (directly after the patch)

@i-prudnikov
Copy link
Author

Thank you for suggestions, this is quite straight forward and easy to achieve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants