New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add AssumeCache from volume binding to client-go #112202
Conversation
The best solution that I could come up with that doesn't depend on ResourceVersion is this:
This ensures that the apiserver always wins. But this may also lead to not having the very latest object locally:
With this approach, the AssumeCache is still an improvement over just using the informer cache, but it cannot fully guarantee that the local object is always the latest. |
There is a simpler solution that achieves the same: |
Let me quote from the comment from @cofyc in #sig-storage:
|
That would imply padding the integer with enough zeros to ensure that "01" < "20". Not sure whether it can be done on-the-fly for existing objects. How many zeros are "enough"? It might be better to just specify the current behavior (convert to integer, compare the result). |
568da8c
to
bcf48e3
Compare
Thanks to some optimization (not allocating a "not found" error when the caller in |
That is a concern... I don't think we should provide libraries to clients that go against the explicit documentation that clients should not compare resource versions other than equality. I'd like to hear @deads2k and @lavalamp thoughts as well. |
/triage accepted |
I'm not very excited about clients parsing RV, our explicit guidance is to not do that. You can use I'll try and think of better alternatives... |
60204b5
to
f794435
Compare
Perhaps two examples in the documentation of AssumeCache make this clearer. I've pushed an update... |
@pohly: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Ah. (I still haven't actually read this code, btw -- I am still trying to figure out if anything in this space is possible without API changes, and right now I'm thinking not.)
Yeah, there's at least one case where this technique can cause backwards time travel, which is the one thing clients basically can't tolerate at all. I want this kind of cache to exist (I talked about the necessity of this in the least understood part of this old talk), but I think we need to do it right. Currently it seems like we either need to add an entire-object logical clock, or relax our RV client usage constraints. I think this is worth talking about at the next api machinery SIG meeting. |
(I put this on the agenda) |
I can't speak for other clients, but at least the controller that I am currently working on for dynamic resource allocation has no problem with this. It's entirely stateless and will just react to whatever it currently gets from the cache. The gRPC calls it makes are idempotent, so it's not a problem to do the same operation twice. If a request is invalid because the object was too old, then an error gets logged and it will try again, probably with a newer object. TLDR: I think this would already be useful in client-go as it is now. But I can also host a local copy elsewhere while something better gets figured out for client-go. |
That means you can have oscillation and actuation amplification. (user changes A->B; controller changes external system A->B->A->B before settling) |
In my case, the controller triggers two external changes, allocation and deallocation of a claim, and records the result in the claim status. When A = "not allocated, pending" and B = "needs to be allocated", then the external change is an Allocate call when seeing B. The third state then is C = "is allocated" which the AssumeCache needs to store locally to prevent repeating the Allocate call. When the controller sees A->B->A, it only calls Allocate once. Without the AssumeCache, it acts twice on B because of the stale informer cache. That's still harmless (the Allocate call is idempotent), but causes noise in the logs and extra work. The inverse direction (deallocating a claim) is similar. |
A is "needs to be allocated", B is "allocated", C is "needs to be deallocated", D is "unallocated" I don't know the specifics enough to say the exact sequence that triggers the behavior, but I'll be surprised if there's no A->C write combo which, combined with a time travel event, causes your controller to allocate, deallocate, allocate, deallocate. Idempotence doesn't protect against time travel! If the input (desired state) flaps, the output will flap too. Maybe the trigger will be exceedingly rare or never happen in your case, but that definitely isn't generally true. |
I think it is indeed. I can easily trigger the situation where the lack of an AssumeCache causes redundant Allocate calls, but I can't think of a way to trigger thrashing. Anyway, we don't need to merge this. I can copy the end-result of this PR into a package elsewhere and just use it in that controller where it is known to help more than it hurts. Shall I close this PR or keep it open as a reminder? |
Let's leave this open at least until the sig discusses this. |
This is identical to the proposal from kubernetes#112202. In contrast to the AssumeCache in the volume binding scheduler plugin, this version of the code does not make assumptions about the content of the ResourceVersion fields, i.e. it does no "is newer than" comparison. Therefore objects might go back in time under some circumstances. This was seen as insufficient for inclusion in client-go, but for DRA its better than not having the AssumeCache, so the code gets included here for use in that controller.
This is identical to the proposal from kubernetes#112202. In contrast to the AssumeCache in the volume binding scheduler plugin, this version of the code does not make assumptions about the content of the ResourceVersion fields, i.e. it does no "is newer than" comparison. Therefore objects might go back in time under some circumstances. This was seen as insufficient for inclusion in client-go, but for DRA its better than not having the AssumeCache, so the code gets included here for use in that controller.
@pohly: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is identical to the proposal from kubernetes#112202. In contrast to the AssumeCache in the volume binding scheduler plugin, this version of the code does not make assumptions about the content of the ResourceVersion fields, i.e. it does no "is newer than" comparison. Therefore objects might go back in time under some circumstances. This was seen as insufficient for inclusion in client-go, but for DRA its better than not having the AssumeCache, so the code gets included here for use in that controller.
This is identical to the proposal from kubernetes#112202. In contrast to the AssumeCache in the volume binding scheduler plugin, this version of the code does not make assumptions about the content of the ResourceVersion fields, i.e. it does no "is newer than" comparison. Therefore objects might go back in time under some circumstances. This was seen as insufficient for inclusion in client-go, but for DRA its better than not having the AssumeCache, so the code gets included here for use in that controller.
This was discussed in the September 21st 2022 SIG api-machinery meeting. There was a certain tendency to just allow parsing ResourceVersion, but also concerns that it might not always be a signed 64 bit int. The next step is for @lavalamp to write an email and/or KEP documenting the options and making a formal proposal. My two cents:
|
Discussion in: #112684 |
Doesn't the mutation cache in client-go already implement (the more useful, but invalid use of API semantics) functionality? |
Mutation cache is test-only, and is there to detect people modifying items in the cache accidentally. Unless it has changed substantially since I last looked. |
I either didn't know about that or had blocked it from my memory. My first impression is that it seems to be constructed of razor wire :) |
I didn't know about it either. Now that I do I am wondering how it would interact with an informer. There's the "TODO find a way to layer this into an informer/lister". I suppose it's meant to be passed the store that is used by an informer? I like the implementation less than the one from the volume binder, but I guess it would do the job and this PR isn't needed - unless a more modern, type-safe API is desired. |
Let me close this PR. The mutation cache is already in client-go and does essentially the same thing, just with a different API. It would still be good to get the whole question of "is it okay to compare versions" clarified, but we don't need this PR for that. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
The AssumeCache in the kube-scheduler volume binding plugin solves a problem that other components also have: when storing objects in an informer cache, that cache is out-dated directly after making a change in the apiserver. If the component happens to do some work at that point that involves the updated object, it will do so based on stale information.
Components can be designed to handle this, for example by discarding changes if storing an updated object in the apiserver fails with a conflict error. Probably components have to be prepared for such a case anyway, but it is confusing.
Special notes for your reviewer:
This PR leaves the the volume binding plugin unchanged. Migrating that code can be handled separately.
Because this uses generics, each commit comes with benchmark results.
In contrast to the original code, the final version of the AssumeCache in this PR no longer makes assumptions about the content of the ResourceVersion field.
Does this PR introduce a user-facing change?