add AssumeCache from volume binding to client-go #112202

pohly · 2022-09-02T10:48:07Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

The AssumeCache in the kube-scheduler volume binding plugin solves a problem that other components also have: when storing objects in an informer cache, that cache is out-dated directly after making a change in the apiserver. If the component happens to do some work at that point that involves the updated object, it will do so based on stale information.

Components can be designed to handle this, for example by discarding changes if storing an updated object in the apiserver fails with a conflict error. Probably components have to be prepared for such a case anyway, but it is confusing.

Special notes for your reviewer:

This PR leaves the the volume binding plugin unchanged. Migrating that code can be handled separately.

Because this uses generics, each commit comes with benchmark results.

In contrast to the original code, the final version of the AssumeCache in this PR no longer makes assumptions about the content of the ResourceVersion field.

Does this PR introduce a user-facing change?

NONE

pohly · 2022-09-02T10:53:54Z

The best solution that I could come up with that doesn't depend on ResourceVersion is this:

before retrieving an object from the cache with the intent to modify it, lock it
the informer callbacks also locks it
after the Update call, unlock and store the modified object or cancel the lock in case of a failure

This ensures that the apiserver always wins. But this may also lead to not having the very latest object locally:

apiserver sends new object with revision X, gets added to AssumeCache
apiserver sends update Y, not added yet
component locks object with revision X
patches object (can succeed even when not using the latest revision) -> revision Z stored in AssumeCache
AssumeCache processes object with revision Y, replacing the newer assumed object with revision Z

With this approach, the AssumeCache is still an improvement over just using the informer cache, but it cannot fully guarantee that the local object is always the latest.

pohly · 2022-09-03T17:44:15Z

There is a simpler solution that achieves the same: Assume must be passed the original object (= the one retrieved from the cache) and the modified one (= the one returned as response to Update or Patch). The modified object then only gets stored if the cache hasn't received a different object from the apiserver in the meantime.

aojea · 2022-09-04T09:34:07Z

/assign @lavalamp @liggitt @deads2k

I think this is a recurrent topic, but I couldn't find the references why it was not implemented this before, the assigned persons will now better for sure 😄

pohly · 2022-09-05T05:53:35Z

Let me quote from the comment from @cofyc in #sig-storage:

Although the doc about resource version asks not to compare resource version, however the GET/WATCH methods can accept ResourceVersion query parameter to get not older than or starts at specified resource version. IMHO the resource version is designed to be monotonic (at least per object I guess) but it is internal implementation in API server currently. Submit a proposal to standardize the ordering of ResourceVersion , e.g. alphanumeric ordering?

pohly · 2022-09-05T05:57:06Z

alphanumeric ordering

That would imply padding the integer with enough zeros to ensure that "01" < "20". Not sure whether it can be done on-the-fly for existing objects. How many zeros are "enough"?

It might be better to just specify the current behavior (convert to integer, compare the result).

pohly · 2022-09-06T09:04:07Z

Thanks to some optimization (not allocating a "not found" error when the caller in add just discards it again) the current implementation is now faster than the original one.

liggitt · 2022-09-06T13:35:17Z

One potential concern is that the AssumeCache must make an assumption (sic!) about the content of the ResourceVersion field.

That is a concern... I don't think we should provide libraries to clients that go against the explicit documentation that clients should not compare resource versions other than equality. I'd like to hear @deads2k and @lavalamp thoughts as well.

pohly · 2022-09-06T14:14:31Z

The AssumeCache under client-go could be implemented without the "older than" comparison. It would still be useful for my purposes. But can it then be used by the volume binding plugin or do we need to keep that code unchanged?

@jsafrane, @msau42: WDYT?

leilajal · 2022-09-06T20:06:41Z

/triage accepted

lavalamp · 2022-09-06T20:55:32Z

I'm not very excited about clients parsing RV, our explicit guidance is to not do that.

You can use .generation for this if your changes are in the spec. Ideal would be a logical clock like generation that covered the whole object.

I'll try and think of better alternatives...

pohly · 2022-09-12T12:35:23Z

Perhaps two examples in the documentation of AssumeCache make this clearer. I've pushed an update...

k8s-ci-robot · 2022-09-12T13:11:10Z

@pohly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-unit	`f794435`	link	true	`/test pull-kubernetes-unit`
pull-kubernetes-e2e-kind-ipv6	`f794435`	link	true	`/test pull-kubernetes-e2e-kind-ipv6`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

lavalamp · 2022-09-12T16:58:57Z

The "is equal" comparison does not depend on that. The logic is "if the AssumeCache currently holds the same object that I modified, then it is safe to replace it with the update that I just got back from Update/Patch". There is no comparison when receiving an update through the informer. That update always overwrites the object in the AssumeCache.

Ah. (I still haven't actually read this code, btw -- I am still trying to figure out if anything in this space is possible without API changes, and right now I'm thinking not.)

That's why the revised AssumeCache is less precise: sometimes it might drop the newer object and replace it with an older one.

Yeah, there's at least one case where this technique can cause backwards time travel, which is the one thing clients basically can't tolerate at all.

I want this kind of cache to exist (I talked about the necessity of this in the least understood part of this old talk), but I think we need to do it right.

Currently it seems like we either need to add an entire-object logical clock, or relax our RV client usage constraints. I think this is worth talking about at the next api machinery SIG meeting.

lavalamp · 2022-09-12T18:03:03Z

(I put this on the agenda)

pohly · 2022-09-12T19:13:58Z

Yeah, there's at least one case where this technique can cause backwards time travel, which is the one thing clients basically can't tolerate at all. [...] I think we need to do it right.

I can't speak for other clients, but at least the controller that I am currently working on for dynamic resource allocation has no problem with this. It's entirely stateless and will just react to whatever it currently gets from the cache. The gRPC calls it makes are idempotent, so it's not a problem to do the same operation twice. If a request is invalid because the object was too old, then an error gets logged and it will try again, probably with a newer object.

TLDR: I think this would already be useful in client-go as it is now. But I can also host a local copy elsewhere while something better gets figured out for client-go.

lavalamp · 2022-09-12T21:26:02Z

It's entirely stateless and will just react to whatever it currently gets from the cache.

That means you can have oscillation and actuation amplification. (user changes A->B; controller changes external system A->B->A->B before settling)

pohly · 2022-09-13T06:09:57Z

In my case, the controller triggers two external changes, allocation and deallocation of a claim, and records the result in the claim status.

When A = "not allocated, pending" and B = "needs to be allocated", then the external change is an Allocate call when seeing B. The third state then is C = "is allocated" which the AssumeCache needs to store locally to prevent repeating the Allocate call. When the controller sees A->B->A, it only calls Allocate once. Without the AssumeCache, it acts twice on B because of the stale informer cache. That's still harmless (the Allocate call is idempotent), but causes noise in the logs and extra work.

The inverse direction (deallocating a claim) is similar.

lavalamp · 2022-09-13T16:53:07Z

A is "needs to be allocated", B is "allocated", C is "needs to be deallocated", D is "unallocated"

I don't know the specifics enough to say the exact sequence that triggers the behavior, but I'll be surprised if there's no A->C write combo which, combined with a time travel event, causes your controller to allocate, deallocate, allocate, deallocate.

Idempotence doesn't protect against time travel! If the input (desired state) flaps, the output will flap too.

Maybe the trigger will be exceedingly rare or never happen in your case, but that definitely isn't generally true.

pohly · 2022-09-13T17:39:22Z

Maybe the trigger will be exceedingly rare or never happen in your case,

I think it is indeed. I can easily trigger the situation where the lack of an AssumeCache causes redundant Allocate calls, but I can't think of a way to trigger thrashing.

Anyway, we don't need to merge this. I can copy the end-result of this PR into a package elsewhere and just use it in that controller where it is known to help more than it hurts.

Shall I close this PR or keep it open as a reminder?

lavalamp · 2022-09-13T17:40:39Z

Let's leave this open at least until the sig discusses this.

This is identical to the proposal from kubernetes#112202. In contrast to the AssumeCache in the volume binding scheduler plugin, this version of the code does not make assumptions about the content of the ResourceVersion fields, i.e. it does no "is newer than" comparison. Therefore objects might go back in time under some circumstances. This was seen as insufficient for inclusion in client-go, but for DRA its better than not having the AssumeCache, so the code gets included here for use in that controller.

k8s-ci-robot · 2022-09-20T12:21:39Z

@pohly: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

This is identical to the proposal from kubernetes#112202. In contrast to the AssumeCache in the volume binding scheduler plugin, this version of the code does not make assumptions about the content of the ResourceVersion fields, i.e. it does no "is newer than" comparison. Therefore objects might go back in time under some circumstances. This was seen as insufficient for inclusion in client-go, but for DRA its better than not having the AssumeCache, so the code gets included here for use in that controller.

pohly · 2022-09-22T10:35:09Z

This was discussed in the September 21st 2022 SIG api-machinery meeting.

There was a certain tendency to just allow parsing ResourceVersion, but also concerns that it might not always be a signed 64 bit int. The next step is for @lavalamp to write an email and/or KEP documenting the options and making a formal proposal.

My two cents:

It's not just me who wants this. The kube-scheduler volume binding plugin already has such an AssumeCache with ResourceVersion parsing.
A new fields with better defined semantic may be nicer, but doesn't solve the problem for a long time while Kubernetes releases without it still need to be supported. Can we allow parsing ResourceVersion as fallback for those?

lavalamp · 2022-09-22T20:11:35Z

Discussion in: #112684

stevekuznetsov · 2022-09-22T20:22:29Z

Doesn't the mutation cache in client-go already implement (the more useful, but invalid use of API semantics) functionality?

lavalamp · 2022-09-22T20:27:06Z

Mutation cache is test-only, and is there to detect people modifying items in the cache accidentally. Unless it has changed substantially since I last looked.

stevekuznetsov · 2022-09-22T20:39:02Z

Sorry, to be super clear, I mean the cache here, imported in k/k and used here in the ServiceAccount tokens controller.

lavalamp · 2022-09-22T21:04:44Z

I either didn't know about that or had blocked it from my memory. My first impression is that it seems to be constructed of razor wire :)

pohly · 2022-09-22T21:13:08Z

I didn't know about it either. Now that I do I am wondering how it would interact with an informer. There's the "TODO find a way to layer this into an informer/lister". I suppose it's meant to be passed the store that is used by an informer?

I like the implementation less than the one from the volume binder, but I guess it would do the job and this PR isn't needed - unless a more modern, type-safe API is desired.

pohly · 2022-10-04T12:11:40Z

Let me close this PR. The mutation cache is already in client-go and does essentially the same thing, just with a different API.

It would still be good to get the whole question of "is it okay to compare versions" clarified, but we don't need this PR for that.

k8s-ci-robot requested review from humblec and wojtek-t September 2, 2022 10:49

k8s-ci-robot assigned deads2k, lavalamp and liggitt Sep 4, 2022

pohly force-pushed the client-go-assume-cache branch 2 times, most recently from 568da8c to bcf48e3 Compare September 6, 2022 09:03

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 6, 2022

pohly force-pushed the client-go-assume-cache branch from 60204b5 to f794435 Compare September 12, 2022 12:34

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 20, 2022

lavalamp mentioned this pull request Sep 22, 2022

RV vs object-wide logical clock #112684

Open

pohly closed this Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add AssumeCache from volume binding to client-go #112202

add AssumeCache from volume binding to client-go #112202

pohly commented Sep 2, 2022 •

edited

pohly commented Sep 2, 2022

pohly commented Sep 3, 2022

aojea commented Sep 4, 2022

pohly commented Sep 5, 2022

pohly commented Sep 5, 2022 •

edited

pohly commented Sep 6, 2022

liggitt commented Sep 6, 2022

pohly commented Sep 6, 2022

leilajal commented Sep 6, 2022

lavalamp commented Sep 6, 2022

pohly commented Sep 12, 2022

k8s-ci-robot commented Sep 12, 2022

lavalamp commented Sep 12, 2022

lavalamp commented Sep 12, 2022

pohly commented Sep 12, 2022

lavalamp commented Sep 12, 2022

pohly commented Sep 13, 2022

lavalamp commented Sep 13, 2022

pohly commented Sep 13, 2022

lavalamp commented Sep 13, 2022

k8s-ci-robot commented Sep 20, 2022

pohly commented Sep 22, 2022

lavalamp commented Sep 22, 2022

stevekuznetsov commented Sep 22, 2022

lavalamp commented Sep 22, 2022

stevekuznetsov commented Sep 22, 2022

lavalamp commented Sep 22, 2022

pohly commented Sep 22, 2022

pohly commented Oct 4, 2022

add AssumeCache from volume binding to client-go #112202

add AssumeCache from volume binding to client-go #112202

Conversation

pohly commented Sep 2, 2022 • edited

What type of PR is this?

What this PR does / why we need it:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

pohly commented Sep 2, 2022

pohly commented Sep 3, 2022

aojea commented Sep 4, 2022

pohly commented Sep 5, 2022

pohly commented Sep 5, 2022 • edited

pohly commented Sep 6, 2022

liggitt commented Sep 6, 2022

pohly commented Sep 6, 2022

leilajal commented Sep 6, 2022

lavalamp commented Sep 6, 2022

pohly commented Sep 12, 2022

k8s-ci-robot commented Sep 12, 2022

lavalamp commented Sep 12, 2022

lavalamp commented Sep 12, 2022

pohly commented Sep 12, 2022

lavalamp commented Sep 12, 2022

pohly commented Sep 13, 2022

lavalamp commented Sep 13, 2022

pohly commented Sep 13, 2022

lavalamp commented Sep 13, 2022

k8s-ci-robot commented Sep 20, 2022

pohly commented Sep 22, 2022

lavalamp commented Sep 22, 2022

stevekuznetsov commented Sep 22, 2022

lavalamp commented Sep 22, 2022

stevekuznetsov commented Sep 22, 2022

lavalamp commented Sep 22, 2022

pohly commented Sep 22, 2022

pohly commented Oct 4, 2022

pohly commented Sep 2, 2022 •

edited

pohly commented Sep 5, 2022 •

edited