Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need write back cache #1622

Closed
dee0sap opened this issue Aug 3, 2021 · 24 comments
Closed

Need write back cache #1622

dee0sap opened this issue Aug 3, 2021 · 24 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@dee0sap
Copy link

dee0sap commented Aug 3, 2021

Ticket https://github.com/crossplane/crossplane/issues/2435 was opened a due to a leak of resources in AWS.
Multiple reasons for the leak were found. One of the reasons is a caching/concurrency problem in controller-runtime.

In the above ticket the caching/concurrency problem involved an AWS route table managed resource. Reconcile created an additional one in AWS because after it had set the external name it was asked to Reconcile again when the cache had an out of date instance of the route table, one where external-name hasn't been set yet. This happened despite the use of HasSynced in specificInformersMap::Get.

These comments to the ticket give the details of how this is happening
https://github.com/crossplane/crossplane/issues/2435#issuecomment-891318989
https://github.com/crossplane/crossplane/issues/2435#issuecomment-890462636
https://github.com/crossplane/crossplane/issues/2435#issuecomment-890424587

A couple of 'band-aid' fixes would be to

  • Get AWS to update their API so that the route table creation would allow the user to supply the unique id
  • Update ResolveReferences to return an indication of whether the resources was updated or not in addition to returning an error.

The first would only solve the problem for AWS route tables which isn't ideal.

The second would only solve the problem when the additional Reconcile, the one where an additional resource in AWS is created, is the result of reference resolution. However since updates to the managed resource object in the API server could come from anywhere this also isn't ideal.

A write-back cache would provide a complete solution. Here is an outline of the changes I think are required.

== client-go ==
These changes just make it so cache mutation continues to work as expected.

  • In shared_informer.go add indexer which embeds Indexer .
    Move cacheMutationDetector from sharedIndexInformer to indexer.
    indexer will wrap calls to Add, Update and Replace. The wrapper methods will forward the passed in object to cacheMutationDetector before the object to the appropriate method of the embedded Indexer.
    NOTE: This plugs a gap in the current cache mutation detection that exists today. Today detection only is performed on objects added to the cache via sharedIndexInformer::HandleDeltas.
  • In shared_informer.go remove cacheMutationDetector from sharedIndexInformer. Change construction of sharedIndexInformer to use indexer mentioned above.

== controller-runtime ==

  • Add CacheUpdater which is similar to CacheReader defined in cache_reader.go.
    It only has an Update method which forwards its parameters to indexer.Update.
  • Add Updater method to MapEntry.
  • Update addInformerToMap in so it sets Updater of MapEntry.
  • Add Updater interface to interfaces.go. This just has Update method identical to the one on the Writer interface.
  • Add Update method to informerCache. which will be similar Get etc.
  • Update interface Cache in cache.go so it embeds above Updater.
  • Add cacheWriteBAck struct similar to delegatingReader. It will embed Writer and it will have to the cache and the uncachedGVKs. It will forward calls to Writer first and if those succeed and, if uncachedGVKs doesn't indicate otherwise, call Update on the cache, passing the updated object from the server.
  • Update NewDelegatingClient so that constructs a delegatingClient with a cacheWriteThrough for the writer.

@hasheddan @negz

@coderanger
Copy link
Contributor

I'm pretty sure this is 100% impossible short of massive code duplication from kube-apiserver. We can't know what kind of transforms would happen on save (webhooks, core types with funky storage adapters, etc) and making this work for Patch requests would be extra super fun. At best we could write back to the cache with the returned object data after the update call but that's different than write-through.

@dee0sap
Copy link
Author

dee0sap commented Aug 3, 2021

Thanks for the prompt reply Noah @coderanger.

Maybe write-through isn't the exactly the term then :) Shall I replace 'write-through' with 'write-back' and update description above?

@dee0sap dee0sap changed the title Need write through cache Need write back cache Aug 3, 2021
@dee0sap
Copy link
Author

dee0sap commented Aug 3, 2021

Re-worded things based on coderanger's comments.

@coderanger
Copy link
Contributor

Is there a reason to not use the substantially simpler solution of "use an uncached client for this particular point in the code"?

@dee0sap
Copy link
Author

dee0sap commented Aug 4, 2021

I considered that but looking at
https://github.com/crossplane/crossplane-runtime/blob/master/pkg/reconciler/managed/reconciler.go#L504
( Reconciler::Reconcile ) it seemed that then we may as well not use caching at all. That is one of the first things that is done is to assess the value of the annotation 'external-name' in order to determine whether or not an external reference should be created. ( That happens in the Observe call at line 620 ).

@coderanger
Copy link
Contributor

Yes? I mean this is one of the major downsides to using the objects themselves for stateful storage as you are in Crossplane. My usual recommendation is either use a configmap/secret or a dedicate state storage type. And in general Crossplane is trying to bridge the gap between a convergent-only system (k8s) and a "we've heard about convergence but not totally on board" system (cloud APIs). You're going to have some cases where caching is going to be unhelpful because you need external state storage because the cloud provider's API is neither convergent nor stateful.

@dee0sap
Copy link
Author

dee0sap commented Aug 4, 2021

So what is the downside up updating the cache with the response from the server? Or at least making it easy for someone to opt-in for that behavior?
From what I see in the code it doesn't seem hard to put in place.

@coderanger
Copy link
Contributor

It's very fiddly :) It means there's many places that cache updates can come from as opposed to just one, watch push events.

@dee0sap
Copy link
Author

dee0sap commented Aug 4, 2021

There could be a boolean option to specify whether write back cache happens at all with a default to 'no'.
This way folks could opt in.
That wouldn't be hard to add to the change I proposed.

@coderanger
Copy link
Contributor

That still adds substantial complexity to the code. And that's generally a terrible design style. If you are worried about something being a problem, you don't fix the problem by making only a few people experience it.

@dee0sap
Copy link
Author

dee0sap commented Aug 4, 2021

While I agree that the proposed change makes the code more complex I wouldn't characterize as substantially more complex.

And looking through I see there are at least a few others who have opened issues because they didn't expect the current behaviour. ( #1543, #1464, #1349 ). And there is this trickiness to try to help avoid use of a stale cache here

// Wait for it to sync before returning the Informer so that folks don't read from a stale cache.

So at least for some the current behaviour doesn't seem to adhere to the principle of least astonishment. That's not so great either.

Btw I don't understand what you meant by 'If you are worried about something being a problem, you don't fix the problem by making only a few people experience it.'. Particularly given that I see delegatingReader has options to disable caching for unstructured data and for specified gvk. Would you please elaborate?

@alvaroaleman
Copy link
Member

I don't think manipulating the cached data is an option, because the data in the shared informer is managed on the basic assumption that there is a resource version and the data we have reflects the data at that resource version in the apiserver. Breaking this assumption has way too much potential to introduce issues and I would be very surprised if upstream accepts submissions into that direction.

A less invasive approach would be to wrap the client with something that blocks after mutating calls until it finds the response it got from the api in the cache. The main drawback is that it will introduce quite a bit of latency into mutating calls.

@dee0sap
Copy link
Author

dee0sap commented Sep 13, 2021

@alvaroaleman
Not sure I understand the concern.

If the process were to be

  • Make the update in k8s ( returns the new version )
  • Update the cache with the version just received above
    then the cache is always reflecting a resource version that is on the server.

The change would just expedite bringing a just written change into the cache.

@alvaroaleman
Copy link
Member

alvaroaleman commented Sep 13, 2021

A very basic example on how things go wrong with a writeback cache just off of the top of my head and I am very sure there are more:

  • We update the data in the cache
  • We get an update that happened before the update we inserted into the cache
    -> Cached data is again outdated

then the cache is always reflecting a resource version that is on the server.

No, because there are also RVs for lists. If we mess with the cache, the reflector will think we are at a given RV when we are not, because we manually manipulated the data in the store.

@dee0sap
Copy link
Author

dee0sap commented Sep 13, 2021

What if this write back behavior was limited to only those objects a generation value? ( https://github.com/zecke/Kubernetes/blob/master/docs/devel/api-conventions.md )

If nothing ever updated the cache with an object whose generation value was less than the value for whatever was in the cache already there would be a risk of pushing stale data back into the cache.

@negz
Copy link
Contributor

negz commented Sep 13, 2021

A less invasive approach would be to wrap the client with something that blocks after mutating calls until it finds the response it got from the api in the cache.

I like the sound of this approach.

I personally (as a representative of the Crossplane project) am aligned with the controller-runtime maintainers here and feel good about coding for the assumption that cache reads may be stale. It would definitely be ideal to make that contract obvious to controller authors though, as it did surprise me at first.

A little more context on the original issue referenced by this one (https://github.com/crossplane/crossplane/issues/2435). Many of our Crossplane controllers orchestrate external APIs - i.e. cloud providers etc - rather than Kubernetes APIs. A subset of those APIs (frustratingly) return non-deterministic unique identifiers at create time, so sometimes when we call an API to create a thing it's critical that we persist the ID of that thing. If we don't, we'll assume we need to call create again on the next reconcile. This means for us a stale cache read can potentially result in us creating something twice (because the stale read may not contain the ID we recorded).

We opted to alleviate this issue by calling Update to add an annotation to our resources immediately before we try to create things in an external API. That way if a stale cache read is causing us to think we need to create something in the external API when we in fact don't, the Update call will fail (due to a stale resource version) and cause us to requeue.

@alvaroaleman
Copy link
Member

A subset of those APIs (frustratingly) return non-deterministic unique identifiers at create time, so sometimes when we call an API to create a thing it's critical that we persist the ID of that thing

Orthogonal to this issue here, but a better approach for this for many resources at many cloudprovides is to tag them during creation and then use a filtered list request to check if it already exists. This way you don't need to persist and it also allows you to deal with the occasional "Create request failed but creation actually succeeded".

If nothing ever updated the cache with an object whose generation value was less than the value for whatever was in the cache already there would be a risk of pushing stale data back into the cache.

Lists don't have a generation.

@negz
Copy link
Contributor

negz commented Sep 14, 2021

Orthogonal to this issue here, but a better approach for this for many resources at many cloudprovides is to tag them during creation and then use a filtered list request to check if it already exists.

I don't necessarily disagree, but "better" is a bit subjective here. When you're dealing with hundreds of different API implementations across tens of different external systems (different clouds, etc) there's not always a reliable and consistent metadata API to use. :)

@matthchr
Copy link

matthchr commented Nov 5, 2021

A very basic example on how things go wrong with a writeback cache just off of the top of my head and I am very sure there are more:
We update the data in the cache
We get an update that happened before the update we inserted into the cache
-> Cached data is again outdated

If the cache understands resourceVersion, then you could avoid overwriting data with obviously older data. It seems achievable to have a cache that won't miss updates done by the current process. Of course, the cache could still be out of date because of some external actor (of which there can be many) - there's no solving that.

It would solve the "I can do a PUT followed immediately by a GET and not see what I just PUT" situation though, which is the most surprising bit to me. As @dee0sap highlighted, it's quite common in controllers to perform incremental updates and that pattern seem to interact poorly with the reality that your subsequent reconcile can end up operating on data without the previous change that your own process made. Even if it doesn't cause any problems per-se it's still awkward.

It's also weird from a data movement perspective IMO. The fact that the server bothers to reply to a PUT showing me an updated resource, with an updated resourceVersion containing all of the changes I just made suggests that I should do something with that response - why drop it on the floor from a caching perspective? That is data we're already getting (so it's "free") that could be used to improve the cache hit-rate so to speak.

@alvaroaleman
Copy link
Member

alvaroaleman commented Nov 5, 2021

If the cache understands resourceVersion, then you could avoid overwriting data with obviously older data.

No, refer to https://kubernetes.io/docs/reference/using-api/api-concepts/#resource-versions, specifically this:

You must not assume resource versions are numeric or collatable. API clients may only compare two resource versions
for equality (this means that you must not compare resource versions for greater-than or less-than relationships).

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 3, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 5, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants