Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Informer::init considered harmful? #134

Closed
nightkr opened this issue Feb 19, 2020 · 9 comments
Closed

Informer::init considered harmful? #134

nightkr opened this issue Feb 19, 2020 · 9 comments
Labels
runtime controller runtime related

Comments

@nightkr
Copy link
Member

nightkr commented Feb 19, 2020

Any use case that I can see will be prone to race conditions. In particular:

  • "I only care about changes!": You'll miss any changes that happen while your controller is down.
  • "I already handled existing objects via a Reflector!": You'll miss any changes that happen between Reflector::init and Informer::init. There is also no guarantee that the Reflector will be up to date if you try to read from it from the Informer.

It's also somewhat confusing that Reflector::init is required for it to be useful, while Informer::init will mostly be a source of bugs.

@clux
Copy link
Member

clux commented Feb 20, 2020

Yeah, the Informer very much tries to be a best-effort thing. If your controller is down, unless you store the resource version, and the resource version have not expired in that time, you can't assume you get to process all events.

The normal controller advice I've seen is to periodically loop through everything and reconcile their state in an idempotent way. However, you don't really get to do that unless you have the state elsewhere, like in a Reflector.

So currently, to do the right thing you'd need both, and periodically reconcile what's inside of the Reflector, while also handling events. It's not a great pattern atm.

Considering writing a more robust Controller like type that makes the right behaviour a little easier to implement - and only maintains one cache for both.

@nightkr
Copy link
Member Author

nightkr commented Feb 20, 2020

If you don't care about the cache (and arguably you shouldn't, given that K8s doesn't have transactions..) then an uninitialized Informer should do the right thing from what I can see (it'll just generate a bunch of fake Add events for existing objects). Unless I'm missing some case where it'll drop events midway through?

@nightkr
Copy link
Member Author

nightkr commented Feb 20, 2020

After looking a bit closer, Informer will indeed lose events if the connection desyncs.

It would be nice to at least have a conservative mode, where it prefers sending extra duplicate events over losing anything.

@clux
Copy link
Member

clux commented Feb 20, 2020

I'm not sure I understand. Desyncs typically happens due to a 410 Gone from the apiserver, if that happens, we do lose events. Not really sure what we can do about it in this apart from implementing bookmark stuff from kube>=1.15 (see #54), but maybe I misunderstand what you are proposing.

@nightkr
Copy link
Member Author

nightkr commented Feb 20, 2020

There are two different options when desyncs happen:

  1. Do a LIST to get the latest resourceVersion, losing all events between the desync and the LIST
  2. Always set resourceVersion to "0", generating WatchEvent::Added events for all resources that already exist

Approach 2 still has a few implications:

  1. Modified resources generate Added events rather than Modified events
    • Applications should usually treat these as generic Upsert events instead. Maybe the abstraction should even erase this distinction, to avoid the confusion?
  2. Unmodified resources also generate Added events
    • So the Added event handler needs to be idempotent
  3. Deletion events are lost
    • Controllers should set finalizers if they care about cleaning up specific resources reliably.

Currently Informer takes approach 1. I'd argue that 2 would be more reasonable for most applications, despite the listed implications.

@clux
Copy link
Member

clux commented Feb 20, 2020

Ohh, I see. I didn't realise the events we got from version zero actually contains at least one event for every resource. This makes approach 2 a lot more reasonable, even if boot becomes a bit heavier. 👍

Upsert events instead. Maybe the abstraction should even erase this distinction, to avoid the confusion?

Yeah, that is pretty in-line with k8s controller writing advice. We should possibly erase the distinction for even Delete events. But that's a pretty opinionated thing to do:

This is what I imagine a Controller type struct - built on top Informer - would excel at. It could take a reconcile callback with a name of the object + namespace (like controller-runtime reconcilers) and it'd be up to the app to implement how to reconcile based on that.

If we just stop following approach 1, in Informer, then we could have all the flexibility and have a best-practice approach.

I like the idea of advocating for finalizers as well. Either that, or the Controller driving app should diff third party state against kube state periodically to catch deleted resources.

@clux
Copy link
Member

clux commented Feb 20, 2020

It's funny that the kubernetes docs say kind of the opposite in https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes

When the requested watch operations fail because the historical version of that resource is not available, clients must handle the case by recognizing the status code 410 Gone, clearing their local cache, performing a list operation, and starting the watch from the resourceVersion returned by that new list operation.

@nightkr
Copy link
Member Author

nightkr commented Feb 20, 2020

From what I can tell, those docs essentially say:

  1. Use a LIST request during the initial sync to to get a current snapshot
    • I don't see how this is different from resourceVersion: 0. The history is pruned anyway, and this saves both a request roundtrip and some extra client logic.
  2. Prefer sending a recent resourceVersion when restarting the watch
    • kube-rs already does this, and it wouldn't be affected as long as we don't stay disconnected while our revision is pruned.
  3. Use watch bookmarks to stay closer to the current resourceVersion

@clux clux added the runtime controller runtime related label Feb 25, 2020
@clux clux added this to the kubecon-eu-2020 milestone Feb 25, 2020
nightkr added a commit to Appva/kube-rs that referenced this issue Feb 26, 2020
Takes "approach 2" from
kube-rs#134 (comment), see the noted
caveats.

Deletes Informer::init, since it is no longer useful.
@clux
Copy link
Member

clux commented Feb 26, 2020

init removed in 0.27.0. See #148 for a controller runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
runtime controller runtime related
Projects
None yet
Development

No branches or pull requests

2 participants