Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP fix #3587: adding a construct between a Watch and an Informer #3616

Closed
wants to merge 1 commit into from

Conversation

shawkins
Copy link
Contributor

Description

This is a rough cut of supporting something in-between a Watch and an Informer. For now it's easier to develop/show this independently of the informer code as to avoid larger refactorings. This will be rebased after #3609 as there will be some shared changes.

The result is a never ending watch that the user interacts with using a Watcher with an additional method: deletedStateUnknown(String key, String resourceVersion) that conveys that a particular resource has been deleted prior to the relist, so you don't know its full state - more on this below.

This would be constructed off the same context as Informable possibly via new interface (I haven't added that just yet). The key function could always be the uid - note the choice of function affects what shows up in the deletedStateUnknown.

The only other new piece of information needed is a batchSize, which is used to control the pagination size of the batched list fetching.

So this could be as simple as:

client.pods().longWatch(100L /*batch size*/, new LongWatcher() {
  @Override
  public void deletedStateUnknown(String key, String resourceVersion) {
    // not where it needs to be yet
  }

  @Override
  public void eventReceived(Action action, HasMetadata resource) {
    // do something
  }
}); 

A couple of thoughts / alternatives - the handling of deletedStateUnknown could be done like a bookmark. Then rather than passing a key function, we'd assume the use of uid and instead lookup the builder for resource to construct a new instance. Then we'd call deletedStateUnknown(resourceWithUidAndVersion).

Building on that, I think that for delete with unknown state to be useful we need more state - for a typical dependent resource scenario that would at least be the owner references. That could be assumed, or supported via accepting a Unary function for stripping the resource down to only what is needed in the delete case. So you still end up with an in-memory cache, but it will be much smaller than keeping the full objects - likely just uid, resourceVersion, and ownerReferences.

So we can either keep refining this or look to merge some of this into what informers support - at the very least batching.

@attilapiros @manusa @rohanKanojia WDYT?

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change
  • Chore (non-breaking change which doesn't affect codebase;
    test, version modification, documentation, etc.)

Checklist

  • Code contributed by me aligns with current project license: Apache 2.0
  • I Added CHANGELOG entry regarding this change
  • I have implemented unit tests to cover my changes
  • I have added/updated the javadocs and other documentation accordingly
  • No new bugs, code smells, etc. in SonarCloud report
  • I tested my code in Kubernetes
  • I tested my code in OpenShift

@centos-ci
Copy link

Can one of the admins verify this patch?

@shawkins shawkins added the wip label Nov 24, 2021
@shawkins shawkins marked this pull request as draft November 24, 2021 23:29
@shawkins
Copy link
Contributor Author

Thinking about this some more, it seems like this should just be added to informer functionality:

  • bookmarks via fix #3615: always requesting bookmarks #3617
  • support for batch fetching along with inline processing of those events so that we don't have to hold the entire list in memory
  • A function(s) for transforming the saved state to something minimal

I'll update this pr with something that looks more like that.

@sonarcloud
Copy link

sonarcloud bot commented Nov 28, 2021

SonarCloud Quality Gate failed.    Quality Gate failed

Bug B 2 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

1.1% 1.1% Coverage
0.0% 0.0% Duplication

@shawkins
Copy link
Contributor Author

I'll update this pr with something that looks more like that.

I've updated this to be a "targeted" informer. To do that you pass a function to save only the state that is needed to make decisions when things change / are deleted.

To expand on something making the Store/Cache pluggable in general for informers leaves open the interpretation of events / state. The contract of the informer is that it will generate a series of events using its previously observed state as a reference - so you know what changed from one observation to the next. With a plugable cache if a user is allowed to remove entries, either explicitly or via cache expiration, that could change the meaning of future events - a modification becomes an add for example. If a lookup were added to read the current state from the api server it would change the meaning of modification events by referencing the wrong prior state - that is especially problematic for edge triggered systems. That was what was guiding my thinking of trying to come up with something between an informer and a watch and not expose a cache at all and make it clear that it was only for use with level triggered systems. The sticking point there is giving the user control over how much state is needed in the corner case of delete via a relist, it seems clearer to present this like an informer.

@metacosm metacosm self-requested a review November 30, 2021 16:37
@shawkins
Copy link
Contributor Author

Here's an attempt of a summary of constructs:

Events? Prior state? Construct Notes
No None Dsl get  
No Customizable User provided cache - nothing in fabric8  
Yes None Watch After #3617 Watches will rarely fail as they will typically resume at a recent resourceVersion.  If bookmarks are not supported, or you get a 410 http gone for whatever reason, then you have to consider what events you have missed - which is only possible if some state is being tracked.
Yes All Informer In general it’s for edge triggered systems. You may not always need or want all of the prior state: I want to “indefinitely” watch Events.  Or I only need a small amount of prior state to make decisions (level triggering). For these situations we can be more targeted.
Yes Targeted TargetedInformer Specialization for level triggered systems. Like an Informer, but must batch fetch to prevent memory issues.  Will only save the state needed to reason over modifications and deletes where the final state is not known. Would often be paired with a user provided cache. Could be merged into the Informer logic, considerations include: 1. Batch size for fetching.  The user must have a speeding or async handler, or else the continue token could become invalid prior to finishing the fetching 2. A function to determine what to save - which needs some thought as it could impact how we hold the state in-memory, if it’s read-only, and if it fits within the Informer construct (only if it’s held as an instance of the resource) 3. Optionally a customizable key function
Yes All StateStoreInformer Specialization for memory consumption. An Informer, but attempts to keep a managable in-memory footprint.  Will use batch fetching and save state to a (user provided?) store with a limited memory footprint. The store must return the values as saved, or else the meaning of events is not the same. Could be merged into the Informer logic, considerations include: 1. Batch size for fetching.  The user must have a speeding or async handler, or else the continue token could become invalid prior to finishing the fetching 2. The Cache class is the closest to what needs to be pluggable, but that also would require the implementer to take full responsibility over secondary indexing.

So what we're really trying to determine here is:

  • how many modifications are needed in the current informer to support one or both of the specialized constructs
  • or is something new required

@shawkins
Copy link
Contributor Author

To keep the ball moving forward, @metacosm @attilapiros based upon the api module work - here's what the natural Cache interface looks like: https://github.com/fabric8io/kubernetes-client/pull/3654/files#diff-b146dfd2b20edaf10226fb2541645bcf2790a6af52550b975c89c4255e1bbd4b
A pluggable version would have to take responsibility for all of the Cache, Indexer, and Store methods. That seems like a lot of developer burden. More than likely we'd need to create a new interface or creating an abstract base that allowed the pluggable part to focus only on the creation of the item map and the indices map.

Or are you leaning more toward the targeted approach where the memory footprint is less of a concern?

@shawkins
Copy link
Contributor Author

shawkins commented Jan 14, 2022

Looking at things on the go side again, they now have support for pagination built into their informer/reflector: https://github.com/kubernetes/kubernetes/blob/6f896dec4f45ffb82491bc9ce2393e7452886562/staging/src/k8s.io/client-go/tools/cache/reflector.go#L92 - there's a note there on the performance implications.

So I'll move ahead with the change to support batch fetching in the base informer, then we can revisit the other differentiation above.

After re-reading their code, this is only done to reduce the size of the result constructed by the api server - the client still creates a full in-memory list of the result before processing it with the sync logic, so it's not quite the support I was thinking.

@shawkins
Copy link
Contributor Author

After several discussions, the work here to have a pruning function will be combined with anything still relevant from #3479 to allow for some modifications to the cache, but without the expectation that user will provide their own implementation.

@shawkins
Copy link
Contributor Author

shawkins commented Mar 7, 2022

replaced by #3943

@shawkins shawkins closed this Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants